All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/12] x86: Support Key Locker
@ 2022-01-12 21:12 ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

Changes from v4 [1]:
* Drop CBC mode support (PATCH10). (Eric Biggers)
* Update the changelog (PATCH8). (Rafael Wysocki)

A couple of other things outside of these patches are still in progress:
* Support DM-crypt/cryptsetup for Key Locker usage (Andy Lutomirski)
  [2].
* Understand decryption under-performance (Eric Biggers and Milan Broz)
  [3][4].

This feature's usage for the threat model can be found in the previous
cover letter [1]. This version is based on 5.16.

Thanks,
Chang

[1] V4: https://lore.kernel.org/lkml/20211214005212.20588-1-chang.seok.bae@intel.com/
[2] https://lore.kernel.org/lkml/75ec3ad1-6234-ae1f-1b83-482793e4fd23@kernel.org/
[3] https://lore.kernel.org/lkml/YbqRseO+TtuGQk5x@sol.localdomain/
[4] https://lore.kernel.org/lkml/120368dc-e337-9176-936c-4db2a8bf710e@gmail.com/

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker internal wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load an internal wrapping key at boot-time
  x86/PM/keylocker: Restore internal wrapping key on resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aes - Prepare for a new AES implementation
  crypto: x86/aes-kl - Support AES algorithm using Key Locker
    instructions
  crypto: x86/aes-kl - Support XTS mode

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/keylocker.rst               |  98 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Makefile                      |   5 +-
 arch/x86/crypto/aes-intel_asm.S               |  26 +
 arch/x86/crypto/aes-intel_glue.c              | 125 ++++
 arch/x86/crypto/aes-intel_glue.h              |  48 ++
 arch/x86/crypto/aeskl-intel_asm.S             | 633 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 216 ++++++
 arch/x86/crypto/aesni-intel_asm.S             |  58 +-
 arch/x86/crypto/aesni-intel_glue.c            | 239 ++-----
 arch/x86/crypto/aesni-intel_glue.h            |  17 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 199 ++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 crypto/Kconfig                                |  36 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 28 files changed, 1633 insertions(+), 216 deletions(-)
 create mode 100644 Documentation/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.c
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: df0cc57e057f18e44dac8e6c18aba47ab53202f9
--
2.17.1


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 00/12] x86: Support Key Locker
@ 2022-01-12 21:12 ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Changes from v4 [1]:
* Drop CBC mode support (PATCH10). (Eric Biggers)
* Update the changelog (PATCH8). (Rafael Wysocki)

A couple of other things outside of these patches are still in progress:
* Support DM-crypt/cryptsetup for Key Locker usage (Andy Lutomirski)
  [2].
* Understand decryption under-performance (Eric Biggers and Milan Broz)
  [3][4].

This feature's usage for the threat model can be found in the previous
cover letter [1]. This version is based on 5.16.

Thanks,
Chang

[1] V4: https://lore.kernel.org/lkml/20211214005212.20588-1-chang.seok.bae@intel.com/
[2] https://lore.kernel.org/lkml/75ec3ad1-6234-ae1f-1b83-482793e4fd23@kernel.org/
[3] https://lore.kernel.org/lkml/YbqRseO+TtuGQk5x@sol.localdomain/
[4] https://lore.kernel.org/lkml/120368dc-e337-9176-936c-4db2a8bf710e@gmail.com/

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker internal wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load an internal wrapping key at boot-time
  x86/PM/keylocker: Restore internal wrapping key on resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aes - Prepare for a new AES implementation
  crypto: x86/aes-kl - Support AES algorithm using Key Locker
    instructions
  crypto: x86/aes-kl - Support XTS mode

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/keylocker.rst               |  98 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Makefile                      |   5 +-
 arch/x86/crypto/aes-intel_asm.S               |  26 +
 arch/x86/crypto/aes-intel_glue.c              | 125 ++++
 arch/x86/crypto/aes-intel_glue.h              |  48 ++
 arch/x86/crypto/aeskl-intel_asm.S             | 633 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 216 ++++++
 arch/x86/crypto/aesni-intel_asm.S             |  58 +-
 arch/x86/crypto/aesni-intel_glue.c            | 239 ++-----
 arch/x86/crypto/aesni-intel_glue.h            |  17 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 199 ++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 crypto/Kconfig                                |  36 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 28 files changed, 1633 insertions(+), 216 deletions(-)
 create mode 100644 Documentation/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.c
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: df0cc57e057f18e44dac8e6c18aba47ab53202f9
--
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v5 01/12] Documentation/x86: Document Key Locker
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae, linux-doc

Document the overview of the feature along with relevant consideration when
provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Add as a new patch.
---
 Documentation/x86/index.rst     |  1 +
 Documentation/x86/keylocker.rst | 98 +++++++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+)
 create mode 100644 Documentation/x86/keylocker.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index f498f1d36cd3..bbea47ea10f6 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -38,3 +38,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/x86/keylocker.rst b/Documentation/x86/keylocker.rst
new file mode 100644
index 000000000000..e65d936ef199
--- /dev/null
+++ b/Documentation/x86/keylocker.rst
@@ -0,0 +1,98 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature feature to reduce key exfiltration
+opportunities while maintaining a programming interface similar to AES-NI.
+It converts the AES key into an encoded form, called the 'key handle'. The
+key handle is a wrapped version of the clear-text key where the wrapping
+key has limited exposure. Once converted, all subsequent data encryption
+using new AES instructions (AES-KL) uses this key handle, reducing the
+exposure of private key material in memory.
+
+Internal Wrapping Key (IWKey)
+=============================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+IWKey Restore Failure
+---------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of an IWKey restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by an IWKey restore failure is a data-volume then it is possible
+that I/O errors on that volume do not bring down the rest of the system.
+However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new IWKey at initial boot, not any time after like resume from
+suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private IWKey state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per VM IWKeys in memory, defeats the purpose of Key Locker. The
+backup / restore facility is also not performant enough to be suitable for
+guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return an error
+  code. Note that the flag is also set when the internal wrapping key is
+  changed because of missing backup.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. But there is no AES-KL
+  instruction to process a 192-bit key. The AES-KL cipher implementation
+  logs a warning message with a 192-bit key and then falls back to AES-NI.
+  So, this 192-bit key-size limitation is only documented, not enforced. It
+  means the key will remain in clear-text in memory. This is to meet Linux
+  crypto-cipher expectation that each implementation must support all the
+  AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 01/12] Documentation/x86: Document Key Locker
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, linux-doc, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Document the overview of the feature along with relevant consideration when
provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Add as a new patch.
---
 Documentation/x86/index.rst     |  1 +
 Documentation/x86/keylocker.rst | 98 +++++++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+)
 create mode 100644 Documentation/x86/keylocker.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index f498f1d36cd3..bbea47ea10f6 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -38,3 +38,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/x86/keylocker.rst b/Documentation/x86/keylocker.rst
new file mode 100644
index 000000000000..e65d936ef199
--- /dev/null
+++ b/Documentation/x86/keylocker.rst
@@ -0,0 +1,98 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature feature to reduce key exfiltration
+opportunities while maintaining a programming interface similar to AES-NI.
+It converts the AES key into an encoded form, called the 'key handle'. The
+key handle is a wrapped version of the clear-text key where the wrapping
+key has limited exposure. Once converted, all subsequent data encryption
+using new AES instructions (AES-KL) uses this key handle, reducing the
+exposure of private key material in memory.
+
+Internal Wrapping Key (IWKey)
+=============================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+IWKey Restore Failure
+---------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of an IWKey restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by an IWKey restore failure is a data-volume then it is possible
+that I/O errors on that volume do not bring down the rest of the system.
+However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new IWKey at initial boot, not any time after like resume from
+suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private IWKey state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per VM IWKeys in memory, defeats the purpose of Key Locker. The
+backup / restore facility is also not performant enough to be suitable for
+guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return an error
+  code. Note that the flag is also set when the internal wrapping key is
+  changed because of missing backup.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. But there is no AES-KL
+  instruction to process a 192-bit key. The AES-KL cipher implementation
+  logs a warning message with a 192-bit key and then falls back to AES-NI.
+  So, this 192-bit key-size limitation is only documented, not enforced. It
+  means the key will remain in clear-text in memory. This is to meet Linux
+  crypto-cipher expectation that each implementation must support all the
+  AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 02/12] x86/cpufeature: Enumerate Key Locker feature
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used to
transform a user key into a key handle.

It supports Advanced Encryption Standard (AES) cipher algorithm with new
SIMD instruction set like its predecessor (AES-NI). So a new AES
implementation will follow in the kernel's crypto library.

Here add it to enumerate the hardware capability, but it will not be
shown in /proc/cpuinfo as userspace usage is not supported.

Make the feature depend on XMM2 as it comes with AES SIMD instructions.

Add X86_FEATURE_KEYLOCKER to the disabled features mask. It will be
enabled under a new config option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d5b5f2ab87a0..e1964446bbe5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -361,6 +361,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 8f28fafa98b3..75e1e87640d4 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -44,6 +44,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -85,7 +91,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index bcba3c643e63..b958a95a0908 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -124,6 +124,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..abe7e04b27d9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -78,6 +78,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 02/12] x86/cpufeature: Enumerate Key Locker feature
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used to
transform a user key into a key handle.

It supports Advanced Encryption Standard (AES) cipher algorithm with new
SIMD instruction set like its predecessor (AES-NI). So a new AES
implementation will follow in the kernel's crypto library.

Here add it to enumerate the hardware capability, but it will not be
shown in /proc/cpuinfo as userspace usage is not supported.

Make the feature depend on XMM2 as it comes with AES SIMD instructions.

Add X86_FEATURE_KEYLOCKER to the disabled features mask. It will be
enabled under a new config option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d5b5f2ab87a0..e1964446bbe5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -361,6 +361,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 8f28fafa98b3..75e1e87640d4 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -44,6 +44,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -85,7 +91,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index bcba3c643e63..b958a95a0908 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -124,6 +124,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index c881bcafba7d..abe7e04b27d9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -78,6 +78,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 03/12] x86/insn: Add Key Locker instructions to the opcode map
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

Add the following Key Locker instructions to the opcode map:

LOADIWKEY:
	Load an CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key indicated
	by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key indicated
	by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key indicated
	by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key indicated
	by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key indicated
	by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key indicated
	by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key indicated
	by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key indicated
	by a key handle.

Details of these instructions can be found in Intel Software Developer's
Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..eb702fc6572e 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -794,11 +794,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -808,6 +809,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..eb702fc6572e 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -794,11 +794,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -808,6 +809,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 03/12] x86/insn: Add Key Locker instructions to the opcode map
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Add the following Key Locker instructions to the opcode map:

LOADIWKEY:
	Load an CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key indicated
	by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key indicated
	by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key indicated
	by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key indicated
	by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key indicated
	by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key indicated
	by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key indicated
	by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key indicated
	by a key handle.

Details of these instructions can be found in Intel Software Developer's
Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..eb702fc6572e 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -794,11 +794,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -808,6 +809,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..eb702fc6572e 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -794,11 +794,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -808,6 +809,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

Key Locker introduces a CPU-internal wrapping key to encode a user key to a
key handle. Then a key handle is referenced instead of the plain text key.

The new instruction loads an internal wrapping key in the
software-inaccessible CPU state. It operates only in kernel mode.

Define struct iwkey to pass the key value.

The kernel will use this function to load a new key at boot time when the
feature is enabled.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
Williams)

Note, Dan wondered if:
  WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
 arch/x86/include/asm/keylocker.h     | 25 ++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..df84c83228a1
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary internal wrapping key storage.
+ * @integrity_key:	A 128-bit key to check that key handles have not
+ *			been tampered with.
+ * @encryption_key:	A 256-bit encryption key used in
+ *			wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 68c257a3de0d..3e50d8396ee1 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <asm/processor-flags.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -294,6 +295,37 @@ static inline int enqcmds(void __iomem *dst, const void *src)
 	return 0;
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key:	A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_cpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * LOADIWKEY %xmm1,%xmm2
+	 *
+	 * EAX and XMM0 are implicit operands. Load a key value
+	 * from XMM0-2 to a software-invisible CPU state. With zero
+	 * in EAX, CPU does not do hardware randomization and the key
+	 * backup is allowed.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Key Locker introduces a CPU-internal wrapping key to encode a user key to a
key handle. Then a key handle is referenced instead of the plain text key.

The new instruction loads an internal wrapping key in the
software-inaccessible CPU state. It operates only in kernel mode.

Define struct iwkey to pass the key value.

The kernel will use this function to load a new key at boot time when the
feature is enabled.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
Williams)

Note, Dan wondered if:
  WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
 arch/x86/include/asm/keylocker.h     | 25 ++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..df84c83228a1
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary internal wrapping key storage.
+ * @integrity_key:	A 128-bit key to check that key handles have not
+ *			been tampered with.
+ * @encryption_key:	A 256-bit encryption key used in
+ *			wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 68c257a3de0d..3e50d8396ee1 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <asm/processor-flags.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -294,6 +295,37 @@ static inline int enqcmds(void __iomem *dst, const void *src)
 	return 0;
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key:	A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_cpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * LOADIWKEY %xmm1,%xmm2
+	 *
+	 * EAX and XMM0 are implicit operands. Load a key value
+	 * from XMM0-2 to a software-invisible CPU state. With zero
+	 * in EAX, CPU does not do hardware randomization and the key
+	 * backup is allowed.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 05/12] x86/msr-index: Add MSRs for Key Locker internal wrapping key
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

The CPU state that contains the internal wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the cache
(like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and restore
the wrapping key, and to check the key status.

The wrapping key is saved in a platform-scoped state of non-volatile media.
The backup itself and its path from the CPU are encrypted and integrity
protected.

But this storage's non-volatility is not architecturally guaranteed across
off states, such as S5 and G3.

The MSRs will be used to back up the key for S3/4 sleep states. Then the
kernel code writes one of them to restore the key in each CPU state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 01e2650b9585..7f11a3b3a75b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -941,4 +941,10 @@
 #define MSR_VM_IGNNE                    0xc0010115
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+/* MSRs for managing an internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 05/12] x86/msr-index: Add MSRs for Key Locker internal wrapping key
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

The CPU state that contains the internal wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the cache
(like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and restore
the wrapping key, and to check the key status.

The wrapping key is saved in a platform-scoped state of non-volatile media.
The backup itself and its path from the CPU are encrypted and integrity
protected.

But this storage's non-volatility is not architecturally guaranteed across
off states, such as S5 and G3.

The MSRs will be used to back up the key for S3/4 sleep states. Then the
kernel code writes one of them to restore the key in each CPU state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 01e2650b9585..7f11a3b3a75b 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -941,4 +941,10 @@
 #define MSR_VM_IGNNE                    0xc0010115
 #define MSR_VM_HSAVE_PA                 0xc0010117
 
+/* MSRs for managing an internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 06/12] x86/keylocker: Define Key Locker CPUID leaf
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

Define the feature-specific CPUID leaf and bits. Both Key Locker enabling
code in the x86 core and AES Key Locker code in the crypto library will
reference them.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index df84c83228a1..e85dfb6c1524 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 06/12] x86/keylocker: Define Key Locker CPUID leaf
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Define the feature-specific CPUID leaf and bits. Both Key Locker enabling
code in the x86 core and AES Key Locker code in the crypto library will
reference them.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index df84c83228a1..e85dfb6c1524 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

The Internal Wrapping Key (IWKey) is an entity of Key Locker to encode a
clear text key into a key handle. This key is a pivot in protecting user
keys. So the value has to be randomized before being loaded in the
software-invisible CPU state.

IWKey needs to be established before the first user. Given that the only
proposed Linux use case for Key Locker is dm-crypt, the feature could be
lazily enabled when the first dm-crypt user arrives, but there is no
precedent for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.

The kernel generates random bytes and load them at boot time. These bytes
are flushed out immediately.

Setting the CR4.KL bit does not always enable the feature so ensure the
dynamic CPU bit (CPUID.AESKLE) is set before loading the key.

Given that the Linux Key Locker support is only intended for bare metal
dm-crypt consumption, and that switching IWKey per VM is untenable,
explicitly skip Key Locker setup in the X86_FEATURE_HYPERVISOR case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
 arch/x86/include/asm/keylocker.h |  9 ++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/common.c     |  5 +-
 arch/x86/kernel/keylocker.c      | 79 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c        |  2 +
 5 files changed, 95 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index e85dfb6c1524..820ac29c06d9 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <linux/bits.h>
 #include <asm/fpu/types.h>
 
@@ -28,5 +29,13 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+#define setup_keylocker(c) do { } while (0)
+#define destroy_keylocker_data() do { } while (0)
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..e15efa238497 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -144,6 +144,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0083464de5e3..23b4aa437c1e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -57,6 +57,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
 
@@ -1595,10 +1597,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..87d775a65716
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support internal wrapping key
+ * management.
+ */
+
+#include <linux/random.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+	struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
+	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+	kernel_fpu_begin();
+	load_xmm_iwkey(&kl_setup.key);
+	kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c:		A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+	/* This feature is not compatible with a hypervisor. */
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) ||
+	    cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		goto out;
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (c == &boot_cpu_data) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+		/*
+		 * Check the feature readiness via CPUID. Note that the
+		 * CPUID AESKLE bit is conditionally set only when CR4.KL
+		 * is set.
+		 */
+		if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+		    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+			pr_debug("x86/keylocker: Not fully supported.\n");
+			goto disable;
+		}
+
+		generate_keylocker_data();
+	}
+
+	load_keylocker();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return;
+
+disable:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+out:
+	/* Make sure the feature disabled for kexec-reboot. */
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 617012f4619f..00cfa64948f5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -82,6 +82,7 @@
 #include <asm/spec-ctrl.h>
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_ACPI_CPPC_LIB
 #include <acpi/cppc_acpi.h>
@@ -1489,6 +1490,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	nmi_selftest();
 	impress_friends();
 	mtrr_aps_init();
+	destroy_keylocker_data();
 }
 
 static int __initdata setup_possible_cpus = -1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

The Internal Wrapping Key (IWKey) is an entity of Key Locker to encode a
clear text key into a key handle. This key is a pivot in protecting user
keys. So the value has to be randomized before being loaded in the
software-invisible CPU state.

IWKey needs to be established before the first user. Given that the only
proposed Linux use case for Key Locker is dm-crypt, the feature could be
lazily enabled when the first dm-crypt user arrives, but there is no
precedent for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.

The kernel generates random bytes and load them at boot time. These bytes
are flushed out immediately.

Setting the CR4.KL bit does not always enable the feature so ensure the
dynamic CPU bit (CPUID.AESKLE) is set before loading the key.

Given that the Linux Key Locker support is only intended for bare metal
dm-crypt consumption, and that switching IWKey per VM is untenable,
explicitly skip Key Locker setup in the X86_FEATURE_HYPERVISOR case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
 arch/x86/include/asm/keylocker.h |  9 ++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/common.c     |  5 +-
 arch/x86/kernel/keylocker.c      | 79 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c        |  2 +
 5 files changed, 95 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index e85dfb6c1524..820ac29c06d9 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <linux/bits.h>
 #include <asm/fpu/types.h>
 
@@ -28,5 +29,13 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+#define setup_keylocker(c) do { } while (0)
+#define destroy_keylocker_data() do { } while (0)
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..e15efa238497 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -144,6 +144,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0083464de5e3..23b4aa437c1e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -57,6 +57,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
 
@@ -1595,10 +1597,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..87d775a65716
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support internal wrapping key
+ * management.
+ */
+
+#include <linux/random.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+	struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
+	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+	kernel_fpu_begin();
+	load_xmm_iwkey(&kl_setup.key);
+	kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c:		A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+	/* This feature is not compatible with a hypervisor. */
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) ||
+	    cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		goto out;
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (c == &boot_cpu_data) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+		/*
+		 * Check the feature readiness via CPUID. Note that the
+		 * CPUID AESKLE bit is conditionally set only when CR4.KL
+		 * is set.
+		 */
+		if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+		    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+			pr_debug("x86/keylocker: Not fully supported.\n");
+			goto disable;
+		}
+
+		generate_keylocker_data();
+	}
+
+	load_keylocker();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return;
+
+disable:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+out:
+	/* Make sure the feature disabled for kexec-reboot. */
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 617012f4619f..00cfa64948f5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -82,6 +82,7 @@
 #include <asm/spec-ctrl.h>
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_ACPI_CPPC_LIB
 #include <acpi/cppc_acpi.h>
@@ -1489,6 +1490,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	nmi_selftest();
 	impress_friends();
 	mtrr_aps_init();
+	destroy_keylocker_data();
 }
 
 static int __initdata setup_possible_cpus = -1;
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae, linux-pm

When the system enters the ACPI S3 or S4 sleep states, the internal
wrapping key is discarded.

The primary use case for the feature is bare metal dm-crypt. The key needs
to be restored properly on wakeup, as dm-crypt does not prompt for the key
on resume from suspend. Even the prompt it does perform for unlocking
the volume where the hibernation image is stored, it still expects to reuse
the key handles within the hibernation image once it is loaded. So it is
motivated to meet dm-crypt's expectation that the key handles in the
suspend-image remain valid after resume from an S-state.

Key Locker provides a mechanism to back up the internal wrapping key in
non-volatile storage. The kernel requests a backup right after the key is
loaded at boot time. It is copied back to each CPU upon wakeup.

While the backup may be maintained in NVM across S5 and G3 "off"
states it is not architecturally guaranteed, nor is it expected by dm-crypt
which expects to prompt for the key each time the volume is started.

The entirety of Key Locker needs to be disabled if the backup mechanism is
not available unless CONFIG_SUSPEND=n, otherwise dm-crypt requires the
backup to be available.

In the event of a key restore failure the kernel proceeds with an
initialized IWKey state. This has the effect of invalidating any key
handles that might be present in a suspend-image. When this happens
dm-crypt will see I/O errors resulting from error returns from
crypto_skcipher_{en,de}crypt(). While this will disrupt operations in the
current boot, data is not at risk and access is restored at the next reboot
to create new handles relative to the current IWKey.

Manage a feature-specific flag to communicate with the crypto
implementation. This ensures to stop using the AES instructions upon the
key restore failure while not turning off the feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 124 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 820ac29c06d9..c1d27fb5a1c3 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 #define setup_keylocker(c) do { } while (0)
 #define destroy_keylocker_data() do { } while (0)
+#define restore_keylocker() do { } while (0)
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 87d775a65716..967c535974ab 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,11 +11,26 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with IWKey load. When the key restore fails, it is
+ * reset. This restore state is exported to the crypto library, then AES-KL
+ * will not be used there. So, the feature is soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
@@ -25,6 +40,8 @@ static void __init generate_keylocker_data(void)
 void __init destroy_keylocker_data(void)
 {
 	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +51,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the internal wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -49,6 +87,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -62,10 +101,49 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The internal wrapping key in CPU state is volatile in
+		 * S3/4 states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
-	}
+		load_keylocker();
 
-	load_keylocker();
+		/* Backup an internal wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+	} else {
+		int rc;
+
+		/*
+		 * Load the internal wrapping key directly when available
+		 * in memory, which is only possible at boot-time.
+		 *
+		 * NB: When system wakes up, this path also recovers the
+		 * internal wrapping key.
+		 */
+		if (!kl_setup.initialized) {
+			load_keylocker();
+		} else if (valid_kl) {
+			rc = copy_keylocker();
+			/*
+			 * The boot CPU was successful but the key copy
+			 * fails here. Then, the subsequent feature use
+			 * will have inconsistent keys and failures. So,
+			 * invalidate the feature via the flag.
+			 */
+			if (rc) {
+				valid_kl = false;
+				pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+			}
+		}
+	}
 
 	pr_info_once("x86/keylocker: Enabled.\n");
 	return;
@@ -77,3 +155,45 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the internal wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the setup
+ * function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that indicates
+	 * an invalid backup if bit 0 is set and a read (or write) error if
+	 * bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature via
+	 * the flag to avoid any subsequent use. But keep the feature with
+	 * zero IWKeys instead of disabling it. The current users will see
+	 * key handle integrity failure but that's because of the internal
+	 * key change.
+	 */
+	pr_err("x86/keylocker: Failed to restore internal wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 9f2b251e83c5..1a290f529c73 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -25,6 +25,7 @@
 #include <asm/cpu.h>
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -262,6 +263,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	mtrr_bp_restore();
 	perf_restore_debug_store();
 	msr_restore_context(ctxt);
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, linux-pm, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

When the system enters the ACPI S3 or S4 sleep states, the internal
wrapping key is discarded.

The primary use case for the feature is bare metal dm-crypt. The key needs
to be restored properly on wakeup, as dm-crypt does not prompt for the key
on resume from suspend. Even the prompt it does perform for unlocking
the volume where the hibernation image is stored, it still expects to reuse
the key handles within the hibernation image once it is loaded. So it is
motivated to meet dm-crypt's expectation that the key handles in the
suspend-image remain valid after resume from an S-state.

Key Locker provides a mechanism to back up the internal wrapping key in
non-volatile storage. The kernel requests a backup right after the key is
loaded at boot time. It is copied back to each CPU upon wakeup.

While the backup may be maintained in NVM across S5 and G3 "off"
states it is not architecturally guaranteed, nor is it expected by dm-crypt
which expects to prompt for the key each time the volume is started.

The entirety of Key Locker needs to be disabled if the backup mechanism is
not available unless CONFIG_SUSPEND=n, otherwise dm-crypt requires the
backup to be available.

In the event of a key restore failure the kernel proceeds with an
initialized IWKey state. This has the effect of invalidating any key
handles that might be present in a suspend-image. When this happens
dm-crypt will see I/O errors resulting from error returns from
crypto_skcipher_{en,de}crypt(). While this will disrupt operations in the
current boot, data is not at risk and access is restored at the next reboot
to create new handles relative to the current IWKey.

Manage a feature-specific flag to communicate with the crypto
implementation. This ensures to stop using the AES instructions upon the
key restore failure while not turning off the feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 124 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 128 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 820ac29c06d9..c1d27fb5a1c3 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 #define setup_keylocker(c) do { } while (0)
 #define destroy_keylocker_data() do { } while (0)
+#define restore_keylocker() do { } while (0)
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 87d775a65716..967c535974ab 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,11 +11,26 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with IWKey load. When the key restore fails, it is
+ * reset. This restore state is exported to the crypto library, then AES-KL
+ * will not be used there. So, the feature is soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
@@ -25,6 +40,8 @@ static void __init generate_keylocker_data(void)
 void __init destroy_keylocker_data(void)
 {
 	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +51,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the internal wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -49,6 +87,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -62,10 +101,49 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The internal wrapping key in CPU state is volatile in
+		 * S3/4 states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
-	}
+		load_keylocker();
 
-	load_keylocker();
+		/* Backup an internal wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+	} else {
+		int rc;
+
+		/*
+		 * Load the internal wrapping key directly when available
+		 * in memory, which is only possible at boot-time.
+		 *
+		 * NB: When system wakes up, this path also recovers the
+		 * internal wrapping key.
+		 */
+		if (!kl_setup.initialized) {
+			load_keylocker();
+		} else if (valid_kl) {
+			rc = copy_keylocker();
+			/*
+			 * The boot CPU was successful but the key copy
+			 * fails here. Then, the subsequent feature use
+			 * will have inconsistent keys and failures. So,
+			 * invalidate the feature via the flag.
+			 */
+			if (rc) {
+				valid_kl = false;
+				pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+			}
+		}
+	}
 
 	pr_info_once("x86/keylocker: Enabled.\n");
 	return;
@@ -77,3 +155,45 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the internal wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the setup
+ * function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that indicates
+	 * an invalid backup if bit 0 is set and a read (or write) error if
+	 * bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature via
+	 * the flag to avoid any subsequent use. But keep the feature with
+	 * zero IWKeys instead of disabling it. The current users will see
+	 * key handle integrity failure but that's because of the internal
+	 * key change.
+	 */
+	pr_err("x86/keylocker: Failed to restore internal wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 9f2b251e83c5..1a290f529c73 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -25,6 +25,7 @@
 #include <asm/cpu.h>
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -262,6 +263,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	mtrr_bp_restore();
 	perf_restore_debug_store();
 	msr_restore_context(ctxt);
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 09/12] x86/cpu: Add a configuration and command line option for Key Locker
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae, linux-doc

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at boot.
The option is selected by the Key Locker cipher module CRYPTO_AES_KL (to be
added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/Kconfig                                |  3 +++
 arch/x86/kernel/cpu/common.c                    | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2fba82431efb..30ed18fbdea3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3405,6 +3405,8 @@
 
 	nohugevmalloc	[PPC] Disable kernel huge vmalloc mappings.
 
+	nokeylocker	[X86] Disable Key Locker hardware feature.
+
 	nosmt		[KNL,S390] Disable symmetric multithreading (SMT).
 			Equivalent to smt=1.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c2ccb85f2ef..191bd4a941eb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1864,6 +1864,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 23b4aa437c1e..db1fc9ff0fe3 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -364,6 +364,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+	/* Expect an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info("x86/keylocker: Disabled by kernel command line.\n");
+	return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 09/12] x86/cpu: Add a configuration and command line option for Key Locker
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, linux-doc, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at boot.
The option is selected by the Key Locker cipher module CRYPTO_AES_KL (to be
added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/Kconfig                                |  3 +++
 arch/x86/kernel/cpu/common.c                    | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 2fba82431efb..30ed18fbdea3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3405,6 +3405,8 @@
 
 	nohugevmalloc	[PPC] Disable kernel huge vmalloc mappings.
 
+	nokeylocker	[X86] Disable Key Locker hardware feature.
+
 	nosmt		[KNL,S390] Disable symmetric multithreading (SMT).
 			Equivalent to smt=1.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c2ccb85f2ef..191bd4a941eb 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1864,6 +1864,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 23b4aa437c1e..db1fc9ff0fe3 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -364,6 +364,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 /* These bits should not change their value after CPU init is finished. */
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP | X86_CR4_FSGSBASE;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+	/* Expect an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info("x86/keylocker: Disabled by kernel command line.\n");
+	return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 10/12] crypto: x86/aes - Prepare for a new AES implementation
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have the
same prototype as AES-NI. Then, the glue code will be the same as AES-NI's.

Refactor the common C code to avoid code duplication. The AES-NI code uses
it with a function pointer argument to call back the AES-NI assembly code.
So will the AES-KL code.

Introduce wrappers for data transformation functions to return an error
code. AES-KL may populate an error during data transformation.

Export those refactored functions and new wrappers that the AES-KL code
will reference.

Also move some constant values to a common assembly code.

No functional change intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Makefile           |   2 +-
 arch/x86/crypto/aes-intel_asm.S    |  26 ++++
 arch/x86/crypto/aes-intel_glue.c   | 125 +++++++++++++++
 arch/x86/crypto/aes-intel_glue.h   |  48 ++++++
 arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
 arch/x86/crypto/aesni-intel_glue.c | 239 +++++++++--------------------
 arch/x86/crypto/aesni-intel_glue.h |  17 ++
 7 files changed, 309 insertions(+), 206 deletions(-)
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.c
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index f307c93fc90a..ef6c0b9f69c6 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -47,7 +47,7 @@ chacha-x86_64-y := chacha-avx2-x86_64.o chacha-ssse3-x86_64.o chacha_glue.o
 chacha-x86_64-$(CONFIG_AS_AVX512) += chacha-avx512vl-x86_64.o
 
 obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
-aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
+aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
diff --git a/arch/x86/crypto/aes-intel_asm.S b/arch/x86/crypto/aes-intel_asm.S
new file mode 100644
index 000000000000..98abf875af79
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_asm.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+#ifdef __x86_64__
+.Lbswap_mask:
+	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+#endif
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-intel_glue.c b/arch/x86/crypto/aes-intel_glue.c
new file mode 100644
index 000000000000..2fca747d78f2
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+#include "aes-intel_glue.h"
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		      int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+				const u8 *in_key, unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx, key + keylen, keylen);
+}
+EXPORT_SYMBOL_GPL(xts_setkey_common);
+
+int xts_crypt_common(struct skcipher_request *req,
+		     int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				     unsigned int len, u8 *iv),
+		     int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+	if (err)
+		return err;
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+EXPORT_SYMBOL_GPL(xts_crypt_common);
diff --git a/arch/x86/crypto/aes-intel_glue.h b/arch/x86/crypto/aes-intel_glue.h
new file mode 100644
index 000000000000..c8bd783fa787
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <crypto/aes.h>
+
+#define AES_ALIGN			16
+#define AES_ALIGN_ATTR			__attribute__((__aligned__(AES_ALIGN)))
+#define AES_BLOCK_MASK			(~(AES_BLOCK_SIZE - 1))
+#define AES_ALIGN_EXTRA			((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define CRYPTO_AES_CTX_SIZE		(sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
+#define XTS_AES_CTX_SIZE		(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+};
+
+static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+{
+	unsigned long addr = (unsigned long)raw_ctx;
+	unsigned long align = AES_ALIGN;
+
+	if (align <= crypto_tfm_ctx_alignment())
+		align = 1;
+
+	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+}
+
+int cbc_crypt_common(struct skcipher_request *req,
+		     int (*fn)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			       unsigned int len, u8 *iv));
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		      int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+				const u8 *in_key, unsigned int key_len));
+
+int xts_crypt_common(struct skcipher_request *req,
+		     int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				     unsigned int len, u8 *iv),
+		     int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in));
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 4e3972570916..4a3447ab6145 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-intel_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1822,10 +1823,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                    unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(_aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1934,12 +1935,12 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	ret
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(_aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(_aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1958,7 +1959,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	ret
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(_aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2126,9 +2127,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(_aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2148,7 +2149,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	ret
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(_aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2691,21 +2692,6 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	ret
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
-.pushsection .rodata
-.align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
-.Lbswap_mask:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
-.popsection
-
 #ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
@@ -2821,12 +2807,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2846,10 +2826,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(_aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2998,13 +2978,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(_aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(_aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3160,4 +3140,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(_aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index e09f4672dd38..75cb23f6cb0e 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,24 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,10 +71,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void _aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void _aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +88,51 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return _aesni_set_key(ctx, in_key, key_len);
+}
+EXPORT_SYMBOL_GPL(aesni_set_key);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_enc(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_enc);
+
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_dec(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_dec);
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
 
 #ifdef CONFIG_X86_64
 
@@ -187,7 +215,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -197,7 +225,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -205,16 +233,6 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
-{
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -229,7 +247,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = _aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
@@ -250,7 +268,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		_aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -263,7 +281,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		_aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -477,7 +495,7 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
 
 #ifdef CONFIG_X86_64
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
-			      const u8 *in, unsigned int len, u8 *iv)
+				 const u8 *in, unsigned int len, u8 *iv)
 {
 	/*
 	 * based on key length, override with the by8 version
@@ -514,7 +532,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			_aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -606,8 +624,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -762,8 +780,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -790,8 +808,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -816,128 +834,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1066,8 +973,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1083,8 +990,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
@@ -1107,7 +1014,7 @@ static struct aead_alg aesni_aeads[] = { {
 		.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.cra_blocksize		= 1,
 		.cra_ctxsize		= sizeof(struct aesni_rfc4106_gcm_ctx),
-		.cra_alignmask		= AESNI_ALIGN - 1,
+		.cra_alignmask		= AES_ALIGN - 1,
 		.cra_module		= THIS_MODULE,
 	},
 }, {
@@ -1124,7 +1031,7 @@ static struct aead_alg aesni_aeads[] = { {
 		.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.cra_blocksize		= 1,
 		.cra_ctxsize		= sizeof(struct generic_gcmaes_ctx),
-		.cra_alignmask		= AESNI_ALIGN - 1,
+		.cra_alignmask		= AES_ALIGN - 1,
 		.cra_module		= THIS_MODULE,
 	},
 } };
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 10/12] crypto: x86/aes - Prepare for a new AES implementation
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have the
same prototype as AES-NI. Then, the glue code will be the same as AES-NI's.

Refactor the common C code to avoid code duplication. The AES-NI code uses
it with a function pointer argument to call back the AES-NI assembly code.
So will the AES-KL code.

Introduce wrappers for data transformation functions to return an error
code. AES-KL may populate an error during data transformation.

Export those refactored functions and new wrappers that the AES-KL code
will reference.

Also move some constant values to a common assembly code.

No functional change intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Makefile           |   2 +-
 arch/x86/crypto/aes-intel_asm.S    |  26 ++++
 arch/x86/crypto/aes-intel_glue.c   | 125 +++++++++++++++
 arch/x86/crypto/aes-intel_glue.h   |  48 ++++++
 arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
 arch/x86/crypto/aesni-intel_glue.c | 239 +++++++++--------------------
 arch/x86/crypto/aesni-intel_glue.h |  17 ++
 7 files changed, 309 insertions(+), 206 deletions(-)
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.c
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index f307c93fc90a..ef6c0b9f69c6 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -47,7 +47,7 @@ chacha-x86_64-y := chacha-avx2-x86_64.o chacha-ssse3-x86_64.o chacha_glue.o
 chacha-x86_64-$(CONFIG_AS_AVX512) += chacha-avx512vl-x86_64.o
 
 obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
-aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
+aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
diff --git a/arch/x86/crypto/aes-intel_asm.S b/arch/x86/crypto/aes-intel_asm.S
new file mode 100644
index 000000000000..98abf875af79
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_asm.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+#ifdef __x86_64__
+.Lbswap_mask:
+	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+#endif
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-intel_glue.c b/arch/x86/crypto/aes-intel_glue.c
new file mode 100644
index 000000000000..2fca747d78f2
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+#include "aes-intel_glue.h"
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		      int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+				const u8 *in_key, unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx, key + keylen, keylen);
+}
+EXPORT_SYMBOL_GPL(xts_setkey_common);
+
+int xts_crypt_common(struct skcipher_request *req,
+		     int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				     unsigned int len, u8 *iv),
+		     int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+	if (err)
+		return err;
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+EXPORT_SYMBOL_GPL(xts_crypt_common);
diff --git a/arch/x86/crypto/aes-intel_glue.h b/arch/x86/crypto/aes-intel_glue.h
new file mode 100644
index 000000000000..c8bd783fa787
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <crypto/aes.h>
+
+#define AES_ALIGN			16
+#define AES_ALIGN_ATTR			__attribute__((__aligned__(AES_ALIGN)))
+#define AES_BLOCK_MASK			(~(AES_BLOCK_SIZE - 1))
+#define AES_ALIGN_EXTRA			((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define CRYPTO_AES_CTX_SIZE		(sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
+#define XTS_AES_CTX_SIZE		(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+};
+
+static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+{
+	unsigned long addr = (unsigned long)raw_ctx;
+	unsigned long align = AES_ALIGN;
+
+	if (align <= crypto_tfm_ctx_alignment())
+		align = 1;
+
+	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+}
+
+int cbc_crypt_common(struct skcipher_request *req,
+		     int (*fn)(struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			       unsigned int len, u8 *iv));
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		      int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+				const u8 *in_key, unsigned int key_len));
+
+int xts_crypt_common(struct skcipher_request *req,
+		     int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				     unsigned int len, u8 *iv),
+		     int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in));
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 4e3972570916..4a3447ab6145 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-intel_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1822,10 +1823,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                    unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(_aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1934,12 +1935,12 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	ret
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(_aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(_aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1958,7 +1959,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	ret
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(_aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2126,9 +2127,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(_aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2148,7 +2149,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	ret
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(_aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2691,21 +2692,6 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	ret
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
-.pushsection .rodata
-.align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
-.Lbswap_mask:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
-.popsection
-
 #ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
@@ -2821,12 +2807,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2846,10 +2826,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(_aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2998,13 +2978,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(_aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(_aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3160,4 +3140,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(_aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index e09f4672dd38..75cb23f6cb0e 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,24 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,10 +71,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void _aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void _aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +88,51 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return _aesni_set_key(ctx, in_key, key_len);
+}
+EXPORT_SYMBOL_GPL(aesni_set_key);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_enc(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_enc);
+
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_dec(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_dec);
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
 
 #ifdef CONFIG_X86_64
 
@@ -187,7 +215,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -197,7 +225,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -205,16 +233,6 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
-{
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -229,7 +247,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = _aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
@@ -250,7 +268,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		_aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -263,7 +281,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		_aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -477,7 +495,7 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
 
 #ifdef CONFIG_X86_64
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
-			      const u8 *in, unsigned int len, u8 *iv)
+				 const u8 *in, unsigned int len, u8 *iv)
 {
 	/*
 	 * based on key length, override with the by8 version
@@ -514,7 +532,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			_aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -606,8 +624,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -762,8 +780,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -790,8 +808,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -816,128 +834,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1066,8 +973,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1083,8 +990,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
@@ -1107,7 +1014,7 @@ static struct aead_alg aesni_aeads[] = { {
 		.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.cra_blocksize		= 1,
 		.cra_ctxsize		= sizeof(struct aesni_rfc4106_gcm_ctx),
-		.cra_alignmask		= AESNI_ALIGN - 1,
+		.cra_alignmask		= AES_ALIGN - 1,
 		.cra_module		= THIS_MODULE,
 	},
 }, {
@@ -1124,7 +1031,7 @@ static struct aead_alg aesni_aeads[] = { {
 		.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.cra_blocksize		= 1,
 		.cra_ctxsize		= sizeof(struct generic_gcmaes_ctx),
-		.cra_alignmask		= AESNI_ALIGN - 1,
+		.cra_alignmask		= AES_ALIGN - 1,
 		.cra_module		= THIS_MODULE,
 	},
 } };
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

Key Locker is a CPU feature to reduce key exfiltration opportunities while
maintaining a programming interface similar to AES-NI. It converts the AES
key into an encoded form, called the 'key handle'.

The key handle is a wrapped version of the clear-text key where the
wrapping key has limited exposure. Once converted via setkey(), all
subsequent data encryption using new AES instructions ('AES-KL') uses this
key handle, reducing the exposure of private key material in memory.

AES-KL is analogous to that of AES-NI. Most assembly code is translated
from the AES-NI code. They are operational in both 32-bit and 64-bit modes
like AES-NI. However, users need to be aware of the following differences:

== Key Handle ==

AES-KL may fail with an invalid key handle. It could be corrupted or fail
with handle restriction. A key handle may be encoded with some
restrictions. The implementation restricts every handle only available
in kernel mode via setkey().

=== AES Compliance ===

Key Locker is not AES compliant in that it lacks support for 192-bit keys.
However, per the expectations of Linux crypto-cipher implementations the
software cipher implementation must support all the AES compliant
key-sizes. The AES-KL cipher implementation achieves this constraint by
logging a warning and falling back to AES-NI. In other words the 192-bit
key-size limitation for what can be converted into a Key Locker key-handle
is only documented, not enforced. This along with the below performance and
failure mode implications is an end-user consideration for selecting AES-KL
vs AES-NI.

== API Limitation ==

The setkey() function transforms an AES key to a handle. An extended key is
a usual outcome of setkey() in other AES cipher implementations. For this
reason, a setkey() failure does not fall back to the other. So, AES-KL will
be exposed via synchronous interfaces only.

== Wrapping-key Restore Failure ==

The setkey() failure is also possible with the wrapping-key restore
failure. In an event of hardware failure, the wrapping key is lost from
deep sleep states. Then, setkey() will fail with the ENODEV error. And the
suspended data transforming task may resume but will fail due to the
wrapping key change.

== Performance ==

This feature comes with some performance penalties vs AES-NI. The
cryptsetup benchmark indicates Key Locker raw throughput can be  ~5x slower
than AES-NI. For disk encryption, storage bandwidth may be the bottleneck
before encryption bandwidth, but the potential performance difference is
why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
the kernel transparently replacing AES-NI usage with AES-KL.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-intel_asm.S  | 184 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 115 ++++++++++++++++++
 crypto/Kconfig                     |  36 ++++++
 4 files changed, 338 insertions(+)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index ef6c0b9f69c6..3a64779af974 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..d56ec8dd6644
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using Intel AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-intel_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define BSWAP_MASK %xmm11
+#define CTR	%xmm12
+#define INC	%xmm13
+
+#ifdef __x86_64__
+#define IN1	%xmm8
+#define IN2	%xmm9
+#define IN3	%xmm10
+#define IN4	%xmm11
+#define IN5	%xmm12
+#define IN6	%xmm13
+#define IN7	%xmm14
+#define IN8	%xmm15
+#define IN	IN1
+#define TCTR_LOW %r11
+#else
+#define IN	%xmm1
+#endif
+
+#ifdef __x86_64__
+#define AREG	%rax
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+#else
+#define AREG	%eax
+#define HANDLEP	%edi
+#define OUTP	AREG
+#define KLEN	%ebx
+#define INP	%edx
+#define T1	%ecx
+#define LEN	%esi
+#define IVP	%ebp
+#endif
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(aeskl_setkey)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	push HANDLEP
+	movl (FRAME_OFFSET+8)(%esp),  HANDLEP	# ctx
+	movl (FRAME_OFFSET+12)(%esp), UKEYP	# in_key
+	movl (FRAME_OFFSET+16)(%esp), %edx	# key_len
+#endif
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	xor AREG, AREG
+#ifndef __x86_64__
+	popl HANDLEP
+#endif
+	FRAME_END
+	ret
+SYM_FUNC_END(aeskl_setkey)
+
+/*
+ * int _aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_enc)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	pushl HANDLEP
+	pushl KLEN
+	movl (FRAME_OFFSET+12)(%esp), HANDLEP	# ctx
+	movl (FRAME_OFFSET+16)(%esp), OUTP	# dst
+	movl (FRAME_OFFSET+20)(%esp), INP	# src
+#endif
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	jmp .Lenc_noerr
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+
+.Lenc_noerr:
+	xor AREG, AREG
+	jmp .Lenc_end
+.Lenc_err:
+	mov $1, AREG
+.Lenc_end:
+	movdqu STATE, (OUTP)
+#ifndef __x86_64__
+	popl KLEN
+	popl HANDLEP
+#endif
+	FRAME_END
+	ret
+SYM_FUNC_END(_aeskl_enc)
+
+/*
+ * int _aeskl_dec(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_dec)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	pushl HANDLEP
+	pushl KLEN
+	movl (FRAME_OFFSET+12)(%esp), HANDLEP	# ctx
+	movl (FRAME_OFFSET+16)(%esp), OUTP	# dst
+	movl (FRAME_OFFSET+20)(%esp), INP	# src
+#endif
+	movdqu (INP), STATE
+	mov 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Ldec_128
+	aesdec256kl (HANDLEP), STATE
+	jz .Ldec_err
+	jmp .Ldec_noerr
+.Ldec_128:
+	aesdec128kl (HANDLEP), STATE
+	jz .Ldec_err
+
+.Ldec_noerr:
+	xor AREG, AREG
+	jmp .Ldec_end
+.Ldec_err:
+	mov $1, AREG
+.Ldec_end:
+	movdqu STATE, (OUTP)
+#ifndef __x86_64__
+	popl KLEN
+	popl HANDLEP
+#endif
+	FRAME_END
+	ret
+SYM_FUNC_END(_aeskl_dec)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..0062baaaf7b2
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
+
+static int __maybe_unused aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx,
+					      const u8 *in_key, unsigned int key_len)
+{
+	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
+	    key_len != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	if (unlikely(key_len == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		err = aesni_set_key(ctx, in_key, key_len);
+	} else {
+		if (!valid_keylocker())
+			err = -ENODEV;
+		else
+			err = aeskl_setkey(ctx, in_key, key_len);
+	}
+	kernel_fpu_end();
+
+	return err;
+}
+
+static inline u32 keylength(const void *raw_ctx)
+{
+	struct crypto_aes_ctx *ctx = aes_ctx((void *)raw_ctx);
+
+	return ctx->key_length;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_enc(ctx, out, in))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_dec(ctx, out, in))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int __init aeskl_init(void)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
+	 * support 192-bit keys. To make itself AES-compliant, it falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return 0;
+}
+
+static void __exit aeskl_exit(void)
+{
+	return;
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 285f82647d2b..b8b5d0353f58 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1113,6 +1113,42 @@ config CRYPTO_AES_NI_INTEL
 	  ECB, CBC, LRW, XTS. The 64 bit version has additional
 	  acceleration for CTR.
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
+config CRYPTO_AES_KL
+	tristate "AES cipher algorithms (AES-KL)"
+	depends on AS_HAS_KEYLOCKER
+	depends on CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Key Locker provides AES SIMD instructions (AES-KL) for secure
+	  data encryption and decryption. While this new instruction
+	  set is analogous to AES-NI, AES-KL supports to encode an AES
+	  key to an encoded form ('key handle') and uses it to transform
+	  data instead of accessing the AES key.
+
+	  The setkey() transforms an AES key to a key handle, then the AES
+	  key is no longer needed for data transformation. A user may
+	  displace their keys from possible exposition.
+
+	  This key encryption is done by the CPU-internal wrapping key.
+	  This wrapping key support is provided with X86_KEYLOCKER.
+
+	  AES-KL supports 128-/256-bit keys only. While giving a 192-bit
+	  key does not return an error, as the AES-NI driver is chosen to
+	  process it, the claimed security property is not available with
+	  that.
+
+	  Bare metal disk encryption is the preferred use case. Key Locker
+	  usage requires explicit opt-in at setup time. So select it if
+	  unsure.
+
+	  See Documentation/x86/keylocker.rst for more details.
+
 config CRYPTO_AES_SPARC64
 	tristate "AES cipher algorithms (SPARC64)"
 	depends on SPARC64
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Key Locker is a CPU feature to reduce key exfiltration opportunities while
maintaining a programming interface similar to AES-NI. It converts the AES
key into an encoded form, called the 'key handle'.

The key handle is a wrapped version of the clear-text key where the
wrapping key has limited exposure. Once converted via setkey(), all
subsequent data encryption using new AES instructions ('AES-KL') uses this
key handle, reducing the exposure of private key material in memory.

AES-KL is analogous to that of AES-NI. Most assembly code is translated
from the AES-NI code. They are operational in both 32-bit and 64-bit modes
like AES-NI. However, users need to be aware of the following differences:

== Key Handle ==

AES-KL may fail with an invalid key handle. It could be corrupted or fail
with handle restriction. A key handle may be encoded with some
restrictions. The implementation restricts every handle only available
in kernel mode via setkey().

=== AES Compliance ===

Key Locker is not AES compliant in that it lacks support for 192-bit keys.
However, per the expectations of Linux crypto-cipher implementations the
software cipher implementation must support all the AES compliant
key-sizes. The AES-KL cipher implementation achieves this constraint by
logging a warning and falling back to AES-NI. In other words the 192-bit
key-size limitation for what can be converted into a Key Locker key-handle
is only documented, not enforced. This along with the below performance and
failure mode implications is an end-user consideration for selecting AES-KL
vs AES-NI.

== API Limitation ==

The setkey() function transforms an AES key to a handle. An extended key is
a usual outcome of setkey() in other AES cipher implementations. For this
reason, a setkey() failure does not fall back to the other. So, AES-KL will
be exposed via synchronous interfaces only.

== Wrapping-key Restore Failure ==

The setkey() failure is also possible with the wrapping-key restore
failure. In an event of hardware failure, the wrapping key is lost from
deep sleep states. Then, setkey() will fail with the ENODEV error. And the
suspended data transforming task may resume but will fail due to the
wrapping key change.

== Performance ==

This feature comes with some performance penalties vs AES-NI. The
cryptsetup benchmark indicates Key Locker raw throughput can be  ~5x slower
than AES-NI. For disk encryption, storage bandwidth may be the bottleneck
before encryption bandwidth, but the potential performance difference is
why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
the kernel transparently replacing AES-NI usage with AES-KL.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-intel_asm.S  | 184 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 115 ++++++++++++++++++
 crypto/Kconfig                     |  36 ++++++
 4 files changed, 338 insertions(+)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index ef6c0b9f69c6..3a64779af974 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..d56ec8dd6644
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using Intel AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-intel_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define BSWAP_MASK %xmm11
+#define CTR	%xmm12
+#define INC	%xmm13
+
+#ifdef __x86_64__
+#define IN1	%xmm8
+#define IN2	%xmm9
+#define IN3	%xmm10
+#define IN4	%xmm11
+#define IN5	%xmm12
+#define IN6	%xmm13
+#define IN7	%xmm14
+#define IN8	%xmm15
+#define IN	IN1
+#define TCTR_LOW %r11
+#else
+#define IN	%xmm1
+#endif
+
+#ifdef __x86_64__
+#define AREG	%rax
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+#else
+#define AREG	%eax
+#define HANDLEP	%edi
+#define OUTP	AREG
+#define KLEN	%ebx
+#define INP	%edx
+#define T1	%ecx
+#define LEN	%esi
+#define IVP	%ebp
+#endif
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(aeskl_setkey)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	push HANDLEP
+	movl (FRAME_OFFSET+8)(%esp),  HANDLEP	# ctx
+	movl (FRAME_OFFSET+12)(%esp), UKEYP	# in_key
+	movl (FRAME_OFFSET+16)(%esp), %edx	# key_len
+#endif
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	xor AREG, AREG
+#ifndef __x86_64__
+	popl HANDLEP
+#endif
+	FRAME_END
+	ret
+SYM_FUNC_END(aeskl_setkey)
+
+/*
+ * int _aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_enc)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	pushl HANDLEP
+	pushl KLEN
+	movl (FRAME_OFFSET+12)(%esp), HANDLEP	# ctx
+	movl (FRAME_OFFSET+16)(%esp), OUTP	# dst
+	movl (FRAME_OFFSET+20)(%esp), INP	# src
+#endif
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	jmp .Lenc_noerr
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+
+.Lenc_noerr:
+	xor AREG, AREG
+	jmp .Lenc_end
+.Lenc_err:
+	mov $1, AREG
+.Lenc_end:
+	movdqu STATE, (OUTP)
+#ifndef __x86_64__
+	popl KLEN
+	popl HANDLEP
+#endif
+	FRAME_END
+	ret
+SYM_FUNC_END(_aeskl_enc)
+
+/*
+ * int _aeskl_dec(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_dec)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	pushl HANDLEP
+	pushl KLEN
+	movl (FRAME_OFFSET+12)(%esp), HANDLEP	# ctx
+	movl (FRAME_OFFSET+16)(%esp), OUTP	# dst
+	movl (FRAME_OFFSET+20)(%esp), INP	# src
+#endif
+	movdqu (INP), STATE
+	mov 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Ldec_128
+	aesdec256kl (HANDLEP), STATE
+	jz .Ldec_err
+	jmp .Ldec_noerr
+.Ldec_128:
+	aesdec128kl (HANDLEP), STATE
+	jz .Ldec_err
+
+.Ldec_noerr:
+	xor AREG, AREG
+	jmp .Ldec_end
+.Ldec_err:
+	mov $1, AREG
+.Ldec_end:
+	movdqu STATE, (OUTP)
+#ifndef __x86_64__
+	popl KLEN
+	popl HANDLEP
+#endif
+	FRAME_END
+	ret
+SYM_FUNC_END(_aeskl_dec)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..0062baaaf7b2
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
+
+static int __maybe_unused aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx,
+					      const u8 *in_key, unsigned int key_len)
+{
+	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
+	    key_len != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	if (unlikely(key_len == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		err = aesni_set_key(ctx, in_key, key_len);
+	} else {
+		if (!valid_keylocker())
+			err = -ENODEV;
+		else
+			err = aeskl_setkey(ctx, in_key, key_len);
+	}
+	kernel_fpu_end();
+
+	return err;
+}
+
+static inline u32 keylength(const void *raw_ctx)
+{
+	struct crypto_aes_ctx *ctx = aes_ctx((void *)raw_ctx);
+
+	return ctx->key_length;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_enc(ctx, out, in))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_dec(ctx, out, in))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int __init aeskl_init(void)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
+	 * support 192-bit keys. To make itself AES-compliant, it falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return 0;
+}
+
+static void __exit aeskl_exit(void)
+{
+	return;
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 285f82647d2b..b8b5d0353f58 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1113,6 +1113,42 @@ config CRYPTO_AES_NI_INTEL
 	  ECB, CBC, LRW, XTS. The 64 bit version has additional
 	  acceleration for CTR.
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
+config CRYPTO_AES_KL
+	tristate "AES cipher algorithms (AES-KL)"
+	depends on AS_HAS_KEYLOCKER
+	depends on CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Key Locker provides AES SIMD instructions (AES-KL) for secure
+	  data encryption and decryption. While this new instruction
+	  set is analogous to AES-NI, AES-KL supports to encode an AES
+	  key to an encoded form ('key handle') and uses it to transform
+	  data instead of accessing the AES key.
+
+	  The setkey() transforms an AES key to a key handle, then the AES
+	  key is no longer needed for data transformation. A user may
+	  displace their keys from possible exposition.
+
+	  This key encryption is done by the CPU-internal wrapping key.
+	  This wrapping key support is provided with X86_KEYLOCKER.
+
+	  AES-KL supports 128-/256-bit keys only. While giving a 192-bit
+	  key does not return an error, as the AES-NI driver is chosen to
+	  process it, the claimed security property is not available with
+	  that.
+
+	  Bare metal disk encryption is the preferred use case. Key Locker
+	  usage requires explicit opt-in at setup time. So select it if
+	  unsure.
+
+	  See Documentation/x86/keylocker.rst for more details.
+
 config CRYPTO_AES_SPARC64
 	tristate "AES cipher algorithms (SPARC64)"
 	depends on SPARC64
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v5 12/12] crypto: x86/aes-kl - Support XTS mode
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-12 21:12   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, chang.seok.bae

Implement XTS mode using AES-KL. Export the methods with a lower priority
than AES-NI to avoid from selected by default.

The assembly code clobbers more than eight 128-bit registers to make use of
performant wide instructions that process eight 128-bit blocks at once.
But that many 128-bit registers are not available in 32-bit mode, so
support 64-bit mode only.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/crypto/aeskl-intel_asm.S  | 449 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 107 ++++++-
 2 files changed, 553 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
index d56ec8dd6644..cb4680461f25 100644
--- a/arch/x86/crypto/aeskl-intel_asm.S
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -182,3 +182,452 @@ SYM_FUNC_START(_aeskl_dec)
 	ret
 SYM_FUNC_END(_aeskl_dec)
 
+#ifdef __x86_64__
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+/*
+ * int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			  const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_encrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+	sub $128, LEN
+	jl .Lxts_enc1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_enc8_128
+	aesencwide256kl (%rdi)
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+	aesencwide128kl (%rdi)
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+	movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+	mov $1, AREG
+.Lxts_enc_ret:
+	FRAME_END
+	ret
+
+.Lxts_enc1_pre:
+	add $128, LEN
+	jz .Lxts_enc_ret_iv
+	sub $16, LEN
+	jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+	movdqu (INP), STATE1
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_enc1_out
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_enc_cts1
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+
+.Lxts_enc_cts1:
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_cts_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+	pxor IV, STATE1
+	movups STATE1, (OUTP)
+	jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(_aeskl_xts_encrypt)
+
+/*
+ * int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			  const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_decrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+	test $15, LEN
+	jz .Lxts_dec8
+	sub $16, LEN
+
+.Lxts_dec8:
+	sub $128, LEN
+	jl .Lxts_dec1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_dec8_128
+	aesdecwide256kl (%rdi)
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+	aesdecwide128kl (%rdi)
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+	movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+	mov $1, AREG
+.Lxts_dec_ret:
+	FRAME_END
+	ret
+
+.Lxts_dec1_pre:
+	add $128, LEN
+	jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+	movdqu (INP), STATE1
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_dec_cts1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_dec1_out
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+	pxor IV, STATE1
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor STATE5, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+	pxor STATE5, STATE1
+
+	movups STATE1, (OUTP)
+	jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(_aeskl_xts_decrypt)
+
+#endif
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
index 0062baaaf7b2..6616a2fc6f04 100644
--- a/arch/x86/crypto/aeskl-intel_glue.c
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -27,8 +27,15 @@ asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsign
 asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
 
-static int __maybe_unused aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx,
-					      const u8 *in_key, unsigned int key_len)
+#ifdef CONFIG_X86_64
+asmlinkage int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				  unsigned int len, u8 *iv);
+asmlinkage int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				  unsigned int len, u8 *iv);
+#endif
+
+static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+			       unsigned int key_len)
 {
 	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
 	int err;
@@ -86,11 +93,99 @@ static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
 		return 0;
 }
 
+#ifdef CONFIG_X86_64
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_xts_encrypt(ctx, out, in, len, iv))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_xts_decrypt(ctx, out, in, len, iv))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int aeskl_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			    unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey_common);
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+#endif /* CONFIG_X86_64 */
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+#ifdef CONFIG_X86_64
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= aeskl_xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+#endif
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
 static int __init aeskl_init(void)
 {
+	u32 eax, ebx, ecx, edx;
+	int err;
+
 	if (!valid_keylocker())
 		return -ENODEV;
 
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
 	/*
 	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
 	 * support 192-bit keys. To make itself AES-compliant, it falls
@@ -99,12 +194,18 @@ static int __init aeskl_init(void)
 	if (!boot_cpu_has(X86_FEATURE_AES))
 		return -ENODEV;
 
+	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					     aeskl_simd_skciphers);
+	if (err)
+		return err;
+
 	return 0;
 }
 
 static void __exit aeskl_exit(void)
 {
-	return;
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
 }
 
 late_initcall(aeskl_init);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5 12/12] crypto: x86/aes-kl - Support XTS mode
@ 2022-01-12 21:12   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-12 21:12 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, chang.seok.bae, linux-kernel,
	kumar.n.dwarakanath, dan.j.williams, charishma1.gairuboyina

Implement XTS mode using AES-KL. Export the methods with a lower priority
than AES-NI to avoid from selected by default.

The assembly code clobbers more than eight 128-bit registers to make use of
performant wide instructions that process eight 128-bit blocks at once.
But that many 128-bit registers are not available in 32-bit mode, so
support 64-bit mode only.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/crypto/aeskl-intel_asm.S  | 449 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 107 ++++++-
 2 files changed, 553 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
index d56ec8dd6644..cb4680461f25 100644
--- a/arch/x86/crypto/aeskl-intel_asm.S
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -182,3 +182,452 @@ SYM_FUNC_START(_aeskl_dec)
 	ret
 SYM_FUNC_END(_aeskl_dec)
 
+#ifdef __x86_64__
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+/*
+ * int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			  const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_encrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+	sub $128, LEN
+	jl .Lxts_enc1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_enc8_128
+	aesencwide256kl (%rdi)
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+	aesencwide128kl (%rdi)
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+	movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+	mov $1, AREG
+.Lxts_enc_ret:
+	FRAME_END
+	ret
+
+.Lxts_enc1_pre:
+	add $128, LEN
+	jz .Lxts_enc_ret_iv
+	sub $16, LEN
+	jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+	movdqu (INP), STATE1
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_enc1_out
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_enc_cts1
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+
+.Lxts_enc_cts1:
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_cts_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+	pxor IV, STATE1
+	movups STATE1, (OUTP)
+	jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(_aeskl_xts_encrypt)
+
+/*
+ * int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			  const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_decrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+	test $15, LEN
+	jz .Lxts_dec8
+	sub $16, LEN
+
+.Lxts_dec8:
+	sub $128, LEN
+	jl .Lxts_dec1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_dec8_128
+	aesdecwide256kl (%rdi)
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+	aesdecwide128kl (%rdi)
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+	movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+	mov $1, AREG
+.Lxts_dec_ret:
+	FRAME_END
+	ret
+
+.Lxts_dec1_pre:
+	add $128, LEN
+	jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+	movdqu (INP), STATE1
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_dec_cts1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_dec1_out
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+	pxor IV, STATE1
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor STATE5, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+	pxor STATE5, STATE1
+
+	movups STATE1, (OUTP)
+	jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(_aeskl_xts_decrypt)
+
+#endif
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
index 0062baaaf7b2..6616a2fc6f04 100644
--- a/arch/x86/crypto/aeskl-intel_glue.c
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -27,8 +27,15 @@ asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsign
 asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
 
-static int __maybe_unused aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx,
-					      const u8 *in_key, unsigned int key_len)
+#ifdef CONFIG_X86_64
+asmlinkage int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				  unsigned int len, u8 *iv);
+asmlinkage int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				  unsigned int len, u8 *iv);
+#endif
+
+static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+			       unsigned int key_len)
 {
 	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
 	int err;
@@ -86,11 +93,99 @@ static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
 		return 0;
 }
 
+#ifdef CONFIG_X86_64
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_xts_encrypt(ctx, out, in, len, iv))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_xts_decrypt(ctx, out, in, len, iv))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int aeskl_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			    unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey_common);
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+#endif /* CONFIG_X86_64 */
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+#ifdef CONFIG_X86_64
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= aeskl_xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+#endif
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
 static int __init aeskl_init(void)
 {
+	u32 eax, ebx, ecx, edx;
+	int err;
+
 	if (!valid_keylocker())
 		return -ENODEV;
 
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
 	/*
 	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
 	 * support 192-bit keys. To make itself AES-compliant, it falls
@@ -99,12 +194,18 @@ static int __init aeskl_init(void)
 	if (!boot_cpu_has(X86_FEATURE_AES))
 		return -ENODEV;
 
+	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					     aeskl_simd_skciphers);
+	if (err)
+		return err;
+
 	return 0;
 }
 
 static void __exit aeskl_exit(void)
 {
-	return;
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
 }
 
 late_initcall(aeskl_init);
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 00/12] x86: Support Key Locker
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2022-01-13 22:16   ` Dave Hansen
  -1 siblings, 0 replies; 247+ messages in thread
From: Dave Hansen @ 2022-01-13 22:16 UTC (permalink / raw)
  To: Chang S. Bae, linux-crypto, dm-devel, herbert, ebiggers, ardb,
	x86, luto, tglx, bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar

On 1/12/22 1:12 PM, Chang S. Bae wrote:
> A couple of other things outside of these patches are still in progress:
> * Support DM-crypt/cryptsetup for Key Locker usage (Andy Lutomirski)
>   [2].
> * Understand decryption under-performance (Eric Biggers and Milan Broz)
>   [3][4].

I really like when contributors are clear about why they are posting
their series and what their expectations are.  This posting leaves me a
bit confused as to what you expect the maintainers to do.

Should the maintainers ignore this series until those in-progress things
are done?  Or, do you expect that this could be merged as-is before
those are resolved?

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 00/12] x86: Support Key Locker
@ 2022-01-13 22:16   ` Dave Hansen
  0 siblings, 0 replies; 247+ messages in thread
From: Dave Hansen @ 2022-01-13 22:16 UTC (permalink / raw)
  To: Chang S. Bae, linux-crypto, dm-devel, herbert, ebiggers, ardb,
	x86, luto, tglx, bp, dave.hansen, mingo
  Cc: ravi.v.shankar, dan.j.williams, linux-kernel,
	charishma1.gairuboyina, kumar.n.dwarakanath

On 1/12/22 1:12 PM, Chang S. Bae wrote:
> A couple of other things outside of these patches are still in progress:
> * Support DM-crypt/cryptsetup for Key Locker usage (Andy Lutomirski)
>   [2].
> * Understand decryption under-performance (Eric Biggers and Milan Broz)
>   [3][4].

I really like when contributors are clear about why they are posting
their series and what their expectations are.  This posting leaves me a
bit confused as to what you expect the maintainers to do.

Should the maintainers ignore this series until those in-progress things
are done?  Or, do you expect that this could be merged as-is before
those are resolved?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 00/12] x86: Support Key Locker
  2022-01-13 22:16   ` [dm-devel] " Dave Hansen
@ 2022-01-13 22:34     ` Bae, Chang Seok
  -1 siblings, 0 replies; 247+ messages in thread
From: Bae, Chang Seok @ 2022-01-13 22:34 UTC (permalink / raw)
  To: Hansen, Dave
  Cc: Linux Crypto Mailing List, dm-devel, Herbert Xu, Eric Biggers,
	Ard Biesheuvel, the arch/x86 maintainers, Lutomirski, Andy, tglx,
	bp, dave.hansen, mingo, linux-kernel, Williams, Dan J,
	Gairuboyina, Charishma1, Dwarakanath, Kumar N, Shankar, Ravi V

On Jan 13, 2022, at 14:16, Hansen, Dave <dave.hansen@intel.com> wrote:
> On 1/12/22 1:12 PM, Chang S. Bae wrote:
>> A couple of other things outside of these patches are still in progress:
>> * Support DM-crypt/cryptsetup for Key Locker usage (Andy Lutomirski)
>>  [2].
>> * Understand decryption under-performance (Eric Biggers and Milan Broz)
>>  [3][4].
> 
> I really like when contributors are clear about why they are posting
> their series and what their expectations are.  This posting leaves me a
> bit confused as to what you expect the maintainers to do.
> 
> Should the maintainers ignore this series until those in-progress things
> are done?  Or, do you expect that this could be merged as-is before
> those are resolved?

Ah, right. Yeah, this is not super clear about that.

I think it makes sense to clarify those two points -- performance implication
and user interaction in the usage case, before considering this feature
support in the mainline.

But I wanted to address feedback on the patches with this posting. 

Hopefully, this clarifies the status of this series.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 00/12] x86: Support Key Locker
@ 2022-01-13 22:34     ` Bae, Chang Seok
  0 siblings, 0 replies; 247+ messages in thread
From: Bae, Chang Seok @ 2022-01-13 22:34 UTC (permalink / raw)
  To: Hansen, Dave
  Cc: Shankar, Ravi V, dave.hansen, Herbert Xu,
	the arch/x86 maintainers, Williams, Dan J, linux-kernel, mingo,
	Eric Biggers, dm-devel, Linux Crypto Mailing List, Lutomirski,
	Andy, tglx, bp, Ard Biesheuvel, Gairuboyina, Charishma1,
	Dwarakanath,  Kumar N

On Jan 13, 2022, at 14:16, Hansen, Dave <dave.hansen@intel.com> wrote:
> On 1/12/22 1:12 PM, Chang S. Bae wrote:
>> A couple of other things outside of these patches are still in progress:
>> * Support DM-crypt/cryptsetup for Key Locker usage (Andy Lutomirski)
>>  [2].
>> * Understand decryption under-performance (Eric Biggers and Milan Broz)
>>  [3][4].
> 
> I really like when contributors are clear about why they are posting
> their series and what their expectations are.  This posting leaves me a
> bit confused as to what you expect the maintainers to do.
> 
> Should the maintainers ignore this series until those in-progress things
> are done?  Or, do you expect that this could be merged as-is before
> those are resolved?

Ah, right. Yeah, this is not super clear about that.

I think it makes sense to clarify those two points -- performance implication
and user interaction in the usage case, before considering this feature
support in the mainline.

But I wanted to address feedback on the patches with this posting. 

Hopefully, this clarifies the status of this series.

Thanks,
Chang


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v5-fix 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
  2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
@ 2022-01-29 17:31     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-29 17:31 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, marvin.hsu, ravi.v.shankar, chang.seok.bae,
	linux-pm

When the system enters the ACPI S3 or S4 sleep states, the internal
wrapping key is discarded.

The primary use case for the feature is bare metal dm-crypt. The key needs
to be restored properly on wakeup, as dm-crypt does not prompt for the key
on resume from suspend. Even the prompt it does perform for unlocking
the volume where the hibernation image is stored, it still expects to reuse
the key handles within the hibernation image once it is loaded. So it is
motivated to meet dm-crypt's expectation that the key handles in the
suspend-image remain valid after resume from an S-state.

Key Locker provides a mechanism to back up the internal wrapping key in
non-volatile storage. The kernel requests a backup right after the key is
loaded at boot time. It is copied back to each CPU upon wakeup.

While the backup may be maintained in NVM across S5 and G3 "off"
states it is not architecturally guaranteed, nor is it expected by dm-crypt
which expects to prompt for the key each time the volume is started.

The entirety of Key Locker needs to be disabled if the backup mechanism is
not available unless CONFIG_SUSPEND=n, otherwise dm-crypt requires the
backup to be available.

In the event of a key restore failure the kernel proceeds with an
initialized IWKey state. This has the effect of invalidating any key
handles that might be present in a suspend-image. When this happens
dm-crypt will see I/O errors resulting from error returns from
crypto_skcipher_{en,de}crypt(). While this will disrupt operations in the
current boot, data is not at risk and access is restored at the next reboot
to create new handles relative to the current IWKey.

Manage a feature-specific flag to communicate with the crypto
implementation. This ensures to stop using the AES instructions upon the
key restore failure while not turning off the feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v5:
* Fix the valid_kl flag setting when the feature is disabled.
  (Reported by Marvin Hsu <marvin.hsu@intel.com>)

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 128 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 820ac29c06d9..c1d27fb5a1c3 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 #define setup_keylocker(c) do { } while (0)
 #define destroy_keylocker_data() do { } while (0)
+#define restore_keylocker() do { } while (0)
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 87d775a65716..981c1f5517e7 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,11 +11,26 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with IWKey load. When the key restore fails, it is
+ * reset. This restore state is exported to the crypto library, then AES-KL
+ * will not be used there. So, the feature is soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
@@ -24,7 +39,13 @@ static void __init generate_keylocker_data(void)
 
 void __init destroy_keylocker_data(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) ||
+	    cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		return;
+
 	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +55,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the internal wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -49,6 +91,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -62,10 +105,49 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The internal wrapping key in CPU state is volatile in
+		 * S3/4 states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
-	}
+		load_keylocker();
+
+		/* Backup an internal wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+	} else {
+		int rc;
 
-	load_keylocker();
+		/*
+		 * Load the internal wrapping key directly when available
+		 * in memory, which is only possible at boot-time.
+		 *
+		 * NB: When system wakes up, this path also recovers the
+		 * internal wrapping key.
+		 */
+		if (!kl_setup.initialized) {
+			load_keylocker();
+		} else if (valid_kl) {
+			rc = copy_keylocker();
+			/*
+			 * The boot CPU was successful but the key copy
+			 * fails here. Then, the subsequent feature use
+			 * will have inconsistent keys and failures. So,
+			 * invalidate the feature via the flag.
+			 */
+			if (rc) {
+				valid_kl = false;
+				pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+			}
+		}
+	}
 
 	pr_info_once("x86/keylocker: Enabled.\n");
 	return;
@@ -77,3 +159,45 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the internal wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the setup
+ * function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that indicates
+	 * an invalid backup if bit 0 is set and a read (or write) error if
+	 * bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature via
+	 * the flag to avoid any subsequent use. But keep the feature with
+	 * zero IWKeys instead of disabling it. The current users will see
+	 * key handle integrity failure but that's because of the internal
+	 * key change.
+	 */
+	pr_err("x86/keylocker: Failed to restore internal wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 9f2b251e83c5..1a290f529c73 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -25,6 +25,7 @@
 #include <asm/cpu.h>
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -262,6 +263,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	mtrr_bp_restore();
 	perf_restore_debug_store();
 	msr_restore_context(ctxt);
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v5-fix 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
@ 2022-01-29 17:31     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-01-29 17:31 UTC (permalink / raw)
  To: linux-crypto, dm-devel, herbert, ebiggers, ardb, x86, luto, tglx,
	bp, dave.hansen, mingo
  Cc: ravi.v.shankar, marvin.hsu, linux-pm, chang.seok.bae,
	linux-kernel, kumar.n.dwarakanath, dan.j.williams,
	charishma1.gairuboyina

When the system enters the ACPI S3 or S4 sleep states, the internal
wrapping key is discarded.

The primary use case for the feature is bare metal dm-crypt. The key needs
to be restored properly on wakeup, as dm-crypt does not prompt for the key
on resume from suspend. Even the prompt it does perform for unlocking
the volume where the hibernation image is stored, it still expects to reuse
the key handles within the hibernation image once it is loaded. So it is
motivated to meet dm-crypt's expectation that the key handles in the
suspend-image remain valid after resume from an S-state.

Key Locker provides a mechanism to back up the internal wrapping key in
non-volatile storage. The kernel requests a backup right after the key is
loaded at boot time. It is copied back to each CPU upon wakeup.

While the backup may be maintained in NVM across S5 and G3 "off"
states it is not architecturally guaranteed, nor is it expected by dm-crypt
which expects to prompt for the key each time the volume is started.

The entirety of Key Locker needs to be disabled if the backup mechanism is
not available unless CONFIG_SUSPEND=n, otherwise dm-crypt requires the
backup to be available.

In the event of a key restore failure the kernel proceeds with an
initialized IWKey state. This has the effect of invalidating any key
handles that might be present in a suspend-image. When this happens
dm-crypt will see I/O errors resulting from error returns from
crypto_skcipher_{en,de}crypt(). While this will disrupt operations in the
current boot, data is not at risk and access is restored at the next reboot
to create new handles relative to the current IWKey.

Manage a feature-specific flag to communicate with the crypto
implementation. This ensures to stop using the AES instructions upon the
key restore failure while not turning off the feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v5:
* Fix the valid_kl flag setting when the feature is disabled.
  (Reported by Marvin Hsu <marvin.hsu@intel.com>)

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 128 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 820ac29c06d9..c1d27fb5a1c3 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 #define setup_keylocker(c) do { } while (0)
 #define destroy_keylocker_data() do { } while (0)
+#define restore_keylocker() do { } while (0)
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 87d775a65716..981c1f5517e7 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,11 +11,26 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with IWKey load. When the key restore fails, it is
+ * reset. This restore state is exported to the crypto library, then AES-KL
+ * will not be used there. So, the feature is soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
@@ -24,7 +39,13 @@ static void __init generate_keylocker_data(void)
 
 void __init destroy_keylocker_data(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) ||
+	    cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+		return;
+
 	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +55,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the internal wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -49,6 +91,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -62,10 +105,49 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The internal wrapping key in CPU state is volatile in
+		 * S3/4 states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
-	}
+		load_keylocker();
+
+		/* Backup an internal wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+	} else {
+		int rc;
 
-	load_keylocker();
+		/*
+		 * Load the internal wrapping key directly when available
+		 * in memory, which is only possible at boot-time.
+		 *
+		 * NB: When system wakes up, this path also recovers the
+		 * internal wrapping key.
+		 */
+		if (!kl_setup.initialized) {
+			load_keylocker();
+		} else if (valid_kl) {
+			rc = copy_keylocker();
+			/*
+			 * The boot CPU was successful but the key copy
+			 * fails here. Then, the subsequent feature use
+			 * will have inconsistent keys and failures. So,
+			 * invalidate the feature via the flag.
+			 */
+			if (rc) {
+				valid_kl = false;
+				pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+			}
+		}
+	}
 
 	pr_info_once("x86/keylocker: Enabled.\n");
 	return;
@@ -77,3 +159,45 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the internal wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the setup
+ * function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that indicates
+	 * an invalid backup if bit 0 is set and a read (or write) error if
+	 * bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature via
+	 * the flag to avoid any subsequent use. But keep the feature with
+	 * zero IWKeys instead of disabling it. The current users will see
+	 * key handle integrity failure but that's because of the internal
+	 * key change.
+	 */
+	pr_err("x86/keylocker: Failed to restore internal wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 9f2b251e83c5..1a290f529c73 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -25,6 +25,7 @@
 #include <asm/cpu.h>
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -262,6 +263,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	mtrr_bp_restore();
 	perf_restore_debug_store();
 	msr_restore_context(ctxt);
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
@ 2022-08-23 15:49     ` Evan Green
  -1 siblings, 0 replies; 247+ messages in thread
From: Evan Green @ 2022-08-23 15:49 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-crypto, dm-devel, herbert, Eric Biggers, Ard Biesheuvel,
	x86, luto, Thomas Gleixner, bp, dave.hansen, mingo, LKML,
	Dan Williams, charishma1.gairuboyina, kumar.n.dwarakanath,
	ravi.v.shankar

On Wed, Jan 12, 2022 at 1:21 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> The Internal Wrapping Key (IWKey) is an entity of Key Locker to encode a
> clear text key into a key handle. This key is a pivot in protecting user
> keys. So the value has to be randomized before being loaded in the
> software-invisible CPU state.
>
> IWKey needs to be established before the first user. Given that the only
> proposed Linux use case for Key Locker is dm-crypt, the feature could be
> lazily enabled when the first dm-crypt user arrives, but there is no
> precedent for late enabling of CPU features and it adds maintenance burden
> without demonstrative benefit outside of minimizing the visibility of
> Key Locker to userspace.
>
> The kernel generates random bytes and load them at boot time. These bytes
> are flushed out immediately.
>
> Setting the CR4.KL bit does not always enable the feature so ensure the
> dynamic CPU bit (CPUID.AESKLE) is set before loading the key.
>
> Given that the Linux Key Locker support is only intended for bare metal
> dm-crypt consumption, and that switching IWKey per VM is untenable,
> explicitly skip Key Locker setup in the X86_FEATURE_HYPERVISOR case.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes from RFC v2:
> * Make bare metal only.
> * Clean up the code (e.g. dynamically allocate the key cache).
>   (Dan Williams)
> * Massage the changelog.
> * Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.
>
> Note, Dan wonders that given that the only proposed Linux use case for
> Key Locker is dm-crypt, the feature could be lazily enabled when the
> first dm-crypt user arrives, but as Dave notes there is no precedent
> for late enabling of CPU features and it adds maintenance burden
> without demonstrative benefit outside of minimizing the visibility of
> Key Locker to userspace.
> ---
>  arch/x86/include/asm/keylocker.h |  9 ++++
>  arch/x86/kernel/Makefile         |  1 +
>  arch/x86/kernel/cpu/common.c     |  5 +-
>  arch/x86/kernel/keylocker.c      | 79 ++++++++++++++++++++++++++++++++
>  arch/x86/kernel/smpboot.c        |  2 +
>  5 files changed, 95 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kernel/keylocker.c
>
> diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
> index e85dfb6c1524..820ac29c06d9 100644
> --- a/arch/x86/include/asm/keylocker.h
> +++ b/arch/x86/include/asm/keylocker.h
> @@ -5,6 +5,7 @@
>
>  #ifndef __ASSEMBLY__
>
> +#include <asm/processor.h>
>  #include <linux/bits.h>
>  #include <asm/fpu/types.h>
>
> @@ -28,5 +29,13 @@ struct iwkey {
>  #define KEYLOCKER_CPUID_EBX_WIDE       BIT(2)
>  #define KEYLOCKER_CPUID_EBX_BACKUP     BIT(4)
>
> +#ifdef CONFIG_X86_KEYLOCKER
> +void setup_keylocker(struct cpuinfo_x86 *c);
> +void destroy_keylocker_data(void);
> +#else
> +#define setup_keylocker(c) do { } while (0)
> +#define destroy_keylocker_data() do { } while (0)
> +#endif
> +
>  #endif /*__ASSEMBLY__ */
>  #endif /* _ASM_KEYLOCKER_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 2ff3e600f426..e15efa238497 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -144,6 +144,7 @@ obj-$(CONFIG_PERF_EVENTS)           += perf_regs.o
>  obj-$(CONFIG_TRACING)                  += tracepoint.o
>  obj-$(CONFIG_SCHED_MC_PRIO)            += itmt.o
>  obj-$(CONFIG_X86_UMIP)                 += umip.o
> +obj-$(CONFIG_X86_KEYLOCKER)            += keylocker.o
>
>  obj-$(CONFIG_UNWINDER_ORC)             += unwind_orc.o
>  obj-$(CONFIG_UNWINDER_FRAME_POINTER)   += unwind_frame.o
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 0083464de5e3..23b4aa437c1e 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -57,6 +57,8 @@
>  #include <asm/microcode_intel.h>
>  #include <asm/intel-family.h>
>  #include <asm/cpu_device_id.h>
> +#include <asm/keylocker.h>
> +
>  #include <asm/uv/uv.h>
>  #include <asm/sigframe.h>
>
> @@ -1595,10 +1597,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>         /* Disable the PN if appropriate */
>         squash_the_stupid_serial_number(c);
>
> -       /* Set up SMEP/SMAP/UMIP */
> +       /* Setup various Intel-specific CPU security features */
>         setup_smep(c);
>         setup_smap(c);
>         setup_umip(c);
> +       setup_keylocker(c);
>
>         /* Enable FSGSBASE instructions if available. */
>         if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
> new file mode 100644
> index 000000000000..87d775a65716
> --- /dev/null
> +++ b/arch/x86/kernel/keylocker.c
> @@ -0,0 +1,79 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Setup Key Locker feature and support internal wrapping key
> + * management.
> + */
> +
> +#include <linux/random.h>
> +#include <linux/poison.h>
> +
> +#include <asm/fpu/api.h>
> +#include <asm/keylocker.h>
> +#include <asm/tlbflush.h>
> +
> +static __initdata struct keylocker_setup_data {
> +       struct iwkey key;
> +} kl_setup;
> +
> +static void __init generate_keylocker_data(void)
> +{
> +       get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
> +       get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
> +}
> +
> +void __init destroy_keylocker_data(void)
> +{
> +       memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
> +}
> +
> +static void __init load_keylocker(void)

I am late to this party by 6 months, but:
load_keylocker() cannot be __init, as it gets called during SMP core onlining.

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-08-23 15:49     ` Evan Green
  0 siblings, 0 replies; 247+ messages in thread
From: Evan Green @ 2022-08-23 15:49 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: ravi.v.shankar, dave.hansen, herbert, x86, Dan Williams, LKML,
	mingo, Eric Biggers, dm-devel, linux-crypto, luto,
	Thomas Gleixner, bp, Ard Biesheuvel, charishma1.gairuboyina,
	kumar.n.dwarakanath

On Wed, Jan 12, 2022 at 1:21 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> The Internal Wrapping Key (IWKey) is an entity of Key Locker to encode a
> clear text key into a key handle. This key is a pivot in protecting user
> keys. So the value has to be randomized before being loaded in the
> software-invisible CPU state.
>
> IWKey needs to be established before the first user. Given that the only
> proposed Linux use case for Key Locker is dm-crypt, the feature could be
> lazily enabled when the first dm-crypt user arrives, but there is no
> precedent for late enabling of CPU features and it adds maintenance burden
> without demonstrative benefit outside of minimizing the visibility of
> Key Locker to userspace.
>
> The kernel generates random bytes and load them at boot time. These bytes
> are flushed out immediately.
>
> Setting the CR4.KL bit does not always enable the feature so ensure the
> dynamic CPU bit (CPUID.AESKLE) is set before loading the key.
>
> Given that the Linux Key Locker support is only intended for bare metal
> dm-crypt consumption, and that switching IWKey per VM is untenable,
> explicitly skip Key Locker setup in the X86_FEATURE_HYPERVISOR case.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes from RFC v2:
> * Make bare metal only.
> * Clean up the code (e.g. dynamically allocate the key cache).
>   (Dan Williams)
> * Massage the changelog.
> * Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.
>
> Note, Dan wonders that given that the only proposed Linux use case for
> Key Locker is dm-crypt, the feature could be lazily enabled when the
> first dm-crypt user arrives, but as Dave notes there is no precedent
> for late enabling of CPU features and it adds maintenance burden
> without demonstrative benefit outside of minimizing the visibility of
> Key Locker to userspace.
> ---
>  arch/x86/include/asm/keylocker.h |  9 ++++
>  arch/x86/kernel/Makefile         |  1 +
>  arch/x86/kernel/cpu/common.c     |  5 +-
>  arch/x86/kernel/keylocker.c      | 79 ++++++++++++++++++++++++++++++++
>  arch/x86/kernel/smpboot.c        |  2 +
>  5 files changed, 95 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kernel/keylocker.c
>
> diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
> index e85dfb6c1524..820ac29c06d9 100644
> --- a/arch/x86/include/asm/keylocker.h
> +++ b/arch/x86/include/asm/keylocker.h
> @@ -5,6 +5,7 @@
>
>  #ifndef __ASSEMBLY__
>
> +#include <asm/processor.h>
>  #include <linux/bits.h>
>  #include <asm/fpu/types.h>
>
> @@ -28,5 +29,13 @@ struct iwkey {
>  #define KEYLOCKER_CPUID_EBX_WIDE       BIT(2)
>  #define KEYLOCKER_CPUID_EBX_BACKUP     BIT(4)
>
> +#ifdef CONFIG_X86_KEYLOCKER
> +void setup_keylocker(struct cpuinfo_x86 *c);
> +void destroy_keylocker_data(void);
> +#else
> +#define setup_keylocker(c) do { } while (0)
> +#define destroy_keylocker_data() do { } while (0)
> +#endif
> +
>  #endif /*__ASSEMBLY__ */
>  #endif /* _ASM_KEYLOCKER_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 2ff3e600f426..e15efa238497 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -144,6 +144,7 @@ obj-$(CONFIG_PERF_EVENTS)           += perf_regs.o
>  obj-$(CONFIG_TRACING)                  += tracepoint.o
>  obj-$(CONFIG_SCHED_MC_PRIO)            += itmt.o
>  obj-$(CONFIG_X86_UMIP)                 += umip.o
> +obj-$(CONFIG_X86_KEYLOCKER)            += keylocker.o
>
>  obj-$(CONFIG_UNWINDER_ORC)             += unwind_orc.o
>  obj-$(CONFIG_UNWINDER_FRAME_POINTER)   += unwind_frame.o
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 0083464de5e3..23b4aa437c1e 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -57,6 +57,8 @@
>  #include <asm/microcode_intel.h>
>  #include <asm/intel-family.h>
>  #include <asm/cpu_device_id.h>
> +#include <asm/keylocker.h>
> +
>  #include <asm/uv/uv.h>
>  #include <asm/sigframe.h>
>
> @@ -1595,10 +1597,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>         /* Disable the PN if appropriate */
>         squash_the_stupid_serial_number(c);
>
> -       /* Set up SMEP/SMAP/UMIP */
> +       /* Setup various Intel-specific CPU security features */
>         setup_smep(c);
>         setup_smap(c);
>         setup_umip(c);
> +       setup_keylocker(c);
>
>         /* Enable FSGSBASE instructions if available. */
>         if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
> new file mode 100644
> index 000000000000..87d775a65716
> --- /dev/null
> +++ b/arch/x86/kernel/keylocker.c
> @@ -0,0 +1,79 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +/*
> + * Setup Key Locker feature and support internal wrapping key
> + * management.
> + */
> +
> +#include <linux/random.h>
> +#include <linux/poison.h>
> +
> +#include <asm/fpu/api.h>
> +#include <asm/keylocker.h>
> +#include <asm/tlbflush.h>
> +
> +static __initdata struct keylocker_setup_data {
> +       struct iwkey key;
> +} kl_setup;
> +
> +static void __init generate_keylocker_data(void)
> +{
> +       get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
> +       get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
> +}
> +
> +void __init destroy_keylocker_data(void)
> +{
> +       memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
> +}
> +
> +static void __init load_keylocker(void)

I am late to this party by 6 months, but:
load_keylocker() cannot be __init, as it gets called during SMP core onlining.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-08-23 15:49     ` [dm-devel] " Evan Green
@ 2022-08-24 22:20       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-08-24 22:20 UTC (permalink / raw)
  To: Evan Green
  Cc: linux-crypto, dm-devel, herbert, Eric Biggers, Ard Biesheuvel,
	x86, luto, Thomas Gleixner, bp, dave.hansen, mingo, LKML,
	Dan Williams, charishma1.gairuboyina, kumar.n.dwarakanath,
	ravi.v.shankar

On 8/23/2022 8:49 AM, Evan Green wrote:
> On Wed, Jan 12, 2022 at 1:21 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>>
<snip>
>> +
>> +static void __init load_keylocker(void)
> 
> I am late to this party by 6 months, but:
> load_keylocker() cannot be __init, as it gets called during SMP core onlining.

Yeah, it looks like the case with this patch only.

But the next patch [1] limits the call during boot time only:

	if (c == &boot_cpu_data) {
		...
		load_keylocker();
		...
	} else {
		...
		if (!kl_setup.initialized) {
			load_keylocker();
		} else if (valid_kl) {
			rc = copy_keylocker();
			...
		}
	}

kl_setup.initialized is set by native_smp_cpus_done() -> 
destroy_keylocker_data() when CPUs are booted. Then load_keylocker() is 
not called because the root key (aka IWKey) is no longer available in 
memory.

Now this 'valid_kl' flag should be always on unless the root key backup 
is corrupted. Then copy_keylocker() loads the root key from the backup 
in the platform state.

So I think the onlining CPU won't call it.

Maybe this bit can be much clarified in a separate (new) patch, instead 
of being part of another like [1].

Thanks,
Chang

[1]: 
https://lore.kernel.org/lkml/20220112211258.21115-9-chang.seok.bae@intel.com/

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-08-24 22:20       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-08-24 22:20 UTC (permalink / raw)
  To: Evan Green
  Cc: ravi.v.shankar, dave.hansen, herbert, x86, Dan Williams, LKML,
	mingo, Eric Biggers, dm-devel, linux-crypto, luto,
	Thomas Gleixner, bp, Ard Biesheuvel, charishma1.gairuboyina,
	kumar.n.dwarakanath

On 8/23/2022 8:49 AM, Evan Green wrote:
> On Wed, Jan 12, 2022 at 1:21 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>>
<snip>
>> +
>> +static void __init load_keylocker(void)
> 
> I am late to this party by 6 months, but:
> load_keylocker() cannot be __init, as it gets called during SMP core onlining.

Yeah, it looks like the case with this patch only.

But the next patch [1] limits the call during boot time only:

	if (c == &boot_cpu_data) {
		...
		load_keylocker();
		...
	} else {
		...
		if (!kl_setup.initialized) {
			load_keylocker();
		} else if (valid_kl) {
			rc = copy_keylocker();
			...
		}
	}

kl_setup.initialized is set by native_smp_cpus_done() -> 
destroy_keylocker_data() when CPUs are booted. Then load_keylocker() is 
not called because the root key (aka IWKey) is no longer available in 
memory.

Now this 'valid_kl' flag should be always on unless the root key backup 
is corrupted. Then copy_keylocker() loads the root key from the backup 
in the platform state.

So I think the onlining CPU won't call it.

Maybe this bit can be much clarified in a separate (new) patch, instead 
of being part of another like [1].

Thanks,
Chang

[1]: 
https://lore.kernel.org/lkml/20220112211258.21115-9-chang.seok.bae@intel.com/

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-08-24 22:20       ` [dm-devel] " Chang S. Bae
@ 2022-08-24 22:52         ` Evan Green
  -1 siblings, 0 replies; 247+ messages in thread
From: Evan Green @ 2022-08-24 22:52 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-crypto, dm-devel, herbert, Eric Biggers, Ard Biesheuvel,
	x86, luto, Thomas Gleixner, bp, dave.hansen, mingo, LKML,
	Dan Williams, charishma1.gairuboyina, kumar.n.dwarakanath,
	ravi.v.shankar

On Wed, Aug 24, 2022 at 3:21 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 8/23/2022 8:49 AM, Evan Green wrote:
> > On Wed, Jan 12, 2022 at 1:21 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >>
> <snip>
> >> +
> >> +static void __init load_keylocker(void)
> >
> > I am late to this party by 6 months, but:
> > load_keylocker() cannot be __init, as it gets called during SMP core onlining.
>
> Yeah, it looks like the case with this patch only.
>
> But the next patch [1] limits the call during boot time only:
>
>         if (c == &boot_cpu_data) {
>                 ...
>                 load_keylocker();
>                 ...
>         } else {
>                 ...
>                 if (!kl_setup.initialized) {
>                         load_keylocker();
>                 } else if (valid_kl) {
>                         rc = copy_keylocker();
>                         ...
>                 }
>         }
>
> kl_setup.initialized is set by native_smp_cpus_done() ->
> destroy_keylocker_data() when CPUs are booted. Then load_keylocker() is
> not called because the root key (aka IWKey) is no longer available in
> memory.
>
> Now this 'valid_kl' flag should be always on unless the root key backup
> is corrupted. Then copy_keylocker() loads the root key from the backup
> in the platform state.
>
> So I think the onlining CPU won't call it.
>
> Maybe this bit can be much clarified in a separate (new) patch, instead
> of being part of another like [1].

Whatever we ended up landing in the ChromeOS tree (which I think was
v4 of this series) actively hit this bug in hibernation, which is how
I found it. I couldn't get a full backtrace because the backtracing
code tripped over itself as well for some reason. If the next patch in
this series is different from what we landed in ChromeOS, then maybe
your description is correct, but I haven't dug in to understand the
delta.

-Evan

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-08-24 22:52         ` Evan Green
  0 siblings, 0 replies; 247+ messages in thread
From: Evan Green @ 2022-08-24 22:52 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: ravi.v.shankar, dave.hansen, herbert, x86, Dan Williams, LKML,
	mingo, Eric Biggers, dm-devel, linux-crypto, luto,
	Thomas Gleixner, bp, Ard Biesheuvel, charishma1.gairuboyina,
	kumar.n.dwarakanath

On Wed, Aug 24, 2022 at 3:21 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 8/23/2022 8:49 AM, Evan Green wrote:
> > On Wed, Jan 12, 2022 at 1:21 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >>
> <snip>
> >> +
> >> +static void __init load_keylocker(void)
> >
> > I am late to this party by 6 months, but:
> > load_keylocker() cannot be __init, as it gets called during SMP core onlining.
>
> Yeah, it looks like the case with this patch only.
>
> But the next patch [1] limits the call during boot time only:
>
>         if (c == &boot_cpu_data) {
>                 ...
>                 load_keylocker();
>                 ...
>         } else {
>                 ...
>                 if (!kl_setup.initialized) {
>                         load_keylocker();
>                 } else if (valid_kl) {
>                         rc = copy_keylocker();
>                         ...
>                 }
>         }
>
> kl_setup.initialized is set by native_smp_cpus_done() ->
> destroy_keylocker_data() when CPUs are booted. Then load_keylocker() is
> not called because the root key (aka IWKey) is no longer available in
> memory.
>
> Now this 'valid_kl' flag should be always on unless the root key backup
> is corrupted. Then copy_keylocker() loads the root key from the backup
> in the platform state.
>
> So I think the onlining CPU won't call it.
>
> Maybe this bit can be much clarified in a separate (new) patch, instead
> of being part of another like [1].

Whatever we ended up landing in the ChromeOS tree (which I think was
v4 of this series) actively hit this bug in hibernation, which is how
I found it. I couldn't get a full backtrace because the backtracing
code tripped over itself as well for some reason. If the next patch in
this series is different from what we landed in ChromeOS, then maybe
your description is correct, but I haven't dug in to understand the
delta.

-Evan

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-08-24 22:52         ` [dm-devel] " Evan Green
@ 2022-08-25  1:06           ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-08-25  1:06 UTC (permalink / raw)
  To: Evan Green
  Cc: linux-crypto, dm-devel, herbert, Eric Biggers, Ard Biesheuvel,
	x86, luto, Thomas Gleixner, bp, dave.hansen, mingo, LKML,
	Dan Williams, charishma1.gairuboyina, kumar.n.dwarakanath,
	ravi.v.shankar

On 8/24/2022 3:52 PM, Evan Green wrote:
> 
> Whatever we ended up landing in the ChromeOS tree (which I think was
> v4 of this series) actively hit this bug in hibernation, which is how
> I found it. I couldn't get a full backtrace because the backtracing
> code tripped over itself as well for some reason. If the next patch in
> this series is different from what we landed in ChromeOS, then maybe
> your description is correct, but I haven't dug in to understand the
> delta.

So the change from v4 is simply dropping CBC mode. Marvin who reported 
another issue told me that he pushed the fix to some Chrome repository. 
But I don't know that's the same repo that you mentioned. Are you able 
to locate that tree if possible?

Also, it would be nice to have more detail about that hibernation bug.

Thanks,
Chang


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-08-25  1:06           ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-08-25  1:06 UTC (permalink / raw)
  To: Evan Green
  Cc: ravi.v.shankar, dave.hansen, herbert, x86, Dan Williams, LKML,
	mingo, Eric Biggers, dm-devel, linux-crypto, luto,
	Thomas Gleixner, bp, Ard Biesheuvel, charishma1.gairuboyina,
	kumar.n.dwarakanath

On 8/24/2022 3:52 PM, Evan Green wrote:
> 
> Whatever we ended up landing in the ChromeOS tree (which I think was
> v4 of this series) actively hit this bug in hibernation, which is how
> I found it. I couldn't get a full backtrace because the backtracing
> code tripped over itself as well for some reason. If the next patch in
> this series is different from what we landed in ChromeOS, then maybe
> your description is correct, but I haven't dug in to understand the
> delta.

So the change from v4 is simply dropping CBC mode. Marvin who reported 
another issue told me that he pushed the fix to some Chrome repository. 
But I don't know that's the same repo that you mentioned. Are you able 
to locate that tree if possible?

Also, it would be nice to have more detail about that hibernation bug.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-08-25  1:06           ` [dm-devel] " Chang S. Bae
@ 2022-08-25 15:31             ` Evan Green
  -1 siblings, 0 replies; 247+ messages in thread
From: Evan Green @ 2022-08-25 15:31 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-crypto, dm-devel, herbert, Eric Biggers, Ard Biesheuvel,
	x86, luto, Thomas Gleixner, bp, dave.hansen, mingo, LKML,
	Dan Williams, charishma1.gairuboyina, kumar.n.dwarakanath,
	ravi.v.shankar

On Wed, Aug 24, 2022 at 6:06 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 8/24/2022 3:52 PM, Evan Green wrote:
> >
> > Whatever we ended up landing in the ChromeOS tree (which I think was
> > v4 of this series) actively hit this bug in hibernation, which is how
> > I found it. I couldn't get a full backtrace because the backtracing
> > code tripped over itself as well for some reason. If the next patch in
> > this series is different from what we landed in ChromeOS, then maybe
> > your description is correct, but I haven't dug in to understand the
> > delta.
>
> So the change from v4 is simply dropping CBC mode. Marvin who reported
> another issue told me that he pushed the fix to some Chrome repository.
> But I don't know that's the same repo that you mentioned. Are you able
> to locate that tree if possible?

I see. The only ChromeOS tree I'm aware of where keylocker has landed
is our 5.10 tree. This is the change where it landed:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3373776/12

>
> Also, it would be nice to have more detail about that hibernation bug.

Here's the log I've got that pointed me down this path:
https://pastebin.com/VvR1EHvE

Relevant bit pasted below:

<6>[43486.263035] Enabling non-boot CPUs ...
<6>[43486.263081] x86: Booting SMP configuration:
<6>[43486.263082] smpboot: Booting Node 0 Processor 1 APIC 0x1
<2>[43486.264010] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
<1>[43486.264019] BUG: unable to handle page fault for address: ffffffff94b483a6
<1>[43486.264021] #PF: supervisor instruction fetch in kernel mode
<1>[43486.264023] #PF: error_code(0x0011) - permissions violation
<6>[43486.264025] PGD 391c0e067 P4D 391c0e067 PUD 391c0f063 PMD
10006c063 PTE 8000000392148163
<4>[43486.264031] Oops: 0011 [#1] PREEMPT SMP NOPTI
<4>[43486.264035] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
5.10.136-19391-gadfe4d4b8c04 #1
b640352a7a0e5f1522aed724296ad63f90c007df
<4>[43486.264036] Hardware name: Google Primus/Primus, BIOS
Google_Primus.14505.145.0 06/23/2022
<4>[43486.264042] RIP: 0010:load_keylocker+0x0/0x7f
<4>[43486.264044] Code: 02 46 0a 0c 07 08 44 0b 24 00 00 00 10 26 00
00 44 d5 e9 ff dd 00 00 00 00 41 0e 10 86 02 43 0d 06 42 8d 03 49 8c
04 02 61 0a <0c> 07 08 48 0b 00 24 00 00 00 38 26 00 00 fc d5 e9 ff ba
00 00 00
<4>[43486.264046] RSP: 0000:ffffb1c7000afe50 EFLAGS: 00010046
<4>[43486.264048] RAX: ffffffff9483a898 RBX: ffff8d64ef855440 RCX:
0000000000310800
<4>[43486.264049] RDX: 0000000000310800 RSI: 0000000000000000 RDI:
00000000003f0ea0
<4>[43486.264051] RBP: ffffb1c7000afe88 R08: 0000000000000000 R09:
0000000000003000
<4>[43486.264052] R10: 0000000000000500 R11: ffffffff92c6c775 R12:
ffff8d64ef8554c0
<4>[43486.264053] R13: 0000000000000000 R14: 0000000000000082 R15:
ffff8d64ef855460
<4>[43486.264055] FS: 0000000000000000(0000) GS:ffff8d64ef840000(0000)
knlGS:0000000000000000
<4>[43486.264057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[43486.264058] CR2: ffffffff94b483a6 CR3: 0000000391c0c001 CR4:
00000000003f0ea0
<4>[43486.264063] invalid opcode: 0000 [#2] PREEMPT SMP NOPTI
<4>[43486.264065] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
5.10.136-19391-gadfe4d4b8c04 #1
b640352a7a0e5f1522aed724296ad63f90c007df
<4>[43486.264066] Hardware name: Google Primus/Primus, BIOS
Google_Primus.14505.145.0 06/23/2022
<4>[43486.264069] RIP: 0010:__show_regs+0x2ed/0x338
<4>[43486.264071] Code: 81 fc 00 04 00 00 75 44 48 f7 05 ca 83 90 01
10 00 00 00 0f 84 fa fd ff ff 31 d2 48 f7 05 b7 83 90 01 10 00 00 00
74 07 31 c9 <0f> 01 ee 89 c2 48 c7 c7 90 38 29 94 4c 89 f6 48 83 c4 28
5b 41 5c
<4>[43486.264072] RSP: 0000:ffffb1c7000afc90 EFLAGS: 00010046
<4>[43486.264074] RAX: 00000000ffff0ff0 RBX: 0000000000000000 RCX:
0000000000000000
<4>[43486.264075] RDX: 0000000000000000 RSI: 0000000000000004 RDI:
ffffffff94cf27f4
<4>[43486.264076] RBP: ffffb1c7000afce0 R08: 0000000000000000 R09:
00000000ffffdfff
<4>[43486.264078] R10: ffffffff94658600 R11: 3fffffffffffffff R12:
0000000000000400
<4>[43486.264079] R13: ffff8d64ef840000 R14: ffffffff9435d0a9 R15:
00000000ffff0ff0
<4>[43486.264080] FS: 0000000000000000(0000) GS:ffff8d64ef840000(0000)
knlGS:0000000000000000
<4>[43486.264082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[43486.264083] CR2: ffffffff94b483a6 CR3: 0000000391c0c001 CR4:
00000000003f0ea0
<4>[43486.264085] invalid opcode: 0000 [#3] PREEMPT SMP NOPTI
<4>[43486.264086] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
5.10.136-19391-gadfe4d4b8c04 #1
b640352a7a0e5f1522aed724296ad63f90c007df
<4>[43486.264088] Hardware name: Google Primus/Primus, BIOS
Google_Primus.14505.145.0 06/23/2022
<4>[43486.264089] RIP: 0010:__show_regs+0x2ed/0x338

I landed this change, though I'm still working on verifying the issue
goes away with this fix:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3851401

I don't have direct access to this machine, but I wonder if a simple
cpu hotplug might also exercise this path.
-Evan

>
> Thanks,
> Chang
>

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-08-25 15:31             ` Evan Green
  0 siblings, 0 replies; 247+ messages in thread
From: Evan Green @ 2022-08-25 15:31 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: ravi.v.shankar, dave.hansen, herbert, x86, Dan Williams, LKML,
	mingo, Eric Biggers, dm-devel, linux-crypto, luto,
	Thomas Gleixner, bp, Ard Biesheuvel, charishma1.gairuboyina,
	kumar.n.dwarakanath

On Wed, Aug 24, 2022 at 6:06 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 8/24/2022 3:52 PM, Evan Green wrote:
> >
> > Whatever we ended up landing in the ChromeOS tree (which I think was
> > v4 of this series) actively hit this bug in hibernation, which is how
> > I found it. I couldn't get a full backtrace because the backtracing
> > code tripped over itself as well for some reason. If the next patch in
> > this series is different from what we landed in ChromeOS, then maybe
> > your description is correct, but I haven't dug in to understand the
> > delta.
>
> So the change from v4 is simply dropping CBC mode. Marvin who reported
> another issue told me that he pushed the fix to some Chrome repository.
> But I don't know that's the same repo that you mentioned. Are you able
> to locate that tree if possible?

I see. The only ChromeOS tree I'm aware of where keylocker has landed
is our 5.10 tree. This is the change where it landed:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3373776/12

>
> Also, it would be nice to have more detail about that hibernation bug.

Here's the log I've got that pointed me down this path:
https://pastebin.com/VvR1EHvE

Relevant bit pasted below:

<6>[43486.263035] Enabling non-boot CPUs ...
<6>[43486.263081] x86: Booting SMP configuration:
<6>[43486.263082] smpboot: Booting Node 0 Processor 1 APIC 0x1
<2>[43486.264010] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
<1>[43486.264019] BUG: unable to handle page fault for address: ffffffff94b483a6
<1>[43486.264021] #PF: supervisor instruction fetch in kernel mode
<1>[43486.264023] #PF: error_code(0x0011) - permissions violation
<6>[43486.264025] PGD 391c0e067 P4D 391c0e067 PUD 391c0f063 PMD
10006c063 PTE 8000000392148163
<4>[43486.264031] Oops: 0011 [#1] PREEMPT SMP NOPTI
<4>[43486.264035] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
5.10.136-19391-gadfe4d4b8c04 #1
b640352a7a0e5f1522aed724296ad63f90c007df
<4>[43486.264036] Hardware name: Google Primus/Primus, BIOS
Google_Primus.14505.145.0 06/23/2022
<4>[43486.264042] RIP: 0010:load_keylocker+0x0/0x7f
<4>[43486.264044] Code: 02 46 0a 0c 07 08 44 0b 24 00 00 00 10 26 00
00 44 d5 e9 ff dd 00 00 00 00 41 0e 10 86 02 43 0d 06 42 8d 03 49 8c
04 02 61 0a <0c> 07 08 48 0b 00 24 00 00 00 38 26 00 00 fc d5 e9 ff ba
00 00 00
<4>[43486.264046] RSP: 0000:ffffb1c7000afe50 EFLAGS: 00010046
<4>[43486.264048] RAX: ffffffff9483a898 RBX: ffff8d64ef855440 RCX:
0000000000310800
<4>[43486.264049] RDX: 0000000000310800 RSI: 0000000000000000 RDI:
00000000003f0ea0
<4>[43486.264051] RBP: ffffb1c7000afe88 R08: 0000000000000000 R09:
0000000000003000
<4>[43486.264052] R10: 0000000000000500 R11: ffffffff92c6c775 R12:
ffff8d64ef8554c0
<4>[43486.264053] R13: 0000000000000000 R14: 0000000000000082 R15:
ffff8d64ef855460
<4>[43486.264055] FS: 0000000000000000(0000) GS:ffff8d64ef840000(0000)
knlGS:0000000000000000
<4>[43486.264057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[43486.264058] CR2: ffffffff94b483a6 CR3: 0000000391c0c001 CR4:
00000000003f0ea0
<4>[43486.264063] invalid opcode: 0000 [#2] PREEMPT SMP NOPTI
<4>[43486.264065] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
5.10.136-19391-gadfe4d4b8c04 #1
b640352a7a0e5f1522aed724296ad63f90c007df
<4>[43486.264066] Hardware name: Google Primus/Primus, BIOS
Google_Primus.14505.145.0 06/23/2022
<4>[43486.264069] RIP: 0010:__show_regs+0x2ed/0x338
<4>[43486.264071] Code: 81 fc 00 04 00 00 75 44 48 f7 05 ca 83 90 01
10 00 00 00 0f 84 fa fd ff ff 31 d2 48 f7 05 b7 83 90 01 10 00 00 00
74 07 31 c9 <0f> 01 ee 89 c2 48 c7 c7 90 38 29 94 4c 89 f6 48 83 c4 28
5b 41 5c
<4>[43486.264072] RSP: 0000:ffffb1c7000afc90 EFLAGS: 00010046
<4>[43486.264074] RAX: 00000000ffff0ff0 RBX: 0000000000000000 RCX:
0000000000000000
<4>[43486.264075] RDX: 0000000000000000 RSI: 0000000000000004 RDI:
ffffffff94cf27f4
<4>[43486.264076] RBP: ffffb1c7000afce0 R08: 0000000000000000 R09:
00000000ffffdfff
<4>[43486.264078] R10: ffffffff94658600 R11: 3fffffffffffffff R12:
0000000000000400
<4>[43486.264079] R13: ffff8d64ef840000 R14: ffffffff9435d0a9 R15:
00000000ffff0ff0
<4>[43486.264080] FS: 0000000000000000(0000) GS:ffff8d64ef840000(0000)
knlGS:0000000000000000
<4>[43486.264082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[43486.264083] CR2: ffffffff94b483a6 CR3: 0000000391c0c001 CR4:
00000000003f0ea0
<4>[43486.264085] invalid opcode: 0000 [#3] PREEMPT SMP NOPTI
<4>[43486.264086] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
5.10.136-19391-gadfe4d4b8c04 #1
b640352a7a0e5f1522aed724296ad63f90c007df
<4>[43486.264088] Hardware name: Google Primus/Primus, BIOS
Google_Primus.14505.145.0 06/23/2022
<4>[43486.264089] RIP: 0010:__show_regs+0x2ed/0x338

I landed this change, though I'm still working on verifying the issue
goes away with this fix:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3851401

I don't have direct access to this machine, but I wonder if a simple
cpu hotplug might also exercise this path.
-Evan

>
> Thanks,
> Chang
>

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-08-25 15:31             ` [dm-devel] " Evan Green
@ 2022-08-31 23:08               ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-08-31 23:08 UTC (permalink / raw)
  To: Evan Green
  Cc: linux-crypto, dm-devel, herbert, Eric Biggers, Ard Biesheuvel,
	x86, luto, Thomas Gleixner, bp, dave.hansen, mingo, LKML,
	Dan Williams, charishma1.gairuboyina, kumar.n.dwarakanath,
	ravi.v.shankar

On 8/25/2022 8:31 AM, Evan Green wrote:
> 
> Here's the log I've got that pointed me down this path:
> https://pastebin.com/VvR1EHvE

     <3>[43486.261583] x86/keylocker: The key backup access failed with 
read error.
     <3>[43486.261584] x86/keylocker: Failed to restore internal 
wrapping key.

Looks like the IWKey backup was corrupted on that system.

> Relevant bit pasted below:
> 
> <6>[43486.263035] Enabling non-boot CPUs ...
> <6>[43486.263081] x86: Booting SMP configuration:
> <6>[43486.263082] smpboot: Booting Node 0 Processor 1 APIC 0x1
> <2>[43486.264010] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> <1>[43486.264019] BUG: unable to handle page fault for address: ffffffff94b483a6
> <1>[43486.264021] #PF: supervisor instruction fetch in kernel mode
> <1>[43486.264023] #PF: error_code(0x0011) - permissions violation
> <6>[43486.264025] PGD 391c0e067 P4D 391c0e067 PUD 391c0f063 PMD
> 10006c063 PTE 8000000392148163
> <4>[43486.264031] Oops: 0011 [#1] PREEMPT SMP NOPTI
> <4>[43486.264035] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
> 5.10.136-19391-gadfe4d4b8c04 #1
> b640352a7a0e5f1522aed724296ad63f90c007df
> <4>[43486.264036] Hardware name: Google Primus/Primus, BIOS
> Google_Primus.14505.145.0 06/23/2022
> <4>[43486.264042] RIP: 0010:load_keylocker+0x0/0x7f

But, I don't get the reason why it hit this. On the wake-up path, 
copy_keylocker() is supposed to be called.

I added some printout in there, and it looks to be fine with me:

     [  218.488711] Enabling non-boot CPUs ...
     [  218.488794] x86: Booting SMP configuration:
     [  218.488795] smpboot: Booting Node 0 Processor 1 APIC 0x1
     [  218.490634] x86/keylocker: restore processor (id=1)
     [  218.491186] CPU1 is up
     ...

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-08-31 23:08               ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-08-31 23:08 UTC (permalink / raw)
  To: Evan Green
  Cc: ravi.v.shankar, dave.hansen, herbert, x86, Dan Williams, LKML,
	mingo, Eric Biggers, dm-devel, linux-crypto, luto,
	Thomas Gleixner, bp, Ard Biesheuvel, charishma1.gairuboyina,
	kumar.n.dwarakanath

On 8/25/2022 8:31 AM, Evan Green wrote:
> 
> Here's the log I've got that pointed me down this path:
> https://pastebin.com/VvR1EHvE

     <3>[43486.261583] x86/keylocker: The key backup access failed with 
read error.
     <3>[43486.261584] x86/keylocker: Failed to restore internal 
wrapping key.

Looks like the IWKey backup was corrupted on that system.

> Relevant bit pasted below:
> 
> <6>[43486.263035] Enabling non-boot CPUs ...
> <6>[43486.263081] x86: Booting SMP configuration:
> <6>[43486.263082] smpboot: Booting Node 0 Processor 1 APIC 0x1
> <2>[43486.264010] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> <1>[43486.264019] BUG: unable to handle page fault for address: ffffffff94b483a6
> <1>[43486.264021] #PF: supervisor instruction fetch in kernel mode
> <1>[43486.264023] #PF: error_code(0x0011) - permissions violation
> <6>[43486.264025] PGD 391c0e067 P4D 391c0e067 PUD 391c0f063 PMD
> 10006c063 PTE 8000000392148163
> <4>[43486.264031] Oops: 0011 [#1] PREEMPT SMP NOPTI
> <4>[43486.264035] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
> 5.10.136-19391-gadfe4d4b8c04 #1
> b640352a7a0e5f1522aed724296ad63f90c007df
> <4>[43486.264036] Hardware name: Google Primus/Primus, BIOS
> Google_Primus.14505.145.0 06/23/2022
> <4>[43486.264042] RIP: 0010:load_keylocker+0x0/0x7f

But, I don't get the reason why it hit this. On the wake-up path, 
copy_keylocker() is supposed to be called.

I added some printout in there, and it looks to be fine with me:

     [  218.488711] Enabling non-boot CPUs ...
     [  218.488794] x86: Booting SMP configuration:
     [  218.488795] smpboot: Booting Node 0 Processor 1 APIC 0x1
     [  218.490634] x86/keylocker: restore processor (id=1)
     [  218.491186] CPU1 is up
     ...

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-08-31 23:08               ` [dm-devel] " Chang S. Bae
@ 2022-09-06 16:22                 ` Evan Green
  -1 siblings, 0 replies; 247+ messages in thread
From: Evan Green @ 2022-09-06 16:22 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-crypto, dm-devel, herbert, Eric Biggers, Ard Biesheuvel,
	x86, luto, Thomas Gleixner, bp, dave.hansen, mingo, LKML,
	Dan Williams, charishma1.gairuboyina, kumar.n.dwarakanath,
	ravi.v.shankar

On Wed, Aug 31, 2022 at 4:16 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 8/25/2022 8:31 AM, Evan Green wrote:
> >
> > Here's the log I've got that pointed me down this path:
> > https://pastebin.com/VvR1EHvE
>
>      <3>[43486.261583] x86/keylocker: The key backup access failed with
> read error.
>      <3>[43486.261584] x86/keylocker: Failed to restore internal
> wrapping key.
>
> Looks like the IWKey backup was corrupted on that system.
>
> > Relevant bit pasted below:
> >
> > <6>[43486.263035] Enabling non-boot CPUs ...
> > <6>[43486.263081] x86: Booting SMP configuration:
> > <6>[43486.263082] smpboot: Booting Node 0 Processor 1 APIC 0x1
> > <2>[43486.264010] kernel tried to execute NX-protected page - exploit
> > attempt? (uid: 0)
> > <1>[43486.264019] BUG: unable to handle page fault for address: ffffffff94b483a6
> > <1>[43486.264021] #PF: supervisor instruction fetch in kernel mode
> > <1>[43486.264023] #PF: error_code(0x0011) - permissions violation
> > <6>[43486.264025] PGD 391c0e067 P4D 391c0e067 PUD 391c0f063 PMD
> > 10006c063 PTE 8000000392148163
> > <4>[43486.264031] Oops: 0011 [#1] PREEMPT SMP NOPTI
> > <4>[43486.264035] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
> > 5.10.136-19391-gadfe4d4b8c04 #1
> > b640352a7a0e5f1522aed724296ad63f90c007df
> > <4>[43486.264036] Hardware name: Google Primus/Primus, BIOS
> > Google_Primus.14505.145.0 06/23/2022
> > <4>[43486.264042] RIP: 0010:load_keylocker+0x0/0x7f
>
> But, I don't get the reason why it hit this. On the wake-up path,
> copy_keylocker() is supposed to be called.

Interesting, that's helpful. I thought I had a lead based on this,
which was that in this case we were doing a hibernate to shutdown,
rather than hibernate to S4. The IWKey backup is only valid down to
S4, so a read error on resume from this type of hibernate might make
sense. I know keylocker won't successfully maintain handles across a
hibernate to shutdown and subsequent resume, but it shouldn't crash
either.

But this still doesn't explain this crash, since in this case we're
still on our way down and haven't even done the shutdown yet. We can
see the "PM: hibernation: Image created (1536412 pages copied)" log
line just before the keylocker read failure. So then it seems
something's not working with the pre-hibernate CPU hotplug path?


>
> I added some printout in there, and it looks to be fine with me:
>
>      [  218.488711] Enabling non-boot CPUs ...
>      [  218.488794] x86: Booting SMP configuration:
>      [  218.488795] smpboot: Booting Node 0 Processor 1 APIC 0x1
>      [  218.490634] x86/keylocker: restore processor (id=1)
>      [  218.491186] CPU1 is up
>      ...

How were you exercising the CPU onlining in this case? Boot, cpu
hotplug, or hibernate?
-Evan

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-09-06 16:22                 ` Evan Green
  0 siblings, 0 replies; 247+ messages in thread
From: Evan Green @ 2022-09-06 16:22 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: ravi.v.shankar, dave.hansen, herbert, x86, Dan Williams, LKML,
	mingo, Eric Biggers, dm-devel, linux-crypto, luto,
	Thomas Gleixner, bp, Ard Biesheuvel, charishma1.gairuboyina,
	kumar.n.dwarakanath

On Wed, Aug 31, 2022 at 4:16 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 8/25/2022 8:31 AM, Evan Green wrote:
> >
> > Here's the log I've got that pointed me down this path:
> > https://pastebin.com/VvR1EHvE
>
>      <3>[43486.261583] x86/keylocker: The key backup access failed with
> read error.
>      <3>[43486.261584] x86/keylocker: Failed to restore internal
> wrapping key.
>
> Looks like the IWKey backup was corrupted on that system.
>
> > Relevant bit pasted below:
> >
> > <6>[43486.263035] Enabling non-boot CPUs ...
> > <6>[43486.263081] x86: Booting SMP configuration:
> > <6>[43486.263082] smpboot: Booting Node 0 Processor 1 APIC 0x1
> > <2>[43486.264010] kernel tried to execute NX-protected page - exploit
> > attempt? (uid: 0)
> > <1>[43486.264019] BUG: unable to handle page fault for address: ffffffff94b483a6
> > <1>[43486.264021] #PF: supervisor instruction fetch in kernel mode
> > <1>[43486.264023] #PF: error_code(0x0011) - permissions violation
> > <6>[43486.264025] PGD 391c0e067 P4D 391c0e067 PUD 391c0f063 PMD
> > 10006c063 PTE 8000000392148163
> > <4>[43486.264031] Oops: 0011 [#1] PREEMPT SMP NOPTI
> > <4>[43486.264035] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G U
> > 5.10.136-19391-gadfe4d4b8c04 #1
> > b640352a7a0e5f1522aed724296ad63f90c007df
> > <4>[43486.264036] Hardware name: Google Primus/Primus, BIOS
> > Google_Primus.14505.145.0 06/23/2022
> > <4>[43486.264042] RIP: 0010:load_keylocker+0x0/0x7f
>
> But, I don't get the reason why it hit this. On the wake-up path,
> copy_keylocker() is supposed to be called.

Interesting, that's helpful. I thought I had a lead based on this,
which was that in this case we were doing a hibernate to shutdown,
rather than hibernate to S4. The IWKey backup is only valid down to
S4, so a read error on resume from this type of hibernate might make
sense. I know keylocker won't successfully maintain handles across a
hibernate to shutdown and subsequent resume, but it shouldn't crash
either.

But this still doesn't explain this crash, since in this case we're
still on our way down and haven't even done the shutdown yet. We can
see the "PM: hibernation: Image created (1536412 pages copied)" log
line just before the keylocker read failure. So then it seems
something's not working with the pre-hibernate CPU hotplug path?


>
> I added some printout in there, and it looks to be fine with me:
>
>      [  218.488711] Enabling non-boot CPUs ...
>      [  218.488794] x86: Booting SMP configuration:
>      [  218.488795] smpboot: Booting Node 0 Processor 1 APIC 0x1
>      [  218.490634] x86/keylocker: restore processor (id=1)
>      [  218.491186] CPU1 is up
>      ...

How were you exercising the CPU onlining in this case? Boot, cpu
hotplug, or hibernate?
-Evan

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2022-09-06 16:22                 ` [dm-devel] " Evan Green
@ 2022-09-06 16:46                   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-09-06 16:46 UTC (permalink / raw)
  To: Evan Green
  Cc: linux-crypto, dm-devel, herbert, Eric Biggers, Ard Biesheuvel,
	x86, luto, Thomas Gleixner, bp, dave.hansen, mingo, LKML,
	Dan Williams, charishma1.gairuboyina, kumar.n.dwarakanath,
	ravi.v.shankar

On 9/6/2022 9:22 AM, Evan Green wrote:
> On Wed, Aug 31, 2022 at 4:16 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> 
> But this still doesn't explain this crash, since in this case we're
> still on our way down and haven't even done the shutdown yet. We can
> see the "PM: hibernation: Image created (1536412 pages copied)" log
> line just before the keylocker read failure. So then it seems
> something's not working with the pre-hibernate CPU hotplug path?

No, that backup crash message came from the boot-CPU's wakeup path:
__restore_processor_state()->restore_keylocker().

If the issue is repeatable, then I suspect that's something to do with 
the specific platform implementation. I don't have any detail yet on the 
customized systems. Let me check with some Chrome HW folks.

> How were you exercising the CPU onlining in this case? Boot, cpu
> hotplug, or hibernate?

Hibernate.

Thanks,
Chang


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2022-09-06 16:46                   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2022-09-06 16:46 UTC (permalink / raw)
  To: Evan Green
  Cc: ravi.v.shankar, dave.hansen, herbert, x86, Dan Williams, LKML,
	mingo, Eric Biggers, dm-devel, linux-crypto, luto,
	Thomas Gleixner, bp, Ard Biesheuvel, charishma1.gairuboyina,
	kumar.n.dwarakanath

On 9/6/2022 9:22 AM, Evan Green wrote:
> On Wed, Aug 31, 2022 at 4:16 PM Chang S. Bae <chang.seok.bae@intel.com> wrote:
> 
> But this still doesn't explain this crash, since in this case we're
> still on our way down and haven't even done the shutdown yet. We can
> see the "PM: hibernation: Image created (1536412 pages copied)" log
> line just before the keylocker read failure. So then it seems
> something's not working with the pre-hibernate CPU hotplug path?

No, that backup crash message came from the boot-CPU's wakeup path:
__restore_processor_state()->restore_keylocker().

If the issue is repeatable, then I suspect that's something to do with 
the specific platform implementation. I don't have any detail yet on the 
customized systems. Let me check with some Chrome HW folks.

> How were you exercising the CPU onlining in this case? Boot, cpu
> hotplug, or hibernate?

Hibernate.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v6 00/12] x86: Support Key Locker
  2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59   ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae

Hi all,

This feature enabling has been stalled due to the decryption
performance issue which took a while to result in a solution. There
was also a question on the userspace utility -- cryptsetup [6].

As posting this version, I wanted to make sure the resolutions are
acknowledgeable with these updates, at first:

* People identified a gap between encryption and decryption speeds
  [1, 2]. Intel has identified the root cause and will make a fix
  available. The fix will ensure that the encryption and decryption
  speeds are comparable with each other.

  This fix requires a microcode update and will go through a formal
  process of release toward the end of the year. As of now, listed
  below are the expected numbers [10] when the issue is fixed. Intel
  will provide the latest information on the request.

  +---------------------------+---------------+-----------------+
  |       Cipher              |   Encryption  |   Decryption    |
  |      (AES-XTS)            |    (MiB/s)    |   (MiB/s)       |
  +===========================+===============+=================+
  | Microcode version (0xa4)  |     1726.5    |      776.3      |
  +---------------------------+---------------+-----------------+
  | With the potential fix    |     2308.7    |     2305.9      |
  +---------------------------+---------------+-----------------+

  The benchmark results came from:
    $ cryptsetup benchmark -c aes-xts -s 256
  which are approximate using memory only (no storage IO).

  The fix is specific to the CPU model [8] which was measured when
  posting v3 [4]. Here is another results from the latest CPU [9]:

  +--------------+---------------+-----------------+
  | Cipher       |   Encryption  |   Decryption    |
  | (AES-XTS)    |    (MiB/s)    |   (MiB/s)       |
  +==============+===============+=================+
  | AES-NI       |     6738.7    |     6851.7      |
  +--------------+---------------+-----------------+
  | AES-KL       |     3425.4    |     3431.7      |
  +--------------+---------------+-----------------+

* Andy questioned the cryptsetup readiness to edit the cipher mode,
  e.g. AES-KL to AES-NI, for decrypting a keylocker-encrypted volume
  on non-keylocker systems [3].

  The cryptsetup changes were prototyped as a proof-of-concept. The
  code and test details can be found in this repository:

    git://github.com/intel-staging/keylocker.git cryptsetup

  Once the enabling code is accepted in the mainline, I will follow-up
  with the DM-crypt [7] folks to add this ability.

The feature is already available on recent Intel client systems. Its
usage along with the threat model and other details can be found in
the v3 cover letter [4].

The code has some updates from the last posting [5]:

- Fix the 'valid_kl' flag not to be set when the feature is disabled.
  This was reported by Marvin Hsu <marvin.hsu@intel.com>. Add the
  function comment about this. (Patch8)
- Improve the error handling in setup_keylocker(). All the error
  cases fall through the end that disables the feature. Otherwise,
  all the successful cases return immediately. (Patch8)
- Call out when disabling the feature on virtual machines. (Patch7)
- Ensure kernel_fpu_end() for the possible error (Patch10)
- Rebased on the upstream -- use the RET macro (Patch11/12)
- Clean up dead code -- cbc_crypt_common() (Patch10)
- Fix a typo (Patch1)

This series is based on the tip tree. The code is also available in
this repository:

  git://github.com/intel-staging/keylocker.git kl

Thanks to Charishma Gairuboyina and Lalithambika Krishnakumar for the
help to update the performance results.

Thanks,
Chang

[1]  https://lore.kernel.org/lkml/YbqRseO+TtuGQk5x@sol.localdomain/
[2]  https://lore.kernel.org/lkml/120368dc-e337-9176-936c-4db2a8bf710e@gmail.com/
[3]  https://lore.kernel.org/lkml/75ec3ad1-6234-ae1f-1b83-482793e4fd23@kernel.org/
[4]  v3: https://lore.kernel.org/lkml/20211124200700.15888-1-chang.seok.bae@intel.com/
[5]  v5: https://lore.kernel.org/lkml/20220112211258.21115-1-chang.seok.bae@intel.com/
[6]  cryptsetup: https://gitlab.com/cryptsetup/cryptsetup
[7]  DM-crypt: https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-crypt.html
[8]  Tiger Lake:
     https://www.intel.com/content/www/us/en/products/docs/processors/embedded/11th-gen-product-brief.html
[9]  Raptor Lake:
     https://www.intel.com/content/www/us/en/newsroom/resources/13th-gen-core.html
[10] Intel publishes information about product performance at
     https://www.Intel.com/PerformanceIndex

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker internal wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load an internal wrapping key at boot-time
  x86/PM/keylocker: Restore internal wrapping key on resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aes - Prepare for a new AES implementation
  crypto: x86/aes-kl - Support AES algorithm using Key Locker
    instructions
  crypto: x86/aes-kl - Support XTS mode

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/keylocker.rst               |  98 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Makefile                      |   5 +-
 arch/x86/crypto/aes-intel_asm.S               |  26 +
 arch/x86/crypto/aes-intel_glue.c              | 127 ++++
 arch/x86/crypto/aes-intel_glue.h              |  44 ++
 arch/x86/crypto/aeskl-intel_asm.S             | 633 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 216 ++++++
 arch/x86/crypto/aesni-intel_asm.S             |  58 +-
 arch/x86/crypto/aesni-intel_glue.c            | 235 ++-----
 arch/x86/crypto/aesni-intel_glue.h            |  17 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 212 ++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 crypto/Kconfig                                |  36 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 28 files changed, 1642 insertions(+), 214 deletions(-)
 create mode 100644 Documentation/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.c
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: b2e7ae549d84c654744bf70d30a881bfa140a6e3
-- 
2.17.1


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 00/12] x86: Support Key Locker
@ 2023-04-10 22:59   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, luto, bernie.keany,
	tglx, bp, gmazyland, charishma1.gairuboyina

Hi all,

This feature enabling has been stalled due to the decryption
performance issue which took a while to result in a solution. There
was also a question on the userspace utility -- cryptsetup [6].

As posting this version, I wanted to make sure the resolutions are
acknowledgeable with these updates, at first:

* People identified a gap between encryption and decryption speeds
  [1, 2]. Intel has identified the root cause and will make a fix
  available. The fix will ensure that the encryption and decryption
  speeds are comparable with each other.

  This fix requires a microcode update and will go through a formal
  process of release toward the end of the year. As of now, listed
  below are the expected numbers [10] when the issue is fixed. Intel
  will provide the latest information on the request.

  +---------------------------+---------------+-----------------+
  |       Cipher              |   Encryption  |   Decryption    |
  |      (AES-XTS)            |    (MiB/s)    |   (MiB/s)       |
  +===========================+===============+=================+
  | Microcode version (0xa4)  |     1726.5    |      776.3      |
  +---------------------------+---------------+-----------------+
  | With the potential fix    |     2308.7    |     2305.9      |
  +---------------------------+---------------+-----------------+

  The benchmark results came from:
    $ cryptsetup benchmark -c aes-xts -s 256
  which are approximate using memory only (no storage IO).

  The fix is specific to the CPU model [8] which was measured when
  posting v3 [4]. Here is another results from the latest CPU [9]:

  +--------------+---------------+-----------------+
  | Cipher       |   Encryption  |   Decryption    |
  | (AES-XTS)    |    (MiB/s)    |   (MiB/s)       |
  +==============+===============+=================+
  | AES-NI       |     6738.7    |     6851.7      |
  +--------------+---------------+-----------------+
  | AES-KL       |     3425.4    |     3431.7      |
  +--------------+---------------+-----------------+

* Andy questioned the cryptsetup readiness to edit the cipher mode,
  e.g. AES-KL to AES-NI, for decrypting a keylocker-encrypted volume
  on non-keylocker systems [3].

  The cryptsetup changes were prototyped as a proof-of-concept. The
  code and test details can be found in this repository:

    git://github.com/intel-staging/keylocker.git cryptsetup

  Once the enabling code is accepted in the mainline, I will follow-up
  with the DM-crypt [7] folks to add this ability.

The feature is already available on recent Intel client systems. Its
usage along with the threat model and other details can be found in
the v3 cover letter [4].

The code has some updates from the last posting [5]:

- Fix the 'valid_kl' flag not to be set when the feature is disabled.
  This was reported by Marvin Hsu <marvin.hsu@intel.com>. Add the
  function comment about this. (Patch8)
- Improve the error handling in setup_keylocker(). All the error
  cases fall through the end that disables the feature. Otherwise,
  all the successful cases return immediately. (Patch8)
- Call out when disabling the feature on virtual machines. (Patch7)
- Ensure kernel_fpu_end() for the possible error (Patch10)
- Rebased on the upstream -- use the RET macro (Patch11/12)
- Clean up dead code -- cbc_crypt_common() (Patch10)
- Fix a typo (Patch1)

This series is based on the tip tree. The code is also available in
this repository:

  git://github.com/intel-staging/keylocker.git kl

Thanks to Charishma Gairuboyina and Lalithambika Krishnakumar for the
help to update the performance results.

Thanks,
Chang

[1]  https://lore.kernel.org/lkml/YbqRseO+TtuGQk5x@sol.localdomain/
[2]  https://lore.kernel.org/lkml/120368dc-e337-9176-936c-4db2a8bf710e@gmail.com/
[3]  https://lore.kernel.org/lkml/75ec3ad1-6234-ae1f-1b83-482793e4fd23@kernel.org/
[4]  v3: https://lore.kernel.org/lkml/20211124200700.15888-1-chang.seok.bae@intel.com/
[5]  v5: https://lore.kernel.org/lkml/20220112211258.21115-1-chang.seok.bae@intel.com/
[6]  cryptsetup: https://gitlab.com/cryptsetup/cryptsetup
[7]  DM-crypt: https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-crypt.html
[8]  Tiger Lake:
     https://www.intel.com/content/www/us/en/products/docs/processors/embedded/11th-gen-product-brief.html
[9]  Raptor Lake:
     https://www.intel.com/content/www/us/en/newsroom/resources/13th-gen-core.html
[10] Intel publishes information about product performance at
     https://www.Intel.com/PerformanceIndex

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker internal wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load an internal wrapping key at boot-time
  x86/PM/keylocker: Restore internal wrapping key on resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aes - Prepare for a new AES implementation
  crypto: x86/aes-kl - Support AES algorithm using Key Locker
    instructions
  crypto: x86/aes-kl - Support XTS mode

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/keylocker.rst               |  98 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Makefile                      |   5 +-
 arch/x86/crypto/aes-intel_asm.S               |  26 +
 arch/x86/crypto/aes-intel_glue.c              | 127 ++++
 arch/x86/crypto/aes-intel_glue.h              |  44 ++
 arch/x86/crypto/aeskl-intel_asm.S             | 633 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 216 ++++++
 arch/x86/crypto/aesni-intel_asm.S             |  58 +-
 arch/x86/crypto/aesni-intel_glue.c            | 235 ++-----
 arch/x86/crypto/aesni-intel_glue.h            |  17 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 212 ++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 crypto/Kconfig                                |  36 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 28 files changed, 1642 insertions(+), 214 deletions(-)
 create mode 100644 Documentation/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.c
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: b2e7ae549d84c654744bf70d30a881bfa140a6e3
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v6 01/12] Documentation/x86: Document Key Locker
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Jonathan Corbet, linux-doc

Document the overview of the feature along with relevant consideration when
provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/x86/keylocker.html
---
 Documentation/x86/index.rst     |  1 +
 Documentation/x86/keylocker.rst | 98 +++++++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+)
 create mode 100644 Documentation/x86/keylocker.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 8ac64d7de4dc..669c239c009f 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -43,3 +43,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/x86/keylocker.rst b/Documentation/x86/keylocker.rst
new file mode 100644
index 000000000000..3b405fade7d8
--- /dev/null
+++ b/Documentation/x86/keylocker.rst
@@ -0,0 +1,98 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+Internal Wrapping Key (IWKey)
+=============================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+IWKey Restore Failure
+---------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of an IWKey restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by an IWKey restore failure is a data-volume then it is possible
+that I/O errors on that volume do not bring down the rest of the system.
+However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new IWKey at initial boot, not any time after like resume from
+suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private IWKey state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per VM IWKeys in memory, defeats the purpose of Key Locker. The
+backup / restore facility is also not performant enough to be suitable for
+guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return an error
+  code. Note that the flag is also set when the internal wrapping key is
+  changed because of missing backup.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. But there is no AES-KL
+  instruction to process a 192-bit key. The AES-KL cipher implementation
+  logs a warning message with a 192-bit key and then falls back to AES-NI.
+  So, this 192-bit key-size limitation is only documented, not enforced. It
+  means the key will remain in clear-text in memory. This is to meet Linux
+  crypto-cipher expectation that each implementation must support all the
+  AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 01/12] Documentation/x86: Document Key Locker
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, bernie.keany, tglx, bp, gmazyland,
	charishma1.gairuboyina

Document the overview of the feature along with relevant consideration when
provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/x86/keylocker.html
---
 Documentation/x86/index.rst     |  1 +
 Documentation/x86/keylocker.rst | 98 +++++++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+)
 create mode 100644 Documentation/x86/keylocker.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 8ac64d7de4dc..669c239c009f 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -43,3 +43,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/x86/keylocker.rst b/Documentation/x86/keylocker.rst
new file mode 100644
index 000000000000..3b405fade7d8
--- /dev/null
+++ b/Documentation/x86/keylocker.rst
@@ -0,0 +1,98 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+Internal Wrapping Key (IWKey)
+=============================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+IWKey Restore Failure
+---------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of an IWKey restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by an IWKey restore failure is a data-volume then it is possible
+that I/O errors on that volume do not bring down the rest of the system.
+However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new IWKey at initial boot, not any time after like resume from
+suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private IWKey state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per VM IWKeys in memory, defeats the purpose of Key Locker. The
+backup / restore facility is also not performant enough to be suitable for
+guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return an error
+  code. Note that the flag is also set when the internal wrapping key is
+  changed because of missing backup.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. But there is no AES-KL
+  instruction to process a 192-bit key. The AES-KL cipher implementation
+  logs a warning message with a 192-bit key and then falls back to AES-NI.
+  So, this 192-bit key-size limitation is only documented, not enforced. It
+  means the key will remain in clear-text in memory. This is to meet Linux
+  crypto-cipher expectation that each implementation must support all the
+  AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 02/12] x86/cpufeature: Enumerate Key Locker feature
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used to
transform a user key into a key handle.

It supports Advanced Encryption Standard (AES) cipher algorithm with new
SIMD instruction set like its predecessor (AES-NI). So a new AES
implementation will follow in the kernel's crypto library.

Here add it to enumerate the hardware capability, but it will not be
shown in /proc/cpuinfo as userspace usage is not supported.

Make the feature depend on XMM2 as it comes with AES SIMD instructions.

Add X86_FEATURE_KEYLOCKER to the disabled features mask. It will be
enabled under a new config option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d9c190cdefa9..878d25e38d7c 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -390,6 +390,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 702d93fdd10e..40e8ad2f837d 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -138,7 +144,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	(DISABLE_IBT)
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..262348aeaad1 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index e462c1d3800a..24f54f58dbf8 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -82,6 +82,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
 	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 02/12] x86/cpufeature: Enumerate Key Locker feature
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin, bernie.keany,
	tglx, bp, gmazyland, charishma1.gairuboyina

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used to
transform a user key into a key handle.

It supports Advanced Encryption Standard (AES) cipher algorithm with new
SIMD instruction set like its predecessor (AES-NI). So a new AES
implementation will follow in the kernel's crypto library.

Here add it to enumerate the hardware capability, but it will not be
shown in /proc/cpuinfo as userspace usage is not supported.

Make the feature depend on XMM2 as it comes with AES SIMD instructions.

Add X86_FEATURE_KEYLOCKER to the disabled features mask. It will be
enabled under a new config option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d9c190cdefa9..878d25e38d7c 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -390,6 +390,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 702d93fdd10e..40e8ad2f837d 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -138,7 +144,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	(DISABLE_IBT)
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..262348aeaad1 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index e462c1d3800a..24f54f58dbf8 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -82,6 +82,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
 	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 03/12] x86/insn: Add Key Locker instructions to the opcode map
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Ingo Molnar, Borislav Petkov, H. Peter Anvin

Add the following Key Locker instructions to the opcode map:

LOADIWKEY:
	Load an CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key indicated
	by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key indicated
	by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key indicated
	by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key indicated
	by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key indicated
	by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key indicated
	by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key indicated
	by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key indicated
	by a key handle.

Details of these instructions can be found in Intel Software Developer's
Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 03/12] x86/insn: Add Key Locker instructions to the opcode map
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Ingo Molnar,
	Borislav Petkov, luto, H. Peter Anvin, bernie.keany, tglx, bp,
	gmazyland, charishma1.gairuboyina

Add the following Key Locker instructions to the opcode map:

LOADIWKEY:
	Load an CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key indicated
	by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key indicated
	by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key indicated
	by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key indicated
	by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key indicated
	by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key indicated
	by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key indicated
	by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key indicated
	by a key handle.

Details of these instructions can be found in Intel Software Developer's
Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Rafael J. Wysocki, Peter Zijlstra

Key Locker introduces a CPU-internal wrapping key to encode a user key to a
key handle. Then a key handle is referenced instead of the plain text key.

The new instruction loads an internal wrapping key in the
software-inaccessible CPU state. It operates only in kernel mode.

Define struct iwkey to pass the key value.

The kernel will use this function to load a new key at boot time when the
feature is enabled.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
  Williams)

Note, Dan wondered if:
  WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
 arch/x86/include/asm/keylocker.h     | 25 ++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..df84c83228a1
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary internal wrapping key storage.
+ * @integrity_key:	A 128-bit key to check that key handles have not
+ *			been tampered with.
+ * @encryption_key:	A 256-bit encryption key used in
+ *			wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index d6cd9344f6c7..90b23f55970a 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <asm/processor-flags.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -296,6 +297,37 @@ static __always_inline void tile_release(void)
 	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key:	A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_cpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * LOADIWKEY %xmm1,%xmm2
+	 *
+	 * EAX and XMM0 are implicit operands. Load a key value
+	 * from XMM0-2 to a software-invisible CPU state. With zero
+	 * in EAX, CPU does not do hardware randomization and the key
+	 * backup is allowed.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin, bernie.keany,
	tglx, bp, Rafael J. Wysocki, gmazyland, charishma1.gairuboyina

Key Locker introduces a CPU-internal wrapping key to encode a user key to a
key handle. Then a key handle is referenced instead of the plain text key.

The new instruction loads an internal wrapping key in the
software-inaccessible CPU state. It operates only in kernel mode.

Define struct iwkey to pass the key value.

The kernel will use this function to load a new key at boot time when the
feature is enabled.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
  Williams)

Note, Dan wondered if:
  WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
 arch/x86/include/asm/keylocker.h     | 25 ++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..df84c83228a1
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary internal wrapping key storage.
+ * @integrity_key:	A 128-bit key to check that key handles have not
+ *			been tampered with.
+ * @encryption_key:	A 256-bit encryption key used in
+ *			wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index d6cd9344f6c7..90b23f55970a 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <asm/processor-flags.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -296,6 +297,37 @@ static __always_inline void tile_release(void)
 	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key:	A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_cpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * LOADIWKEY %xmm1,%xmm2
+	 *
+	 * EAX and XMM0 are implicit operands. Load a key value
+	 * from XMM0-2 to a software-invisible CPU state. With zero
+	 * in EAX, CPU does not do hardware randomization and the key
+	 * backup is allowed.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 05/12] x86/msr-index: Add MSRs for Key Locker internal wrapping key
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra

The CPU state that contains the internal wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the cache
(like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and restore
the wrapping key, and to check the key status.

The wrapping key is saved in a platform-scoped state of non-volatile media.
The backup itself and its path from the CPU are encrypted and integrity
protected.

But this storage's non-volatility is not architecturally guaranteed across
off states, such as S5 and G3.

The MSRs will be used to back up the key for S3/4 sleep states. Then the
kernel code writes one of them to restore the key in each CPU state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ad35355ee43e..4fa3b9e26162 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1115,4 +1115,10 @@
 						* a #GP
 						*/
 
+/* MSRs for managing an internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 05/12] x86/msr-index: Add MSRs for Key Locker internal wrapping key
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin, bernie.keany,
	tglx, bp, gmazyland, charishma1.gairuboyina

The CPU state that contains the internal wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the cache
(like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and restore
the wrapping key, and to check the key status.

The wrapping key is saved in a platform-scoped state of non-volatile media.
The backup itself and its path from the CPU are encrypted and integrity
protected.

But this storage's non-volatility is not architecturally guaranteed across
off states, such as S5 and G3.

The MSRs will be used to back up the key for S3/4 sleep states. Then the
kernel code writes one of them to restore the key in each CPU state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ad35355ee43e..4fa3b9e26162 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1115,4 +1115,10 @@
 						* a #GP
 						*/
 
+/* MSRs for managing an internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 06/12] x86/keylocker: Define Key Locker CPUID leaf
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Ingo Molnar, Borislav Petkov, H. Peter Anvin

Define the feature-specific CPUID leaf and bits. Both Key Locker enabling
code in the x86 core and AES Key Locker code in the crypto library will
reference them.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index df84c83228a1..e85dfb6c1524 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 06/12] x86/keylocker: Define Key Locker CPUID leaf
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Ingo Molnar,
	Borislav Petkov, luto, H. Peter Anvin, bernie.keany, tglx, bp,
	gmazyland, charishma1.gairuboyina

Define the feature-specific CPUID leaf and bits. Both Key Locker enabling
code in the x86 core and AES Key Locker code in the crypto library will
reference them.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index df84c83228a1..e85dfb6c1524 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Ingo Molnar, Borislav Petkov, H. Peter Anvin

The Internal Wrapping Key (IWKey) is an entity of Key Locker to encode a
clear text key into a key handle. This key is a pivot in protecting user
keys. So the value has to be randomized before being loaded in the
software-invisible CPU state.

IWKey needs to be established before the first user. Given that the only
proposed Linux use case for Key Locker is dm-crypt, the feature could be
lazily enabled when the first dm-crypt user arrives, but there is no
precedent for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.

The kernel generates random bytes and load them at boot time. These bytes
are flushed out immediately.

Setting the CR4.KL bit does not always enable the feature so ensure the
dynamic CPU bit (CPUID.AESKLE) is set before loading the key.

Given that the Linux Key Locker support is only intended for bare metal
dm-crypt consumption, and that switching IWKey per VM is untenable,
explicitly skip Key Locker setup in the X86_FEATURE_HYPERVISOR case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Call out the disabling when the feature is available on a virtual
  machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
 arch/x86/include/asm/keylocker.h |  9 ++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/common.c     |  5 +-
 arch/x86/kernel/keylocker.c      | 82 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c        |  2 +
 5 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index e85dfb6c1524..820ac29c06d9 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <linux/bits.h>
 #include <asm/fpu/types.h>
 
@@ -28,5 +29,13 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+#define setup_keylocker(c) do { } while (0)
+#define destroy_keylocker_data() do { } while (0)
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b366641703e3..20babbaf8c0c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -133,6 +133,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3ea06b0b4570..a1edd5997e0a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
 #include <asm/traps.h>
@@ -1874,10 +1876,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..2519102f72f1
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support internal wrapping key
+ * management.
+ */
+
+#include <linux/random.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+	struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
+	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+	kernel_fpu_begin();
+	load_xmm_iwkey(&kl_setup.key);
+	kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c:		A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto out;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto disable;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (c == &boot_cpu_data) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+		/*
+		 * Check the feature readiness via CPUID. Note that the
+		 * CPUID AESKLE bit is conditionally set only when CR4.KL
+		 * is set.
+		 */
+		if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+		    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+			pr_debug("x86/keylocker: Not fully supported.\n");
+			goto disable;
+		}
+
+		generate_keylocker_data();
+	}
+
+	load_keylocker();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return;
+
+disable:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+out:
+	/* Make sure the feature disabled for kexec-reboot. */
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 851477f7d728..880ed9cc8c53 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -84,6 +84,7 @@
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
 #include <asm/sev.h>
+#include <asm/keylocker.h>
 
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1500,6 +1501,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	nmi_selftest();
 	impress_friends();
 	cache_aps_init();
+	destroy_keylocker_data();
 }
 
 static int __initdata setup_possible_cpus = -1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Ingo Molnar,
	Borislav Petkov, luto, H. Peter Anvin, bernie.keany, tglx, bp,
	gmazyland, charishma1.gairuboyina

The Internal Wrapping Key (IWKey) is an entity of Key Locker to encode a
clear text key into a key handle. This key is a pivot in protecting user
keys. So the value has to be randomized before being loaded in the
software-invisible CPU state.

IWKey needs to be established before the first user. Given that the only
proposed Linux use case for Key Locker is dm-crypt, the feature could be
lazily enabled when the first dm-crypt user arrives, but there is no
precedent for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.

The kernel generates random bytes and load them at boot time. These bytes
are flushed out immediately.

Setting the CR4.KL bit does not always enable the feature so ensure the
dynamic CPU bit (CPUID.AESKLE) is set before loading the key.

Given that the Linux Key Locker support is only intended for bare metal
dm-crypt consumption, and that switching IWKey per VM is untenable,
explicitly skip Key Locker setup in the X86_FEATURE_HYPERVISOR case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Call out the disabling when the feature is available on a virtual
  machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
 arch/x86/include/asm/keylocker.h |  9 ++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/common.c     |  5 +-
 arch/x86/kernel/keylocker.c      | 82 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c        |  2 +
 5 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index e85dfb6c1524..820ac29c06d9 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <linux/bits.h>
 #include <asm/fpu/types.h>
 
@@ -28,5 +29,13 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+#define setup_keylocker(c) do { } while (0)
+#define destroy_keylocker_data() do { } while (0)
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index b366641703e3..20babbaf8c0c 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -133,6 +133,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3ea06b0b4570..a1edd5997e0a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
 #include <asm/traps.h>
@@ -1874,10 +1876,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..2519102f72f1
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support internal wrapping key
+ * management.
+ */
+
+#include <linux/random.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+	struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
+	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+	kernel_fpu_begin();
+	load_xmm_iwkey(&kl_setup.key);
+	kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c:		A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto out;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto disable;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (c == &boot_cpu_data) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+		/*
+		 * Check the feature readiness via CPUID. Note that the
+		 * CPUID AESKLE bit is conditionally set only when CR4.KL
+		 * is set.
+		 */
+		if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+		    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+			pr_debug("x86/keylocker: Not fully supported.\n");
+			goto disable;
+		}
+
+		generate_keylocker_data();
+	}
+
+	load_keylocker();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return;
+
+disable:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+out:
+	/* Make sure the feature disabled for kexec-reboot. */
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 851477f7d728..880ed9cc8c53 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -84,6 +84,7 @@
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
 #include <asm/sev.h>
+#include <asm/keylocker.h>
 
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1500,6 +1501,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	nmi_selftest();
 	impress_friends();
 	cache_aps_init();
+	destroy_keylocker_data();
 }
 
 static int __initdata setup_possible_cpus = -1;
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Rafael J. Wysocki, linux-pm

When the system enters the ACPI S3 or S4 sleep states, the internal
wrapping key is discarded.

The primary use case for the feature is bare metal dm-crypt. The key needs
to be restored properly on wakeup, as dm-crypt does not prompt for the key
on resume from suspend. Even the prompt it does perform for unlocking
the volume where the hibernation image is stored, it still expects to reuse
the key handles within the hibernation image once it is loaded. So it is
motivated to meet dm-crypt's expectation that the key handles in the
suspend-image remain valid after resume from an S-state.

Key Locker provides a mechanism to back up the internal wrapping key in
non-volatile storage. The kernel requests a backup right after the key is
loaded at boot time. It is copied back to each CPU upon wakeup.

While the backup may be maintained in NVM across S5 and G3 "off"
states it is not architecturally guaranteed, nor is it expected by dm-crypt
which expects to prompt for the key each time the volume is started.

The entirety of Key Locker needs to be disabled if the backup mechanism is
not available unless CONFIG_SUSPEND=n, otherwise dm-crypt requires the
backup to be available.

In the event of a key restore failure the kernel proceeds with an
initialized IWKey state. This has the effect of invalidating any key
handles that might be present in a suspend-image. When this happens
dm-crypt will see I/O errors resulting from error returns from
crypto_skcipher_{en,de}crypt(). While this will disrupt operations in the
current boot, data is not at risk and access is restored at the next reboot
to create new handles relative to the current IWKey.

Manage a feature-specific flag to communicate with the crypto
implementation. This ensures to stop using the AES instructions upon the
key restore failure while not turning off the feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
  (Reported by Marvin Hsu marvin.hsu@intel.com) Add the function
  comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
  fall through the end that disables the feature. Otherwise, all the
  successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
  Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 138 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 820ac29c06d9..c1d27fb5a1c3 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 #define setup_keylocker(c) do { } while (0)
 #define destroy_keylocker_data() do { } while (0)
+#define restore_keylocker() do { } while (0)
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 2519102f72f1..72d075499067 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,20 +11,45 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with IWKey load. When the key restore fails, it is
+ * reset. This restore state is exported to the crypto library, then AES-KL
+ * will not be used there. So, the feature is soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
 	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
 }
 
+/*
+ * This is invoked when the bootup is finished, which means IWKey is
+ * loaded. Then, the 'valid_kl' flag is set here when the feature is
+ * enabled.
+ */
 void __init destroy_keylocker_data(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return;
+
 	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +59,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the internal wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -52,6 +98,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -65,13 +112,54 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The internal wrapping key in CPU state is volatile in
+		 * S3/4 states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
-	}
+		load_keylocker();
 
-	load_keylocker();
+		/* Backup an internal wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
 
-	pr_info_once("x86/keylocker: Enabled.\n");
-	return;
+		pr_info("x86/keylocker: Enabled.\n");
+		return;
+	} else {
+		int rc;
+
+		/*
+		 * Load the internal wrapping key directly when available
+		 * in memory, which is only possible at boot-time.
+		 *
+		 * NB: When system wakes up, this path also recovers the
+		 * internal wrapping key.
+		 */
+		if (!kl_setup.initialized) {
+			load_keylocker();
+			return;
+		} else if (valid_kl) {
+			rc = copy_keylocker();
+			if (!rc)
+				return;
+
+			/*
+			 * The boot CPU was successful but the key copy
+			 * fails here. Then, the subsequent feature use
+			 * will have inconsistent keys and failures. So,
+			 * invalidate the feature via the flag.
+			 */
+			valid_kl = false;
+			pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+		}
+	}
 
 disable:
 	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
@@ -80,3 +168,45 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the internal wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the setup
+ * function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that indicates
+	 * an invalid backup if bit 0 is set and a read (or write) error if
+	 * bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature via
+	 * the flag to avoid any subsequent use. But keep the feature with
+	 * zero IWKeys instead of disabling it. The current users will see
+	 * key handle integrity failure but that's because of the internal
+	 * key change.
+	 */
+	pr_err("x86/keylocker: Failed to restore internal wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 236447ee9beb..34d2cf946b13 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, Rafael J. Wysocki, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, linux-pm, bernie.keany, tglx, bp, gmazyland,
	charishma1.gairuboyina

When the system enters the ACPI S3 or S4 sleep states, the internal
wrapping key is discarded.

The primary use case for the feature is bare metal dm-crypt. The key needs
to be restored properly on wakeup, as dm-crypt does not prompt for the key
on resume from suspend. Even the prompt it does perform for unlocking
the volume where the hibernation image is stored, it still expects to reuse
the key handles within the hibernation image once it is loaded. So it is
motivated to meet dm-crypt's expectation that the key handles in the
suspend-image remain valid after resume from an S-state.

Key Locker provides a mechanism to back up the internal wrapping key in
non-volatile storage. The kernel requests a backup right after the key is
loaded at boot time. It is copied back to each CPU upon wakeup.

While the backup may be maintained in NVM across S5 and G3 "off"
states it is not architecturally guaranteed, nor is it expected by dm-crypt
which expects to prompt for the key each time the volume is started.

The entirety of Key Locker needs to be disabled if the backup mechanism is
not available unless CONFIG_SUSPEND=n, otherwise dm-crypt requires the
backup to be available.

In the event of a key restore failure the kernel proceeds with an
initialized IWKey state. This has the effect of invalidating any key
handles that might be present in a suspend-image. When this happens
dm-crypt will see I/O errors resulting from error returns from
crypto_skcipher_{en,de}crypt(). While this will disrupt operations in the
current boot, data is not at risk and access is restored at the next reboot
to create new handles relative to the current IWKey.

Manage a feature-specific flag to communicate with the crypto
implementation. This ensures to stop using the AES instructions upon the
key restore failure while not turning off the feature.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
  (Reported by Marvin Hsu marvin.hsu@intel.com) Add the function
  comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
  fall through the end that disables the feature. Otherwise, all the
  successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
  Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 138 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 820ac29c06d9..c1d27fb5a1c3 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 #define setup_keylocker(c) do { } while (0)
 #define destroy_keylocker_data() do { } while (0)
+#define restore_keylocker() do { } while (0)
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 2519102f72f1..72d075499067 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,20 +11,45 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with IWKey load. When the key restore fails, it is
+ * reset. This restore state is exported to the crypto library, then AES-KL
+ * will not be used there. So, the feature is soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+EXPORT_SYMBOL_GPL(valid_keylocker);
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
 	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
 }
 
+/*
+ * This is invoked when the bootup is finished, which means IWKey is
+ * loaded. Then, the 'valid_kl' flag is set here when the feature is
+ * enabled.
+ */
 void __init destroy_keylocker_data(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return;
+
 	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +59,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the internal wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -52,6 +98,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -65,13 +112,54 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The internal wrapping key in CPU state is volatile in
+		 * S3/4 states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
-	}
+		load_keylocker();
 
-	load_keylocker();
+		/* Backup an internal wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
 
-	pr_info_once("x86/keylocker: Enabled.\n");
-	return;
+		pr_info("x86/keylocker: Enabled.\n");
+		return;
+	} else {
+		int rc;
+
+		/*
+		 * Load the internal wrapping key directly when available
+		 * in memory, which is only possible at boot-time.
+		 *
+		 * NB: When system wakes up, this path also recovers the
+		 * internal wrapping key.
+		 */
+		if (!kl_setup.initialized) {
+			load_keylocker();
+			return;
+		} else if (valid_kl) {
+			rc = copy_keylocker();
+			if (!rc)
+				return;
+
+			/*
+			 * The boot CPU was successful but the key copy
+			 * fails here. Then, the subsequent feature use
+			 * will have inconsistent keys and failures. So,
+			 * invalidate the feature via the flag.
+			 */
+			valid_kl = false;
+			pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+		}
+	}
 
 disable:
 	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
@@ -80,3 +168,45 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the internal wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the setup
+ * function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that indicates
+	 * an invalid backup if bit 0 is set and a read (or write) error if
+	 * bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature via
+	 * the flag to avoid any subsequent use. But keep the feature with
+	 * zero IWKeys instead of disabling it. The current users will see
+	 * key handle integrity failure but that's because of the internal
+	 * key change.
+	 */
+	pr_err("x86/keylocker: Failed to restore internal wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 236447ee9beb..34d2cf946b13 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 09/12] x86/cpu: Add a configuration and command line option for Key Locker
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, Jonathan Corbet, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, linux-doc

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at boot.
The option is selected by the Key Locker cipher module CRYPTO_AES_KL (to be
added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/Kconfig                                |  3 +++
 arch/x86/kernel/cpu/common.c                    | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5f2ec4b0f927..6534e6217e56 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3655,6 +3655,8 @@
 
 	nohugevmalloc	[KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings.
 
+	nokeylocker	[X86] Disable Key Locker hardware feature.
+
 	nosmt		[KNL,S390] Disable symmetric multithreading (SMT).
 			Equivalent to smt=1.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c94297369448..91f2063ab283 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1894,6 +1894,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a1edd5997e0a..c5550b8f030d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -419,6 +419,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 	X86_CR4_FSGSBASE | X86_CR4_CET;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+	/* Expect an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info("x86/keylocker: Disabled by kernel command line.\n");
+	return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 09/12] x86/cpu: Add a configuration and command line option for Key Locker
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, bernie.keany, tglx, bp, gmazyland,
	charishma1.gairuboyina

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at boot.
The option is selected by the Key Locker cipher module CRYPTO_AES_KL (to be
added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/Kconfig                                |  3 +++
 arch/x86/kernel/cpu/common.c                    | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5f2ec4b0f927..6534e6217e56 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3655,6 +3655,8 @@
 
 	nohugevmalloc	[KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings.
 
+	nokeylocker	[X86] Disable Key Locker hardware feature.
+
 	nosmt		[KNL,S390] Disable symmetric multithreading (SMT).
 			Equivalent to smt=1.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c94297369448..91f2063ab283 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1894,6 +1894,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a1edd5997e0a..c5550b8f030d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -419,6 +419,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 	X86_CR4_FSGSBASE | X86_CR4_CET;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+	/* Expect an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info("x86/keylocker: Disabled by kernel command line.\n");
+	return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, David S. Miller, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have the
same prototype as AES-NI. Then, the glue code will be the same as AES-NI's.

Refactor the common C code to avoid code duplication. The AES-NI code uses
it with a function pointer argument to call back the AES-NI assembly code.
So will the AES-KL code.

Introduce wrappers for data transformation functions to return an error
code. AES-KL may populate an error during data transformation.

Export those refactored functions and new wrappers that the AES-KL code
will reference.

Also move some constant values to a common assembly code.

No functional change intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link:
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Makefile           |   2 +-
 arch/x86/crypto/aes-intel_asm.S    |  26 ++++
 arch/x86/crypto/aes-intel_glue.c   | 127 ++++++++++++++++
 arch/x86/crypto/aes-intel_glue.h   |  44 ++++++
 arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
 arch/x86/crypto/aesni-intel_glue.c | 235 +++++++++--------------------
 arch/x86/crypto/aesni-intel_glue.h |  17 +++
 7 files changed, 305 insertions(+), 204 deletions(-)
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.c
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..8469b0a09cb5 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -47,7 +47,7 @@ chacha-x86_64-y := chacha-avx2-x86_64.o chacha-ssse3-x86_64.o chacha_glue.o
 chacha-x86_64-$(CONFIG_AS_AVX512) += chacha-avx512vl-x86_64.o
 
 obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
-aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
+aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
diff --git a/arch/x86/crypto/aes-intel_asm.S b/arch/x86/crypto/aes-intel_asm.S
new file mode 100644
index 000000000000..98abf875af79
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_asm.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+#ifdef __x86_64__
+.Lbswap_mask:
+	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+#endif
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-intel_glue.c b/arch/x86/crypto/aes-intel_glue.c
new file mode 100644
index 000000000000..d485577e121b
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+#include "aes-intel_glue.h"
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		      int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+				const u8 *in_key, unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx, key + keylen, keylen);
+}
+EXPORT_SYMBOL_GPL(xts_setkey_common);
+
+int xts_crypt_common(struct skcipher_request *req,
+		     int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				     unsigned int len, u8 *iv),
+		     int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+EXPORT_SYMBOL_GPL(xts_crypt_common);
diff --git a/arch/x86/crypto/aes-intel_glue.h b/arch/x86/crypto/aes-intel_glue.h
new file mode 100644
index 000000000000..a5bd528ac072
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <crypto/aes.h>
+
+#define AES_ALIGN			16
+#define AES_ALIGN_ATTR			__attribute__((__aligned__(AES_ALIGN)))
+#define AES_BLOCK_MASK			(~(AES_BLOCK_SIZE - 1))
+#define AES_ALIGN_EXTRA			((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define CRYPTO_AES_CTX_SIZE		(sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
+#define XTS_AES_CTX_SIZE		(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+};
+
+static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+{
+	unsigned long addr = (unsigned long)raw_ctx;
+	unsigned long align = AES_ALIGN;
+
+	if (align <= crypto_tfm_ctx_alignment())
+		align = 1;
+
+	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+}
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		      int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+				const u8 *in_key, unsigned int key_len));
+
+int xts_crypt_common(struct skcipher_request *req,
+		     int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				     unsigned int len, u8 *iv),
+		     int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in));
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 837c1e0aa021..9ea4f13464d5 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-intel_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1820,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                    unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(_aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1932,12 +1933,12 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(_aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(_aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(_aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(_aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(_aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2689,21 +2690,6 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
-.pushsection .rodata
-.align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
-.Lbswap_mask:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
-.popsection
-
 #ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
@@ -2819,12 +2805,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2824,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(_aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2996,13 +2976,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(_aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(_aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3158,4 +3138,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(_aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..e7a9e8e01680 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,24 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,10 +71,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void _aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void _aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +88,51 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return _aesni_set_key(ctx, in_key, key_len);
+}
+EXPORT_SYMBOL_GPL(aesni_set_key);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_enc(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_enc);
+
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_dec(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_dec);
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
 
 #ifdef CONFIG_X86_64
 
@@ -201,7 +229,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -211,7 +239,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -219,16 +247,6 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
-{
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -243,7 +261,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = _aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
@@ -264,7 +282,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		_aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -277,7 +295,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		_aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -491,7 +509,7 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
 
 #ifdef CONFIG_X86_64
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
-			      const u8 *in, unsigned int len, u8 *iv)
+				 const u8 *in, unsigned int len, u8 *iv)
 {
 	/*
 	 * based on key length, override with the by8 version
@@ -528,7 +546,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			_aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -673,8 +691,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -829,8 +847,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -857,8 +875,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -883,128 +901,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1160,8 +1067,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1177,8 +1084,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin, bernie.keany,
	tglx, bp, gmazyland, charishma1.gairuboyina

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have the
same prototype as AES-NI. Then, the glue code will be the same as AES-NI's.

Refactor the common C code to avoid code duplication. The AES-NI code uses
it with a function pointer argument to call back the AES-NI assembly code.
So will the AES-KL code.

Introduce wrappers for data transformation functions to return an error
code. AES-KL may populate an error during data transformation.

Export those refactored functions and new wrappers that the AES-KL code
will reference.

Also move some constant values to a common assembly code.

No functional change intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link:
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Makefile           |   2 +-
 arch/x86/crypto/aes-intel_asm.S    |  26 ++++
 arch/x86/crypto/aes-intel_glue.c   | 127 ++++++++++++++++
 arch/x86/crypto/aes-intel_glue.h   |  44 ++++++
 arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
 arch/x86/crypto/aesni-intel_glue.c | 235 +++++++++--------------------
 arch/x86/crypto/aesni-intel_glue.h |  17 +++
 7 files changed, 305 insertions(+), 204 deletions(-)
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.c
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..8469b0a09cb5 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -47,7 +47,7 @@ chacha-x86_64-y := chacha-avx2-x86_64.o chacha-ssse3-x86_64.o chacha_glue.o
 chacha-x86_64-$(CONFIG_AS_AVX512) += chacha-avx512vl-x86_64.o
 
 obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
-aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
+aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
diff --git a/arch/x86/crypto/aes-intel_asm.S b/arch/x86/crypto/aes-intel_asm.S
new file mode 100644
index 000000000000..98abf875af79
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_asm.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+#ifdef __x86_64__
+.Lbswap_mask:
+	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+#endif
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-intel_glue.c b/arch/x86/crypto/aes-intel_glue.c
new file mode 100644
index 000000000000..d485577e121b
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+#include "aes-intel_glue.h"
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		      int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+				const u8 *in_key, unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx, key + keylen, keylen);
+}
+EXPORT_SYMBOL_GPL(xts_setkey_common);
+
+int xts_crypt_common(struct skcipher_request *req,
+		     int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				     unsigned int len, u8 *iv),
+		     int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(aes_ctx(ctx->raw_crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+EXPORT_SYMBOL_GPL(xts_crypt_common);
diff --git a/arch/x86/crypto/aes-intel_glue.h b/arch/x86/crypto/aes-intel_glue.h
new file mode 100644
index 000000000000..a5bd528ac072
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <crypto/aes.h>
+
+#define AES_ALIGN			16
+#define AES_ALIGN_ATTR			__attribute__((__aligned__(AES_ALIGN)))
+#define AES_BLOCK_MASK			(~(AES_BLOCK_SIZE - 1))
+#define AES_ALIGN_EXTRA			((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define CRYPTO_AES_CTX_SIZE		(sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
+#define XTS_AES_CTX_SIZE		(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
+};
+
+static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+{
+	unsigned long addr = (unsigned long)raw_ctx;
+	unsigned long align = AES_ALIGN;
+
+	if (align <= crypto_tfm_ctx_alignment())
+		align = 1;
+
+	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+}
+
+int xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		      int (*fn)(struct crypto_tfm *tfm, void *raw_ctx,
+				const u8 *in_key, unsigned int key_len));
+
+int xts_crypt_common(struct skcipher_request *req,
+		     int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				     unsigned int len, u8 *iv),
+		     int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in));
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 837c1e0aa021..9ea4f13464d5 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-intel_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1820,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                    unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(_aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1932,12 +1933,12 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(_aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(_aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(_aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(_aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(_aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2689,21 +2690,6 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
-.pushsection .rodata
-.align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
-.Lbswap_mask:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
-.popsection
-
 #ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
@@ -2819,12 +2805,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2824,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(_aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2996,13 +2976,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(_aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(_aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3158,4 +3138,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(_aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..e7a9e8e01680 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,24 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,10 +71,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void _aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void _aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +88,51 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return _aesni_set_key(ctx, in_key, key_len);
+}
+EXPORT_SYMBOL_GPL(aesni_set_key);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_enc(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_enc);
+
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_dec(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_dec);
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
 
 #ifdef CONFIG_X86_64
 
@@ -201,7 +229,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -211,7 +239,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -219,16 +247,6 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
-{
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -243,7 +261,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = _aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
@@ -264,7 +282,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		_aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -277,7 +295,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		_aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -491,7 +509,7 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
 
 #ifdef CONFIG_X86_64
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
-			      const u8 *in, unsigned int len, u8 *iv)
+				 const u8 *in, unsigned int len, u8 *iv)
 {
 	/*
 	 * based on key length, override with the by8 version
@@ -528,7 +546,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			_aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -673,8 +691,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -829,8 +847,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -857,8 +875,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -883,128 +901,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1160,8 +1067,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1177,8 +1084,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, David S. Miller, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Nathan Chancellor, Nick Desaulniers, Tom Rix

Key Locker is a CPU feature to reduce key exfiltration opportunities while
maintaining a programming interface similar to AES-NI. It converts the AES
key into an encoded form, called the 'key handle'.

The key handle is a wrapped version of the clear-text key where the
wrapping key has limited exposure. Once converted via setkey(), all
subsequent data encryption using new AES instructions ('AES-KL') uses this
key handle, reducing the exposure of private key material in memory.

AES-KL is analogous to that of AES-NI. Most assembly code is translated
from the AES-NI code. They are operational in both 32-bit and 64-bit modes
like AES-NI. However, users need to be aware of the following differences:

== Key Handle ==

AES-KL may fail with an invalid key handle. It could be corrupted or fail
with handle restriction. A key handle may be encoded with some
restrictions. The implementation restricts every handle only available
in kernel mode via setkey().

=== AES Compliance ===

Key Locker is not AES compliant in that it lacks support for 192-bit keys.
However, per the expectations of Linux crypto-cipher implementations the
software cipher implementation must support all the AES compliant
key-sizes. The AES-KL cipher implementation achieves this constraint by
logging a warning and falling back to AES-NI. In other words the 192-bit
key-size limitation for what can be converted into a Key Locker key-handle
is only documented, not enforced. This along with the below performance and
failure mode implications is an end-user consideration for selecting AES-KL
vs AES-NI.

== API Limitation ==

The setkey() function transforms an AES key to a handle. An extended key is
a usual outcome of setkey() in other AES cipher implementations. For this
reason, a setkey() failure does not fall back to the other. So, AES-KL will
be exposed via synchronous interfaces only.

== Wrapping-key Restore Failure ==

The setkey() failure is also possible with the wrapping-key restore
failure. In an event of hardware failure, the wrapping key is lost from
deep sleep states. Then, setkey() will fail with the ENODEV error. And the
suspended data transforming task may resume but will fail due to the
wrapping key change.

== Performance ==

This feature comes with some performance penalties vs AES-NI. The
cryptsetup benchmark indicates Key Locker raw throughput can be  ~5x slower
than AES-NI. For disk encryption, storage bandwidth may be the bottleneck
before encryption bandwidth, but the potential performance difference is
why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
the kernel transparently replacing AES-NI usage with AES-KL.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-intel_asm.S  | 184 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 115 ++++++++++++++++++
 crypto/Kconfig                     |  36 ++++++
 4 files changed, 338 insertions(+)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 8469b0a09cb5..ab9bd2b102dd 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..17c7b306a766
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using Intel AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-intel_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define BSWAP_MASK %xmm11
+#define CTR	%xmm12
+#define INC	%xmm13
+
+#ifdef __x86_64__
+#define IN1	%xmm8
+#define IN2	%xmm9
+#define IN3	%xmm10
+#define IN4	%xmm11
+#define IN5	%xmm12
+#define IN6	%xmm13
+#define IN7	%xmm14
+#define IN8	%xmm15
+#define IN	IN1
+#define TCTR_LOW %r11
+#else
+#define IN	%xmm1
+#endif
+
+#ifdef __x86_64__
+#define AREG	%rax
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+#else
+#define AREG	%eax
+#define HANDLEP	%edi
+#define OUTP	AREG
+#define KLEN	%ebx
+#define INP	%edx
+#define T1	%ecx
+#define LEN	%esi
+#define IVP	%ebp
+#endif
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(aeskl_setkey)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	push HANDLEP
+	movl (FRAME_OFFSET+8)(%esp),  HANDLEP	# ctx
+	movl (FRAME_OFFSET+12)(%esp), UKEYP	# in_key
+	movl (FRAME_OFFSET+16)(%esp), %edx	# key_len
+#endif
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	xor AREG, AREG
+#ifndef __x86_64__
+	popl HANDLEP
+#endif
+	FRAME_END
+	RET
+SYM_FUNC_END(aeskl_setkey)
+
+/*
+ * int _aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_enc)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	pushl HANDLEP
+	pushl KLEN
+	movl (FRAME_OFFSET+12)(%esp), HANDLEP	# ctx
+	movl (FRAME_OFFSET+16)(%esp), OUTP	# dst
+	movl (FRAME_OFFSET+20)(%esp), INP	# src
+#endif
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	jmp .Lenc_noerr
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+
+.Lenc_noerr:
+	xor AREG, AREG
+	jmp .Lenc_end
+.Lenc_err:
+	mov $1, AREG
+.Lenc_end:
+	movdqu STATE, (OUTP)
+#ifndef __x86_64__
+	popl KLEN
+	popl HANDLEP
+#endif
+	FRAME_END
+	RET
+SYM_FUNC_END(_aeskl_enc)
+
+/*
+ * int _aeskl_dec(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_dec)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	pushl HANDLEP
+	pushl KLEN
+	movl (FRAME_OFFSET+12)(%esp), HANDLEP	# ctx
+	movl (FRAME_OFFSET+16)(%esp), OUTP	# dst
+	movl (FRAME_OFFSET+20)(%esp), INP	# src
+#endif
+	movdqu (INP), STATE
+	mov 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Ldec_128
+	aesdec256kl (HANDLEP), STATE
+	jz .Ldec_err
+	jmp .Ldec_noerr
+.Ldec_128:
+	aesdec128kl (HANDLEP), STATE
+	jz .Ldec_err
+
+.Ldec_noerr:
+	xor AREG, AREG
+	jmp .Ldec_end
+.Ldec_err:
+	mov $1, AREG
+.Ldec_end:
+	movdqu STATE, (OUTP)
+#ifndef __x86_64__
+	popl KLEN
+	popl HANDLEP
+#endif
+	FRAME_END
+	RET
+SYM_FUNC_END(_aeskl_dec)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..0062baaaf7b2
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
+
+static int __maybe_unused aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx,
+					      const u8 *in_key, unsigned int key_len)
+{
+	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
+	    key_len != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	if (unlikely(key_len == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		err = aesni_set_key(ctx, in_key, key_len);
+	} else {
+		if (!valid_keylocker())
+			err = -ENODEV;
+		else
+			err = aeskl_setkey(ctx, in_key, key_len);
+	}
+	kernel_fpu_end();
+
+	return err;
+}
+
+static inline u32 keylength(const void *raw_ctx)
+{
+	struct crypto_aes_ctx *ctx = aes_ctx((void *)raw_ctx);
+
+	return ctx->key_length;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_enc(ctx, out, in))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_dec(ctx, out, in))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int __init aeskl_init(void)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
+	 * support 192-bit keys. To make itself AES-compliant, it falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return 0;
+}
+
+static void __exit aeskl_exit(void)
+{
+	return;
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 9c86f7045157..e432d1ded391 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -573,6 +573,42 @@ config CRYPTO_TWOFISH
 
 	  See https://www.schneier.com/twofish.html for further information.
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
+config CRYPTO_AES_KL
+	tristate "AES cipher algorithms (AES-KL)"
+	depends on AS_HAS_KEYLOCKER
+	depends on CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Key Locker provides AES SIMD instructions (AES-KL) for secure
+	  data encryption and decryption. While this new instruction
+	  set is analogous to AES-NI, AES-KL supports to encode an AES
+	  key to an encoded form ('key handle') and uses it to transform
+	  data instead of accessing the AES key.
+
+	  The setkey() transforms an AES key to a key handle, then the AES
+	  key is no longer needed for data transformation. A user may
+	  displace their keys from possible exposition.
+
+	  This key encryption is done by the CPU-internal wrapping key.
+	  This wrapping key support is provided with X86_KEYLOCKER.
+
+	  AES-KL supports 128-/256-bit keys only. While giving a 192-bit
+	  key does not return an error, as the AES-NI driver is chosen to
+	  process it, the claimed security property is not available with
+	  that.
+
+	  Bare metal disk encryption is the preferred use case. Key Locker
+	  usage requires explicit opt-in at setup time. So select it if
+	  unsure.
+
+	  See Documentation/x86/keylocker.rst for more details.
+
 config CRYPTO_TWOFISH_COMMON
 	tristate
 	help
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: Tom Rix, dave.hansen, lalithambika.krishnakumar, H. Peter Anvin,
	ardb, herbert, x86, mingo, ebiggers, Ingo Molnar, bp, gmazyland,
	chang.seok.bae, Nathan Chancellor, Borislav Petkov, luto,
	dan.j.williams, charishma1.gairuboyina, Nick Desaulniers, tglx,
	bernie.keany, David S. Miller

Key Locker is a CPU feature to reduce key exfiltration opportunities while
maintaining a programming interface similar to AES-NI. It converts the AES
key into an encoded form, called the 'key handle'.

The key handle is a wrapped version of the clear-text key where the
wrapping key has limited exposure. Once converted via setkey(), all
subsequent data encryption using new AES instructions ('AES-KL') uses this
key handle, reducing the exposure of private key material in memory.

AES-KL is analogous to that of AES-NI. Most assembly code is translated
from the AES-NI code. They are operational in both 32-bit and 64-bit modes
like AES-NI. However, users need to be aware of the following differences:

== Key Handle ==

AES-KL may fail with an invalid key handle. It could be corrupted or fail
with handle restriction. A key handle may be encoded with some
restrictions. The implementation restricts every handle only available
in kernel mode via setkey().

=== AES Compliance ===

Key Locker is not AES compliant in that it lacks support for 192-bit keys.
However, per the expectations of Linux crypto-cipher implementations the
software cipher implementation must support all the AES compliant
key-sizes. The AES-KL cipher implementation achieves this constraint by
logging a warning and falling back to AES-NI. In other words the 192-bit
key-size limitation for what can be converted into a Key Locker key-handle
is only documented, not enforced. This along with the below performance and
failure mode implications is an end-user consideration for selecting AES-KL
vs AES-NI.

== API Limitation ==

The setkey() function transforms an AES key to a handle. An extended key is
a usual outcome of setkey() in other AES cipher implementations. For this
reason, a setkey() failure does not fall back to the other. So, AES-KL will
be exposed via synchronous interfaces only.

== Wrapping-key Restore Failure ==

The setkey() failure is also possible with the wrapping-key restore
failure. In an event of hardware failure, the wrapping key is lost from
deep sleep states. Then, setkey() will fail with the ENODEV error. And the
suspended data transforming task may resume but will fail due to the
wrapping key change.

== Performance ==

This feature comes with some performance penalties vs AES-NI. The
cryptsetup benchmark indicates Key Locker raw throughput can be  ~5x slower
than AES-NI. For disk encryption, storage bandwidth may be the bottleneck
before encryption bandwidth, but the potential performance difference is
why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
the kernel transparently replacing AES-NI usage with AES-KL.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-intel_asm.S  | 184 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 115 ++++++++++++++++++
 crypto/Kconfig                     |  36 ++++++
 4 files changed, 338 insertions(+)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 8469b0a09cb5..ab9bd2b102dd 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o aes-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..17c7b306a766
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using Intel AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-intel_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define BSWAP_MASK %xmm11
+#define CTR	%xmm12
+#define INC	%xmm13
+
+#ifdef __x86_64__
+#define IN1	%xmm8
+#define IN2	%xmm9
+#define IN3	%xmm10
+#define IN4	%xmm11
+#define IN5	%xmm12
+#define IN6	%xmm13
+#define IN7	%xmm14
+#define IN8	%xmm15
+#define IN	IN1
+#define TCTR_LOW %r11
+#else
+#define IN	%xmm1
+#endif
+
+#ifdef __x86_64__
+#define AREG	%rax
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+#else
+#define AREG	%eax
+#define HANDLEP	%edi
+#define OUTP	AREG
+#define KLEN	%ebx
+#define INP	%edx
+#define T1	%ecx
+#define LEN	%esi
+#define IVP	%ebp
+#endif
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(aeskl_setkey)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	push HANDLEP
+	movl (FRAME_OFFSET+8)(%esp),  HANDLEP	# ctx
+	movl (FRAME_OFFSET+12)(%esp), UKEYP	# in_key
+	movl (FRAME_OFFSET+16)(%esp), %edx	# key_len
+#endif
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	xor AREG, AREG
+#ifndef __x86_64__
+	popl HANDLEP
+#endif
+	FRAME_END
+	RET
+SYM_FUNC_END(aeskl_setkey)
+
+/*
+ * int _aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_enc)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	pushl HANDLEP
+	pushl KLEN
+	movl (FRAME_OFFSET+12)(%esp), HANDLEP	# ctx
+	movl (FRAME_OFFSET+16)(%esp), OUTP	# dst
+	movl (FRAME_OFFSET+20)(%esp), INP	# src
+#endif
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	jmp .Lenc_noerr
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+
+.Lenc_noerr:
+	xor AREG, AREG
+	jmp .Lenc_end
+.Lenc_err:
+	mov $1, AREG
+.Lenc_end:
+	movdqu STATE, (OUTP)
+#ifndef __x86_64__
+	popl KLEN
+	popl HANDLEP
+#endif
+	FRAME_END
+	RET
+SYM_FUNC_END(_aeskl_enc)
+
+/*
+ * int _aeskl_dec(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(_aeskl_dec)
+	FRAME_BEGIN
+#ifndef __x86_64__
+	pushl HANDLEP
+	pushl KLEN
+	movl (FRAME_OFFSET+12)(%esp), HANDLEP	# ctx
+	movl (FRAME_OFFSET+16)(%esp), OUTP	# dst
+	movl (FRAME_OFFSET+20)(%esp), INP	# src
+#endif
+	movdqu (INP), STATE
+	mov 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Ldec_128
+	aesdec256kl (HANDLEP), STATE
+	jz .Ldec_err
+	jmp .Ldec_noerr
+.Ldec_128:
+	aesdec128kl (HANDLEP), STATE
+	jz .Ldec_err
+
+.Ldec_noerr:
+	xor AREG, AREG
+	jmp .Ldec_end
+.Ldec_err:
+	mov $1, AREG
+.Ldec_end:
+	movdqu STATE, (OUTP)
+#ifndef __x86_64__
+	popl KLEN
+	popl HANDLEP
+#endif
+	FRAME_END
+	RET
+SYM_FUNC_END(_aeskl_dec)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..0062baaaf7b2
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
+
+static int __maybe_unused aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx,
+					      const u8 *in_key, unsigned int key_len)
+{
+	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
+	    key_len != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	if (unlikely(key_len == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		err = aesni_set_key(ctx, in_key, key_len);
+	} else {
+		if (!valid_keylocker())
+			err = -ENODEV;
+		else
+			err = aeskl_setkey(ctx, in_key, key_len);
+	}
+	kernel_fpu_end();
+
+	return err;
+}
+
+static inline u32 keylength(const void *raw_ctx)
+{
+	struct crypto_aes_ctx *ctx = aes_ctx((void *)raw_ctx);
+
+	return ctx->key_length;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_enc(ctx, out, in))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_dec(ctx, out, in))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int __init aeskl_init(void)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
+	 * support 192-bit keys. To make itself AES-compliant, it falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return 0;
+}
+
+static void __exit aeskl_exit(void)
+{
+	return;
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 9c86f7045157..e432d1ded391 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -573,6 +573,42 @@ config CRYPTO_TWOFISH
 
 	  See https://www.schneier.com/twofish.html for further information.
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
+config CRYPTO_AES_KL
+	tristate "AES cipher algorithms (AES-KL)"
+	depends on AS_HAS_KEYLOCKER
+	depends on CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Key Locker provides AES SIMD instructions (AES-KL) for secure
+	  data encryption and decryption. While this new instruction
+	  set is analogous to AES-NI, AES-KL supports to encode an AES
+	  key to an encoded form ('key handle') and uses it to transform
+	  data instead of accessing the AES key.
+
+	  The setkey() transforms an AES key to a key handle, then the AES
+	  key is no longer needed for data transformation. A user may
+	  displace their keys from possible exposition.
+
+	  This key encryption is done by the CPU-internal wrapping key.
+	  This wrapping key support is provided with X86_KEYLOCKER.
+
+	  AES-KL supports 128-/256-bit keys only. While giving a 192-bit
+	  key does not return an error, as the AES-NI driver is chosen to
+	  process it, the claimed security property is not available with
+	  that.
+
+	  Bare metal disk encryption is the preferred use case. Key Locker
+	  usage requires explicit opt-in at setup time. So select it if
+	  unsure.
+
+	  See Documentation/x86/keylocker.rst for more details.
+
 config CRYPTO_TWOFISH_COMMON
 	tristate
 	help
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v6 12/12] crypto: x86/aes-kl - Support XTS mode
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-04-10 22:59     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, David S. Miller, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin

Implement XTS mode using AES-KL. Export the methods with a lower priority
than AES-NI to avoid from selected by default.

The assembly code clobbers more than eight 128-bit registers to make use of
performant wide instructions that process eight 128-bit blocks at once.
But that many 128-bit registers are not available in 32-bit mode, so
support 64-bit mode only.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/crypto/aeskl-intel_asm.S  | 449 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 107 ++++++-
 2 files changed, 553 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
index 17c7b306a766..05134327bcb2 100644
--- a/arch/x86/crypto/aeskl-intel_asm.S
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -182,3 +182,452 @@ SYM_FUNC_START(_aeskl_dec)
 	RET
 SYM_FUNC_END(_aeskl_dec)
 
+#ifdef __x86_64__
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+/*
+ * int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			  const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_encrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+	sub $128, LEN
+	jl .Lxts_enc1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_enc8_128
+	aesencwide256kl (%rdi)
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+	aesencwide128kl (%rdi)
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+	movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+	mov $1, AREG
+.Lxts_enc_ret:
+	FRAME_END
+	RET
+
+.Lxts_enc1_pre:
+	add $128, LEN
+	jz .Lxts_enc_ret_iv
+	sub $16, LEN
+	jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+	movdqu (INP), STATE1
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_enc1_out
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_enc_cts1
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+
+.Lxts_enc_cts1:
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_cts_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+	pxor IV, STATE1
+	movups STATE1, (OUTP)
+	jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(_aeskl_xts_encrypt)
+
+/*
+ * int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			  const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_decrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+	test $15, LEN
+	jz .Lxts_dec8
+	sub $16, LEN
+
+.Lxts_dec8:
+	sub $128, LEN
+	jl .Lxts_dec1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_dec8_128
+	aesdecwide256kl (%rdi)
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+	aesdecwide128kl (%rdi)
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+	movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+	mov $1, AREG
+.Lxts_dec_ret:
+	FRAME_END
+	RET
+
+.Lxts_dec1_pre:
+	add $128, LEN
+	jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+	movdqu (INP), STATE1
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_dec_cts1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_dec1_out
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+	pxor IV, STATE1
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor STATE5, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+	pxor STATE5, STATE1
+
+	movups STATE1, (OUTP)
+	jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(_aeskl_xts_decrypt)
+
+#endif
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
index 0062baaaf7b2..6616a2fc6f04 100644
--- a/arch/x86/crypto/aeskl-intel_glue.c
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -27,8 +27,15 @@ asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsign
 asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
 
-static int __maybe_unused aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx,
-					      const u8 *in_key, unsigned int key_len)
+#ifdef CONFIG_X86_64
+asmlinkage int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				  unsigned int len, u8 *iv);
+asmlinkage int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				  unsigned int len, u8 *iv);
+#endif
+
+static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+			       unsigned int key_len)
 {
 	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
 	int err;
@@ -86,11 +93,99 @@ static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
 		return 0;
 }
 
+#ifdef CONFIG_X86_64
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_xts_encrypt(ctx, out, in, len, iv))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_xts_decrypt(ctx, out, in, len, iv))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int aeskl_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			    unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey_common);
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+#endif /* CONFIG_X86_64 */
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+#ifdef CONFIG_X86_64
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= aeskl_xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+#endif
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
 static int __init aeskl_init(void)
 {
+	u32 eax, ebx, ecx, edx;
+	int err;
+
 	if (!valid_keylocker())
 		return -ENODEV;
 
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
 	/*
 	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
 	 * support 192-bit keys. To make itself AES-compliant, it falls
@@ -99,12 +194,18 @@ static int __init aeskl_init(void)
 	if (!boot_cpu_has(X86_FEATURE_AES))
 		return -ENODEV;
 
+	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					     aeskl_simd_skciphers);
+	if (err)
+		return err;
+
 	return 0;
 }
 
 static void __exit aeskl_exit(void)
 {
-	return;
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
 }
 
 late_initcall(aeskl_init);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 12/12] crypto: x86/aes-kl - Support XTS mode
@ 2023-04-10 22:59     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-04-10 22:59 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, luto, H. Peter Anvin, bernie.keany,
	tglx, bp, gmazyland, charishma1.gairuboyina

Implement XTS mode using AES-KL. Export the methods with a lower priority
than AES-NI to avoid from selected by default.

The assembly code clobbers more than eight 128-bit registers to make use of
performant wide instructions that process eight 128-bit blocks at once.
But that many 128-bit registers are not available in 32-bit mode, so
support 64-bit mode only.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/crypto/aeskl-intel_asm.S  | 449 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 107 ++++++-
 2 files changed, 553 insertions(+), 3 deletions(-)

diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
index 17c7b306a766..05134327bcb2 100644
--- a/arch/x86/crypto/aeskl-intel_asm.S
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -182,3 +182,452 @@ SYM_FUNC_START(_aeskl_dec)
 	RET
 SYM_FUNC_END(_aeskl_dec)
 
+#ifdef __x86_64__
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+/*
+ * int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			  const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_encrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+	sub $128, LEN
+	jl .Lxts_enc1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_enc8_128
+	aesencwide256kl (%rdi)
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+	aesencwide128kl (%rdi)
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+	movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+	mov $1, AREG
+.Lxts_enc_ret:
+	FRAME_END
+	RET
+
+.Lxts_enc1_pre:
+	add $128, LEN
+	jz .Lxts_enc_ret_iv
+	sub $16, LEN
+	jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+	movdqu (INP), STATE1
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_enc1_out
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_enc_cts1
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+
+.Lxts_enc_cts1:
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_cts_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+	pxor IV, STATE1
+	movups STATE1, (OUTP)
+	jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(_aeskl_xts_encrypt)
+
+/*
+ * int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			  const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(_aeskl_xts_decrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+	test $15, LEN
+	jz .Lxts_dec8
+	sub $16, LEN
+
+.Lxts_dec8:
+	sub $128, LEN
+	jl .Lxts_dec1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_dec8_128
+	aesdecwide256kl (%rdi)
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+	aesdecwide128kl (%rdi)
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+	movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+	mov $1, AREG
+.Lxts_dec_ret:
+	FRAME_END
+	RET
+
+.Lxts_dec1_pre:
+	add $128, LEN
+	jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+	movdqu (INP), STATE1
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_dec_cts1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_dec1_out
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+	pxor IV, STATE1
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor STATE5, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+	pxor STATE5, STATE1
+
+	movups STATE1, (OUTP)
+	jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(_aeskl_xts_decrypt)
+
+#endif
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
index 0062baaaf7b2..6616a2fc6f04 100644
--- a/arch/x86/crypto/aeskl-intel_glue.c
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -27,8 +27,15 @@ asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsign
 asmlinkage int _aeskl_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage int _aeskl_dec(const void *ctx, u8 *out, const u8 *in);
 
-static int __maybe_unused aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx,
-					      const u8 *in_key, unsigned int key_len)
+#ifdef CONFIG_X86_64
+asmlinkage int _aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				  unsigned int len, u8 *iv);
+asmlinkage int _aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				  unsigned int len, u8 *iv);
+#endif
+
+static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+			       unsigned int key_len)
 {
 	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
 	int err;
@@ -86,11 +93,99 @@ static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
 		return 0;
 }
 
+#ifdef CONFIG_X86_64
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_xts_encrypt(ctx, out, in, len, iv))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (unlikely(ctx->key_length == AES_KEYSIZE_192))
+		return -EINVAL;
+	else if (!valid_keylocker())
+		return -ENODEV;
+	else if (_aeskl_xts_decrypt(ctx, out, in, len, iv))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int aeskl_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			    unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey_common);
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+
+	if (likely(keylength(crypto_skcipher_ctx(tfm)) != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+#endif /* CONFIG_X86_64 */
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+#ifdef CONFIG_X86_64
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= aeskl_xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+#endif
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
 static int __init aeskl_init(void)
 {
+	u32 eax, ebx, ecx, edx;
+	int err;
+
 	if (!valid_keylocker())
 		return -ENODEV;
 
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
 	/*
 	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
 	 * support 192-bit keys. To make itself AES-compliant, it falls
@@ -99,12 +194,18 @@ static int __init aeskl_init(void)
 	if (!boot_cpu_has(X86_FEATURE_AES))
 		return -ENODEV;
 
+	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					     aeskl_simd_skciphers);
+	if (err)
+		return err;
+
 	return 0;
 }
 
 static void __exit aeskl_exit(void)
 {
-	return;
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
 }
 
 late_initcall(aeskl_init);
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
@ 2023-05-05 23:05       ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-05 23:05 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Mon, Apr 10, 2023 at 03:59:31PM -0700, Chang S. Bae wrote:
> diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
> index e85dfb6c1524..820ac29c06d9 100644
> --- a/arch/x86/include/asm/keylocker.h
> +++ b/arch/x86/include/asm/keylocker.h
> @@ -5,6 +5,7 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include <asm/processor.h>
>  #include <linux/bits.h>
>  #include <asm/fpu/types.h>
>  
> @@ -28,5 +29,13 @@ struct iwkey {
>  #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
>  #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
>  
> +#ifdef CONFIG_X86_KEYLOCKER
> +void setup_keylocker(struct cpuinfo_x86 *c);
> +void destroy_keylocker_data(void);
> +#else
> +#define setup_keylocker(c) do { } while (0)
> +#define destroy_keylocker_data() do { } while (0)
> +#endif

Shouldn't the !CONFIG_X86_KEYLOCKER stubs be static inline functions instead of
macros, so that type checking works?

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2023-05-05 23:05       ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-05 23:05 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, ardb, dave.hansen, dan.j.williams, linux-kernel,
	mingo, lalithambika.krishnakumar, dm-devel, Ingo Molnar,
	Borislav Petkov, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, bp, gmazyland, charishma1.gairuboyina

On Mon, Apr 10, 2023 at 03:59:31PM -0700, Chang S. Bae wrote:
> diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
> index e85dfb6c1524..820ac29c06d9 100644
> --- a/arch/x86/include/asm/keylocker.h
> +++ b/arch/x86/include/asm/keylocker.h
> @@ -5,6 +5,7 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include <asm/processor.h>
>  #include <linux/bits.h>
>  #include <asm/fpu/types.h>
>  
> @@ -28,5 +29,13 @@ struct iwkey {
>  #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
>  #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
>  
> +#ifdef CONFIG_X86_KEYLOCKER
> +void setup_keylocker(struct cpuinfo_x86 *c);
> +void destroy_keylocker_data(void);
> +#else
> +#define setup_keylocker(c) do { } while (0)
> +#define destroy_keylocker_data() do { } while (0)
> +#endif

Shouldn't the !CONFIG_X86_KEYLOCKER stubs be static inline functions instead of
macros, so that type checking works?

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
  2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
@ 2023-05-05 23:09       ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-05 23:09 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Rafael J. Wysocki,
	linux-pm

On Mon, Apr 10, 2023 at 03:59:32PM -0700, Chang S. Bae wrote:
> +/*
> + * This flag is set with IWKey load. When the key restore fails, it is
> + * reset. This restore state is exported to the crypto library, then AES-KL
> + * will not be used there. So, the feature is soft-disabled with this flag.
> + */
> +static bool valid_kl;
> +
> +bool valid_keylocker(void)
> +{
> +	return valid_kl;
> +}
> +EXPORT_SYMBOL_GPL(valid_keylocker);

It would be simpler to export this bool directly.

> +	if (status & BIT(0))
> +		return 0;
> +	else
> +		return -EBUSY;
[...]
> +		pr_info("x86/keylocker: Enabled.\n");
> +		return;
> +	} else {
> +		int rc;

The kernel coding style usually doesn't use 'else' after a return.

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
@ 2023-05-05 23:09       ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-05 23:09 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, Rafael J. Wysocki, ardb, linux-pm, dave.hansen,
	dan.j.williams, linux-kernel, mingo, lalithambika.krishnakumar,
	dm-devel, Ingo Molnar, Borislav Petkov, linux-crypto, luto,
	H. Peter Anvin, bernie.keany, tglx, bp, gmazyland,
	charishma1.gairuboyina

On Mon, Apr 10, 2023 at 03:59:32PM -0700, Chang S. Bae wrote:
> +/*
> + * This flag is set with IWKey load. When the key restore fails, it is
> + * reset. This restore state is exported to the crypto library, then AES-KL
> + * will not be used there. So, the feature is soft-disabled with this flag.
> + */
> +static bool valid_kl;
> +
> +bool valid_keylocker(void)
> +{
> +	return valid_kl;
> +}
> +EXPORT_SYMBOL_GPL(valid_keylocker);

It would be simpler to export this bool directly.

> +	if (status & BIT(0))
> +		return 0;
> +	else
> +		return -EBUSY;
[...]
> +		pr_info("x86/keylocker: Enabled.\n");
> +		return;
> +	} else {
> +		int rc;

The kernel coding style usually doesn't use 'else' after a return.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
  2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
@ 2023-05-05 23:27       ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-05 23:27 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On Mon, Apr 10, 2023 at 03:59:34PM -0700, Chang S. Bae wrote:
> Refactor the common C code to avoid code duplication. The AES-NI code uses
> it with a function pointer argument to call back the AES-NI assembly code.
> So will the AES-KL code.

Actually, the AES-NI XTS glue code currently makes direct calls to the assembly
code.  This patch changes it to make indirect calls.  Indirect calls are very
expensive these days, partly due to all the speculative execution mitigations.
So this patch likely causes a performance regression.  How about making
xts_crypt_common() and xts_setkey_common() be inline functions?

Another issue with having the above be exported symbols is that their names are
too generic, so they could easily collide with another symbols in the kernel.
To be exported symbols, they would need something x86-specific in their names.

>  arch/x86/crypto/Makefile           |   2 +-
>  arch/x86/crypto/aes-intel_asm.S    |  26 ++++
>  arch/x86/crypto/aes-intel_glue.c   | 127 ++++++++++++++++
>  arch/x86/crypto/aes-intel_glue.h   |  44 ++++++
>  arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
>  arch/x86/crypto/aesni-intel_glue.c | 235 +++++++++--------------------
>  arch/x86/crypto/aesni-intel_glue.h |  17 +++

It's confusing having aes-intel, aesni-intel, *and* aeskl-intel.  Maybe call the
first one "aes-helpers" or "aes-common" instead?

> +struct aes_xts_ctx {
> +	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
> +	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
> +};

This struct does not make sense.  It should look like:

    struct aes_xts_ctx {
            struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
            struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
    };

The runtime alignment to a 16-byte boundary should happen when translating the
raw crypto_skcipher_ctx() into the pointer to the aes_xts_ctx.  It should not
happen when accessing each individual field in the aes_xts_ctx.

>  /*
> - * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> - *                   unsigned int key_len)
> + * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> + *                    unsigned int key_len)
>   */

It's conventional to use two leading underscores, not one.

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
@ 2023-05-05 23:27       ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-05 23:27 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, mingo, lalithambika.krishnakumar, dm-devel,
	Ingo Molnar, Borislav Petkov, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, bp, gmazyland, charishma1.gairuboyina

On Mon, Apr 10, 2023 at 03:59:34PM -0700, Chang S. Bae wrote:
> Refactor the common C code to avoid code duplication. The AES-NI code uses
> it with a function pointer argument to call back the AES-NI assembly code.
> So will the AES-KL code.

Actually, the AES-NI XTS glue code currently makes direct calls to the assembly
code.  This patch changes it to make indirect calls.  Indirect calls are very
expensive these days, partly due to all the speculative execution mitigations.
So this patch likely causes a performance regression.  How about making
xts_crypt_common() and xts_setkey_common() be inline functions?

Another issue with having the above be exported symbols is that their names are
too generic, so they could easily collide with another symbols in the kernel.
To be exported symbols, they would need something x86-specific in their names.

>  arch/x86/crypto/Makefile           |   2 +-
>  arch/x86/crypto/aes-intel_asm.S    |  26 ++++
>  arch/x86/crypto/aes-intel_glue.c   | 127 ++++++++++++++++
>  arch/x86/crypto/aes-intel_glue.h   |  44 ++++++
>  arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
>  arch/x86/crypto/aesni-intel_glue.c | 235 +++++++++--------------------
>  arch/x86/crypto/aesni-intel_glue.h |  17 +++

It's confusing having aes-intel, aesni-intel, *and* aeskl-intel.  Maybe call the
first one "aes-helpers" or "aes-common" instead?

> +struct aes_xts_ctx {
> +	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
> +	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
> +};

This struct does not make sense.  It should look like:

    struct aes_xts_ctx {
            struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
            struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
    };

The runtime alignment to a 16-byte boundary should happen when translating the
raw crypto_skcipher_ctx() into the pointer to the aes_xts_ctx.  It should not
happen when accessing each individual field in the aes_xts_ctx.

>  /*
> - * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> - *                   unsigned int key_len)
> + * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> + *                    unsigned int key_len)
>   */

It's conventional to use two leading underscores, not one.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
@ 2023-05-06  0:01       ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-06  0:01 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Tom Rix

On Mon, Apr 10, 2023 at 03:59:35PM -0700, Chang S. Bae wrote:
> [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

Thanks for dropping the unnecessary modes of operation (CBC, ECB, CTR).  It
simplified the patchset quite a bit!

Now that only AES-XTS is included, can you please also merge this patch with the
following patch?  As-is, this patch is misleading since it doesn't actually add
"support" for anything at all.  It actually just adds an unfinished AES-XTS
implementation, which patch 12 then finishes.  I assume that the current
patchset organization is left over from when you were trying to support multiple
modes of operation.  IMO, it would be much easier to review if patches 11-12
were merged into one patch that adds the new AES-XTS implementation.

> For disk encryption, storage bandwidth may be the bottleneck
> before encryption bandwidth, but the potential performance difference is
> why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
> the kernel transparently replacing AES-NI usage with AES-KL.

This does not correctly describe what is going on.  Actually, this patchset
registers the AES-KL XTS algorithm with the usual name "xts(aes)".  So, it can
potentially be used by any AES-XTS user.  It seems that you're actually relying
on the algorithm priorities to prioritize AES-NI, as you've assigned priority
200 to AES-KL, whereas AES-NI has priority 401.  Is that what you intend, and if
so can you please update your explanation to properly explain this?

The alternative would be to use a unique algorithm name, such as
"keylocker-xts(aes)".  I'm not sure that would be better, given that the
algorithms are compatible.  However, that actually would seem to match the
explanation you gave more closely, so perhaps that's what you actually intended?

> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
[...]
> +#ifdef __x86_64__
> +#define AREG	%rax
> +#define HANDLEP	%rdi
> +#define OUTP	%rsi
> +#define KLEN	%r9d
> +#define INP	%rdx
> +#define T1	%r10
> +#define LEN	%rcx
> +#define IVP	%r8
> +#else
> +#define AREG	%eax
> +#define HANDLEP	%edi
> +#define OUTP	AREG
> +#define KLEN	%ebx
> +#define INP	%edx
> +#define T1	%ecx
> +#define LEN	%esi
> +#define IVP	%ebp
> +#endif

I strongly recommend skipping the 32-bit support, as it's unlikely to be worth
the effort.

And actually, aeskl-intel_glue.c only registers the algorithm for 64-bit
anyway...  So the 32-bit code paths are untested dead code.

> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
> +{
> +	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
> +		return -EINVAL;
> +	else if (!valid_keylocker())
> +		return -ENODEV;
> +	else if (_aeskl_enc(ctx, out, in))
> +		return -EINVAL;
> +	else
> +		return 0;
> +}
> +
> +static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
> +{
> +	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
> +		return -EINVAL;
> +	else if (!valid_keylocker())
> +		return -ENODEV;
> +	else if (_aeskl_dec(ctx, out, in))
> +		return -EINVAL;
> +	else
> +		return 0;
> +}

I don't think the above two functions should exist.  keylength() and
valid_keylocker() should be checked before calling xts_crypt_common(), and the
assembly functions should just return the correct error code (-EINVAL,
apparently) instead of an unspecified nonzero value.  That would leave nothing
for a wrapper function to do.

Note: if you take this suggestion, the assembly functions would need to use
SYM_TYPED_FUNC_START instead of SYM_FUNC_START.

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2023-05-06  0:01       ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-06  0:01 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: Tom Rix, dave.hansen, lalithambika.krishnakumar, dm-devel,
	H. Peter Anvin, ardb, herbert, x86, mingo, Ingo Molnar, bp,
	gmazyland, Nathan Chancellor, Borislav Petkov, luto,
	dan.j.williams, charishma1.gairuboyina, Nick Desaulniers,
	linux-kernel, tglx, linux-crypto, bernie.keany, David S. Miller

On Mon, Apr 10, 2023 at 03:59:35PM -0700, Chang S. Bae wrote:
> [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions

Thanks for dropping the unnecessary modes of operation (CBC, ECB, CTR).  It
simplified the patchset quite a bit!

Now that only AES-XTS is included, can you please also merge this patch with the
following patch?  As-is, this patch is misleading since it doesn't actually add
"support" for anything at all.  It actually just adds an unfinished AES-XTS
implementation, which patch 12 then finishes.  I assume that the current
patchset organization is left over from when you were trying to support multiple
modes of operation.  IMO, it would be much easier to review if patches 11-12
were merged into one patch that adds the new AES-XTS implementation.

> For disk encryption, storage bandwidth may be the bottleneck
> before encryption bandwidth, but the potential performance difference is
> why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
> the kernel transparently replacing AES-NI usage with AES-KL.

This does not correctly describe what is going on.  Actually, this patchset
registers the AES-KL XTS algorithm with the usual name "xts(aes)".  So, it can
potentially be used by any AES-XTS user.  It seems that you're actually relying
on the algorithm priorities to prioritize AES-NI, as you've assigned priority
200 to AES-KL, whereas AES-NI has priority 401.  Is that what you intend, and if
so can you please update your explanation to properly explain this?

The alternative would be to use a unique algorithm name, such as
"keylocker-xts(aes)".  I'm not sure that would be better, given that the
algorithms are compatible.  However, that actually would seem to match the
explanation you gave more closely, so perhaps that's what you actually intended?

> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
[...]
> +#ifdef __x86_64__
> +#define AREG	%rax
> +#define HANDLEP	%rdi
> +#define OUTP	%rsi
> +#define KLEN	%r9d
> +#define INP	%rdx
> +#define T1	%r10
> +#define LEN	%rcx
> +#define IVP	%r8
> +#else
> +#define AREG	%eax
> +#define HANDLEP	%edi
> +#define OUTP	AREG
> +#define KLEN	%ebx
> +#define INP	%edx
> +#define T1	%ecx
> +#define LEN	%esi
> +#define IVP	%ebp
> +#endif

I strongly recommend skipping the 32-bit support, as it's unlikely to be worth
the effort.

And actually, aeskl-intel_glue.c only registers the algorithm for 64-bit
anyway...  So the 32-bit code paths are untested dead code.

> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
> +{
> +	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
> +		return -EINVAL;
> +	else if (!valid_keylocker())
> +		return -ENODEV;
> +	else if (_aeskl_enc(ctx, out, in))
> +		return -EINVAL;
> +	else
> +		return 0;
> +}
> +
> +static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
> +{
> +	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
> +		return -EINVAL;
> +	else if (!valid_keylocker())
> +		return -ENODEV;
> +	else if (_aeskl_dec(ctx, out, in))
> +		return -EINVAL;
> +	else
> +		return 0;
> +}

I don't think the above two functions should exist.  keylength() and
valid_keylocker() should be checked before calling xts_crypt_common(), and the
assembly functions should just return the correct error code (-EINVAL,
apparently) instead of an unspecified nonzero value.  That would leave nothing
for a wrapper function to do.

Note: if you take this suggestion, the assembly functions would need to use
SYM_TYPED_FUNC_START instead of SYM_FUNC_START.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2023-05-06  0:01       ` [dm-devel] " Eric Biggers
@ 2023-05-08 18:18         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 18:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Tom Rix

On 5/5/2023 5:01 PM, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 03:59:35PM -0700, Chang S. Bae wrote:
>> [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
> 
> Thanks for dropping the unnecessary modes of operation (CBC, ECB, CTR).  It
> simplified the patchset quite a bit!

Yeah. But, there are more things to go away here as you pointed out here.

I thought some generic establishment (patch10) then introduce some 
mode-specific code (patch11). Considerably, this incremental change was 
expected to help reviewers.

Now I realize this introduces dead code on its hindsight. And this 
approach seems not helping that much.

> Now that only AES-XTS is included, can you please also merge this patch with the
> following patch?  As-is, this patch is misleading since it doesn't actually add
> "support" for anything at all.  It actually just adds an unfinished AES-XTS
> implementation, which patch 12 then finishes.  I assume that the current
> patchset organization is left over from when you were trying to support multiple
> modes of operation.  IMO, it would be much easier to review if patches 11-12
> were merged into one patch that adds the new AES-XTS implementation.

Yes, I agree to merge them.

>> For disk encryption, storage bandwidth may be the bottleneck
>> before encryption bandwidth, but the potential performance difference is
>> why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
>> the kernel transparently replacing AES-NI usage with AES-KL.
> 
> This does not correctly describe what is going on.  Actually, this patchset
> registers the AES-KL XTS algorithm with the usual name "xts(aes)".  So, it can
> potentially be used by any AES-XTS user.  It seems that you're actually relying
> on the algorithm priorities to prioritize AES-NI, as you've assigned priority
> 200 to AES-KL, whereas AES-NI has priority 401.  Is that what you intend, and if
> so can you please update your explanation to properly explain this?

I think AES-KL could be a drop-in replacement for AES-NI IF it performs 
well -- something on par with AES-NI or better, AND it also supports all 
the key sizes. But, it can't be the default because that's not the case 
(at least for now).

> The alternative would be to use a unique algorithm name, such as
> "keylocker-xts(aes)".  I'm not sure that would be better, given that the
> algorithms are compatible.  However, that actually would seem to match the
> explanation you gave more closely, so perhaps that's what you actually intended?

I think those AES implementations are functionally the same to end 
users. The key envelopment is not something user-visible to my 
understanding. So, I thought that same name makes sense.

Now looking at the changelog, this text in the 'performance' section 
appears to be relevant:

  > the potential performance difference is why AES-KL is advertised as
  > a distinct cipher in /proc/crypto rather than the kernel
  > transparently replacing AES-NI usage with AES-KL.

But, this does not seem to be clear enough. Maybe, this exposition story 
can go under a new section. The changelog is already tl;dr...

> I strongly recommend skipping the 32-bit support, as it's unlikely to be worth
> the effort.
> 
> And actually, aeskl-intel_glue.c only registers the algorithm for 64-bit
> anyway...  So the 32-bit code paths are untested dead code.

Yeah, will do. Also, I'd make the module available only with X86-64. 
Then, a bit of comments for the reason should come along.

>> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
>> +{
>> +	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
>> +		return -EINVAL;
>> +	else if (!valid_keylocker())
>> +		return -ENODEV;
>> +	else if (_aeskl_enc(ctx, out, in))
>> +		return -EINVAL;
>> +	else
>> +		return 0;
>> +}
>> +
>> +static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
>> +{
>> +	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
>> +		return -EINVAL;
>> +	else if (!valid_keylocker())
>> +		return -ENODEV;
>> +	else if (_aeskl_dec(ctx, out, in))
>> +		return -EINVAL;
>> +	else
>> +		return 0;
>> +}
> 
> I don't think the above two functions should exist.  keylength() and
> valid_keylocker() should be checked before calling xts_crypt_common(), and the
> assembly functions should just return the correct error code (-EINVAL,
> apparently) instead of an unspecified nonzero value.  That would leave nothing
> for a wrapper function to do.
> 
> Note: if you take this suggestion, the assembly functions would need to use
> SYM_TYPED_FUNC_START instead of SYM_FUNC_START.

I thought this is something benign to stay here. But, yes, I agree that 
it is better to simplify the code.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2023-05-08 18:18         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 18:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Tom Rix, dave.hansen, lalithambika.krishnakumar, dm-devel,
	H. Peter Anvin, ardb, herbert, x86, mingo, Ingo Molnar, bp,
	gmazyland, Nathan Chancellor, Borislav Petkov, luto,
	dan.j.williams, charishma1.gairuboyina, Nick Desaulniers,
	linux-kernel, tglx, linux-crypto, bernie.keany, David S. Miller

On 5/5/2023 5:01 PM, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 03:59:35PM -0700, Chang S. Bae wrote:
>> [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
> 
> Thanks for dropping the unnecessary modes of operation (CBC, ECB, CTR).  It
> simplified the patchset quite a bit!

Yeah. But, there are more things to go away here as you pointed out here.

I thought some generic establishment (patch10) then introduce some 
mode-specific code (patch11). Considerably, this incremental change was 
expected to help reviewers.

Now I realize this introduces dead code on its hindsight. And this 
approach seems not helping that much.

> Now that only AES-XTS is included, can you please also merge this patch with the
> following patch?  As-is, this patch is misleading since it doesn't actually add
> "support" for anything at all.  It actually just adds an unfinished AES-XTS
> implementation, which patch 12 then finishes.  I assume that the current
> patchset organization is left over from when you were trying to support multiple
> modes of operation.  IMO, it would be much easier to review if patches 11-12
> were merged into one patch that adds the new AES-XTS implementation.

Yes, I agree to merge them.

>> For disk encryption, storage bandwidth may be the bottleneck
>> before encryption bandwidth, but the potential performance difference is
>> why AES-KL is advertised as a distinct cipher in /proc/crypto rather than
>> the kernel transparently replacing AES-NI usage with AES-KL.
> 
> This does not correctly describe what is going on.  Actually, this patchset
> registers the AES-KL XTS algorithm with the usual name "xts(aes)".  So, it can
> potentially be used by any AES-XTS user.  It seems that you're actually relying
> on the algorithm priorities to prioritize AES-NI, as you've assigned priority
> 200 to AES-KL, whereas AES-NI has priority 401.  Is that what you intend, and if
> so can you please update your explanation to properly explain this?

I think AES-KL could be a drop-in replacement for AES-NI IF it performs 
well -- something on par with AES-NI or better, AND it also supports all 
the key sizes. But, it can't be the default because that's not the case 
(at least for now).

> The alternative would be to use a unique algorithm name, such as
> "keylocker-xts(aes)".  I'm not sure that would be better, given that the
> algorithms are compatible.  However, that actually would seem to match the
> explanation you gave more closely, so perhaps that's what you actually intended?

I think those AES implementations are functionally the same to end 
users. The key envelopment is not something user-visible to my 
understanding. So, I thought that same name makes sense.

Now looking at the changelog, this text in the 'performance' section 
appears to be relevant:

  > the potential performance difference is why AES-KL is advertised as
  > a distinct cipher in /proc/crypto rather than the kernel
  > transparently replacing AES-NI usage with AES-KL.

But, this does not seem to be clear enough. Maybe, this exposition story 
can go under a new section. The changelog is already tl;dr...

> I strongly recommend skipping the 32-bit support, as it's unlikely to be worth
> the effort.
> 
> And actually, aeskl-intel_glue.c only registers the algorithm for 64-bit
> anyway...  So the 32-bit code paths are untested dead code.

Yeah, will do. Also, I'd make the module available only with X86-64. 
Then, a bit of comments for the reason should come along.

>> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
>> +{
>> +	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
>> +		return -EINVAL;
>> +	else if (!valid_keylocker())
>> +		return -ENODEV;
>> +	else if (_aeskl_enc(ctx, out, in))
>> +		return -EINVAL;
>> +	else
>> +		return 0;
>> +}
>> +
>> +static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
>> +{
>> +	if (unlikely(keylength(ctx) == AES_KEYSIZE_192))
>> +		return -EINVAL;
>> +	else if (!valid_keylocker())
>> +		return -ENODEV;
>> +	else if (_aeskl_dec(ctx, out, in))
>> +		return -EINVAL;
>> +	else
>> +		return 0;
>> +}
> 
> I don't think the above two functions should exist.  keylength() and
> valid_keylocker() should be checked before calling xts_crypt_common(), and the
> assembly functions should just return the correct error code (-EINVAL,
> apparently) instead of an unspecified nonzero value.  That would leave nothing
> for a wrapper function to do.
> 
> Note: if you take this suggestion, the assembly functions would need to use
> SYM_TYPED_FUNC_START instead of SYM_FUNC_START.

I thought this is something benign to stay here. But, yes, I agree that 
it is better to simplify the code.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
  2023-05-05 23:09       ` [dm-devel] " Eric Biggers
@ 2023-05-08 18:18         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 18:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Rafael J. Wysocki,
	linux-pm

On 5/5/2023 4:09 PM, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 03:59:32PM -0700, Chang S. Bae wrote:
>> +/*
>> + * This flag is set with IWKey load. When the key restore fails, it is
>> + * reset. This restore state is exported to the crypto library, then AES-KL
>> + * will not be used there. So, the feature is soft-disabled with this flag.
>> + */
>> +static bool valid_kl;
>> +
>> +bool valid_keylocker(void)
>> +{
>> +	return valid_kl;
>> +}
>> +EXPORT_SYMBOL_GPL(valid_keylocker);
> 
> It would be simpler to export this bool directly.

Yeah, but this wrapper is for code encapsulation. The code outside of 
the core code is not allowed to overwrite the value.

Perhaps, it is better to export it only with the AES-KL module:

#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
EXPORT_SYMBOL_GPL(valid_keylocker);
#endif


>> +	if (status & BIT(0))
>> +		return 0;
>> +	else
>> +		return -EBUSY;
> [...]
>> +		pr_info("x86/keylocker: Enabled.\n");
>> +		return;
>> +	} else {
>> +		int rc;
> 
> The kernel coding style usually doesn't use 'else' after a return.

Yeah, right. Will fix up.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4
@ 2023-05-08 18:18         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 18:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, Rafael J. Wysocki, ardb, linux-pm, dave.hansen,
	dan.j.williams, linux-kernel, mingo, lalithambika.krishnakumar,
	dm-devel, Ingo Molnar, Borislav Petkov, linux-crypto, luto,
	H. Peter Anvin, bernie.keany, tglx, bp, gmazyland,
	charishma1.gairuboyina

On 5/5/2023 4:09 PM, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 03:59:32PM -0700, Chang S. Bae wrote:
>> +/*
>> + * This flag is set with IWKey load. When the key restore fails, it is
>> + * reset. This restore state is exported to the crypto library, then AES-KL
>> + * will not be used there. So, the feature is soft-disabled with this flag.
>> + */
>> +static bool valid_kl;
>> +
>> +bool valid_keylocker(void)
>> +{
>> +	return valid_kl;
>> +}
>> +EXPORT_SYMBOL_GPL(valid_keylocker);
> 
> It would be simpler to export this bool directly.

Yeah, but this wrapper is for code encapsulation. The code outside of 
the core code is not allowed to overwrite the value.

Perhaps, it is better to export it only with the AES-KL module:

#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
EXPORT_SYMBOL_GPL(valid_keylocker);
#endif


>> +	if (status & BIT(0))
>> +		return 0;
>> +	else
>> +		return -EBUSY;
> [...]
>> +		pr_info("x86/keylocker: Enabled.\n");
>> +		return;
>> +	} else {
>> +		int rc;
> 
> The kernel coding style usually doesn't use 'else' after a return.

Yeah, right. Will fix up.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2023-05-05 23:05       ` [dm-devel] " Eric Biggers
@ 2023-05-08 18:18         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 18:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 5/5/2023 4:05 PM, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 03:59:31PM -0700, Chang S. Bae wrote:
>>   
>> +#ifdef CONFIG_X86_KEYLOCKER
>> +void setup_keylocker(struct cpuinfo_x86 *c);
>> +void destroy_keylocker_data(void);
>> +#else
>> +#define setup_keylocker(c) do { } while (0)
>> +#define destroy_keylocker_data() do { } while (0)
>> +#endif
> 
> Shouldn't the !CONFIG_X86_KEYLOCKER stubs be static inline functions instead of
> macros, so that type checking works?

I think either way works here. This macro is just for nothing.

Thanks,
Chang


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2023-05-08 18:18         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 18:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, ardb, dave.hansen, dan.j.williams, linux-kernel,
	mingo, lalithambika.krishnakumar, dm-devel, Ingo Molnar,
	Borislav Petkov, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, bp, gmazyland, charishma1.gairuboyina

On 5/5/2023 4:05 PM, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 03:59:31PM -0700, Chang S. Bae wrote:
>>   
>> +#ifdef CONFIG_X86_KEYLOCKER
>> +void setup_keylocker(struct cpuinfo_x86 *c);
>> +void destroy_keylocker_data(void);
>> +#else
>> +#define setup_keylocker(c) do { } while (0)
>> +#define destroy_keylocker_data() do { } while (0)
>> +#endif
> 
> Shouldn't the !CONFIG_X86_KEYLOCKER stubs be static inline functions instead of
> macros, so that type checking works?

I think either way works here. This macro is just for nothing.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* RE: [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
@ 2023-05-08 19:18       ` Elliott, Robert (Servers)
  -1 siblings, 0 replies; 247+ messages in thread
From: Elliott, Robert (Servers) @ 2023-05-08 19:18 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin


> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
...
> +void __init destroy_keylocker_data(void)
> +{
> +	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
> +}

That's a special value for garbage collected keyring keys assigned
a keytype of ".dead". memzero() or memzero_explicit() might be better 
for this use case.



^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2023-05-08 19:18       ` Elliott, Robert (Servers)
  0 siblings, 0 replies; 247+ messages in thread
From: Elliott, Robert (Servers) @ 2023-05-08 19:18 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, Borislav Petkov, luto,
	H. Peter Anvin, bernie.keany, tglx, bp, gmazyland,
	charishma1.gairuboyina


> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
...
> +void __init destroy_keylocker_data(void)
> +{
> +	memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
> +}

That's a special value for garbage collected keyring keys assigned
a keytype of ".dead". memzero() or memzero_explicit() might be better 
for this use case.


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* RE: [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
@ 2023-05-08 19:21       ` Elliott, Robert (Servers)
  -1 siblings, 0 replies; 247+ messages in thread
From: Elliott, Robert (Servers) @ 2023-05-08 19:21 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Tom Rix


> diff --git a/crypto/Kconfig b/crypto/Kconfig
...
> +config AS_HAS_KEYLOCKER
> +	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
> +	help
> +	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
> +
> +config CRYPTO_AES_KL
> +	tristate "AES cipher algorithms (AES-KL)"
> +	depends on AS_HAS_KEYLOCKER
> +	depends on CRYPTO_AES_NI_INTEL
> +	select X86_KEYLOCKER

This material belongs in arch/x86/Kconfig now (which didn't exist when
this patch series began).



^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2023-05-08 19:21       ` Elliott, Robert (Servers)
  0 siblings, 0 replies; 247+ messages in thread
From: Elliott, Robert (Servers) @ 2023-05-08 19:21 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: Nathan Chancellor, x86, herbert, David S. Miller, ardb,
	Nick Desaulniers, dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Tom Rix, Ingo Molnar, Borislav Petkov,
	luto, H. Peter Anvin, bernie.keany, tglx, bp, gmazyland,
	charishma1.gairuboyina


> diff --git a/crypto/Kconfig b/crypto/Kconfig
...
> +config AS_HAS_KEYLOCKER
> +	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
> +	help
> +	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
> +
> +config CRYPTO_AES_KL
> +	tristate "AES cipher algorithms (AES-KL)"
> +	depends on AS_HAS_KEYLOCKER
> +	depends on CRYPTO_AES_NI_INTEL
> +	select X86_KEYLOCKER

This material belongs in arch/x86/Kconfig now (which didn't exist when
this patch series began).


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* RE: [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2023-05-08 19:21       ` [dm-devel] " Elliott, Robert (Servers)
@ 2023-05-08 19:24         ` Elliott, Robert (Servers)
  -1 siblings, 0 replies; 247+ messages in thread
From: Elliott, Robert (Servers) @ 2023-05-08 19:24 UTC (permalink / raw)
  To: Elliott, Robert (Servers),
	Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Tom Rix



> -----Original Message-----
> From: Elliott, Robert (Servers) <elliott@hpe.com>
...
> This material belongs in arch/x86/Kconfig now (which didn't exist when
> this patch series began).


Sorry, omitted one level:
	arch/x86/crypto/Kconfig



^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2023-05-08 19:24         ` Elliott, Robert (Servers)
  0 siblings, 0 replies; 247+ messages in thread
From: Elliott, Robert (Servers) @ 2023-05-08 19:24 UTC (permalink / raw)
  To: Elliott, Robert (Servers),
	Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: Nathan Chancellor, x86, herbert, David S. Miller, ardb,
	Nick Desaulniers, dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Tom Rix, Ingo Molnar, Borislav Petkov,
	luto, H. Peter Anvin, bernie.keany, tglx, bp, gmazyland,
	charishma1.gairuboyina



> -----Original Message-----
> From: Elliott, Robert (Servers) <elliott@hpe.com>
...
> This material belongs in arch/x86/Kconfig now (which didn't exist when
> this patch series began).


Sorry, omitted one level:
	arch/x86/crypto/Kconfig


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2023-05-08 19:24         ` [dm-devel] " Elliott, Robert (Servers)
@ 2023-05-08 20:00           ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 20:00 UTC (permalink / raw)
  To: Elliott, Robert (Servers), linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, Lutomirski, Andy, dave.hansen, tglx, bp,
	mingo, x86, herbert, ardb, Williams, Dan J, Keany, Bernie,
	Gairuboyina, Charishma1, Krishnakumar, Lalithambika,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Rix, Tom

On 5/8/2023 12:24 PM, Elliott, Robert (Servers) wrote:
> ...
>> This material belongs in arch/x86/Kconfig now (which didn't exist when
>> this patch series began).
> 
> 
> Sorry, omitted one level:
>          arch/x86/crypto/Kconfig

Thanks for pointing this out! Otherwise, it might happen to be based on 
the old mess before 28a936ef44e1 ("crypto: Kconfig - move x86 entries to 
a submenu")

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2023-05-08 20:00           ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 20:00 UTC (permalink / raw)
  To: Elliott, Robert (Servers), linux-kernel, linux-crypto, dm-devel
  Cc: Nathan Chancellor, x86, herbert, David S. Miller, ardb,
	Nick Desaulniers, dave.hansen, Williams, Dan J, mingo, ebiggers,
	Krishnakumar,  Lalithambika, Rix, Tom, Ingo Molnar,
	Borislav Petkov, Lutomirski, Andy, H. Peter Anvin, Keany, Bernie,
	tglx, bp, gmazyland, Gairuboyina, Charishma1

On 5/8/2023 12:24 PM, Elliott, Robert (Servers) wrote:
> ...
>> This material belongs in arch/x86/Kconfig now (which didn't exist when
>> this patch series began).
> 
> 
> Sorry, omitted one level:
>          arch/x86/crypto/Kconfig

Thanks for pointing this out! Otherwise, it might happen to be based on 
the old mess before 28a936ef44e1 ("crypto: Kconfig - move x86 entries to 
a submenu")

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2023-05-08 19:18       ` [dm-devel] " Elliott, Robert (Servers)
@ 2023-05-08 20:15         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 20:15 UTC (permalink / raw)
  To: Elliott, Robert (Servers), linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, Lutomirski, Andy, dave.hansen, tglx, bp,
	mingo, x86, herbert, ardb, Williams, Dan J, Keany, Bernie,
	Gairuboyina, Charishma1, Krishnakumar, Lalithambika, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin

On 5/8/2023 12:18 PM, Elliott, Robert (Servers) wrote:
> 
>> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
> ...
>> +void __init destroy_keylocker_data(void)
>> +{
>> +     memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
>> +}
> 
> That's a special value for garbage collected keyring keys assigned
> a keytype of ".dead". memzero() or memzero_explicit() might be better
> for this use case.
memzero() looks to be the same as memset() in x86:

$ git grep memzero arch/x86/ | grep define
arch/x86/boot/compressed/misc.c:#define memzero(s, n)   memset((s), 0, (n))

Instead, memzero_explicit() looks to be about the right call here:

/**
  * memzero_explicit - Fill a region of memory (e.g. sensitive
  *		      keying data) with 0s.
  ...
  * Note: usually using memset() is just fine (!), but in cases
  * where clearing out _local_ data at the end of a scope is
  * necessary, memzero_explicit() should be used instead in
  * order to prevent the compiler from optimising away zeroing.
  ...

Then,

void __init destroy_keylocker_data(void)
{
	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
}

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2023-05-08 20:15         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-08 20:15 UTC (permalink / raw)
  To: Elliott, Robert (Servers), linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, dave.hansen, Williams, Dan J, mingo,
	ebiggers, Krishnakumar,  Lalithambika, Ingo Molnar,
	Borislav Petkov, Lutomirski, Andy, H. Peter Anvin, Keany, Bernie,
	tglx, bp, gmazyland, Gairuboyina, Charishma1

On 5/8/2023 12:18 PM, Elliott, Robert (Servers) wrote:
> 
>> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
> ...
>> +void __init destroy_keylocker_data(void)
>> +{
>> +     memset(&kl_setup.key, KEY_DESTROY, sizeof(kl_setup.key));
>> +}
> 
> That's a special value for garbage collected keyring keys assigned
> a keytype of ".dead". memzero() or memzero_explicit() might be better
> for this use case.
memzero() looks to be the same as memset() in x86:

$ git grep memzero arch/x86/ | grep define
arch/x86/boot/compressed/misc.c:#define memzero(s, n)   memset((s), 0, (n))

Instead, memzero_explicit() looks to be about the right call here:

/**
  * memzero_explicit - Fill a region of memory (e.g. sensitive
  *		      keying data) with 0s.
  ...
  * Note: usually using memset() is just fine (!), but in cases
  * where clearing out _local_ data at the end of a scope is
  * necessary, memzero_explicit() should be used instead in
  * order to prevent the compiler from optimising away zeroing.
  ...

Then,

void __init destroy_keylocker_data(void)
{
	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
}

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2023-05-08 18:18         ` [dm-devel] " Chang S. Bae
@ 2023-05-08 21:56           ` Dave Hansen
  -1 siblings, 0 replies; 247+ messages in thread
From: Dave Hansen @ 2023-05-08 21:56 UTC (permalink / raw)
  To: Chang S. Bae, Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 5/8/23 11:18, Chang S. Bae wrote:
> On 5/5/2023 4:05 PM, Eric Biggers wrote:
>> On Mon, Apr 10, 2023 at 03:59:31PM -0700, Chang S. Bae wrote:
>>>   +#ifdef CONFIG_X86_KEYLOCKER
>>> +void setup_keylocker(struct cpuinfo_x86 *c);
>>> +void destroy_keylocker_data(void);
>>> +#else
>>> +#define setup_keylocker(c) do { } while (0)
>>> +#define destroy_keylocker_data() do { } while (0)
>>> +#endif
>>
>> Shouldn't the !CONFIG_X86_KEYLOCKER stubs be static inline functions
>> instead of
>> macros, so that type checking works?
> 
> I think either way works here. This macro is just for nothing.

Chang, I do prefer the 'static inline' as a general rule.  Think of this:

static inline void setup_keylocker(struct cpuinfo_x86 *c) {}

versus:

#define setup_keylocker(c) do { } while (0)

Imagine some dope does:

	char c;
	...
	setup_keylocker(c);

With the macro, they'll get no type warning.  The inline actually makes
it easier to find bugs because folks will get _some_ type checking no
matter how they compile the code.

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2023-05-08 21:56           ` Dave Hansen
  0 siblings, 0 replies; 247+ messages in thread
From: Dave Hansen @ 2023-05-08 21:56 UTC (permalink / raw)
  To: Chang S. Bae, Eric Biggers
  Cc: x86, herbert, ardb, dave.hansen, dan.j.williams, linux-kernel,
	mingo, lalithambika.krishnakumar, dm-devel, Ingo Molnar,
	Borislav Petkov, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, bp, gmazyland, charishma1.gairuboyina

On 5/8/23 11:18, Chang S. Bae wrote:
> On 5/5/2023 4:05 PM, Eric Biggers wrote:
>> On Mon, Apr 10, 2023 at 03:59:31PM -0700, Chang S. Bae wrote:
>>>   +#ifdef CONFIG_X86_KEYLOCKER
>>> +void setup_keylocker(struct cpuinfo_x86 *c);
>>> +void destroy_keylocker_data(void);
>>> +#else
>>> +#define setup_keylocker(c) do { } while (0)
>>> +#define destroy_keylocker_data() do { } while (0)
>>> +#endif
>>
>> Shouldn't the !CONFIG_X86_KEYLOCKER stubs be static inline functions
>> instead of
>> macros, so that type checking works?
> 
> I think either way works here. This macro is just for nothing.

Chang, I do prefer the 'static inline' as a general rule.  Think of this:

static inline void setup_keylocker(struct cpuinfo_x86 *c) {}

versus:

#define setup_keylocker(c) do { } while (0)

Imagine some dope does:

	char c;
	...
	setup_keylocker(c);

With the macro, they'll get no type warning.  The inline actually makes
it easier to find bugs because folks will get _some_ type checking no
matter how they compile the code.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2023-05-08 21:56           ` [dm-devel] " Dave Hansen
@ 2023-05-09  0:31             ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-09  0:31 UTC (permalink / raw)
  To: Dave Hansen, Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 5/8/2023 2:56 PM, Dave Hansen wrote:
> On 5/8/23 11:18, Chang S. Bae wrote:
>> On 5/5/2023 4:05 PM, Eric Biggers wrote:
>>> On Mon, Apr 10, 2023 at 03:59:31PM -0700, Chang S. Bae wrote:
>>>>    +#ifdef CONFIG_X86_KEYLOCKER
>>>> +void setup_keylocker(struct cpuinfo_x86 *c);
>>>> +void destroy_keylocker_data(void);
>>>> +#else
>>>> +#define setup_keylocker(c) do { } while (0)
>>>> +#define destroy_keylocker_data() do { } while (0)
>>>> +#endif
>>>
>>> Shouldn't the !CONFIG_X86_KEYLOCKER stubs be static inline functions
>>> instead of
>>> macros, so that type checking works?
>>
>> I think either way works here. This macro is just for nothing.
> 
> Chang, I do prefer the 'static inline' as a general rule.  Think of this:
> 
> static inline void setup_keylocker(struct cpuinfo_x86 *c) {}
> 
> versus:
> 
> #define setup_keylocker(c) do { } while (0)
> 
> Imagine some dope does:
> 
> 	char c;
> 	...
> 	setup_keylocker(c);
> 
> With the macro, they'll get no type warning.  The inline actually makes
> it easier to find bugs because folks will get _some_ type checking no
> matter how they compile the code.

Ah, when the prototype with one or more arguments, 'static inline' 
allows the check. Then it is not an 'either-way' thing.

Looking at the x86 code, there are some seemingly related:

$ git grep "do { } while (0)" arch/x86 | grep -v "()"
arch/x86/include/asm/kprobes.h:#define flush_insn_slot(p)       do { } 
while (0)
arch/x86/include/asm/mc146818rtc.h:#define lock_cmos(reg) do { } while (0)
arch/x86/include/asm/pgtable.h:#define flush_tlb_fix_spurious_fault(vma, 
address, ptep) do { } while (0)
arch/x86/include/asm/preempt.h:#define init_task_preempt_count(p) do { } 
while (0)
arch/x86/kvm/ioapic.h:#define ASSERT(x) do { } while (0)
arch/x86/kvm/mmu/mmu_internal.h:#define pgprintk(x...) do { } while (0)
arch/x86/kvm/mmu/mmu_internal.h:#define rmap_printk(x...) do { } while (0)
arch/x86/kvm/mmu/mmu_internal.h:#define MMU_WARN_ON(x) do { } while (0)

Now I feel owed for some potential cleanup work.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2023-05-09  0:31             ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-09  0:31 UTC (permalink / raw)
  To: Dave Hansen, Eric Biggers
  Cc: x86, herbert, ardb, dave.hansen, dan.j.williams, linux-kernel,
	mingo, lalithambika.krishnakumar, dm-devel, Ingo Molnar,
	Borislav Petkov, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, gmazyland, charishma1.gairuboyina

On 5/8/2023 2:56 PM, Dave Hansen wrote:
> On 5/8/23 11:18, Chang S. Bae wrote:
>> On 5/5/2023 4:05 PM, Eric Biggers wrote:
>>> On Mon, Apr 10, 2023 at 03:59:31PM -0700, Chang S. Bae wrote:
>>>>    +#ifdef CONFIG_X86_KEYLOCKER
>>>> +void setup_keylocker(struct cpuinfo_x86 *c);
>>>> +void destroy_keylocker_data(void);
>>>> +#else
>>>> +#define setup_keylocker(c) do { } while (0)
>>>> +#define destroy_keylocker_data() do { } while (0)
>>>> +#endif
>>>
>>> Shouldn't the !CONFIG_X86_KEYLOCKER stubs be static inline functions
>>> instead of
>>> macros, so that type checking works?
>>
>> I think either way works here. This macro is just for nothing.
> 
> Chang, I do prefer the 'static inline' as a general rule.  Think of this:
> 
> static inline void setup_keylocker(struct cpuinfo_x86 *c) {}
> 
> versus:
> 
> #define setup_keylocker(c) do { } while (0)
> 
> Imagine some dope does:
> 
> 	char c;
> 	...
> 	setup_keylocker(c);
> 
> With the macro, they'll get no type warning.  The inline actually makes
> it easier to find bugs because folks will get _some_ type checking no
> matter how they compile the code.

Ah, when the prototype with one or more arguments, 'static inline' 
allows the check. Then it is not an 'either-way' thing.

Looking at the x86 code, there are some seemingly related:

$ git grep "do { } while (0)" arch/x86 | grep -v "()"
arch/x86/include/asm/kprobes.h:#define flush_insn_slot(p)       do { } 
while (0)
arch/x86/include/asm/mc146818rtc.h:#define lock_cmos(reg) do { } while (0)
arch/x86/include/asm/pgtable.h:#define flush_tlb_fix_spurious_fault(vma, 
address, ptep) do { } while (0)
arch/x86/include/asm/preempt.h:#define init_task_preempt_count(p) do { } 
while (0)
arch/x86/kvm/ioapic.h:#define ASSERT(x) do { } while (0)
arch/x86/kvm/mmu/mmu_internal.h:#define pgprintk(x...) do { } while (0)
arch/x86/kvm/mmu/mmu_internal.h:#define rmap_printk(x...) do { } while (0)
arch/x86/kvm/mmu/mmu_internal.h:#define MMU_WARN_ON(x) do { } while (0)

Now I feel owed for some potential cleanup work.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
  2023-05-09  0:31             ` [dm-devel] " Chang S. Bae
@ 2023-05-09  0:51               ` Dave Hansen
  -1 siblings, 0 replies; 247+ messages in thread
From: Dave Hansen @ 2023-05-09  0:51 UTC (permalink / raw)
  To: Chang S. Bae, Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 5/8/23 17:31, Chang S. Bae wrote:
>> With the macro, they'll get no type warning.  The inline actually makes
>> it easier to find bugs because folks will get _some_ type checking no
>> matter how they compile the code.
> 
> Ah, when the prototype with one or more arguments, 'static inline'
> allows the check. Then it is not an 'either-way' thing.
> 
> Looking at the x86 code, there are some seemingly related:
> 
> $ git grep "do { } while (0)" arch/x86 | grep -v "()"
...

Right.  It's not a hard and fast rule.  We certainly take code either
way and there can be real reasons to do it one way versus the other.

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time
@ 2023-05-09  0:51               ` Dave Hansen
  0 siblings, 0 replies; 247+ messages in thread
From: Dave Hansen @ 2023-05-09  0:51 UTC (permalink / raw)
  To: Chang S. Bae, Eric Biggers
  Cc: x86, herbert, ardb, dave.hansen, dan.j.williams, linux-kernel,
	mingo, lalithambika.krishnakumar, dm-devel, Ingo Molnar,
	Borislav Petkov, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, gmazyland, charishma1.gairuboyina

On 5/8/23 17:31, Chang S. Bae wrote:
>> With the macro, they'll get no type warning.  The inline actually makes
>> it easier to find bugs because folks will get _some_ type checking no
>> matter how they compile the code.
> 
> Ah, when the prototype with one or more arguments, 'static inline'
> allows the check. Then it is not an 'either-way' thing.
> 
> Looking at the x86 code, there are some seemingly related:
> 
> $ git grep "do { } while (0)" arch/x86 | grep -v "()"
...

Right.  It's not a hard and fast rule.  We certainly take code either
way and there can be real reasons to do it one way versus the other.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
  2023-05-05 23:27       ` [dm-devel] " Eric Biggers
@ 2023-05-09  0:55         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-09  0:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin

On 5/5/2023 4:27 PM, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 03:59:34PM -0700, Chang S. Bae wrote:
>> Refactor the common C code to avoid code duplication. The AES-NI code uses
>> it with a function pointer argument to call back the AES-NI assembly code.
>> So will the AES-KL code.
> 
> Actually, the AES-NI XTS glue code currently makes direct calls to the assembly
> code.  This patch changes it to make indirect calls.  Indirect calls are very
> expensive these days, partly due to all the speculative execution mitigations.
> So this patch likely causes a performance regression.  How about making
> xts_crypt_common() and xts_setkey_common() be inline functions?

I guess this series is relevant:
   https://lore.kernel.org/lkml/20201222160629.22268-1-ardb@kernel.org/#t

Yeah, inlining those looks to be just a cut-and-paste work. Then I was 
curious about the performance impact.

So I picked one of my old machines. Then, I was able to quickly run 
through with these cases:

$ cryptsetup benchmark -c aes-xts -s 256

# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption

Upstream (6.4-rc1)
     aes-xts        256b      3949.3 MiB/s      4014.2 MiB/s
     aes-xts        256b      4016.1 MiB/s      4011.6 MiB/s
     aes-xts        256b      4026.2 MiB/s      4018.4 MiB/s
     aes-xts        256b      4009.2 MiB/s      4006.9 MiB/s
     aes-xts        256b      4025.0 MiB/s      4016.4 MiB/s

Upstream + V6
     aes-xts        256b      3876.1 MiB/s      3963.6 MiB/s
     aes-xts        256b      3926.3 MiB/s      3984.2 MiB/s
     aes-xts        256b      3940.8 MiB/s      3961.2 MiB/s
     aes-xts        256b      3929.7 MiB/s      3984.7 MiB/s
     aes-xts        256b      3892.5 MiB/s      3942.5 MiB/s

Upstream + V6 + {inlined helpers}
     aes-xts        256b      3996.9 MiB/s      4085.4 MiB/s
     aes-xts        256b      4087.6 MiB/s      4104.9 MiB/s
     aes-xts        256b      4117.9 MiB/s      4130.2 MiB/s
     aes-xts        256b      4098.4 MiB/s      4120.6 MiB/s
     aes-xts        256b      4095.5 MiB/s      4111.5 MiB/s

Okay, I'm a bit more convinced with this inlining. Will try to comment 
this along with the change.

> Another issue with having the above be exported symbols is that their names are
> too generic, so they could easily collide with another symbols in the kernel.
> To be exported symbols, they would need something x86-specific in their names.

I think that's another good point though, they don't need to be exported 
once moved into the header so that inlined.

>>   arch/x86/crypto/Makefile           |   2 +-
>>   arch/x86/crypto/aes-intel_asm.S    |  26 ++++
>>   arch/x86/crypto/aes-intel_glue.c   | 127 ++++++++++++++++
>>   arch/x86/crypto/aes-intel_glue.h   |  44 ++++++
>>   arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
>>   arch/x86/crypto/aesni-intel_glue.c | 235 +++++++++--------------------
>>   arch/x86/crypto/aesni-intel_glue.h |  17 +++
> 
> It's confusing having aes-intel, aesni-intel, *and* aeskl-intel.  Maybe call the
> first one "aes-helpers" or "aes-common" instead?

Yeah, I can see a few files named with helper. So, maybe 
s/aes-intel/aes-helpers/.

>> +struct aes_xts_ctx {
>> +	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
>> +	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
>> +}; >
> This struct does not make sense.  It should look like:
> 
>      struct aes_xts_ctx {
>              struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
>              struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
>      };
> 
> The runtime alignment to a 16-byte boundary should happen when translating the
> raw crypto_skcipher_ctx() into the pointer to the aes_xts_ctx.  It should not
> happen when accessing each individual field in the aes_xts_ctx.

Oh, ugly. This came from mindless copy & paste here. I guess the fix 
could be a standalone patch. Or, it can be fixed along with this mess.

>>   /*
>> - * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
>> - *                   unsigned int key_len)
>> + * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
>> + *                    unsigned int key_len)
>>    */
> 
> It's conventional to use two leading underscores, not one.

Yes, will fix.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
@ 2023-05-09  0:55         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-09  0:55 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, mingo, lalithambika.krishnakumar, dm-devel,
	Ingo Molnar, Borislav Petkov, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, gmazyland, charishma1.gairuboyina

On 5/5/2023 4:27 PM, Eric Biggers wrote:
> On Mon, Apr 10, 2023 at 03:59:34PM -0700, Chang S. Bae wrote:
>> Refactor the common C code to avoid code duplication. The AES-NI code uses
>> it with a function pointer argument to call back the AES-NI assembly code.
>> So will the AES-KL code.
> 
> Actually, the AES-NI XTS glue code currently makes direct calls to the assembly
> code.  This patch changes it to make indirect calls.  Indirect calls are very
> expensive these days, partly due to all the speculative execution mitigations.
> So this patch likely causes a performance regression.  How about making
> xts_crypt_common() and xts_setkey_common() be inline functions?

I guess this series is relevant:
   https://lore.kernel.org/lkml/20201222160629.22268-1-ardb@kernel.org/#t

Yeah, inlining those looks to be just a cut-and-paste work. Then I was 
curious about the performance impact.

So I picked one of my old machines. Then, I was able to quickly run 
through with these cases:

$ cryptsetup benchmark -c aes-xts -s 256

# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption

Upstream (6.4-rc1)
     aes-xts        256b      3949.3 MiB/s      4014.2 MiB/s
     aes-xts        256b      4016.1 MiB/s      4011.6 MiB/s
     aes-xts        256b      4026.2 MiB/s      4018.4 MiB/s
     aes-xts        256b      4009.2 MiB/s      4006.9 MiB/s
     aes-xts        256b      4025.0 MiB/s      4016.4 MiB/s

Upstream + V6
     aes-xts        256b      3876.1 MiB/s      3963.6 MiB/s
     aes-xts        256b      3926.3 MiB/s      3984.2 MiB/s
     aes-xts        256b      3940.8 MiB/s      3961.2 MiB/s
     aes-xts        256b      3929.7 MiB/s      3984.7 MiB/s
     aes-xts        256b      3892.5 MiB/s      3942.5 MiB/s

Upstream + V6 + {inlined helpers}
     aes-xts        256b      3996.9 MiB/s      4085.4 MiB/s
     aes-xts        256b      4087.6 MiB/s      4104.9 MiB/s
     aes-xts        256b      4117.9 MiB/s      4130.2 MiB/s
     aes-xts        256b      4098.4 MiB/s      4120.6 MiB/s
     aes-xts        256b      4095.5 MiB/s      4111.5 MiB/s

Okay, I'm a bit more convinced with this inlining. Will try to comment 
this along with the change.

> Another issue with having the above be exported symbols is that their names are
> too generic, so they could easily collide with another symbols in the kernel.
> To be exported symbols, they would need something x86-specific in their names.

I think that's another good point though, they don't need to be exported 
once moved into the header so that inlined.

>>   arch/x86/crypto/Makefile           |   2 +-
>>   arch/x86/crypto/aes-intel_asm.S    |  26 ++++
>>   arch/x86/crypto/aes-intel_glue.c   | 127 ++++++++++++++++
>>   arch/x86/crypto/aes-intel_glue.h   |  44 ++++++
>>   arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
>>   arch/x86/crypto/aesni-intel_glue.c | 235 +++++++++--------------------
>>   arch/x86/crypto/aesni-intel_glue.h |  17 +++
> 
> It's confusing having aes-intel, aesni-intel, *and* aeskl-intel.  Maybe call the
> first one "aes-helpers" or "aes-common" instead?

Yeah, I can see a few files named with helper. So, maybe 
s/aes-intel/aes-helpers/.

>> +struct aes_xts_ctx {
>> +	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
>> +	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AES_ALIGN_ATTR;
>> +}; >
> This struct does not make sense.  It should look like:
> 
>      struct aes_xts_ctx {
>              struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
>              struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
>      };
> 
> The runtime alignment to a 16-byte boundary should happen when translating the
> raw crypto_skcipher_ctx() into the pointer to the aes_xts_ctx.  It should not
> happen when accessing each individual field in the aes_xts_ctx.

Oh, ugly. This came from mindless copy & paste here. I guess the fix 
could be a standalone patch. Or, it can be fixed along with this mess.

>>   /*
>> - * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
>> - *                   unsigned int key_len)
>> + * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
>> + *                    unsigned int key_len)
>>    */
> 
> It's conventional to use two leading underscores, not one.

Yes, will fix.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
  2023-05-09  0:55         ` [dm-devel] " Chang S. Bae
@ 2023-05-11 19:05           ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-11 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, gmazyland, luto, dave.hansen, tglx, bp, mingo, x86,
	herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	chang.seok.bae, David S. Miller, Ingo Molnar, H. Peter Anvin

Note, this only incorporates the fix for the performance issue. It is
intended for those who want to run the series quickly. Otherwise, please
ignore this. The next posting will address all the feedback.

---

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have the
same prototype as AES-NI. Then, the glue code will be the same as AES-NI's.

Refactor the common C code to avoid code duplication. The AES-NI code uses
it with a function pointer argument to call back the AES-NI assembly code.
So will the AES-KL code.

Introduce wrappers for data transformation functions to return an error
code. AES-KL may populate an error during data transformation.

Export those refactored functions and new wrappers that the AES-KL code
will reference.

Also move some constant values to a common assembly code.

No functional change intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Inline the XTS helpers. Otherwise, measurable overhead is induced
  by indirect calls with the retpoline instrumentation. (Eric Biggers)
  Then, remove the glue helper .c file.

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/aes-intel_asm.S    |  26 ++++
 arch/x86/crypto/aes-intel_glue.h   | 162 ++++++++++++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
 arch/x86/crypto/aesni-intel_glue.c | 235 +++++++++--------------------
 arch/x86/crypto/aesni-intel_glue.h |  17 +++
 5 files changed, 295 insertions(+), 203 deletions(-)
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/aes-intel_asm.S b/arch/x86/crypto/aes-intel_asm.S
new file mode 100644
index 000000000000..98abf875af79
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_asm.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+#ifdef __x86_64__
+.Lbswap_mask:
+	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+#endif
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-intel_glue.h b/arch/x86/crypto/aes-intel_glue.h
new file mode 100644
index 000000000000..5877d0988e36
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN			16
+#define AES_ALIGN_ATTR			__attribute__((__aligned__(AES_ALIGN)))
+#define AES_BLOCK_MASK			(~(AES_BLOCK_SIZE - 1))
+#define AES_ALIGN_EXTRA			((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define CRYPTO_AES_CTX_SIZE		(sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
+#define XTS_AES_CTX_SIZE		(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+{
+	unsigned long addr = (unsigned long)raw_ctx;
+	unsigned long align = AES_ALIGN;
+
+	if (align <= crypto_tfm_ctx_alignment())
+		align = 1;
+
+	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		  int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 *in_key,
+			    unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+		 int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				 unsigned int len, u8 *iv),
+		 int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(aes_ctx(&ctx->tweak_ctx), walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(aes_ctx(&ctx->crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(aes_ctx(&ctx->crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 837c1e0aa021..9ea4f13464d5 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-intel_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1820,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                    unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(_aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1932,12 +1933,12 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(_aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(_aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(_aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(_aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(_aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2689,21 +2690,6 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
-.pushsection .rodata
-.align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
-.Lbswap_mask:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
-.popsection
-
 #ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
@@ -2819,12 +2805,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2824,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(_aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2996,13 +2976,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(_aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(_aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3158,4 +3138,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(_aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..e7a9e8e01680 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,24 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,10 +71,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void _aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void _aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +88,51 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return _aesni_set_key(ctx, in_key, key_len);
+}
+EXPORT_SYMBOL_GPL(aesni_set_key);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_enc(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_enc);
+
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_dec(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_dec);
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
 
 #ifdef CONFIG_X86_64
 
@@ -201,7 +229,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -211,7 +239,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -219,16 +247,6 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
-{
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -243,7 +261,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = _aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
@@ -264,7 +282,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		_aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -277,7 +295,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		_aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -491,7 +509,7 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
 
 #ifdef CONFIG_X86_64
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
-			      const u8 *in, unsigned int len, u8 *iv)
+				 const u8 *in, unsigned int len, u8 *iv)
 {
 	/*
 	 * based on key length, override with the by8 version
@@ -528,7 +546,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			_aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -673,8 +691,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -829,8 +847,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -857,8 +875,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -883,128 +901,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1160,8 +1067,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1177,8 +1084,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
@ 2023-05-11 19:05           ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-11 19:05 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, luto, H. Peter Anvin, bernie.keany, tglx,
	gmazyland, charishma1.gairuboyina

Note, this only incorporates the fix for the performance issue. It is
intended for those who want to run the series quickly. Otherwise, please
ignore this. The next posting will address all the feedback.

---

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have the
same prototype as AES-NI. Then, the glue code will be the same as AES-NI's.

Refactor the common C code to avoid code duplication. The AES-NI code uses
it with a function pointer argument to call back the AES-NI assembly code.
So will the AES-KL code.

Introduce wrappers for data transformation functions to return an error
code. AES-KL may populate an error during data transformation.

Export those refactored functions and new wrappers that the AES-KL code
will reference.

Also move some constant values to a common assembly code.

No functional change intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Inline the XTS helpers. Otherwise, measurable overhead is induced
  by indirect calls with the retpoline instrumentation. (Eric Biggers)
  Then, remove the glue helper .c file.

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/aes-intel_asm.S    |  26 ++++
 arch/x86/crypto/aes-intel_glue.h   | 162 ++++++++++++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |  58 +++----
 arch/x86/crypto/aesni-intel_glue.c | 235 +++++++++--------------------
 arch/x86/crypto/aesni-intel_glue.h |  17 +++
 5 files changed, 295 insertions(+), 203 deletions(-)
 create mode 100644 arch/x86/crypto/aes-intel_asm.S
 create mode 100644 arch/x86/crypto/aes-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/aes-intel_asm.S b/arch/x86/crypto/aes-intel_asm.S
new file mode 100644
index 000000000000..98abf875af79
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_asm.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+#ifdef __x86_64__
+.Lbswap_mask:
+	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
+#endif
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-intel_glue.h b/arch/x86/crypto/aes-intel_glue.h
new file mode 100644
index 000000000000..5877d0988e36
--- /dev/null
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN			16
+#define AES_ALIGN_ATTR			__attribute__((__aligned__(AES_ALIGN)))
+#define AES_BLOCK_MASK			(~(AES_BLOCK_SIZE - 1))
+#define AES_ALIGN_EXTRA			((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define CRYPTO_AES_CTX_SIZE		(sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
+#define XTS_AES_CTX_SIZE		(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+{
+	unsigned long addr = (unsigned long)raw_ctx;
+	unsigned long align = AES_ALIGN;
+
+	if (align <= crypto_tfm_ctx_alignment())
+		align = 1;
+
+	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		  int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 *in_key,
+			    unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+		 int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				 unsigned int len, u8 *iv),
+		 int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(aes_ctx(&ctx->tweak_ctx), walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(aes_ctx(&ctx->crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(aes_ctx(&ctx->crypt_ctx), walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 837c1e0aa021..9ea4f13464d5 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-intel_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1820,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                    unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(_aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1932,12 +1933,12 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(_aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(_aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(_aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void _aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(_aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(_aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2689,21 +2690,6 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
-.pushsection .rodata
-.align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
-.Lbswap_mask:
-	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
-.popsection
-
 #ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
@@ -2819,12 +2805,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2824,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(_aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2996,13 +2976,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(_aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(_aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3158,4 +3138,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(_aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..e7a9e8e01680 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,24 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-intel_glue.h"
+#include "aesni-intel_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,10 +71,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int _aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void _aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void _aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +88,51 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return _aesni_set_key(ctx, in_key, key_len);
+}
+EXPORT_SYMBOL_GPL(aesni_set_key);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_enc(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_enc);
+
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	_aesni_dec(ctx, out, in);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_dec);
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void _aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				   const u8 *in, unsigned int len, u8 *iv);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
+{
+	_aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
 
 #ifdef CONFIG_X86_64
 
@@ -201,7 +229,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -211,7 +239,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -219,16 +247,6 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
-{
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -243,7 +261,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = _aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
@@ -264,7 +282,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		_aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -277,7 +295,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		_aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -491,7 +509,7 @@ static int cts_cbc_decrypt(struct skcipher_request *req)
 
 #ifdef CONFIG_X86_64
 static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
-			      const u8 *in, unsigned int len, u8 *iv)
+				 const u8 *in, unsigned int len, u8 *iv)
 {
 	/*
 	 * based on key length, override with the by8 version
@@ -528,7 +546,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			_aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -673,8 +691,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -829,8 +847,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -857,8 +875,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -883,128 +901,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1160,8 +1067,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1177,8 +1084,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
  2023-05-11 19:05           ` [dm-devel] " Chang S. Bae
@ 2023-05-11 21:39             ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-11 21:39 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, H. Peter Anvin

On Thu, May 11, 2023 at 12:05:17PM -0700, Chang S. Bae wrote:
> +
> +struct aes_xts_ctx {
> +	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
> +	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
> +};
> +
> +static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
> +{
> +	unsigned long addr = (unsigned long)raw_ctx;
> +	unsigned long align = AES_ALIGN;
> +
> +	if (align <= crypto_tfm_ctx_alignment())
> +		align = 1;
> +
> +	return (struct crypto_aes_ctx *)ALIGN(addr, align);
> +}

It seems you took my suggestion to fix the definition of struct aes_xts_ctx, but
you didn't make the corresponding change to the runtime alignment code.  There
should be a helper function aes_xts_ctx() that is used like:

    struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);

It would do the runtime alignment.  Then, aes_ctx() should be removed.

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
@ 2023-05-11 21:39             ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-11 21:39 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, mingo, lalithambika.krishnakumar, dm-devel,
	Ingo Molnar, bp, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, gmazyland, charishma1.gairuboyina

On Thu, May 11, 2023 at 12:05:17PM -0700, Chang S. Bae wrote:
> +
> +struct aes_xts_ctx {
> +	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
> +	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
> +};
> +
> +static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
> +{
> +	unsigned long addr = (unsigned long)raw_ctx;
> +	unsigned long align = AES_ALIGN;
> +
> +	if (align <= crypto_tfm_ctx_alignment())
> +		align = 1;
> +
> +	return (struct crypto_aes_ctx *)ALIGN(addr, align);
> +}

It seems you took my suggestion to fix the definition of struct aes_xts_ctx, but
you didn't make the corresponding change to the runtime alignment code.  There
should be a helper function aes_xts_ctx() that is used like:

    struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);

It would do the runtime alignment.  Then, aes_ctx() should be removed.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
  2023-05-11 21:39             ` [dm-devel] " Eric Biggers
@ 2023-05-11 23:19               ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-11 23:19 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, H. Peter Anvin

On 5/11/2023 2:39 PM, Eric Biggers wrote:
> On Thu, May 11, 2023 at 12:05:17PM -0700, Chang S. Bae wrote:
>> +
>> +struct aes_xts_ctx {
>> +	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
>> +	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
>> +};
>> +
>> +static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
>> +{
>> +	unsigned long addr = (unsigned long)raw_ctx;
>> +	unsigned long align = AES_ALIGN;
>> +
>> +	if (align <= crypto_tfm_ctx_alignment())
>> +		align = 1;
>> +
>> +	return (struct crypto_aes_ctx *)ALIGN(addr, align);
>> +}
> 
> It seems you took my suggestion to fix the definition of struct aes_xts_ctx, but
> you didn't make the corresponding change to the runtime alignment code.  

Sigh. This particular change was unintentionally leaked here from my WIP 
code. Yes, I'm aware of your comment:

 > The runtime alignment to a 16-byte boundary should happen when 
translating the
 > raw crypto_skcipher_ctx() into the pointer to the aes_xts_ctx.  It 
should not
 > happen when accessing each individual field in the aes_xts_ctx.

> There should be a helper function aes_xts_ctx() that is used like:
> 
>      struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
> 
> It would do the runtime alignment.  Then, aes_ctx() should be removed.

Yes, I could think of some changes like the one below. I guess the aeskl 
code can live with it. The aesni glue code still wants aes_cts() as it 
deals with other modes. Then, that can be left there as it is.

diff --git a/arch/x86/crypto/aes-intel_glue.h 
b/arch/x86/crypto/aes-intel_glue.h
index 5877d0988e36..b22de77594fe 100644
--- a/arch/x86/crypto/aes-intel_glue.h
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -31,15 +31,15 @@ struct aes_xts_ctx {
         struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
  };

-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
  {
-       unsigned long addr = (unsigned long)raw_ctx;
+       unsigned long addr = (unsigned long)crypto_skcipher_ctx(tfm);
         unsigned long align = AES_ALIGN;

         if (align <= crypto_tfm_ctx_alignment())
                 align = 1;

-       return (struct crypto_aes_ctx *)ALIGN(addr, align);
+       return (struct aes_xts_ctx *)ALIGN(addr, align);
  }

  static inline int
@@ -47,7 +47,7 @@ xts_setkey_common(struct crypto_skcipher *tfm, const 
u8 *key, unsigned int keyle
                   int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 
*in_key,
                             unsigned int key_len))
  {
-       struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+       struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
         int err;

         err = xts_verify_key(tfm, key, keylen);
@@ -72,7 +72,7 @@ xts_crypt_common(struct skcipher_request *req,
                  int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
  {
         struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-       struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+       struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
         int tail = req->cryptlen % AES_BLOCK_SIZE;
         struct skcipher_request subreq;
         struct skcipher_walk walk;
@@ -108,7 +108,7 @@ xts_crypt_common(struct skcipher_request *req,
         kernel_fpu_begin();

         /* calculate first value of T */
-       err = crypt1_fn(aes_ctx(&ctx->tweak_ctx), walk.iv, walk.iv);
+       err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
         if (err) {
                 kernel_fpu_end();
                 return err;
@@ -120,7 +120,7 @@ xts_crypt_common(struct skcipher_request *req,
                 if (nbytes < walk.total)
                         nbytes &= ~(AES_BLOCK_SIZE - 1);

-               err = crypt_fn(aes_ctx(&ctx->crypt_ctx), 
walk.dst.virt.addr, walk.src.virt.addr,
+               err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, 
walk.src.virt.addr,
                                nbytes, walk.iv);
                 kernel_fpu_end();
                 if (err)
@@ -148,7 +148,7 @@ xts_crypt_common(struct skcipher_request *req,
                         return err;

                 kernel_fpu_begin();
-               err = crypt_fn(aes_ctx(&ctx->crypt_ctx), 
walk.dst.virt.addr, walk.src.virt.addr,
+               err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, 
walk.src.virt.addr,
                                walk.nbytes, walk.iv);
                 kernel_fpu_end();
                 if (err)

Thanks,
Chang

^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation
@ 2023-05-11 23:19               ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-11 23:19 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, mingo, lalithambika.krishnakumar, dm-devel,
	Ingo Molnar, bp, linux-crypto, luto, H. Peter Anvin,
	bernie.keany, tglx, gmazyland, charishma1.gairuboyina

On 5/11/2023 2:39 PM, Eric Biggers wrote:
> On Thu, May 11, 2023 at 12:05:17PM -0700, Chang S. Bae wrote:
>> +
>> +struct aes_xts_ctx {
>> +	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
>> +	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
>> +};
>> +
>> +static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
>> +{
>> +	unsigned long addr = (unsigned long)raw_ctx;
>> +	unsigned long align = AES_ALIGN;
>> +
>> +	if (align <= crypto_tfm_ctx_alignment())
>> +		align = 1;
>> +
>> +	return (struct crypto_aes_ctx *)ALIGN(addr, align);
>> +}
> 
> It seems you took my suggestion to fix the definition of struct aes_xts_ctx, but
> you didn't make the corresponding change to the runtime alignment code.  

Sigh. This particular change was unintentionally leaked here from my WIP 
code. Yes, I'm aware of your comment:

 > The runtime alignment to a 16-byte boundary should happen when 
translating the
 > raw crypto_skcipher_ctx() into the pointer to the aes_xts_ctx.  It 
should not
 > happen when accessing each individual field in the aes_xts_ctx.

> There should be a helper function aes_xts_ctx() that is used like:
> 
>      struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
> 
> It would do the runtime alignment.  Then, aes_ctx() should be removed.

Yes, I could think of some changes like the one below. I guess the aeskl 
code can live with it. The aesni glue code still wants aes_cts() as it 
deals with other modes. Then, that can be left there as it is.

diff --git a/arch/x86/crypto/aes-intel_glue.h 
b/arch/x86/crypto/aes-intel_glue.h
index 5877d0988e36..b22de77594fe 100644
--- a/arch/x86/crypto/aes-intel_glue.h
+++ b/arch/x86/crypto/aes-intel_glue.h
@@ -31,15 +31,15 @@ struct aes_xts_ctx {
         struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
  };

-static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
  {
-       unsigned long addr = (unsigned long)raw_ctx;
+       unsigned long addr = (unsigned long)crypto_skcipher_ctx(tfm);
         unsigned long align = AES_ALIGN;

         if (align <= crypto_tfm_ctx_alignment())
                 align = 1;

-       return (struct crypto_aes_ctx *)ALIGN(addr, align);
+       return (struct aes_xts_ctx *)ALIGN(addr, align);
  }

  static inline int
@@ -47,7 +47,7 @@ xts_setkey_common(struct crypto_skcipher *tfm, const 
u8 *key, unsigned int keyle
                   int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 
*in_key,
                             unsigned int key_len))
  {
-       struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+       struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
         int err;

         err = xts_verify_key(tfm, key, keylen);
@@ -72,7 +72,7 @@ xts_crypt_common(struct skcipher_request *req,
                  int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
  {
         struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-       struct aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+       struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
         int tail = req->cryptlen % AES_BLOCK_SIZE;
         struct skcipher_request subreq;
         struct skcipher_walk walk;
@@ -108,7 +108,7 @@ xts_crypt_common(struct skcipher_request *req,
         kernel_fpu_begin();

         /* calculate first value of T */
-       err = crypt1_fn(aes_ctx(&ctx->tweak_ctx), walk.iv, walk.iv);
+       err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
         if (err) {
                 kernel_fpu_end();
                 return err;
@@ -120,7 +120,7 @@ xts_crypt_common(struct skcipher_request *req,
                 if (nbytes < walk.total)
                         nbytes &= ~(AES_BLOCK_SIZE - 1);

-               err = crypt_fn(aes_ctx(&ctx->crypt_ctx), 
walk.dst.virt.addr, walk.src.virt.addr,
+               err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, 
walk.src.virt.addr,
                                nbytes, walk.iv);
                 kernel_fpu_end();
                 if (err)
@@ -148,7 +148,7 @@ xts_crypt_common(struct skcipher_request *req,
                         return err;

                 kernel_fpu_begin();
-               err = crypt_fn(aes_ctx(&ctx->crypt_ctx), 
walk.dst.virt.addr, walk.src.virt.addr,
+               err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, 
walk.src.virt.addr,
                                walk.nbytes, walk.iv);
                 kernel_fpu_end();
                 if (err)

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2023-05-06  0:01       ` [dm-devel] " Eric Biggers
@ 2023-05-12 17:52         ` Milan Broz
  -1 siblings, 0 replies; 247+ messages in thread
From: Milan Broz @ 2023-05-12 17:52 UTC (permalink / raw)
  To: Eric Biggers, Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, luto, dave.hansen, tglx,
	bp, mingo, x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Tom Rix

On 5/6/23 02:01, Eric Biggers wrote:
...
> This does not correctly describe what is going on.  Actually, this patchset
> registers the AES-KL XTS algorithm with the usual name "xts(aes)".  So, it can
> potentially be used by any AES-XTS user.  It seems that you're actually relying
> on the algorithm priorities to prioritize AES-NI, as you've assigned priority
> 200 to AES-KL, whereas AES-NI has priority 401.  Is that what you intend, and if
> so can you please update your explanation to properly explain this?
> 
> The alternative would be to use a unique algorithm name, such as
> "keylocker-xts(aes)".  I'm not sure that would be better, given that the
> algorithms are compatible.  However, that actually would seem to match the
> explanation you gave more closely, so perhaps that's what you actually intended?

Sorry to be late in-game, but as this is intended for LUKS/dm-crypt use,
I have a comment here:

LUKS2 will no longer support algorithms with the dash in the name for dm-crypt
(iow "aes-generic" or something like that will no longer work, and I am afraid
you will need aes-kl/keylocker-xts here to force to use AES-KL for dm-crypt).

One reason is described in https://gitlab.com/cryptsetup/cryptsetup/-/issues/809,
but the major problem is that cryptsetup used CIPHER-MODE-IV syntax (that mixes
badly with the dash in algorithm names). And we still rely on internal conversions
of common modes to that syntax (currently it worked only by a luck).

When I added the "capi" format for dm-crypt for algorithms specification,
I made a mistake in that it allows everything, including crypto driver
platform-specific names.
The intention was to keep the kernel to decide which crypto driver will be used.
So, this is perhaps fine for dm-crypt now but LUKS is a portable format, and a generic
algorithm (like AES) should not depend on a specific driver or CPU feature.

IOW, implement xts(aes) and let the user prioritize the driver (no changes
needed for LUKS header then, AES-KL is loaded automatically) or/and create a wrapper
(similar to paes, that we already support) that will force to use AES-KL
(...but without the dash in the name, please :)

If there is a problem with it, please create an issue for cryptsetup upstream
to discuss it there (before the kernel part is merged!), so we can find some
solution - I would like to avoid incompatibilities later.

Thanks,
Milan

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2023-05-12 17:52         ` Milan Broz
  0 siblings, 0 replies; 247+ messages in thread
From: Milan Broz @ 2023-05-12 17:52 UTC (permalink / raw)
  To: Eric Biggers, Chang S. Bae
  Cc: Tom Rix, dave.hansen, lalithambika.krishnakumar, dm-devel,
	H. Peter Anvin, ardb, herbert, x86, mingo, Ingo Molnar, bp,
	Nathan Chancellor, Borislav Petkov, luto, dan.j.williams,
	charishma1.gairuboyina, Nick Desaulniers, linux-kernel, tglx,
	linux-crypto, bernie.keany, David S. Miller

On 5/6/23 02:01, Eric Biggers wrote:
...
> This does not correctly describe what is going on.  Actually, this patchset
> registers the AES-KL XTS algorithm with the usual name "xts(aes)".  So, it can
> potentially be used by any AES-XTS user.  It seems that you're actually relying
> on the algorithm priorities to prioritize AES-NI, as you've assigned priority
> 200 to AES-KL, whereas AES-NI has priority 401.  Is that what you intend, and if
> so can you please update your explanation to properly explain this?
> 
> The alternative would be to use a unique algorithm name, such as
> "keylocker-xts(aes)".  I'm not sure that would be better, given that the
> algorithms are compatible.  However, that actually would seem to match the
> explanation you gave more closely, so perhaps that's what you actually intended?

Sorry to be late in-game, but as this is intended for LUKS/dm-crypt use,
I have a comment here:

LUKS2 will no longer support algorithms with the dash in the name for dm-crypt
(iow "aes-generic" or something like that will no longer work, and I am afraid
you will need aes-kl/keylocker-xts here to force to use AES-KL for dm-crypt).

One reason is described in https://gitlab.com/cryptsetup/cryptsetup/-/issues/809,
but the major problem is that cryptsetup used CIPHER-MODE-IV syntax (that mixes
badly with the dash in algorithm names). And we still rely on internal conversions
of common modes to that syntax (currently it worked only by a luck).

When I added the "capi" format for dm-crypt for algorithms specification,
I made a mistake in that it allows everything, including crypto driver
platform-specific names.
The intention was to keep the kernel to decide which crypto driver will be used.
So, this is perhaps fine for dm-crypt now but LUKS is a portable format, and a generic
algorithm (like AES) should not depend on a specific driver or CPU feature.

IOW, implement xts(aes) and let the user prioritize the driver (no changes
needed for LUKS header then, AES-KL is loaded automatically) or/and create a wrapper
(similar to paes, that we already support) that will force to use AES-KL
(...but without the dash in the name, please :)

If there is a problem with it, please create an issue for cryptsetup upstream
to discuss it there (before the kernel part is merged!), so we can find some
solution - I would like to avoid incompatibilities later.

Thanks,
Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v7 00/12] x86: Support Key Locker
  2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57     ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae

Hi all,

The previous posting [v6] was intended to deliver the hardware
performance update primarily by revamping the old series. Then,
multiple feedbacks were received. This revision thus incorporates the
changes to address them. So, the primary goal here is to meet those
reviewers' expectations at first.

The series has two parts of changes -- the x86 core and its crypto
library. In this revision, much more changes were made on the latter.
Also, overall changelogs were revisited to be more reviewable.

Here is the overview of those feedbacks/changes (skip some trivial):

Part 1. Crypto Library (PATCH10-12):

  PATCH12: AES-KL driver
  - Merge all the AES-KL patches. (Eric Biggers)
  - Make the driver for the 64-bit mode only. (Eric Biggers)
  - Rework the key-size check code (Eric Biggers)
  - Adjust the Kconfig change (Robert Elliott)
  - Update the changelog. (Eric Biggers)
  - Adjust the ASM code to return a proper error code. (Eric Biggers)

  PATCH11: AES-NI rework
  - Inline the helper code to avoid the indirect call. (Eric Biggers)
  - Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
  - Don't export symbols yet here. Instead, do it when needed later.
  - Improve the coding style (Eric Biggers)

  PATCH10: the AESNI-XTS field type fix
  - Add as a new patch. (Eric Biggers)

Part 2. The X86 Core (PATCH1-9):

  PATCH09: a chicken bit and a Kconfig option
  - Rebase on the upstream: commit a894a8a56b57 ("Documentation:
    kernel-parameters: sort all "no..." parameters")

  PATCH08: the wrapping key recovery
  - Limit the symbol export only when needed.

  PATCH07: the wrapping key load at boot-time
  - Use memzero_explicit() instead of memset(). (Robert Elliott)
  - Improve the function prototype. (Eric Biggers and Dave Hansen)

  PATCH01: documentation
  - Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
    documentation into Documentation/arch/"). (Nathan Huckleberry)

This version can be also found here:
    git://github.com/intel-staging/keylocker.git kl-v7

[v6] -- the last posting:
    https://lore.kernel.org/lkml/20230410225936.8940-1-chang.seok.bae@intel.com/

Thanks,
Chang

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load a wrapping key at boot-time
  x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  crypto: x86/aes - Prepare for a new AES implementation
  crypto: x86/aes-kl - Implement the AES-XTS algorithm

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/arch/x86/index.rst              |   1 +
 Documentation/arch/x86/keylocker.rst          |  97 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Kconfig                       |  22 +
 arch/x86/crypto/Makefile                      |   3 +
 arch/x86/crypto/aes-helper_asm.S              |  22 +
 arch/x86/crypto/aes-helper_glue.h             | 161 +++++
 arch/x86/crypto/aeskl-intel_asm.S             | 580 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 216 +++++++
 arch/x86/crypto/aesni-intel_asm.S             |  55 +-
 arch/x86/crypto/aesni-intel_glue.c            | 242 +++-----
 arch/x86/crypto/aesni-intel_glue.h            |  17 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 212 +++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 27 files changed, 1573 insertions(+), 203 deletions(-)
 create mode 100644 Documentation/arch/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 3a128547bd4425cdef27c606efc88e1eb03a2dba
-- 
2.17.1


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 00/12] x86: Support Key Locker
@ 2023-05-24 16:57     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, bp,
	charishma1.gairuboyina, luto, bernie.keany, tglx, nhuck,
	gmazyland, elliott

Hi all,

The previous posting [v6] was intended to deliver the hardware
performance update primarily by revamping the old series. Then,
multiple feedbacks were received. This revision thus incorporates the
changes to address them. So, the primary goal here is to meet those
reviewers' expectations at first.

The series has two parts of changes -- the x86 core and its crypto
library. In this revision, much more changes were made on the latter.
Also, overall changelogs were revisited to be more reviewable.

Here is the overview of those feedbacks/changes (skip some trivial):

Part 1. Crypto Library (PATCH10-12):

  PATCH12: AES-KL driver
  - Merge all the AES-KL patches. (Eric Biggers)
  - Make the driver for the 64-bit mode only. (Eric Biggers)
  - Rework the key-size check code (Eric Biggers)
  - Adjust the Kconfig change (Robert Elliott)
  - Update the changelog. (Eric Biggers)
  - Adjust the ASM code to return a proper error code. (Eric Biggers)

  PATCH11: AES-NI rework
  - Inline the helper code to avoid the indirect call. (Eric Biggers)
  - Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
  - Don't export symbols yet here. Instead, do it when needed later.
  - Improve the coding style (Eric Biggers)

  PATCH10: the AESNI-XTS field type fix
  - Add as a new patch. (Eric Biggers)

Part 2. The X86 Core (PATCH1-9):

  PATCH09: a chicken bit and a Kconfig option
  - Rebase on the upstream: commit a894a8a56b57 ("Documentation:
    kernel-parameters: sort all "no..." parameters")

  PATCH08: the wrapping key recovery
  - Limit the symbol export only when needed.

  PATCH07: the wrapping key load at boot-time
  - Use memzero_explicit() instead of memset(). (Robert Elliott)
  - Improve the function prototype. (Eric Biggers and Dave Hansen)

  PATCH01: documentation
  - Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
    documentation into Documentation/arch/"). (Nathan Huckleberry)

This version can be also found here:
    git://github.com/intel-staging/keylocker.git kl-v7

[v6] -- the last posting:
    https://lore.kernel.org/lkml/20230410225936.8940-1-chang.seok.bae@intel.com/

Thanks,
Chang

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load a wrapping key at boot-time
  x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  crypto: x86/aes - Prepare for a new AES implementation
  crypto: x86/aes-kl - Implement the AES-XTS algorithm

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/arch/x86/index.rst              |   1 +
 Documentation/arch/x86/keylocker.rst          |  97 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Kconfig                       |  22 +
 arch/x86/crypto/Makefile                      |   3 +
 arch/x86/crypto/aes-helper_asm.S              |  22 +
 arch/x86/crypto/aes-helper_glue.h             | 161 +++++
 arch/x86/crypto/aeskl-intel_asm.S             | 580 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 216 +++++++
 arch/x86/crypto/aesni-intel_asm.S             |  55 +-
 arch/x86/crypto/aesni-intel_glue.c            | 242 +++-----
 arch/x86/crypto/aesni-intel_glue.h            |  17 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 212 +++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 27 files changed, 1573 insertions(+), 203 deletions(-)
 create mode 100644 Documentation/arch/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 3a128547bd4425cdef27c606efc88e1eb03a2dba
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v7 01/12] Documentation/x86: Document Key Locker
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Jonathan Corbet,
	linux-doc

Document the overview of the feature along with relevant consideration
when provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
  documentation into Documentation/arch/"). (Nathan Huckleberry)
* Remove a duplicated sentence -- 'But there is no AES-KL instruction
  to process a 192-bit key.'
* Update the text for clarity and readability:
  - Clarify the error code and exemplify the backup failure
  - Use 'wrapping key' instead of less readable 'IWKey'

Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
---
 Documentation/arch/x86/index.rst     |  1 +
 Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)
 create mode 100644 Documentation/arch/x86/keylocker.rst

diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
index c73d133fd37c..256359c24669 100644
--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@@ -42,3 +42,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
new file mode 100644
index 000000000000..5557b8d0659a
--- /dev/null
+++ b/Documentation/arch/x86/keylocker.rst
@@ -0,0 +1,97 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+CPU-internal Wrapping Key
+=========================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+Wrapping Key Restore Failure
+----------------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of a wrapping key restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by a wrapping key restore failure is a data-volume then it is
+possible that I/O errors on that volume do not bring down the rest of the
+system. However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new wrapping key at initial boot, not any time after like
+resume from suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private wrapping key state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
+The backup / restore facility is also not performant enough to be suitable
+for guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return -EINVAL.
+  Note that this flag is also set if the wrapping key is changed, e.g.,
+  because of the backup error.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. The AES-KL cipher
+  implementation logs a warning message with a 192-bit key and then falls
+  back to AES-NI. So, this 192-bit key-size limitation is only documented,
+  not enforced. It means the key will remain in clear-text in memory. This
+  is to meet Linux crypto-cipher expectation that each implementation must
+  support all the AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 01/12] Documentation/x86: Document Key Locker
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, bernie.keany, tglx,
	nhuck, gmazyland, elliott

Document the overview of the feature along with relevant consideration
when provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
  documentation into Documentation/arch/"). (Nathan Huckleberry)
* Remove a duplicated sentence -- 'But there is no AES-KL instruction
  to process a 192-bit key.'
* Update the text for clarity and readability:
  - Clarify the error code and exemplify the backup failure
  - Use 'wrapping key' instead of less readable 'IWKey'

Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
---
 Documentation/arch/x86/index.rst     |  1 +
 Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)
 create mode 100644 Documentation/arch/x86/keylocker.rst

diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
index c73d133fd37c..256359c24669 100644
--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@@ -42,3 +42,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
new file mode 100644
index 000000000000..5557b8d0659a
--- /dev/null
+++ b/Documentation/arch/x86/keylocker.rst
@@ -0,0 +1,97 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+CPU-internal Wrapping Key
+=========================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+Wrapping Key Restore Failure
+----------------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of a wrapping key restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by a wrapping key restore failure is a data-volume then it is
+possible that I/O errors on that volume do not bring down the rest of the
+system. However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new wrapping key at initial boot, not any time after like
+resume from suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private wrapping key state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
+The backup / restore facility is also not performant enough to be suitable
+for guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return -EINVAL.
+  Note that this flag is also set if the wrapping key is changed, e.g.,
+  because of the backup error.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. The AES-KL cipher
+  implementation logs a warning message with a 192-bit key and then falls
+  back to AES-NI. So, this 192-bit key-size limitation is only documented,
+  not enforced. It means the key will remain in clear-text in memory. This
+  is to meet Linux crypto-cipher expectation that each implementation must
+  support all the AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 02/12] x86/cpufeature: Enumerate Key Locker feature
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used
to transform a user key into a key handle. On rarely unexpected
hardware failure, the key could be lost.

Here enumerate this hardware capability. It will not be shown up in
/proc/cpuinfo as userspace usage is not supported. This is because
there is no ABI to coordinate the wrapping-key failure.

The feature supports Advanced Encryption Standard (AES) cipher
algorithm with new SIMD instruction set like its predecessor (AES-NI).
Mark the feature depending on XMM2. The new AES implementation will be
in the crypto library.

Add X86_FEATURE_KEYLOCKER to the disabled features mask at the moment.
It will be enabled under a new config option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- re-organize the change descriptions

Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cb8ca46213be..4a12673df7e9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -389,6 +389,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..eb841d694ed9 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -126,7 +132,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..262348aeaad1 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..200c5e69f78c 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 02/12] x86/cpufeature: Enumerate Key Locker feature
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used
to transform a user key into a key handle. On rarely unexpected
hardware failure, the key could be lost.

Here enumerate this hardware capability. It will not be shown up in
/proc/cpuinfo as userspace usage is not supported. This is because
there is no ABI to coordinate the wrapping-key failure.

The feature supports Advanced Encryption Standard (AES) cipher
algorithm with new SIMD instruction set like its predecessor (AES-NI).
Mark the feature depending on XMM2. The new AES implementation will be
in the crypto library.

Add X86_FEATURE_KEYLOCKER to the disabled features mask at the moment.
It will be enabled under a new config option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- re-organize the change descriptions

Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cb8ca46213be..4a12673df7e9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -389,6 +389,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..eb841d694ed9 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -126,7 +132,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..262348aeaad1 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..200c5e69f78c 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 03/12] x86/insn: Add Key Locker instructions to the opcode map
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
	Load a CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- add the reason a bit.

Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 03/12] x86/insn: Add Key Locker instructions to the opcode map
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, bernie.keany, tglx,
	nhuck, gmazyland, elliott

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
	Load a CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- add the reason a bit.

Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Rafael J. Wysocki,
	Peter Zijlstra

Key Locker introduces a CPU-internal wrapping key to encode a user key
to a key handle. Then a key handle is referenced instead of the plain
text key.

LOADIWKEY loads a wrapping key in the software-inaccessible CPU state.
It operates only in kernel mode.

The kernel will use this to load a new key at boot time. Establish a
wrapper to prepare for this use. Also, define struct iwkey to pass the
key value to it.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- clarify the reason and the changes a bit.

Changes from v5:
* Fix a typo: kernel_cpu_begin() -> kernel_fpu_begin()

Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
  Williams)

Note, Dan wondered if:
  WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
 arch/x86/include/asm/keylocker.h     | 25 ++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..9b3bec452b31
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary wrapping key storage.
+ * @integrity_key:	A 128-bit key to check that key handles have not
+ *			been tampered with.
+ * @encryption_key:	A 256-bit encryption key used in
+ *			wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index de48d1389936..dd2d8b40fce3 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <asm/processor-flags.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -283,6 +284,37 @@ static __always_inline void tile_release(void)
 	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key:	A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_fpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * LOADIWKEY %xmm1,%xmm2
+	 *
+	 * EAX and XMM0 are implicit operands. Load a key value
+	 * from XMM0-2 to a software-invisible CPU state. With zero
+	 * in EAX, CPU does not do hardware randomization and the key
+	 * backup is allowed.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, Rafael J. Wysocki, gmazyland, elliott

Key Locker introduces a CPU-internal wrapping key to encode a user key
to a key handle. Then a key handle is referenced instead of the plain
text key.

LOADIWKEY loads a wrapping key in the software-inaccessible CPU state.
It operates only in kernel mode.

The kernel will use this to load a new key at boot time. Establish a
wrapper to prepare for this use. Also, define struct iwkey to pass the
key value to it.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- clarify the reason and the changes a bit.

Changes from v5:
* Fix a typo: kernel_cpu_begin() -> kernel_fpu_begin()

Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
  Williams)

Note, Dan wondered if:
  WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
 arch/x86/include/asm/keylocker.h     | 25 ++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..9b3bec452b31
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary wrapping key storage.
+ * @integrity_key:	A 128-bit key to check that key handles have not
+ *			been tampered with.
+ * @encryption_key:	A 256-bit encryption key used in
+ *			wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index de48d1389936..dd2d8b40fce3 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <asm/processor-flags.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -283,6 +284,37 @@ static __always_inline void tile_release(void)
 	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key:	A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_fpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * LOADIWKEY %xmm1,%xmm2
+	 *
+	 * EAX and XMM0 are implicit operands. Load a key value
+	 * from XMM0-2 to a software-invisible CPU state. With zero
+	 * in EAX, CPU does not do hardware randomization and the key
+	 * backup is allowed.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 05/12] x86/msr-index: Add MSRs for Key Locker wrapping key
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

The CPU state that contains the wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the
cache (like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and
restore the wrapping key, and to check the key status. The wrapping
key is saved in a platform-scoped state of non-volatile media. The
backup itself and its path from the CPU are encrypted and integrity
protected.

Define those MSRs to be used to save and restore the key for S3/4
sleep states.

But the backup storage's non-volatility is not architecturally
guaranteed across off-states, such as S5 and G3. Then, the kernel may
generate a new key on the next boot.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Tweak the changelog -- put the last for those about other sleep
  states

Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..cd8555c0f3c2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1117,4 +1117,10 @@
 						* a #GP
 						*/
 
+/* MSRs for managing a CPU-internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 05/12] x86/msr-index: Add MSRs for Key Locker wrapping key
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

The CPU state that contains the wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the
cache (like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and
restore the wrapping key, and to check the key status. The wrapping
key is saved in a platform-scoped state of non-volatile media. The
backup itself and its path from the CPU are encrypted and integrity
protected.

Define those MSRs to be used to save and restore the key for S3/4
sleep states.

But the backup storage's non-volatility is not architecturally
guaranteed across off-states, such as S5 and G3. Then, the kernel may
generate a new key on the next boot.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Tweak the changelog -- put the last for those about other sleep
  states

Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..cd8555c0f3c2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1117,4 +1117,10 @@
 						* a #GP
 						*/
 
+/* MSRs for managing a CPU-internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 06/12] x86/keylocker: Define Key Locker CPUID leaf
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin

Both Key Locker enabling code in the x86 core and AES Key Locker code
in the crypto library will need to reference some CPUID bits. Define
these feature-specific CPUID leaf and bits.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Tweak the changelog -- comment the reason first and then brief the
  change.

Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9b3bec452b31..1717e0866254 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 06/12] x86/keylocker: Define Key Locker CPUID leaf
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, bernie.keany, tglx,
	nhuck, gmazyland, elliott

Both Key Locker enabling code in the x86 core and AES Key Locker code
in the crypto library will need to reference some CPUID bits. Define
these feature-specific CPUID leaf and bits.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Tweak the changelog -- comment the reason first and then brief the
  change.

Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9b3bec452b31..1717e0866254 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 07/12] x86/cpu/keylocker: Load a wrapping key at boot-time
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled when the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time. Ensure
the dynamic CPU bit (CPUID.AESKLE) is set before the load and flush
out these bytes after that.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt consumption, and that switching wrapping key per VM is
untenable, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Switch to use 'static inline' for the empty functions, instead of
  macro that disallows type checks. (Eric Biggers and Dave Hansen)
* Use memzero_explicit() to wipe out the key data instead of writing
  the poison value over there. (Robert Elliott)
* Massage the changelog for the better readability.

Changes from v5:
* Call out the disabling when the feature is available on a virtual
  machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
 arch/x86/include/asm/keylocker.h |  9 ++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/common.c     |  5 +-
 arch/x86/kernel/keylocker.c      | 82 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c        |  2 +
 5 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1717e0866254..9040e10ab648 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <linux/bits.h>
 #include <asm/fpu/types.h>
 
@@ -28,5 +29,13 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
+static inline void destroy_keylocker_data(void) { }
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4070a01c11b7..9e2647320dc6 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -134,6 +134,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f284e185aea..5882ff6e3c6b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
 #include <asm/traps.h>
@@ -1834,10 +1836,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..038e5268e6b8
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+	struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
+	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+	kernel_fpu_begin();
+	load_xmm_iwkey(&kl_setup.key);
+	kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c:		A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto out;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto disable;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (c == &boot_cpu_data) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+		/*
+		 * Check the feature readiness via CPUID. Note that the
+		 * CPUID AESKLE bit is conditionally set only when CR4.KL
+		 * is set.
+		 */
+		if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+		    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+			pr_debug("x86/keylocker: Not fully supported.\n");
+			goto disable;
+		}
+
+		generate_keylocker_data();
+	}
+
+	load_keylocker();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return;
+
+disable:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+out:
+	/* Make sure the feature disabled for kexec-reboot. */
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index e82be82cda55..41f24ab74fb9 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -86,6 +86,7 @@
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
 #include <asm/sev.h>
+#include <asm/keylocker.h>
 
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1386,6 +1387,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	nmi_selftest();
 	impress_friends();
 	cache_aps_init();
+	destroy_keylocker_data();
 }
 
 static int __initdata setup_possible_cpus = -1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 07/12] x86/cpu/keylocker: Load a wrapping key at boot-time
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled when the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time. Ensure
the dynamic CPU bit (CPUID.AESKLE) is set before the load and flush
out these bytes after that.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt consumption, and that switching wrapping key per VM is
untenable, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Switch to use 'static inline' for the empty functions, instead of
  macro that disallows type checks. (Eric Biggers and Dave Hansen)
* Use memzero_explicit() to wipe out the key data instead of writing
  the poison value over there. (Robert Elliott)
* Massage the changelog for the better readability.

Changes from v5:
* Call out the disabling when the feature is available on a virtual
  machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
 arch/x86/include/asm/keylocker.h |  9 ++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/common.c     |  5 +-
 arch/x86/kernel/keylocker.c      | 82 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c        |  2 +
 5 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1717e0866254..9040e10ab648 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <linux/bits.h>
 #include <asm/fpu/types.h>
 
@@ -28,5 +29,13 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
+static inline void destroy_keylocker_data(void) { }
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4070a01c11b7..9e2647320dc6 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -134,6 +134,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f284e185aea..5882ff6e3c6b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
 #include <asm/traps.h>
@@ -1834,10 +1836,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..038e5268e6b8
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+	struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
+	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+	kernel_fpu_begin();
+	load_xmm_iwkey(&kl_setup.key);
+	kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c:		A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto out;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto disable;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (c == &boot_cpu_data) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+		/*
+		 * Check the feature readiness via CPUID. Note that the
+		 * CPUID AESKLE bit is conditionally set only when CR4.KL
+		 * is set.
+		 */
+		if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+		    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+			pr_debug("x86/keylocker: Not fully supported.\n");
+			goto disable;
+		}
+
+		generate_keylocker_data();
+	}
+
+	load_keylocker();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return;
+
+disable:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+out:
+	/* Make sure the feature disabled for kexec-reboot. */
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index e82be82cda55..41f24ab74fb9 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -86,6 +86,7 @@
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
 #include <asm/sev.h>
+#include <asm/keylocker.h>
 
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1386,6 +1387,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	nmi_selftest();
 	impress_friends();
 	cache_aps_init();
+	destroy_keylocker_data();
 }
 
 static int __initdata setup_possible_cpus = -1;
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 08/12] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Rafael J. Wysocki,
	linux-pm

The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

So it is motivated to meet dm-crypt's expectation that the key handles
in the suspend-image remain valid after resume from an S-state. But,
when the system enters the ACPI S3 or S4 sleep states, the wrapping
key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. So, request a backup right after the key is
loaded at boot time, and copy it back to each CPU upon wakeup. Then,
the entirety of Key Locker has to be disabled if the backup mechanism
is not available unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This has the effect of invalidating
any key handles that might be present in a suspend-image. When this
happens, dm-crypt will see I/O errors resulting from error returns
from crypto_skcipher_encrypt()/decrypt().

While this will disrupt operations in the current boot, data is not at
risk and access is restored with new handles created by the new
wrapping key at the next boot. Also, manage a feature-specific flag to
communicate with the crypto implementation. This ensures to stop using
the AES instructions upon the key restore failure while not turning
off the feature.

== Off-states ==

While the backup may be maintained in non-volatile media across S5 and
G3 "off" states, it is neither architecturally guaranteed nor it is
expected by dm-crypt as prompting for the key whenever the volume is
started. Then, a reboot can cover this case with a new wrapping key.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v6:
* Limit the symbol export only when needed.
* Improve the coding style -- reduce an indent after
  'if() { ... return; }'. (Eric Biggers)
* Fix the coding style -- reduce an indent after if() {...return;}.
  (Eric Biggers) Tweak the comment along with that.
* Improve the function prototype, instead of using a macro. (Eric
  Biggers and Dave Hansen)
* Update the documentation:
  - Massage the changelog to clarify the problem-and-solution by
    sections
  - Clarify the comment about the key restore failure.

Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
  (Reported by Marvin Hsu marvin.hsu@intel.com) Add the function
  comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
  fall through the end that disables the feature. Otherwise, all the
  successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
  Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 136 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9040e10ab648..8621234c5897 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
 static inline void destroy_keylocker_data(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 038e5268e6b8..8f3acc34b6da 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,20 +11,48 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with wrapping key load. When the key restore
+ * fails, it is reset. This restore state is exported to the crypto
+ * library, then Key Locker will not be used there. So, the feature is
+ * soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(valid_keylocker);
+#endif
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
 	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
 }
 
+/*
+ * This is invoked when the boot-up is finished, which means wrapping
+ * key is loaded. Then, the 'valid_kl' flag is set here as the feature
+ * is enabled.
+ */
 void __init destroy_keylocker_data(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return;
+
 	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +62,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -52,6 +101,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -65,13 +115,53 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The wrapping key in CPU state is volatile in S3/4
+		 * states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
+		load_keylocker();
+
+		/* Backup a wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+
+		pr_info("x86/keylocker: Enabled.\n");
+		return;
 	}
 
-	load_keylocker();
+	/*
+	 * At boot time, the key is loaded directly from the memory.
+	 * Otherwise, this path performs the key recovery on each CPU
+	 * wake-up. Then, the backup in the platform-scoped state is
+	 * copied to the CPU state.
+	 */
+	if (!kl_setup.initialized) {
+		load_keylocker();
+		return;
+	} else if (valid_kl) {
+		int rc;
 
-	pr_info_once("x86/keylocker: Enabled.\n");
-	return;
+		rc = copy_keylocker();
+		if (!rc)
+			return;
+
+		/*
+		 * The key copy failed here. The subsequent feature
+		 * use will have inconsistent keys and failures. So,
+		 * invalidate the feature via the flag like with the
+		 * backup failure.
+		 */
+		valid_kl = false;
+		pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+	}
 
 disable:
 	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
@@ -80,3 +170,43 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the
+ * setup function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that
+	 * indicates an invalid backup if bit 0 is set and a read (or
+	 * write) error if bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature
+	 * via the flag to avoid any subsequent use. But keep the
+	 * feature with zeroed wrapping key instead of disabling it.
+	 */
+	pr_err("x86/keylocker: Failed to restore wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 08/12] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, Rafael J. Wysocki, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, linux-pm,
	bernie.keany, tglx, nhuck, gmazyland, elliott

The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

So it is motivated to meet dm-crypt's expectation that the key handles
in the suspend-image remain valid after resume from an S-state. But,
when the system enters the ACPI S3 or S4 sleep states, the wrapping
key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. So, request a backup right after the key is
loaded at boot time, and copy it back to each CPU upon wakeup. Then,
the entirety of Key Locker has to be disabled if the backup mechanism
is not available unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This has the effect of invalidating
any key handles that might be present in a suspend-image. When this
happens, dm-crypt will see I/O errors resulting from error returns
from crypto_skcipher_encrypt()/decrypt().

While this will disrupt operations in the current boot, data is not at
risk and access is restored with new handles created by the new
wrapping key at the next boot. Also, manage a feature-specific flag to
communicate with the crypto implementation. This ensures to stop using
the AES instructions upon the key restore failure while not turning
off the feature.

== Off-states ==

While the backup may be maintained in non-volatile media across S5 and
G3 "off" states, it is neither architecturally guaranteed nor it is
expected by dm-crypt as prompting for the key whenever the volume is
started. Then, a reboot can cover this case with a new wrapping key.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v6:
* Limit the symbol export only when needed.
* Improve the coding style -- reduce an indent after
  'if() { ... return; }'. (Eric Biggers)
* Fix the coding style -- reduce an indent after if() {...return;}.
  (Eric Biggers) Tweak the comment along with that.
* Improve the function prototype, instead of using a macro. (Eric
  Biggers and Dave Hansen)
* Update the documentation:
  - Massage the changelog to clarify the problem-and-solution by
    sections
  - Clarify the comment about the key restore failure.

Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
  (Reported by Marvin Hsu marvin.hsu@intel.com) Add the function
  comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
  fall through the end that disables the feature. Otherwise, all the
  successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
  Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 136 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9040e10ab648..8621234c5897 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
 static inline void destroy_keylocker_data(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 038e5268e6b8..8f3acc34b6da 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,20 +11,48 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with wrapping key load. When the key restore
+ * fails, it is reset. This restore state is exported to the crypto
+ * library, then Key Locker will not be used there. So, the feature is
+ * soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(valid_keylocker);
+#endif
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
 	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
 }
 
+/*
+ * This is invoked when the boot-up is finished, which means wrapping
+ * key is loaded. Then, the 'valid_kl' flag is set here as the feature
+ * is enabled.
+ */
 void __init destroy_keylocker_data(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return;
+
 	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +62,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -52,6 +101,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -65,13 +115,53 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The wrapping key in CPU state is volatile in S3/4
+		 * states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
+		load_keylocker();
+
+		/* Backup a wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+
+		pr_info("x86/keylocker: Enabled.\n");
+		return;
 	}
 
-	load_keylocker();
+	/*
+	 * At boot time, the key is loaded directly from the memory.
+	 * Otherwise, this path performs the key recovery on each CPU
+	 * wake-up. Then, the backup in the platform-scoped state is
+	 * copied to the CPU state.
+	 */
+	if (!kl_setup.initialized) {
+		load_keylocker();
+		return;
+	} else if (valid_kl) {
+		int rc;
 
-	pr_info_once("x86/keylocker: Enabled.\n");
-	return;
+		rc = copy_keylocker();
+		if (!rc)
+			return;
+
+		/*
+		 * The key copy failed here. The subsequent feature
+		 * use will have inconsistent keys and failures. So,
+		 * invalidate the feature via the flag like with the
+		 * backup failure.
+		 */
+		valid_kl = false;
+		pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+	}
 
 disable:
 	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
@@ -80,3 +170,43 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the
+ * setup function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that
+	 * indicates an invalid backup if bit 0 is set and a read (or
+	 * write) error if bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature
+	 * via the flag to avoid any subsequent use. But keep the
+	 * feature with zeroed wrapping key instead of disabling it.
+	 */
+	pr_err("x86/keylocker: Failed to restore wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 09/12] x86/cpu: Add a configuration and command line option for Key Locker
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Jonathan Corbet, Ingo Molnar, H. Peter Anvin,
	linux-doc

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at
boot. The option is selected by the Key Locker cipher module
CRYPTO_AES_KL (to be added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Rebase on the upstream: commit a894a8a56b57 ("Documentation:
  kernel-parameters: sort all "no..." parameters")

Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/Kconfig                                |  3 +++
 arch/x86/kernel/cpu/common.c                    | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c1247ec4589a..b42fc53cbcf9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3749,6 +3749,8 @@
 			kernel and module base offset ASLR (Address Space
 			Layout Randomization).
 
+	nokeylocker     [X86] Disable Key Locker hardware feature.
+
 	no-kvmapf	[X86,KVM] Disable paravirtualized asynchronous page
 			fault handling.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a98c5f82be48..f9788b477db1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1879,6 +1879,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5882ff6e3c6b..718ff1b1d6dd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -402,6 +402,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 	X86_CR4_FSGSBASE | X86_CR4_CET;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+	/* Expect an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info("x86/keylocker: Disabled by kernel command line.\n");
+	return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 09/12] x86/cpu: Add a configuration and command line option for Key Locker
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, bernie.keany, tglx,
	nhuck, gmazyland, elliott

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at
boot. The option is selected by the Key Locker cipher module
CRYPTO_AES_KL (to be added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Rebase on the upstream: commit a894a8a56b57 ("Documentation:
  kernel-parameters: sort all "no..." parameters")

Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/Kconfig                                |  3 +++
 arch/x86/kernel/cpu/common.c                    | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c1247ec4589a..b42fc53cbcf9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3749,6 +3749,8 @@
 			kernel and module base offset ASLR (Address Space
 			Layout Randomization).
 
+	nokeylocker     [X86] Disable Key Locker hardware feature.
+
 	no-kvmapf	[X86,KVM] Disable paravirtualized asynchronous page
 			fault handling.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a98c5f82be48..f9788b477db1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1879,6 +1879,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5882ff6e3c6b..718ff1b1d6dd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -402,6 +402,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 	X86_CR4_FSGSBASE | X86_CR4_CET;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+	/* Expect an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info("x86/keylocker: Disabled by kernel command line.\n");
+	return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, David S. Miller, Ingo Molnar, H. Peter Anvin

Every field in struct aesni_xts_ctx is a pointer to a byte array. Each
array has a size of struct crypto_aes_ctx. Then, the field can be
redefined as that struct type instead of the obscure pointer.

Subsequently, the address to struct aesni_xts_ctx should be aligned
right away. This can also simplify the runtime alignment code.

Redefine struct aesni_xts_ctx, and align its address on the front.
This draws a rework by refactoring the common alignment code. Then,
clean up the alignment for the old pointers.

Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Add as a new patch. (Eric Biggers)

This fix was considered to be better addressed before the preparatory
AES-NI code rework.
---
 arch/x86/crypto/aesni-intel_glue.c | 38 +++++++++++++++++-------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..97a1629b84c4 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -61,8 +61,8 @@ struct generic_gcmaes_ctx {
 };
 
 struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -219,14 +219,20 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
+static inline unsigned long aes_align_addr(unsigned long addr)
+{
+	return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
+	       ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
+}
+
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 {
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
+	return (struct crypto_aes_ctx *)aes_align_addr((unsigned long)raw_ctx);
+}
 
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return (struct aesni_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
 }
 
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
@@ -883,7 +889,7 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int err;
 
 	err = xts_verify_key(tfm, key, keylen);
@@ -893,20 +899,20 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 	keylen /= 2;
 
 	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
+	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
 				 key, keylen);
 	if (err)
 		return err;
 
 	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
+	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
 				  key + keylen, keylen);
 }
 
 static int xts_crypt(struct skcipher_request *req, bool encrypt)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int tail = req->cryptlen % AES_BLOCK_SIZE;
 	struct skcipher_request subreq;
 	struct skcipher_walk walk;
@@ -942,7 +948,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 	kernel_fpu_begin();
 
 	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
 
 	while (walk.nbytes > 0) {
 		int nbytes = walk.nbytes;
@@ -951,11 +957,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 			nbytes &= ~(AES_BLOCK_SIZE - 1);
 
 		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_encrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  nbytes, walk.iv);
 		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_decrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  nbytes, walk.iv);
 		kernel_fpu_end();
@@ -983,11 +989,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 
 		kernel_fpu_begin();
 		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_encrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  walk.nbytes, walk.iv);
 		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_decrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  walk.nbytes, walk.iv);
 		kernel_fpu_end();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

Every field in struct aesni_xts_ctx is a pointer to a byte array. Each
array has a size of struct crypto_aes_ctx. Then, the field can be
redefined as that struct type instead of the obscure pointer.

Subsequently, the address to struct aesni_xts_ctx should be aligned
right away. This can also simplify the runtime alignment code.

Redefine struct aesni_xts_ctx, and align its address on the front.
This draws a rework by refactoring the common alignment code. Then,
clean up the alignment for the old pointers.

Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Add as a new patch. (Eric Biggers)

This fix was considered to be better addressed before the preparatory
AES-NI code rework.
---
 arch/x86/crypto/aesni-intel_glue.c | 38 +++++++++++++++++-------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..97a1629b84c4 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -61,8 +61,8 @@ struct generic_gcmaes_ctx {
 };
 
 struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -219,14 +219,20 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
+static inline unsigned long aes_align_addr(unsigned long addr)
+{
+	return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
+	       ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
+}
+
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 {
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
+	return (struct crypto_aes_ctx *)aes_align_addr((unsigned long)raw_ctx);
+}
 
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return (struct aesni_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
 }
 
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
@@ -883,7 +889,7 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int err;
 
 	err = xts_verify_key(tfm, key, keylen);
@@ -893,20 +899,20 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 	keylen /= 2;
 
 	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
+	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
 				 key, keylen);
 	if (err)
 		return err;
 
 	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
+	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
 				  key + keylen, keylen);
 }
 
 static int xts_crypt(struct skcipher_request *req, bool encrypt)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int tail = req->cryptlen % AES_BLOCK_SIZE;
 	struct skcipher_request subreq;
 	struct skcipher_walk walk;
@@ -942,7 +948,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 	kernel_fpu_begin();
 
 	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
 
 	while (walk.nbytes > 0) {
 		int nbytes = walk.nbytes;
@@ -951,11 +957,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 			nbytes &= ~(AES_BLOCK_SIZE - 1);
 
 		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_encrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  nbytes, walk.iv);
 		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_decrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  nbytes, walk.iv);
 		kernel_fpu_end();
@@ -983,11 +989,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 
 		kernel_fpu_begin();
 		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_encrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  walk.nbytes, walk.iv);
 		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_decrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  walk.nbytes, walk.iv);
 		kernel_fpu_end();
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 11/12] crypto: x86/aes - Prepare for a new AES implementation
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, David S. Miller, Ingo Molnar, H. Peter Anvin

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have
the same prototype as AES-NI. Then, the glue code will be the same as
AES-NI's.

The new AES code will support the XTS mode alone as disk encryption is
the only intended use case.

Refactor the XTS-related code to avoid code duplication, and also move
some constant values to be shareable. Introduce wrappers for data
transformation functions to return an error code as AES-KL may
populate it.

The refactored code needs to invoke the implementation-specific
functions. Then, while reusable, the code has to be inlined to the
caller to avoid the indirect call for its possible overhead.

Neither functional change nor performance regression is intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Inline the helper code to avoid the indirect call. (Eric Biggers)
* Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
* Don't export symbols yet here. Instead, do it when needed later.
* Improve the coding style:
  - Follow the symbol convention: '_' -> '__' (Eric Biggers)
  - Fix a style issue -- 'dst = src = ...' catched by checkpatch.pl:
    "CHECK: multiple assignments should be avoided"
* Cleanup: move some define back to AES-NI code as not used by AES-KL

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link:
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/aes-helper_asm.S   |  22 +++
 arch/x86/crypto/aes-helper_glue.h  | 161 +++++++++++++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |  47 +++----
 arch/x86/crypto/aesni-intel_glue.c | 218 ++++++++---------------------
 4 files changed, 257 insertions(+), 191 deletions(-)
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h

diff --git a/arch/x86/crypto/aes-helper_asm.S b/arch/x86/crypto/aes-helper_asm.S
new file mode 100644
index 000000000000..b31abcdf63cb
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_asm.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
new file mode 100644
index 000000000000..af422e2c9263
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN		16
+#define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline unsigned long aes_align_addr(unsigned long addr)
+{
+	return (crypto_tfm_ctx_alignment() >= AES_ALIGN) ? ALIGN(addr, 1) : ALIGN(addr, AES_ALIGN);
+}
+
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return (struct aes_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		  int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 *in_key,
+			    unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+		 int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				 unsigned int len, u8 *iv),
+		 int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+		else
+			dst = src;
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3ac7487ecad2..3922d24cae2b 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-helper_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1935,9 +1936,9 @@ SYM_FUNC_START(aesni_set_key)
 SYM_FUNC_END(aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(__aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(__aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(__aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(__aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2689,22 +2690,14 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
+#ifdef __x86_64__
+
 .pushsection .rodata
 .align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
 .Lbswap_mask:
 	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
 .popsection
 
-#ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
  *	setup registers used by _aesni_inc
@@ -2819,12 +2812,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2831,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(__aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2996,13 +2983,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(__aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(__aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3158,4 +3145,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(__aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 97a1629b84c4..857cff484bb3 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,25 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-helper_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -82,8 +74,8 @@ struct gcm_context_data {
 
 asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
 			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +89,40 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
+static int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_dec(ctx, out, in);
+	return 0;
+}
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+
+static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			     unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+
+static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			     unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
 
 #ifdef CONFIG_X86_64
 
@@ -201,7 +219,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -211,7 +229,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -219,22 +237,11 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
-static inline unsigned long aes_align_addr(unsigned long addr)
-{
-	return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
-	       ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
-}
-
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 {
 	return (struct crypto_aes_ctx *)aes_align_addr((unsigned long)raw_ctx);
 }
 
-static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
-{
-	return (struct aesni_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -270,7 +277,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		__aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -283,7 +290,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		__aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -534,7 +541,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			__aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -679,8 +686,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -835,8 +842,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -863,8 +870,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -889,128 +896,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1166,8 +1062,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1183,8 +1079,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 11/12] crypto: x86/aes - Prepare for a new AES implementation
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have
the same prototype as AES-NI. Then, the glue code will be the same as
AES-NI's.

The new AES code will support the XTS mode alone as disk encryption is
the only intended use case.

Refactor the XTS-related code to avoid code duplication, and also move
some constant values to be shareable. Introduce wrappers for data
transformation functions to return an error code as AES-KL may
populate it.

The refactored code needs to invoke the implementation-specific
functions. Then, while reusable, the code has to be inlined to the
caller to avoid the indirect call for its possible overhead.

Neither functional change nor performance regression is intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Inline the helper code to avoid the indirect call. (Eric Biggers)
* Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
* Don't export symbols yet here. Instead, do it when needed later.
* Improve the coding style:
  - Follow the symbol convention: '_' -> '__' (Eric Biggers)
  - Fix a style issue -- 'dst = src = ...' catched by checkpatch.pl:
    "CHECK: multiple assignments should be avoided"
* Cleanup: move some define back to AES-NI code as not used by AES-KL

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link:
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/aes-helper_asm.S   |  22 +++
 arch/x86/crypto/aes-helper_glue.h  | 161 +++++++++++++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |  47 +++----
 arch/x86/crypto/aesni-intel_glue.c | 218 ++++++++---------------------
 4 files changed, 257 insertions(+), 191 deletions(-)
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h

diff --git a/arch/x86/crypto/aes-helper_asm.S b/arch/x86/crypto/aes-helper_asm.S
new file mode 100644
index 000000000000..b31abcdf63cb
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_asm.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
new file mode 100644
index 000000000000..af422e2c9263
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN		16
+#define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline unsigned long aes_align_addr(unsigned long addr)
+{
+	return (crypto_tfm_ctx_alignment() >= AES_ALIGN) ? ALIGN(addr, 1) : ALIGN(addr, AES_ALIGN);
+}
+
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return (struct aes_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		  int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 *in_key,
+			    unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+		 int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				 unsigned int len, u8 *iv),
+		 int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+		else
+			dst = src;
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3ac7487ecad2..3922d24cae2b 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-helper_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1935,9 +1936,9 @@ SYM_FUNC_START(aesni_set_key)
 SYM_FUNC_END(aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(__aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(__aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(__aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(__aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2689,22 +2690,14 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
+#ifdef __x86_64__
+
 .pushsection .rodata
 .align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
 .Lbswap_mask:
 	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
 .popsection
 
-#ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
  *	setup registers used by _aesni_inc
@@ -2819,12 +2812,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2831,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(__aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2996,13 +2983,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(__aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(__aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3158,4 +3145,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(__aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 97a1629b84c4..857cff484bb3 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,25 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-helper_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -82,8 +74,8 @@ struct gcm_context_data {
 
 asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
 			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -97,14 +89,40 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
+static int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_dec(ctx, out, in);
+	return 0;
+}
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+
+static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			     unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+
+static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			     unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
 
 #ifdef CONFIG_X86_64
 
@@ -201,7 +219,7 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -211,7 +229,7 @@ aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
+	unsigned long align = AES_ALIGN;
 
 	if (align <= crypto_tfm_ctx_alignment())
 		align = 1;
@@ -219,22 +237,11 @@ generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 }
 #endif
 
-static inline unsigned long aes_align_addr(unsigned long addr)
-{
-	return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
-	       ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
-}
-
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 {
 	return (struct crypto_aes_ctx *)aes_align_addr((unsigned long)raw_ctx);
 }
 
-static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
-{
-	return (struct aesni_xts_ctx *)aes_align_addr((unsigned long)crypto_skcipher_ctx(tfm));
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -270,7 +277,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		__aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -283,7 +290,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		__aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -534,7 +541,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			__aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -679,8 +686,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -835,8 +842,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -863,8 +870,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -889,128 +896,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1166,8 +1062,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1183,8 +1079,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-05-24 16:57       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformation are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The
implementation has some details worth mentioning, which differentiate
itself from others, that users may need to be aware of:

== Key Handle Restriction ==

A key handle may be encoded with some restrictions. Restrict every
handle only available in kernel mode via setkey().

Subsequently the key handle could be corrupted or fail with handle
restrictions. Then, encrypt()/decrypt() returns -EINVAL.

== API Limitation ==

The setkey() function transforms an AES key into a handle. But, an
extended key is a usual outcome of setkey() in other AES cipher
implementations. For this reason, a setkey() failure does not fall
back to the other. So, expose AES-KL methods via synchronous
interfaces only.

=== AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

== Wrapping Key Restore Failure ==

The failure of setkey() as well as encode()/decode() is also possible
with the wrapping key failure. In the event of hardware failure, the
wrapping key is lost from deep sleep states. Then, those functions
return -ENODEV as the feature is disabled.

== Userspace Exposition ==

Some hardware implementations may have some performance penalties.
E.g., the cryptsetup benchmark indicates the raw throughput is
measurably slower than AES-NI. But, for disk encryption, storage
bandwidth may be the bottleneck before encryption bandwidth.

This, along with the above points, is an end-user consideration for
selecting AES-KL over AES-NI. Thus, advertise it with a unique name
'xts-aes-aeskl' in /proc/crypto while not replacing AES-NI under the
generic name 'xts(aes)' with a lower priority.

== 64-bit Only ==

AES-KL provides wide instructions that process eight blocks at once
which can boost the AES performance. Leveraging those, the code needs
to clobber more than eight 128-bit registers.

But, the 32-bit does not have enough wide registers. Then, the
performance is unlikely better than 64-bit which has already a gap vs.
AES-NI. So, simply make it for the 64-bit mode only at the moment.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
  - Trim unnecessary checks. (Eric Biggers)
  - Document the reason
  - Make sure both XTS keys with the same size
* Adjust the Kconfig change:
  - Move the location. (Robert Elliott)
  - Trim the description to follow others such as AES-NI.
* Update the changelog:
  - Explain the priority value for the common name under 'User
    Exposition' (renamed from 'Performance'). (Eric Biggers)
  - Trim the introduction
  - Switch to more imperative mood for those explaining the code
    change
  - Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
  - Remove unused one.
  - Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Kconfig            |  22 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-intel_asm.S  | 580 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 216 +++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |   8 +-
 arch/x86/crypto/aesni-intel_glue.c |  40 +-
 arch/x86/crypto/aesni-intel_glue.h |  17 +
 7 files changed, 873 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 9bbfd01cfa2f..658adfd7aebf 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -2,6 +2,11 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (x86)"
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config CRYPTO_CURVE25519_X86
 	tristate "Public key crypto: Curve25519 (ADX)"
 	depends on X86 && 64BIT
@@ -29,6 +34,23 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	tristate "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	depends on CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..402dd7796375
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,580 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define INC	%xmm13
+
+#define IN1	%xmm8
+#define IN	IN1
+
+#define AREG	%rax
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(aeskl_setkey)
+	FRAME_BEGIN
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	xor AREG, AREG
+	FRAME_END
+	RET
+SYM_FUNC_END(aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	jmp .Lenc_noerr
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+
+.Lenc_noerr:
+	xor AREG, AREG
+	jmp .Lenc_end
+.Lenc_err:
+	mov $(-EINVAL), AREG
+.Lenc_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * int __aeskl_dec(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_dec)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	mov 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Ldec_128
+	aesdec256kl (HANDLEP), STATE
+	jz .Ldec_err
+	jmp .Ldec_noerr
+.Ldec_128:
+	aesdec128kl (HANDLEP), STATE
+	jz .Ldec_err
+
+.Ldec_noerr:
+	xor AREG, AREG
+	jmp .Ldec_end
+.Ldec_err:
+	mov $(-EINVAL), AREG
+.Ldec_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_dec)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+/*
+ * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+	sub $128, LEN
+	jl .Lxts_enc1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_enc8_128
+	aesencwide256kl (%rdi)
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+	aesencwide128kl (%rdi)
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+	movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+	mov $(-EINVAL), AREG
+.Lxts_enc_ret:
+	FRAME_END
+	RET
+
+.Lxts_enc1_pre:
+	add $128, LEN
+	jz .Lxts_enc_ret_iv
+	sub $16, LEN
+	jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+	movdqu (INP), STATE1
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_enc1_out
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_enc_cts1
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+
+.Lxts_enc_cts1:
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_cts_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+	pxor IV, STATE1
+	movups STATE1, (OUTP)
+	jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+	test $15, LEN
+	jz .Lxts_dec8
+	sub $16, LEN
+
+.Lxts_dec8:
+	sub $128, LEN
+	jl .Lxts_dec1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_dec8_128
+	aesdecwide256kl (%rdi)
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+	aesdecwide128kl (%rdi)
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+	movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+	mov $(-EINVAL), AREG
+.Lxts_dec_ret:
+	FRAME_END
+	RET
+
+.Lxts_dec1_pre:
+	add $128, LEN
+	jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+	movdqu (INP), STATE1
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_dec_cts1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_dec1_out
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+	pxor IV, STATE1
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor STATE5, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+	pxor STATE5, STATE1
+
+	movups STATE1, (OUTP)
+	jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..c6824c50fc72
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int __aeskl_dec(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+			       unsigned int keylen)
+{
+	/* raw_ctx is an aligned address via xts_setkey_common() */
+	struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
+	    keylen != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	if (unlikely(keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		err = aesni_set_key(ctx, in_key, keylen);
+	} else {
+		if (!valid_keylocker())
+			err = -ENODEV;
+		else
+			err = aeskl_setkey(ctx, in_key, keylen);
+	}
+	kernel_fpu_end();
+
+	return err;
+}
+
+/*
+ * The below wrappers for the encryption/decryption functions
+ * incorporate the feature availability check:
+ *
+ * In the rare event of hardware failure, the wrapping key can be lost
+ * after wake-up from a deep sleep state. Then, this check helps to
+ * avoid any subsequent misuse with populating a proper error code.
+ */
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_dec(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(ctx, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(ctx, out, in, len, iv);
+}
+
+static int aeskl_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			    unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey_common);
+}
+
+static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
+		return -EINVAL;
+
+	*keylen = ctx->crypt_ctx.key_length;
+	return 0;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	u32 keylen;
+	int err;
+
+	err = xts_keylen(req, &keylen);
+	if (err)
+		return err;
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	u32 keylen;
+	int rc;
+
+	rc = xts_keylen(req, &keylen);
+	if (rc)
+		return rc;
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= aeskl_xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+	int err;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
+	 * support 192-bit keys. To make itself AES-compliant, it falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					     aeskl_simd_skciphers);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3922d24cae2b..d38abcc69d9e 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1821,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                     unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(__aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1933,7 +1933,7 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(__aesni_set_key)
 
 /*
  * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 857cff484bb3..3aaf5504e349 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
 #include <linux/static_call.h>
 
 #include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
 
 #define RFC4106_HASH_SUBKEY_SIZE 16
 #define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,8 +73,8 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
+asmlinkage int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			       unsigned int key_len);
 asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -89,17 +90,32 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
-static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return __aesni_set_key(ctx, in_key, key_len);
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_set_key);
+#endif
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
 {
 	__aesni_enc(ctx, out, in);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_enc);
+#endif
 
-static int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
 {
 	__aesni_dec(ctx, out, in);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_dec);
+#endif
 
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
@@ -110,19 +126,25 @@ asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
 				    const u8 *in, unsigned int len, u8 *iv);
 
-static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
-			     unsigned int len, u8 *iv)
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
 {
 	__aesni_xts_encrypt(ctx, out, in, len, iv);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+#endif
 
-static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
-			     unsigned int len, u8 *iv)
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
 {
 	__aesni_xts_decrypt(ctx, out, in, len, iv);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
+#endif
 
 #ifdef CONFIG_X86_64
 
@@ -256,7 +278,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = __aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
@ 2023-05-24 16:57       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 16:57 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, David S. Miller, Ingo Molnar, H. Peter Anvin

Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformation are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The
implementation has some details worth mentioning, which differentiate
itself from others, that users may need to be aware of:

== Key Handle Restriction ==

A key handle may be encoded with some restrictions. Restrict every
handle only available in kernel mode via setkey().

Subsequently the key handle could be corrupted or fail with handle
restrictions. Then, encrypt()/decrypt() returns -EINVAL.

== API Limitation ==

The setkey() function transforms an AES key into a handle. But, an
extended key is a usual outcome of setkey() in other AES cipher
implementations. For this reason, a setkey() failure does not fall
back to the other. So, expose AES-KL methods via synchronous
interfaces only.

=== AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

== Wrapping Key Restore Failure ==

The failure of setkey() as well as encode()/decode() is also possible
with the wrapping key failure. In the event of hardware failure, the
wrapping key is lost from deep sleep states. Then, those functions
return -ENODEV as the feature is disabled.

== Userspace Exposition ==

Some hardware implementations may have some performance penalties.
E.g., the cryptsetup benchmark indicates the raw throughput is
measurably slower than AES-NI. But, for disk encryption, storage
bandwidth may be the bottleneck before encryption bandwidth.

This, along with the above points, is an end-user consideration for
selecting AES-KL over AES-NI. Thus, advertise it with a unique name
'xts-aes-aeskl' in /proc/crypto while not replacing AES-NI under the
generic name 'xts(aes)' with a lower priority.

== 64-bit Only ==

AES-KL provides wide instructions that process eight blocks at once
which can boost the AES performance. Leveraging those, the code needs
to clobber more than eight 128-bit registers.

But, the 32-bit does not have enough wide registers. Then, the
performance is unlikely better than 64-bit which has already a gap vs.
AES-NI. So, simply make it for the 64-bit mode only at the moment.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
  - Trim unnecessary checks. (Eric Biggers)
  - Document the reason
  - Make sure both XTS keys with the same size
* Adjust the Kconfig change:
  - Move the location. (Robert Elliott)
  - Trim the description to follow others such as AES-NI.
* Update the changelog:
  - Explain the priority value for the common name under 'User
    Exposition' (renamed from 'Performance'). (Eric Biggers)
  - Trim the introduction
  - Switch to more imperative mood for those explaining the code
    change
  - Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
  - Remove unused one.
  - Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Kconfig            |  22 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-intel_asm.S  | 580 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 216 +++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |   8 +-
 arch/x86/crypto/aesni-intel_glue.c |  40 +-
 arch/x86/crypto/aesni-intel_glue.h |  17 +
 7 files changed, 873 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 9bbfd01cfa2f..658adfd7aebf 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -2,6 +2,11 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (x86)"
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config CRYPTO_CURVE25519_X86
 	tristate "Public key crypto: Curve25519 (ADX)"
 	depends on X86 && 64BIT
@@ -29,6 +34,23 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	tristate "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	depends on CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..402dd7796375
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,580 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define INC	%xmm13
+
+#define IN1	%xmm8
+#define IN	IN1
+
+#define AREG	%rax
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(aeskl_setkey)
+	FRAME_BEGIN
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	xor AREG, AREG
+	FRAME_END
+	RET
+SYM_FUNC_END(aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	jmp .Lenc_noerr
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+
+.Lenc_noerr:
+	xor AREG, AREG
+	jmp .Lenc_end
+.Lenc_err:
+	mov $(-EINVAL), AREG
+.Lenc_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * int __aeskl_dec(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_dec)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	mov 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Ldec_128
+	aesdec256kl (HANDLEP), STATE
+	jz .Ldec_err
+	jmp .Ldec_noerr
+.Ldec_128:
+	aesdec128kl (HANDLEP), STATE
+	jz .Ldec_err
+
+.Ldec_noerr:
+	xor AREG, AREG
+	jmp .Ldec_end
+.Ldec_err:
+	mov $(-EINVAL), AREG
+.Ldec_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_dec)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+/*
+ * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+	sub $128, LEN
+	jl .Lxts_enc1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_enc8_128
+	aesencwide256kl (%rdi)
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+	aesencwide128kl (%rdi)
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+	movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+	mov $(-EINVAL), AREG
+.Lxts_enc_ret:
+	FRAME_END
+	RET
+
+.Lxts_enc1_pre:
+	add $128, LEN
+	jz .Lxts_enc_ret_iv
+	sub $16, LEN
+	jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+	movdqu (INP), STATE1
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_enc1_out
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_enc_cts1
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+
+.Lxts_enc_cts1:
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_cts_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+	pxor IV, STATE1
+	movups STATE1, (OUTP)
+	jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+	test $15, LEN
+	jz .Lxts_dec8
+	sub $16, LEN
+
+.Lxts_dec8:
+	sub $128, LEN
+	jl .Lxts_dec1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_dec8_128
+	aesdecwide256kl (%rdi)
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+	aesdecwide128kl (%rdi)
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+	movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+	mov $(-EINVAL), AREG
+.Lxts_dec_ret:
+	FRAME_END
+	RET
+
+.Lxts_dec1_pre:
+	add $128, LEN
+	jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+	movdqu (INP), STATE1
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_dec_cts1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_dec1_out
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+	pxor IV, STATE1
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor STATE5, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+	pxor STATE5, STATE1
+
+	movups STATE1, (OUTP)
+	jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..c6824c50fc72
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage int __aeskl_dec(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+static int aeskl_setkey_common(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+			       unsigned int keylen)
+{
+	/* raw_ctx is an aligned address via xts_setkey_common() */
+	struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
+	    keylen != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	if (unlikely(keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		err = aesni_set_key(ctx, in_key, keylen);
+	} else {
+		if (!valid_keylocker())
+			err = -ENODEV;
+		else
+			err = aeskl_setkey(ctx, in_key, keylen);
+	}
+	kernel_fpu_end();
+
+	return err;
+}
+
+/*
+ * The below wrappers for the encryption/decryption functions
+ * incorporate the feature availability check:
+ *
+ * In the rare event of hardware failure, the wrapping key can be lost
+ * after wake-up from a deep sleep state. Then, this check helps to
+ * avoid any subsequent misuse with populating a proper error code.
+ */
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_dec(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_dec(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(ctx, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(ctx, out, in, len, iv);
+}
+
+static int aeskl_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			    unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey_common);
+}
+
+static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
+		return -EINVAL;
+
+	*keylen = ctx->crypt_ctx.key_length;
+	return 0;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	u32 keylen;
+	int err;
+
+	err = xts_keylen(req, &keylen);
+	if (err)
+		return err;
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	u32 keylen;
+	int rc;
+
+	rc = xts_keylen(req, &keylen);
+	if (rc)
+		return rc;
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= aeskl_xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+	int err;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
+	 * support 192-bit keys. To make itself AES-compliant, it falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					     aeskl_simd_skciphers);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3922d24cae2b..d38abcc69d9e 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1821,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                     unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(__aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1933,7 +1933,7 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(__aesni_set_key)
 
 /*
  * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 857cff484bb3..3aaf5504e349 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
 #include <linux/static_call.h>
 
 #include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
 
 #define RFC4106_HASH_SUBKEY_SIZE 16
 #define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,8 +73,8 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
+asmlinkage int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			       unsigned int key_len);
 asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -89,17 +90,32 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
-static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return __aesni_set_key(ctx, in_key, key_len);
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_set_key);
+#endif
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
 {
 	__aesni_enc(ctx, out, in);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_enc);
+#endif
 
-static int aesni_dec(const void *ctx, u8 *out, const u8 *in)
+int aesni_dec(const void *ctx, u8 *out, const u8 *in)
 {
 	__aesni_dec(ctx, out, in);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_dec);
+#endif
 
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
@@ -110,19 +126,25 @@ asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
 				    const u8 *in, unsigned int len, u8 *iv);
 
-static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
-			     unsigned int len, u8 *iv)
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
 {
 	__aesni_xts_encrypt(ctx, out, in, len, iv);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+#endif
 
-static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
-			     unsigned int len, u8 *iv)
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
 {
 	__aesni_xts_decrypt(ctx, out, in, len, iv);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
+#endif
 
 #ifdef CONFIG_X86_64
 
@@ -256,7 +278,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = __aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..81ecacb4e54c
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+int aesni_dec(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
  2023-05-08 18:18         ` [dm-devel] " Chang S. Bae
@ 2023-05-24 17:18           ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	David S. Miller, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Nathan Chancellor, Nick Desaulniers, Tom Rix

On 5/8/2023 11:18 AM, Chang S. Bae wrote:
> 
> I thought this is something benign to stay here. But, yes, I agree that 
> it is better to simplify the code.

After staring at this a bit, I realized that at least the feature flag 
check needs to stay there. This can populate a proper error code when 
the feature abruptly gets disabled (most likely due to the backup failure).

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions
@ 2023-05-24 17:18           ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-24 17:18 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Tom Rix, dave.hansen, lalithambika.krishnakumar, dm-devel,
	H. Peter Anvin, ardb, herbert, x86, mingo, Ingo Molnar, bp,
	gmazyland, Nathan Chancellor, Borislav Petkov, luto,
	dan.j.williams, charishma1.gairuboyina, Nick Desaulniers,
	linux-kernel, tglx, linux-crypto, bernie.keany, David S. Miller

On 5/8/2023 11:18 AM, Chang S. Bae wrote:
> 
> I thought this is something benign to stay here. But, yes, I agree that 
> it is better to simplify the code.

After staring at this a bit, I realized that at least the feature flag 
check needs to stay there. This can populate a proper error code when 
the feature abruptly gets disabled (most likely due to the backup failure).

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
@ 2023-05-26  6:54         ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-26  6:54 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On Wed, May 24, 2023 at 09:57:15AM -0700, Chang S. Bae wrote:
> +static inline unsigned long aes_align_addr(unsigned long addr)
> +{
> +	return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
> +	       ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
> +}

Wouldn't it be simpler to make this take and return 'void *'?  Then the callers
wouldn't need to cast to/from 'unsigned long'.

Also, aligning to a 1-byte boundary is a no-op.

So, please consider the following:

static inline void *aes_align_addr(void *addr)
{
	if (crypto_tfm_ctx_alignment() >= AES_ALIGN)
		return addr;
	return PTR_ALIGN(addr, AES_ALIGN);
}

Also, aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get() should call this
too, rather than duplicating similar code.

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
@ 2023-05-26  6:54         ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-26  6:54 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On Wed, May 24, 2023 at 09:57:15AM -0700, Chang S. Bae wrote:
> +static inline unsigned long aes_align_addr(unsigned long addr)
> +{
> +	return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
> +	       ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
> +}

Wouldn't it be simpler to make this take and return 'void *'?  Then the callers
wouldn't need to cast to/from 'unsigned long'.

Also, aligning to a 1-byte boundary is a no-op.

So, please consider the following:

static inline void *aes_align_addr(void *addr)
{
	if (crypto_tfm_ctx_alignment() >= AES_ALIGN)
		return addr;
	return PTR_ALIGN(addr, AES_ALIGN);
}

Also, aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get() should call this
too, rather than duplicating similar code.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
  2023-05-24 16:57       ` Chang S. Bae
@ 2023-05-26  7:23         ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-26  7:23 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On Wed, May 24, 2023 at 09:57:17AM -0700, Chang S. Bae wrote:
> == API Limitation ==
> 
> The setkey() function transforms an AES key into a handle. But, an
> extended key is a usual outcome of setkey() in other AES cipher
> implementations. For this reason, a setkey() failure does not fall
> back to the other. So, expose AES-KL methods via synchronous
> interfaces only.

I don't understand what this paragraph is trying to say.

> +/*
> + * The below wrappers for the encryption/decryption functions
> + * incorporate the feature availability check:
> + *
> + * In the rare event of hardware failure, the wrapping key can be lost
> + * after wake-up from a deep sleep state. Then, this check helps to
> + * avoid any subsequent misuse with populating a proper error code.
> + */
> +
> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
> +{
> +	if (!valid_keylocker())
> +		return -ENODEV;
> +
> +	return __aeskl_enc(ctx, out, in);
> +}

Is it not sufficient for the valid_keylocker() check to occur at the top level
(in xts_encrypt() and xts_decrypt()), which would seem to be a better place to
do it?  Is this because valid_keylocker() needs to be checked in every
kernel_fpu_begin() section separately, to avoid a race condition?  If that's
indeed the reason, can you explain that in the comment?

> +static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
> +{
> +	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
> +
> +	if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
> +		return -EINVAL;
> +
> +	*keylen = ctx->crypt_ctx.key_length;
> +	return 0;
> +}

This is odd.  Why would the key lengths be different here?

> +	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
> +					     aeskl_simd_skciphers);
> +	if (err)
> +		return err;
> +
> +	return 0;

This can be simplified to:

	return simd_register_skciphers_compat(aeskl_skciphers,
					      ARRAY_SIZE(aeskl_skciphers),
					      aeskl_simd_skciphers);

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
@ 2023-05-26  7:23         ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-05-26  7:23 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On Wed, May 24, 2023 at 09:57:17AM -0700, Chang S. Bae wrote:
> == API Limitation ==
> 
> The setkey() function transforms an AES key into a handle. But, an
> extended key is a usual outcome of setkey() in other AES cipher
> implementations. For this reason, a setkey() failure does not fall
> back to the other. So, expose AES-KL methods via synchronous
> interfaces only.

I don't understand what this paragraph is trying to say.

> +/*
> + * The below wrappers for the encryption/decryption functions
> + * incorporate the feature availability check:
> + *
> + * In the rare event of hardware failure, the wrapping key can be lost
> + * after wake-up from a deep sleep state. Then, this check helps to
> + * avoid any subsequent misuse with populating a proper error code.
> + */
> +
> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
> +{
> +	if (!valid_keylocker())
> +		return -ENODEV;
> +
> +	return __aeskl_enc(ctx, out, in);
> +}

Is it not sufficient for the valid_keylocker() check to occur at the top level
(in xts_encrypt() and xts_decrypt()), which would seem to be a better place to
do it?  Is this because valid_keylocker() needs to be checked in every
kernel_fpu_begin() section separately, to avoid a race condition?  If that's
indeed the reason, can you explain that in the comment?

> +static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
> +{
> +	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
> +
> +	if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
> +		return -EINVAL;
> +
> +	*keylen = ctx->crypt_ctx.key_length;
> +	return 0;
> +}

This is odd.  Why would the key lengths be different here?

> +	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
> +					     aeskl_simd_skciphers);
> +	if (err)
> +		return err;
> +
> +	return 0;

This can be simplified to:

	return simd_register_skciphers_compat(aeskl_skciphers,
					      ARRAY_SIZE(aeskl_skciphers),
					      aeskl_simd_skciphers);

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
  2023-05-26  7:23         ` [dm-devel] " Eric Biggers
@ 2023-05-30 20:49           ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-30 20:49 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On 5/26/2023 12:23 AM, Eric Biggers wrote:
> On Wed, May 24, 2023 at 09:57:17AM -0700, Chang S. Bae wrote:
>> == API Limitation ==
>>
>> The setkey() function transforms an AES key into a handle. But, an
>> extended key is a usual outcome of setkey() in other AES cipher
>> implementations. For this reason, a setkey() failure does not fall
>> back to the other. So, expose AES-KL methods via synchronous
>> interfaces only.
> 
> I don't understand what this paragraph is trying to say.

This text comes with this particular comment as I look back:

 > This basically implies that we cannot expose the cipher interface at
 > all, and so AES-KL can only be used by callers that use the
 > asynchronous interface, which rules out 802.11, s/w kTLS, macsec and
 > kerberos.

https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/

Then, I realize that at that moment the dm-crypt use model was not 
clearly out yet.

This seems to be carried over the versions. But, now, it has XTS only. 
Then, this becomes less relevant which makes confusion I guess.

I think this can go away as claiming the usage clearly now.

> 
>> +/*
>> + * The below wrappers for the encryption/decryption functions
>> + * incorporate the feature availability check:
>> + *
>> + * In the rare event of hardware failure, the wrapping key can be lost
>> + * after wake-up from a deep sleep state. Then, this check helps to
>> + * avoid any subsequent misuse with populating a proper error code.
>> + */
>> +
>> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
>> +{
>> +	if (!valid_keylocker())
>> +		return -ENODEV;
>> +
>> +	return __aeskl_enc(ctx, out, in);
>> +}
> 
> Is it not sufficient for the valid_keylocker() check to occur at the top level
> (in xts_encrypt() and xts_decrypt()), which would seem to be a better place to
> do it?  Is this because valid_keylocker() needs to be checked in every
> kernel_fpu_begin() section separately, to avoid a race condition?  If that's
> indeed the reason, can you explain that in the comment?

Maybe something like this:

/*
  * In the event of hardware failure, the wrapping key can be lost
  * from a sleep state. Then, the feature is not usable anymore. This
  * feature state can be found via valid_keylocker().
  *
  * Such disabling could be anywhere preemptible, outside
  * kernel_fpu_begin()/end(). So, to avoid the race condition, check
  * the feature availability on every use in the below wrappers.
  */

> 
>> +static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
>> +{
>> +	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
>> +
>> +	if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
>> +		return -EINVAL;
>> +
>> +	*keylen = ctx->crypt_ctx.key_length;
>> +	return 0;
>> +}
> 
> This is odd.  Why would the key lengths be different here?

I thought it was logical to do such sanity check. But, in practice, they 
are the same.

Looks like this entire crypto code is treated as performance-critical or so.

> 
>> +	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
>> +					     aeskl_simd_skciphers);
>> +	if (err)
>> +		return err;
>> +
>> +	return 0;
> 
> This can be simplified to:
> 
> 	return simd_register_skciphers_compat(aeskl_skciphers,
> 					      ARRAY_SIZE(aeskl_skciphers),
> 					      aeskl_simd_skciphers);

Oh, obviously!

Thanks,
Chang


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
@ 2023-05-30 20:49           ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-30 20:49 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On 5/26/2023 12:23 AM, Eric Biggers wrote:
> On Wed, May 24, 2023 at 09:57:17AM -0700, Chang S. Bae wrote:
>> == API Limitation ==
>>
>> The setkey() function transforms an AES key into a handle. But, an
>> extended key is a usual outcome of setkey() in other AES cipher
>> implementations. For this reason, a setkey() failure does not fall
>> back to the other. So, expose AES-KL methods via synchronous
>> interfaces only.
> 
> I don't understand what this paragraph is trying to say.

This text comes with this particular comment as I look back:

 > This basically implies that we cannot expose the cipher interface at
 > all, and so AES-KL can only be used by callers that use the
 > asynchronous interface, which rules out 802.11, s/w kTLS, macsec and
 > kerberos.

https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/

Then, I realize that at that moment the dm-crypt use model was not 
clearly out yet.

This seems to be carried over the versions. But, now, it has XTS only. 
Then, this becomes less relevant which makes confusion I guess.

I think this can go away as claiming the usage clearly now.

> 
>> +/*
>> + * The below wrappers for the encryption/decryption functions
>> + * incorporate the feature availability check:
>> + *
>> + * In the rare event of hardware failure, the wrapping key can be lost
>> + * after wake-up from a deep sleep state. Then, this check helps to
>> + * avoid any subsequent misuse with populating a proper error code.
>> + */
>> +
>> +static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
>> +{
>> +	if (!valid_keylocker())
>> +		return -ENODEV;
>> +
>> +	return __aeskl_enc(ctx, out, in);
>> +}
> 
> Is it not sufficient for the valid_keylocker() check to occur at the top level
> (in xts_encrypt() and xts_decrypt()), which would seem to be a better place to
> do it?  Is this because valid_keylocker() needs to be checked in every
> kernel_fpu_begin() section separately, to avoid a race condition?  If that's
> indeed the reason, can you explain that in the comment?

Maybe something like this:

/*
  * In the event of hardware failure, the wrapping key can be lost
  * from a sleep state. Then, the feature is not usable anymore. This
  * feature state can be found via valid_keylocker().
  *
  * Such disabling could be anywhere preemptible, outside
  * kernel_fpu_begin()/end(). So, to avoid the race condition, check
  * the feature availability on every use in the below wrappers.
  */

> 
>> +static inline int xts_keylen(struct skcipher_request *req, u32 *keylen)
>> +{
>> +	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
>> +
>> +	if (ctx->crypt_ctx.key_length != ctx->tweak_ctx.key_length)
>> +		return -EINVAL;
>> +
>> +	*keylen = ctx->crypt_ctx.key_length;
>> +	return 0;
>> +}
> 
> This is odd.  Why would the key lengths be different here?

I thought it was logical to do such sanity check. But, in practice, they 
are the same.

Looks like this entire crypto code is treated as performance-critical or so.

> 
>> +	err = simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
>> +					     aeskl_simd_skciphers);
>> +	if (err)
>> +		return err;
>> +
>> +	return 0;
> 
> This can be simplified to:
> 
> 	return simd_register_skciphers_compat(aeskl_skciphers,
> 					      ARRAY_SIZE(aeskl_skciphers),
> 					      aeskl_simd_skciphers);

Oh, obviously!

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  2023-05-26  6:54         ` [dm-devel] " Eric Biggers
@ 2023-05-30 20:50           ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-30 20:50 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On 5/25/2023 11:54 PM, Eric Biggers wrote:
> On Wed, May 24, 2023 at 09:57:15AM -0700, Chang S. Bae wrote:
>> +static inline unsigned long aes_align_addr(unsigned long addr)
>> +{
>> +	return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
>> +	       ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
>> +}
> 
> Wouldn't it be simpler to make this take and return 'void *'?  Then the callers
> wouldn't need to cast to/from 'unsigned long'.
> 
> Also, aligning to a 1-byte boundary is a no-op.
> 
> So, please consider the following:
> 
> static inline void *aes_align_addr(void *addr)
> {
> 	if (crypto_tfm_ctx_alignment() >= AES_ALIGN)
> 		return addr;
> 	return PTR_ALIGN(addr, AES_ALIGN);
> }
> 
> Also, aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get() should call this
> too, rather than duplicating similar code.

Indeed, your suggestion can improve the overall code there.

Thanks!
Chang


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
@ 2023-05-30 20:50           ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-05-30 20:50 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On 5/25/2023 11:54 PM, Eric Biggers wrote:
> On Wed, May 24, 2023 at 09:57:15AM -0700, Chang S. Bae wrote:
>> +static inline unsigned long aes_align_addr(unsigned long addr)
>> +{
>> +	return (crypto_tfm_ctx_alignment() >= AESNI_ALIGN) ?
>> +	       ALIGN(addr, 1) : ALIGN(addr, AESNI_ALIGN);
>> +}
> 
> Wouldn't it be simpler to make this take and return 'void *'?  Then the callers
> wouldn't need to cast to/from 'unsigned long'.
> 
> Also, aligning to a 1-byte boundary is a no-op.
> 
> So, please consider the following:
> 
> static inline void *aes_align_addr(void *addr)
> {
> 	if (crypto_tfm_ctx_alignment() >= AES_ALIGN)
> 		return addr;
> 	return PTR_ALIGN(addr, AES_ALIGN);
> }
> 
> Also, aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get() should call this
> too, rather than duplicating similar code.

Indeed, your suggestion can improve the overall code there.

Thanks!
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v8 00/12] x86: Support Key Locker
  2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22       ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae

Hi all,

Posting V8 here, a brief status of this enabling:

The last two revisions trend to update the crypto code mainly. The
existing AEX-XTS code was improved further before being shared with
the new code. Then, the new implementation was tuned for the AES-XTS
mode which aligns with the claimed use case -- dm-crypt.

But, I'd say some additional change might be still needed.

The overall changes in charge of the last review:

  * PATCH12:
    - Clarify some documentation (Eric Biggers)
    - Simplify (Eric Biggers) and cleanup code

  * PATCH11:
    - Remove dead code.

  * PATCH10:
    - Deduplicate the alignment code (Eric Biggers)

The series can be found in this repo:
    git://github.com/intel-staging/keylocker.git kl-v8

The overall diff was populated in:
    https://raw.githubusercontent.com/intel-staging/keylocker/diff/kl-v8-vs-v7.diff

The feature is already available on recent Intel client systems. The
V3 cover letter covered the usage, the threat model and other details:
    https://lore.kernel.org/lkml/20211124200700.15888-1-chang.seok.bae@intel.com/

And the V6 cover followed up with updating the performance data:
    https://lore.kernel.org/lkml/20230410225936.8940-1-chang.seok.bae@intel.com/

V7 posting:
    https://lore.kernel.org/lkml/20230524165717.14062-1-chang.seok.bae@intel.com/

Thanks,
Chang

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load a wrapping key at boot-time
  x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  crypto: x86/aes - Prepare for a new AES-XTS implementation
  crypto: x86/aes-kl - Implement the AES-XTS algorithm

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/arch/x86/index.rst              |   1 +
 Documentation/arch/x86/keylocker.rst          |  97 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Kconfig                       |  22 +
 arch/x86/crypto/Makefile                      |   3 +
 arch/x86/crypto/aes-helper_asm.S              |  22 +
 arch/x86/crypto/aes-helper_glue.h             | 161 +++++
 arch/x86/crypto/aeskl-intel_asm.S             | 552 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 188 ++++++
 arch/x86/crypto/aesni-intel_asm.S             |  55 +-
 arch/x86/crypto/aesni-intel_glue.c            | 241 +++-----
 arch/x86/crypto/aesni-intel_glue.h            |  16 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 212 +++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 27 files changed, 1507 insertions(+), 211 deletions(-)
 create mode 100644 Documentation/arch/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 054377e4774eee812b7930933d7a354ed5a7ddd6
-- 
2.17.1


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 00/12] x86: Support Key Locker
@ 2023-06-03 15:22       ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, bp,
	charishma1.gairuboyina, luto, bernie.keany, tglx, nhuck,
	gmazyland, elliott

Hi all,

Posting V8 here, a brief status of this enabling:

The last two revisions trend to update the crypto code mainly. The
existing AEX-XTS code was improved further before being shared with
the new code. Then, the new implementation was tuned for the AES-XTS
mode which aligns with the claimed use case -- dm-crypt.

But, I'd say some additional change might be still needed.

The overall changes in charge of the last review:

  * PATCH12:
    - Clarify some documentation (Eric Biggers)
    - Simplify (Eric Biggers) and cleanup code

  * PATCH11:
    - Remove dead code.

  * PATCH10:
    - Deduplicate the alignment code (Eric Biggers)

The series can be found in this repo:
    git://github.com/intel-staging/keylocker.git kl-v8

The overall diff was populated in:
    https://raw.githubusercontent.com/intel-staging/keylocker/diff/kl-v8-vs-v7.diff

The feature is already available on recent Intel client systems. The
V3 cover letter covered the usage, the threat model and other details:
    https://lore.kernel.org/lkml/20211124200700.15888-1-chang.seok.bae@intel.com/

And the V6 cover followed up with updating the performance data:
    https://lore.kernel.org/lkml/20230410225936.8940-1-chang.seok.bae@intel.com/

V7 posting:
    https://lore.kernel.org/lkml/20230524165717.14062-1-chang.seok.bae@intel.com/

Thanks,
Chang

Chang S. Bae (12):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load a wrapping key at boot-time
  x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
    S3/4
  x86/cpu: Add a configuration and command line option for Key Locker
  crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  crypto: x86/aes - Prepare for a new AES-XTS implementation
  crypto: x86/aes-kl - Implement the AES-XTS algorithm

 .../admin-guide/kernel-parameters.txt         |   2 +
 Documentation/arch/x86/index.rst              |   1 +
 Documentation/arch/x86/keylocker.rst          |  97 +++
 arch/x86/Kconfig                              |   3 +
 arch/x86/crypto/Kconfig                       |  22 +
 arch/x86/crypto/Makefile                      |   3 +
 arch/x86/crypto/aes-helper_asm.S              |  22 +
 arch/x86/crypto/aes-helper_glue.h             | 161 +++++
 arch/x86/crypto/aeskl-intel_asm.S             | 552 ++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c            | 188 ++++++
 arch/x86/crypto/aesni-intel_asm.S             |  55 +-
 arch/x86/crypto/aesni-intel_glue.c            | 241 +++-----
 arch/x86/crypto/aesni-intel_glue.h            |  16 +
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   8 +-
 arch/x86/include/asm/keylocker.h              |  45 ++
 arch/x86/include/asm/msr-index.h              |   6 +
 arch/x86/include/asm/special_insns.h          |  32 +
 arch/x86/include/uapi/asm/processor-flags.h   |   2 +
 arch/x86/kernel/Makefile                      |   1 +
 arch/x86/kernel/cpu/common.c                  |  21 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/keylocker.c                   | 212 +++++++
 arch/x86/kernel/smpboot.c                     |   2 +
 arch/x86/lib/x86-opcode-map.txt               |  11 +-
 arch/x86/power/cpu.c                          |   2 +
 tools/arch/x86/lib/x86-opcode-map.txt         |  11 +-
 27 files changed, 1507 insertions(+), 211 deletions(-)
 create mode 100644 Documentation/arch/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 054377e4774eee812b7930933d7a354ed5a7ddd6
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v8 01/12] Documentation/x86: Document Key Locker
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Jonathan Corbet,
	linux-doc

Document the overview of the feature along with relevant consideration
when provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
  documentation into Documentation/arch/"). (Nathan Huckleberry)
* Remove a duplicated sentence -- 'But there is no AES-KL instruction
  to process a 192-bit key.'
* Update the text for clarity and readability:
  - Clarify the error code and exemplify the backup failure
  - Use 'wrapping key' instead of less readable 'IWKey'

Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
---
 Documentation/arch/x86/index.rst     |  1 +
 Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)
 create mode 100644 Documentation/arch/x86/keylocker.rst

diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
index c73d133fd37c..256359c24669 100644
--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@@ -42,3 +42,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
new file mode 100644
index 000000000000..5557b8d0659a
--- /dev/null
+++ b/Documentation/arch/x86/keylocker.rst
@@ -0,0 +1,97 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+CPU-internal Wrapping Key
+=========================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+Wrapping Key Restore Failure
+----------------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of a wrapping key restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by a wrapping key restore failure is a data-volume then it is
+possible that I/O errors on that volume do not bring down the rest of the
+system. However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new wrapping key at initial boot, not any time after like
+resume from suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private wrapping key state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
+The backup / restore facility is also not performant enough to be suitable
+for guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return -EINVAL.
+  Note that this flag is also set if the wrapping key is changed, e.g.,
+  because of the backup error.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. The AES-KL cipher
+  implementation logs a warning message with a 192-bit key and then falls
+  back to AES-NI. So, this 192-bit key-size limitation is only documented,
+  not enforced. It means the key will remain in clear-text in memory. This
+  is to meet Linux crypto-cipher expectation that each implementation must
+  support all the AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 01/12] Documentation/x86: Document Key Locker
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, bernie.keany, tglx,
	nhuck, gmazyland, elliott

Document the overview of the feature along with relevant consideration
when provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: x86@kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
  documentation into Documentation/arch/"). (Nathan Huckleberry)
* Remove a duplicated sentence -- 'But there is no AES-KL instruction
  to process a 192-bit key.'
* Update the text for clarity and readability:
  - Clarify the error code and exemplify the backup failure
  - Use 'wrapping key' instead of less readable 'IWKey'

Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
---
 Documentation/arch/x86/index.rst     |  1 +
 Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)
 create mode 100644 Documentation/arch/x86/keylocker.rst

diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
index c73d133fd37c..256359c24669 100644
--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@@ -42,3 +42,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
new file mode 100644
index 000000000000..5557b8d0659a
--- /dev/null
+++ b/Documentation/arch/x86/keylocker.rst
@@ -0,0 +1,97 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+CPU-internal Wrapping Key
+=========================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+Wrapping Key Restore Failure
+----------------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+The event of a wrapping key restore failure upon resume from suspend, all
+established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend, reboot. If the volume
+impacted by a wrapping key restore failure is a data-volume then it is
+possible that I/O errors on that volume do not bring down the rest of the
+system. However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new wrapping key at initial boot, not any time after like
+resume from suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private wrapping key state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
+The backup / restore facility is also not performant enough to be suitable
+for guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return -EINVAL.
+  Note that this flag is also set if the wrapping key is changed, e.g.,
+  because of the backup error.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. The AES-KL cipher
+  implementation logs a warning message with a 192-bit key and then falls
+  back to AES-NI. So, this 192-bit key-size limitation is only documented,
+  not enforced. It means the key will remain in clear-text in memory. This
+  is to meet Linux crypto-cipher expectation that each implementation must
+  support all the AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 02/12] x86/cpufeature: Enumerate Key Locker feature
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used
to transform a user key into a key handle. On rarely unexpected
hardware failure, the key could be lost.

Here enumerate this hardware capability. It will not be shown up in
/proc/cpuinfo as userspace usage is not supported. This is because
there is no ABI to coordinate the wrapping-key failure.

The feature supports Advanced Encryption Standard (AES) cipher
algorithm with new SIMD instruction set like its predecessor (AES-NI).
Mark the feature depending on XMM2. The new AES implementation will be
in the crypto library.

Add X86_FEATURE_KEYLOCKER to the disabled features mask at the moment.
It will be enabled under a new config option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- re-organize the change descriptions

Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cb8ca46213be..4a12673df7e9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -389,6 +389,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..eb841d694ed9 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -126,7 +132,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..262348aeaad1 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..200c5e69f78c 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 02/12] x86/cpufeature: Enumerate Key Locker feature
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used
to transform a user key into a key handle. On rarely unexpected
hardware failure, the key could be lost.

Here enumerate this hardware capability. It will not be shown up in
/proc/cpuinfo as userspace usage is not supported. This is because
there is no ABI to coordinate the wrapping-key failure.

The feature supports Advanced Encryption Standard (AES) cipher
algorithm with new SIMD instruction set like its predecessor (AES-NI).
Mark the feature depending on XMM2. The new AES implementation will be
in the crypto library.

Add X86_FEATURE_KEYLOCKER to the disabled features mask at the moment.
It will be enabled under a new config option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- re-organize the change descriptions

Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cb8ca46213be..4a12673df7e9 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -389,6 +389,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index fafe9be7a6f4..eb841d694ed9 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -126,7 +132,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK19	0
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index d898432947ff..262348aeaad1 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index f6748c8bd647..200c5e69f78c 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -81,6 +81,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_XFD,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_XFD,			X86_FEATURE_XGETBV1   },
 	{ X86_FEATURE_AMX_TILE,			X86_FEATURE_XFD       },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 03/12] x86/insn: Add Key Locker instructions to the opcode map
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
	Load a CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- add the reason a bit.

Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 03/12] x86/insn: Add Key Locker instructions to the opcode map
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, bernie.keany, tglx,
	nhuck, gmazyland, elliott

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
	Load a CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- add the reason a bit.

Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 5168ee0360b2..87e9696c47d2 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Rafael J. Wysocki,
	Peter Zijlstra

Key Locker introduces a CPU-internal wrapping key to encode a user key
to a key handle. Then a key handle is referenced instead of the plain
text key.

LOADIWKEY loads a wrapping key in the software-inaccessible CPU state.
It operates only in kernel mode.

The kernel will use this to load a new key at boot time. Establish a
wrapper to prepare for this use. Also, define struct iwkey to pass the
key value to it.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- clarify the reason and the changes a bit.

Changes from v5:
* Fix a typo: kernel_cpu_begin() -> kernel_fpu_begin()

Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
  Williams)

Note, Dan wondered if:
  WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
 arch/x86/include/asm/keylocker.h     | 25 ++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..9b3bec452b31
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary wrapping key storage.
+ * @integrity_key:	A 128-bit key to check that key handles have not
+ *			been tampered with.
+ * @encryption_key:	A 256-bit encryption key used in
+ *			wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index de48d1389936..dd2d8b40fce3 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <asm/processor-flags.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -283,6 +284,37 @@ static __always_inline void tile_release(void)
 	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key:	A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_fpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * LOADIWKEY %xmm1,%xmm2
+	 *
+	 * EAX and XMM0 are implicit operands. Load a key value
+	 * from XMM0-2 to a software-invisible CPU state. With zero
+	 * in EAX, CPU does not do hardware randomization and the key
+	 * backup is allowed.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, Rafael J. Wysocki, gmazyland, elliott

Key Locker introduces a CPU-internal wrapping key to encode a user key
to a key handle. Then a key handle is referenced instead of the plain
text key.

LOADIWKEY loads a wrapping key in the software-inaccessible CPU state.
It operates only in kernel mode.

The kernel will use this to load a new key at boot time. Establish a
wrapper to prepare for this use. Also, define struct iwkey to pass the
key value to it.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Massage the changelog -- clarify the reason and the changes a bit.

Changes from v5:
* Fix a typo: kernel_cpu_begin() -> kernel_fpu_begin()

Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
  Williams)

Note, Dan wondered if:
  WARN_ON(!irq_fpu_usable());
would be appropriate in the load_xmm_iwkey() function.
---
 arch/x86/include/asm/keylocker.h     | 25 ++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 32 ++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..9b3bec452b31
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary wrapping key storage.
+ * @integrity_key:	A 128-bit key to check that key handles have not
+ *			been tampered with.
+ * @encryption_key:	A 256-bit encryption key used in
+ *			wrapping/unwrapping a clear text key.
+ *
+ * This storage should be flushed immediately after loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index de48d1389936..dd2d8b40fce3 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <asm/processor-flags.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -283,6 +284,37 @@ static __always_inline void tile_release(void)
 	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key
+ * @key:	A struct iwkey pointer.
+ *
+ * Load @key to XMMs then do LOADIWKEY. After this, flush XMM
+ * registers. Caller is responsible for kernel_fpu_begin().
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * LOADIWKEY %xmm1,%xmm2
+	 *
+	 * EAX and XMM0 are implicit operands. Load a key value
+	 * from XMM0-2 to a software-invisible CPU state. With zero
+	 * in EAX, CPU does not do hardware randomization and the key
+	 * backup is allowed.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 05/12] x86/msr-index: Add MSRs for Key Locker wrapping key
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

The CPU state that contains the wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the
cache (like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and
restore the wrapping key, and to check the key status. The wrapping
key is saved in a platform-scoped state of non-volatile media. The
backup itself and its path from the CPU are encrypted and integrity
protected.

Define those MSRs to be used to save and restore the key for S3/4
sleep states.

But the backup storage's non-volatility is not architecturally
guaranteed across off-states, such as S5 and G3. Then, the kernel may
generate a new key on the next boot.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Tweak the changelog -- put the last for those about other sleep
  states

Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..cd8555c0f3c2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1117,4 +1117,10 @@
 						* a #GP
 						*/
 
+/* MSRs for managing a CPU-internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 05/12] x86/msr-index: Add MSRs for Key Locker wrapping key
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

The CPU state that contains the wrapping key is in the same power
domain as the cache. So any sleep state that would invalidate the
cache (like S3) also invalidates the state of the wrapping key.

But, since the state is inaccessible to software, it needs a special
mechanism to save and restore the key during deep sleep.

A set of new MSRs are provided as an abstract interface to save and
restore the wrapping key, and to check the key status. The wrapping
key is saved in a platform-scoped state of non-volatile media. The
backup itself and its path from the CPU are encrypted and integrity
protected.

Define those MSRs to be used to save and restore the key for S3/4
sleep states.

But the backup storage's non-volatility is not architecturally
guaranteed across off-states, such as S5 and G3. Then, the kernel may
generate a new key on the next boot.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Tweak the changelog -- put the last for those about other sleep
  states

Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..cd8555c0f3c2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1117,4 +1117,10 @@
 						* a #GP
 						*/
 
+/* MSRs for managing a CPU-internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 06/12] x86/keylocker: Define Key Locker CPUID leaf
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin

Both Key Locker enabling code in the x86 core and AES Key Locker code
in the crypto library will need to reference some CPUID bits. Define
these feature-specific CPUID leaf and bits.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Tweak the changelog -- comment the reason first and then brief the
  change.

Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9b3bec452b31..1717e0866254 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 06/12] x86/keylocker: Define Key Locker CPUID leaf
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, bernie.keany, tglx,
	nhuck, gmazyland, elliott

Both Key Locker enabling code in the x86 core and AES Key Locker code
in the crypto library will need to reference some CPUID bits. Define
these feature-specific CPUID leaf and bits.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Tweak the changelog -- comment the reason first and then brief the
  change.

Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9b3bec452b31..1717e0866254 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 07/12] x86/cpu/keylocker: Load a wrapping key at boot-time
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled when the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time. Ensure
the dynamic CPU bit (CPUID.AESKLE) is set before the load and flush
out these bytes after that.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt consumption, and that switching wrapping key per VM is
untenable, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Switch to use 'static inline' for the empty functions, instead of
  macro that disallows type checks. (Eric Biggers and Dave Hansen)
* Use memzero_explicit() to wipe out the key data instead of writing
  the poison value over there. (Robert Elliott)
* Massage the changelog for the better readability.

Changes from v5:
* Call out the disabling when the feature is available on a virtual
  machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
 arch/x86/include/asm/keylocker.h |  9 ++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/common.c     |  5 +-
 arch/x86/kernel/keylocker.c      | 82 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c        |  2 +
 5 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1717e0866254..9040e10ab648 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <linux/bits.h>
 #include <asm/fpu/types.h>
 
@@ -28,5 +29,13 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
+static inline void destroy_keylocker_data(void) { }
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4070a01c11b7..9e2647320dc6 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -134,6 +134,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f284e185aea..5882ff6e3c6b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
 #include <asm/traps.h>
@@ -1834,10 +1836,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..038e5268e6b8
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+	struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
+	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+	kernel_fpu_begin();
+	load_xmm_iwkey(&kl_setup.key);
+	kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c:		A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto out;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto disable;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (c == &boot_cpu_data) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+		/*
+		 * Check the feature readiness via CPUID. Note that the
+		 * CPUID AESKLE bit is conditionally set only when CR4.KL
+		 * is set.
+		 */
+		if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+		    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+			pr_debug("x86/keylocker: Not fully supported.\n");
+			goto disable;
+		}
+
+		generate_keylocker_data();
+	}
+
+	load_keylocker();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return;
+
+disable:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+out:
+	/* Make sure the feature disabled for kexec-reboot. */
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ea4f4c1c74cd..13ebdb5fb6c4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -86,6 +86,7 @@
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
 #include <asm/sev.h>
+#include <asm/keylocker.h>
 
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1371,6 +1372,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	nmi_selftest();
 	impress_friends();
 	cache_aps_init();
+	destroy_keylocker_data();
 }
 
 static int __initdata setup_possible_cpus = -1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 07/12] x86/cpu/keylocker: Load a wrapping key at boot-time
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, ardb, chang.seok.bae, dave.hansen, dan.j.williams,
	mingo, ebiggers, lalithambika.krishnakumar, Peter Zijlstra,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled when the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time. Ensure
the dynamic CPU bit (CPUID.AESKLE) is set before the load and flush
out these bytes after that.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt consumption, and that switching wrapping key per VM is
untenable, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Switch to use 'static inline' for the empty functions, instead of
  macro that disallows type checks. (Eric Biggers and Dave Hansen)
* Use memzero_explicit() to wipe out the key data instead of writing
  the poison value over there. (Robert Elliott)
* Massage the changelog for the better readability.

Changes from v5:
* Call out the disabling when the feature is available on a virtual
  machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.

Note, Dan wonders that given that the only proposed Linux use case for
Key Locker is dm-crypt, the feature could be lazily enabled when the
first dm-crypt user arrives, but as Dave notes there is no precedent
for late enabling of CPU features and it adds maintenance burden
without demonstrative benefit outside of minimizing the visibility of
Key Locker to userspace.
---
 arch/x86/include/asm/keylocker.h |  9 ++++
 arch/x86/kernel/Makefile         |  1 +
 arch/x86/kernel/cpu/common.c     |  5 +-
 arch/x86/kernel/keylocker.c      | 82 ++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c        |  2 +
 5 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1717e0866254..9040e10ab648 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/processor.h>
 #include <linux/bits.h>
 #include <asm/fpu/types.h>
 
@@ -28,5 +29,13 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(struct cpuinfo_x86 *c);
+void destroy_keylocker_data(void);
+#else
+static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
+static inline void destroy_keylocker_data(void) { }
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4070a01c11b7..9e2647320dc6 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -134,6 +134,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f284e185aea..5882ff6e3c6b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -58,6 +58,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
+#include <asm/keylocker.h>
+
 #include <asm/uv/uv.h>
 #include <asm/sigframe.h>
 #include <asm/traps.h>
@@ -1834,10 +1836,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker(c);
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..038e5268e6b8
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+#include <linux/poison.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/tlbflush.h>
+
+static __initdata struct keylocker_setup_data {
+	struct iwkey key;
+} kl_setup;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
+	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
+}
+
+void __init destroy_keylocker_data(void)
+{
+	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+}
+
+static void __init load_keylocker(void)
+{
+	kernel_fpu_begin();
+	load_xmm_iwkey(&kl_setup.key);
+	kernel_fpu_end();
+}
+
+/**
+ * setup_keylocker - Enable the feature.
+ * @c:		A pointer to struct cpuinfo_x86
+ */
+void __ref setup_keylocker(struct cpuinfo_x86 *c)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto out;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto disable;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (c == &boot_cpu_data) {
+		u32 eax, ebx, ecx, edx;
+
+		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+		/*
+		 * Check the feature readiness via CPUID. Note that the
+		 * CPUID AESKLE bit is conditionally set only when CR4.KL
+		 * is set.
+		 */
+		if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+		    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+			pr_debug("x86/keylocker: Not fully supported.\n");
+			goto disable;
+		}
+
+		generate_keylocker_data();
+	}
+
+	load_keylocker();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return;
+
+disable:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+out:
+	/* Make sure the feature disabled for kexec-reboot. */
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ea4f4c1c74cd..13ebdb5fb6c4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -86,6 +86,7 @@
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
 #include <asm/sev.h>
+#include <asm/keylocker.h>
 
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1371,6 +1372,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 	nmi_selftest();
 	impress_friends();
 	cache_aps_init();
+	destroy_keylocker_data();
 }
 
 static int __initdata setup_possible_cpus = -1;
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 08/12] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Ingo Molnar, H. Peter Anvin, Rafael J. Wysocki,
	linux-pm

The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

So it is motivated to meet dm-crypt's expectation that the key handles
in the suspend-image remain valid after resume from an S-state. But,
when the system enters the ACPI S3 or S4 sleep states, the wrapping
key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. So, request a backup right after the key is
loaded at boot time, and copy it back to each CPU upon wakeup. Then,
the entirety of Key Locker has to be disabled if the backup mechanism
is not available unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This has the effect of invalidating
any key handles that might be present in a suspend-image. When this
happens, dm-crypt will see I/O errors resulting from error returns
from crypto_skcipher_encrypt()/decrypt().

While this will disrupt operations in the current boot, data is not at
risk and access is restored with new handles created by the new
wrapping key at the next boot. Also, manage a feature-specific flag to
communicate with the crypto implementation. This ensures to stop using
the AES instructions upon the key restore failure while not turning
off the feature.

== Off-states ==

While the backup may be maintained in non-volatile media across S5 and
G3 "off" states, it is neither architecturally guaranteed nor it is
expected by dm-crypt as prompting for the key whenever the volume is
started. Then, a reboot can cover this case with a new wrapping key.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v6:
* Limit the symbol export only when needed.
* Improve the coding style -- reduce an indent after
  'if() { ... return; }'. (Eric Biggers)
* Fix the coding style -- reduce an indent after if() {...return;}.
  (Eric Biggers) Tweak the comment along with that.
* Improve the function prototype, instead of using a macro. (Eric
  Biggers and Dave Hansen)
* Update the documentation:
  - Massage the changelog to clarify the problem-and-solution by
    sections
  - Clarify the comment about the key restore failure.

Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
  (Reported by Marvin Hsu marvin.hsu@intel.com) Add the function
  comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
  fall through the end that disables the feature. Otherwise, all the
  successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
  Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 136 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9040e10ab648..8621234c5897 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
 static inline void destroy_keylocker_data(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 038e5268e6b8..8f3acc34b6da 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,20 +11,48 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with wrapping key load. When the key restore
+ * fails, it is reset. This restore state is exported to the crypto
+ * library, then Key Locker will not be used there. So, the feature is
+ * soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(valid_keylocker);
+#endif
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
 	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
 }
 
+/*
+ * This is invoked when the boot-up is finished, which means wrapping
+ * key is loaded. Then, the 'valid_kl' flag is set here as the feature
+ * is enabled.
+ */
 void __init destroy_keylocker_data(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return;
+
 	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +62,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -52,6 +101,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -65,13 +115,53 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The wrapping key in CPU state is volatile in S3/4
+		 * states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
+		load_keylocker();
+
+		/* Backup a wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+
+		pr_info("x86/keylocker: Enabled.\n");
+		return;
 	}
 
-	load_keylocker();
+	/*
+	 * At boot time, the key is loaded directly from the memory.
+	 * Otherwise, this path performs the key recovery on each CPU
+	 * wake-up. Then, the backup in the platform-scoped state is
+	 * copied to the CPU state.
+	 */
+	if (!kl_setup.initialized) {
+		load_keylocker();
+		return;
+	} else if (valid_kl) {
+		int rc;
 
-	pr_info_once("x86/keylocker: Enabled.\n");
-	return;
+		rc = copy_keylocker();
+		if (!rc)
+			return;
+
+		/*
+		 * The key copy failed here. The subsequent feature
+		 * use will have inconsistent keys and failures. So,
+		 * invalidate the feature via the flag like with the
+		 * backup failure.
+		 */
+		valid_kl = false;
+		pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+	}
 
 disable:
 	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
@@ -80,3 +170,43 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the
+ * setup function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that
+	 * indicates an invalid backup if bit 0 is set and a read (or
+	 * write) error if bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature
+	 * via the flag to avoid any subsequent use. But keep the
+	 * feature with zeroed wrapping key instead of disabling it.
+	 */
+	pr_err("x86/keylocker: Failed to restore wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 08/12] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, Rafael J. Wysocki, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, linux-pm,
	bernie.keany, tglx, nhuck, gmazyland, elliott

The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

So it is motivated to meet dm-crypt's expectation that the key handles
in the suspend-image remain valid after resume from an S-state. But,
when the system enters the ACPI S3 or S4 sleep states, the wrapping
key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. So, request a backup right after the key is
loaded at boot time, and copy it back to each CPU upon wakeup. Then,
the entirety of Key Locker has to be disabled if the backup mechanism
is not available unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This has the effect of invalidating
any key handles that might be present in a suspend-image. When this
happens, dm-crypt will see I/O errors resulting from error returns
from crypto_skcipher_encrypt()/decrypt().

While this will disrupt operations in the current boot, data is not at
risk and access is restored with new handles created by the new
wrapping key at the next boot. Also, manage a feature-specific flag to
communicate with the crypto implementation. This ensures to stop using
the AES instructions upon the key restore failure while not turning
off the feature.

== Off-states ==

While the backup may be maintained in non-volatile media across S5 and
G3 "off" states, it is neither architecturally guaranteed nor it is
expected by dm-crypt as prompting for the key whenever the volume is
started. Then, a reboot can cover this case with a new wrapping key.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
Changes from v6:
* Limit the symbol export only when needed.
* Improve the coding style -- reduce an indent after
  'if() { ... return; }'. (Eric Biggers)
* Fix the coding style -- reduce an indent after if() {...return;}.
  (Eric Biggers) Tweak the comment along with that.
* Improve the function prototype, instead of using a macro. (Eric
  Biggers and Dave Hansen)
* Update the documentation:
  - Massage the changelog to clarify the problem-and-solution by
    sections
  - Clarify the comment about the key restore failure.

Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
  (Reported by Marvin Hsu marvin.hsu@intel.com) Add the function
  comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
  fall through the end that disables the feature. Otherwise, all the
  successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
  Wysocki)
* Rebased on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h |   4 +
 arch/x86/kernel/keylocker.c      | 136 ++++++++++++++++++++++++++++++-
 arch/x86/power/cpu.c             |   2 +
 3 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 9040e10ab648..8621234c5897 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -32,9 +32,13 @@ struct iwkey {
 #ifdef CONFIG_X86_KEYLOCKER
 void setup_keylocker(struct cpuinfo_x86 *c);
 void destroy_keylocker_data(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
 #else
 static inline void setup_keylocker(struct cpuinfo_x86 *c) { }
 static inline void destroy_keylocker_data(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
 #endif
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 038e5268e6b8..8f3acc34b6da 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -11,20 +11,48 @@
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/tlbflush.h>
+#include <asm/msr.h>
 
 static __initdata struct keylocker_setup_data {
+	bool initialized;
 	struct iwkey key;
 } kl_setup;
 
+/*
+ * This flag is set with wrapping key load. When the key restore
+ * fails, it is reset. This restore state is exported to the crypto
+ * library, then Key Locker will not be used there. So, the feature is
+ * soft-disabled with this flag.
+ */
+static bool valid_kl;
+
+bool valid_keylocker(void)
+{
+	return valid_kl;
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(valid_keylocker);
+#endif
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&kl_setup.key.integrity_key,  sizeof(kl_setup.key.integrity_key));
 	get_random_bytes(&kl_setup.key.encryption_key, sizeof(kl_setup.key.encryption_key));
 }
 
+/*
+ * This is invoked when the boot-up is finished, which means wrapping
+ * key is loaded. Then, the 'valid_kl' flag is set here as the feature
+ * is enabled.
+ */
 void __init destroy_keylocker_data(void)
 {
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return;
+
 	memzero_explicit(&kl_setup.key, sizeof(kl_setup.key));
+	kl_setup.initialized = true;
+	valid_kl = true;
 }
 
 static void __init load_keylocker(void)
@@ -34,6 +62,27 @@ static void __init load_keylocker(void)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Request hardware to copy the key in non-volatile storage to the CPU
+ * state.
+ *
+ * Returns:	-EBUSY if the copy fails, 0 if successful.
+ */
+static int copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	if (status & BIT(0))
+		return 0;
+	else
+		return -EBUSY;
+}
+
 /**
  * setup_keylocker - Enable the feature.
  * @c:		A pointer to struct cpuinfo_x86
@@ -52,6 +101,7 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 
 	if (c == &boot_cpu_data) {
 		u32 eax, ebx, ecx, edx;
+		bool backup_available;
 
 		cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
 		/*
@@ -65,13 +115,53 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 			goto disable;
 		}
 
+		backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+		/*
+		 * The wrapping key in CPU state is volatile in S3/4
+		 * states. So ensure the backup capability along with
+		 * S-states.
+		 */
+		if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+			pr_debug("x86/keylocker: No key backup support with possible S3/4.\n");
+			goto disable;
+		}
+
 		generate_keylocker_data();
+		load_keylocker();
+
+		/* Backup a wrapping key in non-volatile media. */
+		if (backup_available)
+			wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
+
+		pr_info("x86/keylocker: Enabled.\n");
+		return;
 	}
 
-	load_keylocker();
+	/*
+	 * At boot time, the key is loaded directly from the memory.
+	 * Otherwise, this path performs the key recovery on each CPU
+	 * wake-up. Then, the backup in the platform-scoped state is
+	 * copied to the CPU state.
+	 */
+	if (!kl_setup.initialized) {
+		load_keylocker();
+		return;
+	} else if (valid_kl) {
+		int rc;
 
-	pr_info_once("x86/keylocker: Enabled.\n");
-	return;
+		rc = copy_keylocker();
+		if (!rc)
+			return;
+
+		/*
+		 * The key copy failed here. The subsequent feature
+		 * use will have inconsistent keys and failures. So,
+		 * invalidate the feature via the flag like with the
+		 * backup failure.
+		 */
+		valid_kl = false;
+		pr_err_once("x86/keylocker: Invalid copy status (rc: %d).\n", rc);
+	}
 
 disable:
 	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
@@ -80,3 +170,43 @@ void __ref setup_keylocker(struct cpuinfo_x86 *c)
 	/* Make sure the feature disabled for kexec-reboot. */
 	cr4_clear_bits(X86_CR4_KEYLOCKER);
 }
+
+/**
+ * restore_keylocker - Restore the wrapping key.
+ *
+ * The boot CPU executes this while other CPUs restore it through the
+ * setup function.
+ */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+	int rc;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER) || !valid_kl)
+		return;
+
+	/*
+	 * The IA32_IWKEYBACKUP_STATUS MSR contains a bitmap that
+	 * indicates an invalid backup if bit 0 is set and a read (or
+	 * write) error if bit 2 is set.
+	 */
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		rc = copy_keylocker();
+		if (rc)
+			pr_err("x86/keylocker: Invalid copy state (rc: %d).\n", rc);
+		else
+			return;
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Now the backup key is not available. Invalidate the feature
+	 * via the flag to avoid any subsequent use. But keep the
+	 * feature with zeroed wrapping key instead of disabling it.
+	 */
+	pr_err("x86/keylocker: Failed to restore wrapping key.\n");
+	valid_kl = false;
+}
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, Jonathan Corbet, Ingo Molnar, H. Peter Anvin,
	linux-doc

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at
boot. The option is selected by the Key Locker cipher module
CRYPTO_AES_KL (to be added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Rebase on the upstream: commit a894a8a56b57 ("Documentation:
  kernel-parameters: sort all "no..." parameters")

Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/Kconfig                                |  3 +++
 arch/x86/kernel/cpu/common.c                    | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c1247ec4589a..b42fc53cbcf9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3749,6 +3749,8 @@
 			kernel and module base offset ASLR (Address Space
 			Layout Randomization).
 
+	nokeylocker     [X86] Disable Key Locker hardware feature.
+
 	no-kvmapf	[X86,KVM] Disable paravirtualized asynchronous page
 			fault handling.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a98c5f82be48..f9788b477db1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1879,6 +1879,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5882ff6e3c6b..718ff1b1d6dd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -402,6 +402,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 	X86_CR4_FSGSBASE | X86_CR4_CET;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+	/* Expect an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info("x86/keylocker: Disabled by kernel command line.\n");
+	return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, chang.seok.bae,
	dave.hansen, dan.j.williams, mingo, ebiggers,
	lalithambika.krishnakumar, Ingo Molnar, bp,
	charishma1.gairuboyina, luto, H. Peter Anvin, bernie.keany, tglx,
	nhuck, gmazyland, elliott

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at
boot. The option is selected by the Key Locker cipher module
CRYPTO_AES_KL (to be added in a later patch).

Add a new command line option "nokeylocker" to optionally override the
default CONFIG_X86_KEYLOCKER=y behavior.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v6:
* Rebase on the upstream: commit a894a8a56b57 ("Documentation:
  kernel-parameters: sort all "no..." parameters")

Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 Documentation/admin-guide/kernel-parameters.txt |  2 ++
 arch/x86/Kconfig                                |  3 +++
 arch/x86/kernel/cpu/common.c                    | 16 ++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c1247ec4589a..b42fc53cbcf9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3749,6 +3749,8 @@
 			kernel and module base offset ASLR (Address Space
 			Layout Randomization).
 
+	nokeylocker     [X86] Disable Key Locker hardware feature.
+
 	no-kvmapf	[X86,KVM] Disable paravirtualized asynchronous page
 			fault handling.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a98c5f82be48..f9788b477db1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1879,6 +1879,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5882ff6e3c6b..718ff1b1d6dd 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -402,6 +402,22 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
 static const unsigned long cr4_pinned_mask =
 	X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP |
 	X86_CR4_FSGSBASE | X86_CR4_CET;
+
+static __init int x86_nokeylocker_setup(char *arg)
+{
+	/* Expect an exact match without trailing characters. */
+	if (strlen(arg))
+		return 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		return 1;
+
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info("x86/keylocker: Disabled by kernel command line.\n");
+	return 1;
+}
+__setup("nokeylocker", x86_nokeylocker_setup);
+
 static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
 static unsigned long cr4_pinned_bits __ro_after_init;
 
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, David S. Miller, Ingo Molnar, H. Peter Anvin

Every field in struct aesni_xts_ctx is a pointer to a byte array. Each
array has a size of struct crypto_aes_ctx. Then, the field can be
redefined as that struct type instead of the obscure pointer.

Subsequently, the address to struct aesni_xts_ctx should be aligned
right away rather than on every access to the field.

Thus, redefine struct aesni_xts_ctx, and align its address on the
front. This draws a rework by refactoring the common alignment code.

Then, the refactored code itself appears to be useful to simplify
the overall runtime alignment. So, use it for other modes.

Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v7:
* Massage the helper function to be usable for other alignment code
  such as aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get().
  (Eric Biggers)

Changes from v6:
* Add as a new patch. (Eric Biggers)

This fix was considered to be better addressed before the preparatory
AES-NI code rework.
---
 arch/x86/crypto/aesni-intel_glue.c | 51 +++++++++++++++---------------
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..589648142c17 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -61,8 +61,8 @@ struct generic_gcmaes_ctx {
 };
 
 struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,6 +80,13 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
+static inline void *aes_align_addr(void *addr)
+{
+	if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
+		return addr;
+	return PTR_ALIGN(addr, AESNI_ALIGN);
+}
+
 asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
 			     unsigned int key_len);
 asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
@@ -201,32 +208,24 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return PTR_ALIGN(crypto_aead_ctx(tfm), align);
+	return (struct aesni_rfc4106_gcm_ctx *)aes_align_addr(crypto_aead_ctx(tfm));
 }
 
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return PTR_ALIGN(crypto_aead_ctx(tfm), align);
+	return (struct generic_gcmaes_ctx *)aes_align_addr(crypto_aead_ctx(tfm));
 }
 #endif
 
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 {
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
+	return (struct crypto_aes_ctx *)aes_align_addr(raw_ctx);
+}
 
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
 }
 
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
@@ -883,7 +882,7 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int err;
 
 	err = xts_verify_key(tfm, key, keylen);
@@ -893,20 +892,20 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 	keylen /= 2;
 
 	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
+	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
 				 key, keylen);
 	if (err)
 		return err;
 
 	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
+	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
 				  key + keylen, keylen);
 }
 
 static int xts_crypt(struct skcipher_request *req, bool encrypt)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int tail = req->cryptlen % AES_BLOCK_SIZE;
 	struct skcipher_request subreq;
 	struct skcipher_walk walk;
@@ -942,7 +941,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 	kernel_fpu_begin();
 
 	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
 
 	while (walk.nbytes > 0) {
 		int nbytes = walk.nbytes;
@@ -951,11 +950,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 			nbytes &= ~(AES_BLOCK_SIZE - 1);
 
 		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_encrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  nbytes, walk.iv);
 		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_decrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  nbytes, walk.iv);
 		kernel_fpu_end();
@@ -983,11 +982,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 
 		kernel_fpu_begin();
 		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_encrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  walk.nbytes, walk.iv);
 		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_decrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  walk.nbytes, walk.iv);
 		kernel_fpu_end();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

Every field in struct aesni_xts_ctx is a pointer to a byte array. Each
array has a size of struct crypto_aes_ctx. Then, the field can be
redefined as that struct type instead of the obscure pointer.

Subsequently, the address to struct aesni_xts_ctx should be aligned
right away rather than on every access to the field.

Thus, redefine struct aesni_xts_ctx, and align its address on the
front. This draws a rework by refactoring the common alignment code.

Then, the refactored code itself appears to be useful to simplify
the overall runtime alignment. So, use it for other modes.

Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v7:
* Massage the helper function to be usable for other alignment code
  such as aesni_rfc4106_gcm_ctx_get() and generic_gcmaes_ctx_get().
  (Eric Biggers)

Changes from v6:
* Add as a new patch. (Eric Biggers)

This fix was considered to be better addressed before the preparatory
AES-NI code rework.
---
 arch/x86/crypto/aesni-intel_glue.c | 51 +++++++++++++++---------------
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..589648142c17 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -61,8 +61,8 @@ struct generic_gcmaes_ctx {
 };
 
 struct aesni_xts_ctx {
-	u8 raw_tweak_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
-	u8 raw_crypt_ctx[sizeof(struct crypto_aes_ctx)] AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,6 +80,13 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
+static inline void *aes_align_addr(void *addr)
+{
+	if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
+		return addr;
+	return PTR_ALIGN(addr, AESNI_ALIGN);
+}
+
 asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
 			     unsigned int key_len);
 asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
@@ -201,32 +208,24 @@ static __ro_after_init DEFINE_STATIC_KEY_FALSE(gcm_use_avx2);
 static inline struct
 aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return PTR_ALIGN(crypto_aead_ctx(tfm), align);
+	return (struct aesni_rfc4106_gcm_ctx *)aes_align_addr(crypto_aead_ctx(tfm));
 }
 
 static inline struct
 generic_gcmaes_ctx *generic_gcmaes_ctx_get(struct crypto_aead *tfm)
 {
-	unsigned long align = AESNI_ALIGN;
-
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return PTR_ALIGN(crypto_aead_ctx(tfm), align);
+	return (struct generic_gcmaes_ctx *)aes_align_addr(crypto_aead_ctx(tfm));
 }
 #endif
 
 static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 {
-	unsigned long addr = (unsigned long)raw_ctx;
-	unsigned long align = AESNI_ALIGN;
+	return (struct crypto_aes_ctx *)aes_align_addr(raw_ctx);
+}
 
-	if (align <= crypto_tfm_ctx_alignment())
-		align = 1;
-	return (struct crypto_aes_ctx *)ALIGN(addr, align);
+static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
 }
 
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
@@ -883,7 +882,7 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int err;
 
 	err = xts_verify_key(tfm, key, keylen);
@@ -893,20 +892,20 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 	keylen /= 2;
 
 	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
+	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
 				 key, keylen);
 	if (err)
 		return err;
 
 	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
+	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
 				  key + keylen, keylen);
 }
 
 static int xts_crypt(struct skcipher_request *req, bool encrypt)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
 	int tail = req->cryptlen % AES_BLOCK_SIZE;
 	struct skcipher_request subreq;
 	struct skcipher_walk walk;
@@ -942,7 +941,7 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 	kernel_fpu_begin();
 
 	/* calculate first value of T */
-	aesni_enc(aes_ctx(ctx->raw_tweak_ctx), walk.iv, walk.iv);
+	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
 
 	while (walk.nbytes > 0) {
 		int nbytes = walk.nbytes;
@@ -951,11 +950,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 			nbytes &= ~(AES_BLOCK_SIZE - 1);
 
 		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_encrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  nbytes, walk.iv);
 		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_decrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  nbytes, walk.iv);
 		kernel_fpu_end();
@@ -983,11 +982,11 @@ static int xts_crypt(struct skcipher_request *req, bool encrypt)
 
 		kernel_fpu_begin();
 		if (encrypt)
-			aesni_xts_encrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_encrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  walk.nbytes, walk.iv);
 		else
-			aesni_xts_decrypt(aes_ctx(ctx->raw_crypt_ctx),
+			aesni_xts_decrypt(&ctx->crypt_ctx,
 					  walk.dst.virt.addr, walk.src.virt.addr,
 					  walk.nbytes, walk.iv);
 		kernel_fpu_end();
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 11/12] crypto: x86/aes - Prepare for a new AES-XTS implementation
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, David S. Miller, Ingo Molnar, H. Peter Anvin

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have
the same prototype as AES-NI. Then, the glue code will be the same as
AES-NI's.

The new AES code will support the XTS mode alone as disk encryption is
the only intended use case.

Refactor the XTS-related code to avoid code duplication, and also move
some constant values to be shareable. Introduce wrappers for data
transformation functions to return an error code as AES-KL may
populate it.

The refactored code needs to invoke the implementation-specific
functions. Then, while reusable, the code has to be inlined to the
caller to avoid the indirect call for its possible overhead.

Neither functional change nor performance regression is intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v7:
* Remove aesni_dec() as not referenced by the refactored helpers. But,
  keep the ASM symbol '__aesni_dec' to make it appears consistent with
  its counterpart '__aesni_enc'.
* Call out 'AES-XTS' in the subject.

Changes from v6:
* Inline the helper code to avoid the indirect call. (Eric Biggers)
* Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
* Don't export symbols yet here. Instead, do it when needed later.
* Improve the coding style:
  - Follow the symbol convention: '_' -> '__' (Eric Biggers)
  - Fix a style issue -- 'dst = src = ...' catched by checkpatch.pl:
    "CHECK: multiple assignments should be avoided"
* Cleanup: move some define back to AES-NI code as not used by AES-KL

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link:
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/aes-helper_asm.S   |  22 +++
 arch/x86/crypto/aes-helper_glue.h  | 161 ++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |  47 +++----
 arch/x86/crypto/aesni-intel_glue.c | 209 +++++++----------------------
 4 files changed, 249 insertions(+), 190 deletions(-)
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h

diff --git a/arch/x86/crypto/aes-helper_asm.S b/arch/x86/crypto/aes-helper_asm.S
new file mode 100644
index 000000000000..b31abcdf63cb
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_asm.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
new file mode 100644
index 000000000000..fbb5a70f1e1b
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN		16
+#define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline void *aes_align_addr(void *addr)
+{
+	return (crypto_tfm_ctx_alignment() >= AES_ALIGN) ? addr : PTR_ALIGN(addr, AES_ALIGN);
+}
+
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return (struct aes_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		  int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 *in_key,
+			    unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+		 int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				 unsigned int len, u8 *iv),
+		 int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+		else
+			dst = src;
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3ac7487ecad2..3922d24cae2b 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-helper_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1935,9 +1936,9 @@ SYM_FUNC_START(aesni_set_key)
 SYM_FUNC_END(aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(__aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(__aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(__aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(__aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2689,22 +2690,14 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
+#ifdef __x86_64__
+
 .pushsection .rodata
 .align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
 .Lbswap_mask:
 	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
 .popsection
 
-#ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
  *	setup registers used by _aesni_inc
@@ -2819,12 +2812,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2831,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(__aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2996,13 +2983,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(__aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(__aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3158,4 +3145,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(__aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 589648142c17..518f48f3bd6b 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,25 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-helper_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,17 +72,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-static inline void *aes_align_addr(void *addr)
-{
-	if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
-		return addr;
-	return PTR_ALIGN(addr, AESNI_ALIGN);
-}
-
 asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
 			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -104,14 +89,34 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+
+static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			     unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+
+static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			     unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
 
 #ifdef CONFIG_X86_64
 
@@ -223,11 +228,6 @@ static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 	return (struct crypto_aes_ctx *)aes_align_addr(raw_ctx);
 }
 
-static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
-{
-	return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -263,7 +263,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		__aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -276,7 +276,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		__aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -527,7 +527,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			__aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -672,8 +672,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -828,8 +828,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -856,8 +856,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -882,128 +882,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1159,8 +1048,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1176,8 +1065,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 11/12] crypto: x86/aes - Prepare for a new AES-XTS implementation
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

Key Locker's AES instruction set ('AES-KL') has a similar programming
interface to AES-NI. The internal ABI in the assembly code will have
the same prototype as AES-NI. Then, the glue code will be the same as
AES-NI's.

The new AES code will support the XTS mode alone as disk encryption is
the only intended use case.

Refactor the XTS-related code to avoid code duplication, and also move
some constant values to be shareable. Introduce wrappers for data
transformation functions to return an error code as AES-KL may
populate it.

The refactored code needs to invoke the implementation-specific
functions. Then, while reusable, the code has to be inlined to the
caller to avoid the indirect call for its possible overhead.

Neither functional change nor performance regression is intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v7:
* Remove aesni_dec() as not referenced by the refactored helpers. But,
  keep the ASM symbol '__aesni_dec' to make it appears consistent with
  its counterpart '__aesni_enc'.
* Call out 'AES-XTS' in the subject.

Changes from v6:
* Inline the helper code to avoid the indirect call. (Eric Biggers)
* Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
* Don't export symbols yet here. Instead, do it when needed later.
* Improve the coding style:
  - Follow the symbol convention: '_' -> '__' (Eric Biggers)
  - Fix a style issue -- 'dst = src = ...' catched by checkpatch.pl:
    "CHECK: multiple assignments should be avoided"
* Cleanup: move some define back to AES-NI code as not used by AES-KL

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
  Link:
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/aes-helper_asm.S   |  22 +++
 arch/x86/crypto/aes-helper_glue.h  | 161 ++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |  47 +++----
 arch/x86/crypto/aesni-intel_glue.c | 209 +++++++----------------------
 4 files changed, 249 insertions(+), 190 deletions(-)
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h

diff --git a/arch/x86/crypto/aes-helper_asm.S b/arch/x86/crypto/aes-helper_asm.S
new file mode 100644
index 000000000000..b31abcdf63cb
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_asm.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
new file mode 100644
index 000000000000..fbb5a70f1e1b
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -0,0 +1,161 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_INTEL_GLUE_H
+#define _AES_INTEL_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN		16
+#define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+struct aes_xts_ctx {
+	struct crypto_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	struct crypto_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline void *aes_align_addr(void *addr)
+{
+	return (crypto_tfm_ctx_alignment() >= AES_ALIGN) ? addr : PTR_ALIGN(addr, AES_ALIGN);
+}
+
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return (struct aes_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		  int (*fn)(struct crypto_tfm *tfm, void *ctx, const u8 *in_key,
+			    unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+		 int (*crypt_fn)(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				 unsigned int len, u8 *iv),
+		 int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+		else
+			dst = src;
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+
+#endif /* _AES_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3ac7487ecad2..3922d24cae2b 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-helper_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1935,9 +1936,9 @@ SYM_FUNC_START(aesni_set_key)
 SYM_FUNC_END(aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(__aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1956,7 +1957,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(__aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2124,9 +2125,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(__aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2146,7 +2147,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(__aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2689,22 +2690,14 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
+#ifdef __x86_64__
+
 .pushsection .rodata
 .align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
 .Lbswap_mask:
 	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
 .popsection
 
-#ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
  *	setup registers used by _aesni_inc
@@ -2819,12 +2812,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2844,10 +2831,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(__aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2996,13 +2983,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(__aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(__aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3158,4 +3145,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(__aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 589648142c17..518f48f3bd6b 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,25 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-helper_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,17 +72,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-static inline void *aes_align_addr(void *addr)
-{
-	if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
-		return addr;
-	return PTR_ALIGN(addr, AESNI_ALIGN);
-}
-
 asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
 			     unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -104,14 +89,34 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+
+static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			     unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(ctx, out, in, len, iv);
+	return 0;
+}
+
+static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+			     unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(ctx, out, in, len, iv);
+	return 0;
+}
 
 #ifdef CONFIG_X86_64
 
@@ -223,11 +228,6 @@ static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 	return (struct crypto_aes_ctx *)aes_align_addr(raw_ctx);
 }
 
-static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
-{
-	return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
-}
-
 static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -263,7 +263,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		__aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -276,7 +276,7 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		__aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -527,7 +527,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			__aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -672,8 +672,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -828,8 +828,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -856,8 +856,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -882,128 +882,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
-				 key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
-				  key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aes_set_key_common);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1159,8 +1048,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1176,8 +1065,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
@ 2023-06-03 15:22         ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	chang.seok.bae, David S. Miller, Ingo Molnar, H. Peter Anvin

Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformation are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The
implementation has some details worth mentioning, which differentiate
itself from others, that users may need to be aware of:

== Key Handle Restriction ==

A key handle may be encoded with some restrictions. Restrict every
handle only available in kernel mode via setkey().

Subsequently the key handle could be corrupted or fail with handle
restrictions. Then, encrypt()/decrypt() returns -EINVAL.

=== AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

== Wrapping Key Restore Failure ==

The failure of setkey() as well as encode()/decode() is also possible
with the wrapping key failure. In the event of hardware failure, the
wrapping key is lost from deep sleep states. Then, those functions
return -ENODEV as the feature is disabled.

== Userspace Exposition ==

Some hardware implementations may have some performance penalties.
E.g., the cryptsetup benchmark indicates the raw throughput is
measurably slower than AES-NI. But, for disk encryption, storage
bandwidth may be the bottleneck before encryption bandwidth.

This, along with the above points, is an end-user consideration for
selecting AES-KL over AES-NI. Thus, advertise it with a unique name
'xts-aes-aeskl' in /proc/crypto while not replacing AES-NI under the
generic name 'xts(aes)' with a lower priority.

== 64-bit Only ==

AES-KL provides wide instructions that process eight blocks at once
which can boost the AES performance. Leveraging those, the code needs
to clobber more than eight 128-bit registers.

But, the 32-bit does not have enough wide registers. Then, the
performance is unlikely better than 64-bit which has already a gap vs.
AES-NI. So, simply make it for the 64-bit mode only at the moment.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v7:
* Update the changelog -- remove 'API Limitation'. (Eric Biggers)
* Update the comment for valid_keylocker(). (Eric Biggers)
* Improve the code:
  - Remove the key-length check and simplify the code. (Eric Biggers)
  - Remove aeskl_dec() and __aeskl_dec() as not needed.
  - Simplify the register-function return handling. (Eric)
  - Rename setkey functions for coherent naming:
    aeskl_setkey() -> __aeskl_setkey(),
    aeskl_setkey_common() -> aeskl_setkey(),
    aeskl_xts_setkey() -> xts_setkey()
  - Revert an unnecessary comment.

Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
  - Trim unnecessary checks. (Eric Biggers)
  - Document the reason
  - Make sure both XTS keys with the same size
* Adjust the Kconfig change:
  - Move the location. (Robert Elliott)
  - Trim the description to follow others such as AES-NI.
* Update the changelog:
  - Explain the priority value for the common name under 'User
    Exposition' (renamed from 'Performance'). (Eric Biggers)
  - Trim the introduction
  - Switch to more imperative mood for those explaining the code
    change
  - Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
  - Remove unused one.
  - Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Kconfig            |  22 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-intel_asm.S  | 552 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 188 ++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |   8 +-
 arch/x86/crypto/aesni-intel_glue.c |  35 +-
 arch/x86/crypto/aesni-intel_glue.h |  16 +
 7 files changed, 812 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 9bbfd01cfa2f..658adfd7aebf 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -2,6 +2,11 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (x86)"
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config CRYPTO_CURVE25519_X86
 	tristate "Public key crypto: Curve25519 (ADX)"
 	depends on X86 && 64BIT
@@ -29,6 +34,23 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	tristate "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	depends on CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..61addc61dd4e
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,552 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define INC	%xmm13
+
+#define IN1	%xmm8
+#define IN	IN1
+
+#define AREG	%rax
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	FRAME_BEGIN
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	xor AREG, AREG
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	jmp .Lenc_noerr
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+
+.Lenc_noerr:
+	xor AREG, AREG
+	jmp .Lenc_end
+.Lenc_err:
+	mov $(-EINVAL), AREG
+.Lenc_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+/*
+ * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+	sub $128, LEN
+	jl .Lxts_enc1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_enc8_128
+	aesencwide256kl (%rdi)
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+	aesencwide128kl (%rdi)
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+	movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+	mov $(-EINVAL), AREG
+.Lxts_enc_ret:
+	FRAME_END
+	RET
+
+.Lxts_enc1_pre:
+	add $128, LEN
+	jz .Lxts_enc_ret_iv
+	sub $16, LEN
+	jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+	movdqu (INP), STATE1
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_enc1_out
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_enc_cts1
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+
+.Lxts_enc_cts1:
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_cts_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+	pxor IV, STATE1
+	movups STATE1, (OUTP)
+	jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+	test $15, LEN
+	jz .Lxts_dec8
+	sub $16, LEN
+
+.Lxts_dec8:
+	sub $128, LEN
+	jl .Lxts_dec1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_dec8_128
+	aesdecwide256kl (%rdi)
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+	aesdecwide128kl (%rdi)
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+	movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+	mov $(-EINVAL), AREG
+.Lxts_dec_ret:
+	FRAME_END
+	RET
+
+.Lxts_dec1_pre:
+	add $128, LEN
+	jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+	movdqu (INP), STATE1
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_dec_cts1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_dec1_out
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+	pxor IV, STATE1
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor STATE5, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+	pxor STATE5, STATE1
+
+	movups STATE1, (OUTP)
+	jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..193a3a96eb09
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+/*
+ * In the event of hardware failure, the wrapping key can be lost
+ * from a sleep state. Then, it is not usable anymore. The feature
+ * state can be found via valid_keylocker().
+ *
+ * Such disabling can happen anywhere preemptible. So, to avoid the
+ * race condition, check the availability on every use along with
+ * kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+			unsigned int keylen)
+{
+	struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
+	    keylen != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	if (unlikely(keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		err = aesni_set_key(ctx, in_key, keylen);
+	} else {
+		if (!valid_keylocker())
+			err = -ENODEV;
+		else
+			err = __aeskl_setkey(ctx, in_key, keylen);
+	}
+	kernel_fpu_end();
+
+	return err;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(ctx, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(ctx, out, in, len, iv);
+}
+
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			    unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	return ctx->crypt_ctx.key_length;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
+	 * support 192-bit keys. To make itself AES-compliant, it falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3922d24cae2b..d38abcc69d9e 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1821,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                     unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(__aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1933,7 +1933,7 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(__aesni_set_key)
 
 /*
  * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 518f48f3bd6b..774e3a78b662 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
 #include <linux/static_call.h>
 
 #include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
 
 #define RFC4106_HASH_SUBKEY_SIZE 16
 #define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,8 +73,8 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
+asmlinkage int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			       unsigned int key_len);
 asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -89,11 +90,23 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
-static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return __aesni_set_key(ctx, in_key, key_len);
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_set_key);
+#endif
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
 {
 	__aesni_enc(ctx, out, in);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_enc);
+#endif
 
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
@@ -104,19 +117,25 @@ asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
 				    const u8 *in, unsigned int len, u8 *iv);
 
-static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
-			     unsigned int len, u8 *iv)
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
 {
 	__aesni_xts_encrypt(ctx, out, in, len, iv);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+#endif
 
-static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
-			     unsigned int len, u8 *iv)
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
 {
 	__aesni_xts_decrypt(ctx, out, in, len, iv);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
+#endif
 
 #ifdef CONFIG_X86_64
 
@@ -242,7 +261,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = __aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..5b1919f49efe
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [dm-devel] [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
@ 2023-06-03 15:22         ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-03 15:22 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, David S. Miller, ardb, chang.seok.bae, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformation are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The
implementation has some details worth mentioning, which differentiate
itself from others, that users may need to be aware of:

== Key Handle Restriction ==

A key handle may be encoded with some restrictions. Restrict every
handle only available in kernel mode via setkey().

Subsequently the key handle could be corrupted or fail with handle
restrictions. Then, encrypt()/decrypt() returns -EINVAL.

=== AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

== Wrapping Key Restore Failure ==

The failure of setkey() as well as encode()/decode() is also possible
with the wrapping key failure. In the event of hardware failure, the
wrapping key is lost from deep sleep states. Then, those functions
return -ENODEV as the feature is disabled.

== Userspace Exposition ==

Some hardware implementations may have some performance penalties.
E.g., the cryptsetup benchmark indicates the raw throughput is
measurably slower than AES-NI. But, for disk encryption, storage
bandwidth may be the bottleneck before encryption bandwidth.

This, along with the above points, is an end-user consideration for
selecting AES-KL over AES-NI. Thus, advertise it with a unique name
'xts-aes-aeskl' in /proc/crypto while not replacing AES-NI under the
generic name 'xts(aes)' with a lower priority.

== 64-bit Only ==

AES-KL provides wide instructions that process eight blocks at once
which can boost the AES performance. Leveraging those, the code needs
to clobber more than eight 128-bit registers.

But, the 32-bit does not have enough wide registers. Then, the
performance is unlikely better than 64-bit which has already a gap vs.
AES-NI. So, simply make it for the 64-bit mode only at the moment.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Milan Broz <gmazyland@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
Changes from v7:
* Update the changelog -- remove 'API Limitation'. (Eric Biggers)
* Update the comment for valid_keylocker(). (Eric Biggers)
* Improve the code:
  - Remove the key-length check and simplify the code. (Eric Biggers)
  - Remove aeskl_dec() and __aeskl_dec() as not needed.
  - Simplify the register-function return handling. (Eric)
  - Rename setkey functions for coherent naming:
    aeskl_setkey() -> __aeskl_setkey(),
    aeskl_setkey_common() -> aeskl_setkey(),
    aeskl_xts_setkey() -> xts_setkey()
  - Revert an unnecessary comment.

Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
  - Trim unnecessary checks. (Eric Biggers)
  - Document the reason
  - Make sure both XTS keys with the same size
* Adjust the Kconfig change:
  - Move the location. (Robert Elliott)
  - Trim the description to follow others such as AES-NI.
* Update the changelog:
  - Explain the priority value for the common name under 'User
    Exposition' (renamed from 'Performance'). (Eric Biggers)
  - Trim the introduction
  - Switch to more imperative mood for those explaining the code
    change
  - Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
  - Remove unused one.
  - Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlsta)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
Link: https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
---
 arch/x86/crypto/Kconfig            |  22 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aeskl-intel_asm.S  | 552 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 188 ++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |   8 +-
 arch/x86/crypto/aesni-intel_glue.c |  35 +-
 arch/x86/crypto/aesni-intel_glue.h |  16 +
 7 files changed, 812 insertions(+), 12 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index 9bbfd01cfa2f..658adfd7aebf 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -2,6 +2,11 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (x86)"
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config CRYPTO_CURVE25519_X86
 	tristate "Public key crypto: Curve25519 (ADX)"
 	depends on X86 && 64BIT
@@ -29,6 +34,23 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	tristate "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	depends on CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..61addc61dd4e
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,552 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define INC	%xmm13
+
+#define IN1	%xmm8
+#define IN	IN1
+
+#define AREG	%rax
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	FRAME_BEGIN
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	xor AREG, AREG
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
+ */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	jmp .Lenc_noerr
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+
+.Lenc_noerr:
+	xor AREG, AREG
+	jmp .Lenc_end
+.Lenc_err:
+	mov $(-EINVAL), AREG
+.Lenc_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+/*
+ * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.Lxts_enc8:
+	sub $128, LEN
+	jl .Lxts_enc1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_enc8_128
+	aesencwide256kl (%rdi)
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc8_end
+.Lxts_enc8_128:
+	aesencwide128kl (%rdi)
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_enc8
+
+.Lxts_enc_ret_iv:
+	movups IV, (IVP)
+.Lxts_enc_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_enc_ret
+.Lxts_enc_ret_err:
+	mov $(-EINVAL), AREG
+.Lxts_enc_ret:
+	FRAME_END
+	RET
+
+.Lxts_enc1_pre:
+	add $128, LEN
+	jz .Lxts_enc_ret_iv
+	sub $16, LEN
+	jl .Lxts_enc_cts4
+
+.Lxts_enc1:
+	movdqu (INP), STATE1
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_end
+.Lxts_enc1_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_enc1_out
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_enc_cts1
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_enc1
+
+.Lxts_enc1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_enc_ret_iv
+
+.Lxts_enc_cts4:
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+
+.Lxts_enc_cts1:
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_enc1_cts_128
+	aesenc256kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+	jmp .Lxts_enc1_cts_end
+.Lxts_enc1_cts_128:
+	aesenc128kl (HANDLEP), STATE1
+	jz .Lxts_enc_ret_err
+
+.Lxts_enc1_cts_end:
+	pxor IV, STATE1
+	movups STATE1, (OUTP)
+	jmp .Lxts_enc_ret_noerr
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			   const u8 *src, unsigned int len, le128 *iv)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+	test $15, LEN
+	jz .Lxts_dec8
+	sub $16, LEN
+
+.Lxts_dec8:
+	sub $128, LEN
+	jl .Lxts_dec1_pre
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_dec8_128
+	aesdecwide256kl (%rdi)
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec8_end
+.Lxts_dec8_128:
+	aesdecwide128kl (%rdi)
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec8_end:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_dec8
+
+.Lxts_dec_ret_iv:
+	movups IV, (IVP)
+.Lxts_dec_ret_noerr:
+	xor AREG, AREG
+	jmp .Lxts_dec_ret
+.Lxts_dec_ret_err:
+	mov $(-EINVAL), AREG
+.Lxts_dec_ret:
+	FRAME_END
+	RET
+
+.Lxts_dec1_pre:
+	add $128, LEN
+	jz .Lxts_dec_ret_iv
+
+.Lxts_dec1:
+	movdqu (INP), STATE1
+
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_dec_cts1
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_end
+.Lxts_dec1_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_end:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_dec1_out
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_dec1
+
+.Lxts_dec1_out:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_dec_ret_iv
+
+.Lxts_dec_cts1:
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_pre_end
+.Lxts_dec1_cts_pre_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_pre_end:
+	pxor IV, STATE1
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN1
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN1
+	pblendvb STATE3, IN1
+	movaps IN1, STATE1
+
+	pxor STATE5, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_128
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+	jmp .Lxts_dec1_cts_end
+.Lxts_dec1_cts_128:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_dec_ret_err
+
+.Lxts_dec1_cts_end:
+	pxor STATE5, STATE1
+
+	movups STATE1, (OUTP)
+	jmp .Lxts_dec_ret_noerr
+
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..193a3a96eb09
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+/*
+ * In the event of hardware failure, the wrapping key can be lost
+ * from a sleep state. Then, it is not usable anymore. The feature
+ * state can be found via valid_keylocker().
+ *
+ * Such disabling can happen anywhere preemptible. So, to avoid the
+ * race condition, check the availability on every use along with
+ * kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
+			unsigned int keylen)
+{
+	struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
+	    keylen != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	kernel_fpu_begin();
+	if (unlikely(keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		err = aesni_set_key(ctx, in_key, keylen);
+	} else {
+		if (!valid_keylocker())
+			err = -ENODEV;
+		else
+			err = __aeskl_setkey(ctx, in_key, keylen);
+	}
+	kernel_fpu_end();
+
+	return err;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(ctx, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(ctx, out, in, len, iv);
+}
+
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+			    unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	return ctx->crypt_ctx.key_length;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not depend on AES-NI. But AES-KL does not
+	 * support 192-bit keys. To make itself AES-compliant, it falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 3922d24cae2b..d38abcc69d9e 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1821,10 +1821,10 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                     unsigned int key_len)
  */
-SYM_FUNC_START(aesni_set_key)
+SYM_FUNC_START(__aesni_set_key)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1933,7 +1933,7 @@ SYM_FUNC_START(aesni_set_key)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_set_key)
+SYM_FUNC_END(__aesni_set_key)
 
 /*
  * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 518f48f3bd6b..774e3a78b662 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
 #include <linux/static_call.h>
 
 #include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
 
 #define RFC4106_HASH_SUBKEY_SIZE 16
 #define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,8 +73,8 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
+asmlinkage int __aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			       unsigned int key_len);
 asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -89,11 +90,23 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
-static int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+		  unsigned int key_len)
+{
+	return __aesni_set_key(ctx, in_key, key_len);
+}
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_set_key);
+#endif
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in)
 {
 	__aesni_enc(ctx, out, in);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_enc);
+#endif
 
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
@@ -104,19 +117,25 @@ asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
 				    const u8 *in, unsigned int len, u8 *iv);
 
-static int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
-			     unsigned int len, u8 *iv)
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
 {
 	__aesni_xts_encrypt(ctx, out, in, len, iv);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_encrypt);
+#endif
 
-static int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
-			     unsigned int len, u8 *iv)
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv)
 {
 	__aesni_xts_decrypt(ctx, out, in, len, iv);
 	return 0;
 }
+#if IS_MODULE(CONFIG_CRYPTO_AES_KL)
+EXPORT_SYMBOL_GPL(aesni_xts_decrypt);
+#endif
 
 #ifdef CONFIG_X86_64
 
@@ -242,7 +261,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		err = __aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..5b1919f49efe
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Support for Intel AES-NI instructions. This file contains function
+ * prototypes to be referenced for other AES implementations
+ */
+
+int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len);
+
+int aesni_enc(const void *ctx, u8 *out, const u8 *in);
+
+int aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+int aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out, const u8 *in,
+		      unsigned int len, u8 *iv);
+
-- 
2.17.1

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker
  2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
@ 2023-06-03 16:37           ` Borislav Petkov
  -1 siblings, 0 replies; 247+ messages in thread
From: Borislav Petkov @ 2023-06-03 16:37 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, ebiggers, elliott,
	gmazyland, luto, dave.hansen, tglx, mingo, x86, herbert, ardb,
	dan.j.williams, bernie.keany, charishma1.gairuboyina,
	lalithambika.krishnakumar, nhuck, Jonathan Corbet, Ingo Molnar,
	H. Peter Anvin, linux-doc

On Sat, Jun 03, 2023 at 08:22:24AM -0700, Chang S. Bae wrote:
> +static __init int x86_nokeylocker_setup(char *arg)
> +{
> +	/* Expect an exact match without trailing characters. */
> +	if (strlen(arg))
> +		return 0;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
> +		return 1;
> +
> +	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
> +	pr_info("x86/keylocker: Disabled by kernel command line.\n");
> +	return 1;
> +}
> +__setup("nokeylocker", x86_nokeylocker_setup);

Can we stop adding those just to remove them at some point later but
simply do:

clearcpuid=keylocker

?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker
@ 2023-06-03 16:37           ` Borislav Petkov
  0 siblings, 0 replies; 247+ messages in thread
From: Borislav Petkov @ 2023-06-03 16:37 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-doc, dave.hansen, lalithambika.krishnakumar, dm-devel,
	H. Peter Anvin, nhuck, ardb, herbert, Jonathan Corbet, x86,
	mingo, ebiggers, Ingo Molnar, gmazyland, elliott, luto,
	dan.j.williams, charishma1.gairuboyina, linux-kernel, tglx,
	linux-crypto, bernie.keany

On Sat, Jun 03, 2023 at 08:22:24AM -0700, Chang S. Bae wrote:
> +static __init int x86_nokeylocker_setup(char *arg)
> +{
> +	/* Expect an exact match without trailing characters. */
> +	if (strlen(arg))
> +		return 0;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
> +		return 1;
> +
> +	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
> +	pr_info("x86/keylocker: Disabled by kernel command line.\n");
> +	return 1;
> +}
> +__setup("nokeylocker", x86_nokeylocker_setup);

Can we stop adding those just to remove them at some point later but
simply do:

clearcpuid=keylocker

?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
@ 2023-06-04 15:34           ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-06-04 15:34 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On Sat, Jun 03, 2023 at 08:22:25AM -0700, Chang S. Bae wrote:
> Every field in struct aesni_xts_ctx is a pointer to a byte array.

A byte array.  Not a pointer to a byte array.

> Then, the field can be redefined as that struct type instead of the obscure
> pointer.

There's no pointer.

>  static inline struct
>  aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
>  {
> -	unsigned long align = AESNI_ALIGN;
> -
> -	if (align <= crypto_tfm_ctx_alignment())
> -		align = 1;
> -	return PTR_ALIGN(crypto_aead_ctx(tfm), align);
> +	return (struct aesni_rfc4106_gcm_ctx *)aes_align_addr(crypto_aead_ctx(tfm));
>  }

Explicit casts from 'void *' are unnecessary.

>  static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
>  			    unsigned int keylen)
>  {
> -	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
> +	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
>  	int err;
>  
>  	err = xts_verify_key(tfm, key, keylen);
> @@ -893,20 +892,20 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
>  	keylen /= 2;
>  
>  	/* first half of xts-key is for crypt */
> -	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
> +	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
>  				 key, keylen);
>  	if (err)
>  		return err;
>  
>  	/* second half of xts-key is for tweak */
> -	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
> +	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
>  				  key + keylen, keylen);
>  }

To re-iterate what I said on v6, the runtime alignment to a 16-byte boundary
should happen when translating the raw crypto_skcipher_ctx() into the pointer to
the aes_xts_ctx.  It should not happen when accessing each individual field in
the aes_xts_ctx.

Yet, this code is still doing runtime alignment when accessing each individual
field, as the second argument to aes_set_key_common() is 'void *raw_ctx' which
aes_set_key_common() runtime-aligns to crypto_aes_ctx.

We should keep everything consistent, which means making aes_set_key_common()
take a pointer to crypto_aes_ctx and not do the runtime alignment.

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
@ 2023-06-04 15:34           ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-06-04 15:34 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On Sat, Jun 03, 2023 at 08:22:25AM -0700, Chang S. Bae wrote:
> Every field in struct aesni_xts_ctx is a pointer to a byte array.

A byte array.  Not a pointer to a byte array.

> Then, the field can be redefined as that struct type instead of the obscure
> pointer.

There's no pointer.

>  static inline struct
>  aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
>  {
> -	unsigned long align = AESNI_ALIGN;
> -
> -	if (align <= crypto_tfm_ctx_alignment())
> -		align = 1;
> -	return PTR_ALIGN(crypto_aead_ctx(tfm), align);
> +	return (struct aesni_rfc4106_gcm_ctx *)aes_align_addr(crypto_aead_ctx(tfm));
>  }

Explicit casts from 'void *' are unnecessary.

>  static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
>  			    unsigned int keylen)
>  {
> -	struct aesni_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
> +	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
>  	int err;
>  
>  	err = xts_verify_key(tfm, key, keylen);
> @@ -893,20 +892,20 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
>  	keylen /= 2;
>  
>  	/* first half of xts-key is for crypt */
> -	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
> +	err = aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->crypt_ctx,
>  				 key, keylen);
>  	if (err)
>  		return err;
>  
>  	/* second half of xts-key is for tweak */
> -	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
> +	return aes_set_key_common(crypto_skcipher_tfm(tfm), &ctx->tweak_ctx,
>  				  key + keylen, keylen);
>  }

To re-iterate what I said on v6, the runtime alignment to a 16-byte boundary
should happen when translating the raw crypto_skcipher_ctx() into the pointer to
the aes_xts_ctx.  It should not happen when accessing each individual field in
the aes_xts_ctx.

Yet, this code is still doing runtime alignment when accessing each individual
field, as the second argument to aes_set_key_common() is 'void *raw_ctx' which
aes_set_key_common() runtime-aligns to crypto_aes_ctx.

We should keep everything consistent, which means making aes_set_key_common()
take a pointer to crypto_aes_ctx and not do the runtime alignment.

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  2023-06-04 15:34           ` [dm-devel] " Eric Biggers
@ 2023-06-04 22:02             ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-04 22:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On 6/4/2023 8:34 AM, Eric Biggers wrote:
> 
> To re-iterate what I said on v6, the runtime alignment to a 16-byte boundary
> should happen when translating the raw crypto_skcipher_ctx() into the pointer to
> the aes_xts_ctx.  It should not happen when accessing each individual field in
> the aes_xts_ctx.
> 
> Yet, this code is still doing runtime alignment when accessing each individual
> field, as the second argument to aes_set_key_common() is 'void *raw_ctx' which
> aes_set_key_common() runtime-aligns to crypto_aes_ctx.
> 
> We should keep everything consistent, which means making aes_set_key_common()
> take a pointer to crypto_aes_ctx and not do the runtime alignment.

Let me clarify what is the problem this patch tried to solve here. The 
current struct aesni_xts_ctx is ugly. So, the main story is let's fix it 
before using the code for AES-KL.

Then, the rework part may be applicable for code re-usability. That 
seems to be okay to do here.

Fixing the runtime alignment entirely seems to be touching other code 
than AES-XTS. Yes, that's ideal cleanup for consistency. But, it seems 
to be less relevant in this series.  I'd be happy to follow up on that 
improvement though.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
@ 2023-06-04 22:02             ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-04 22:02 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On 6/4/2023 8:34 AM, Eric Biggers wrote:
> 
> To re-iterate what I said on v6, the runtime alignment to a 16-byte boundary
> should happen when translating the raw crypto_skcipher_ctx() into the pointer to
> the aes_xts_ctx.  It should not happen when accessing each individual field in
> the aes_xts_ctx.
> 
> Yet, this code is still doing runtime alignment when accessing each individual
> field, as the second argument to aes_set_key_common() is 'void *raw_ctx' which
> aes_set_key_common() runtime-aligns to crypto_aes_ctx.
> 
> We should keep everything consistent, which means making aes_set_key_common()
> take a pointer to crypto_aes_ctx and not do the runtime alignment.

Let me clarify what is the problem this patch tried to solve here. The 
current struct aesni_xts_ctx is ugly. So, the main story is let's fix it 
before using the code for AES-KL.

Then, the rework part may be applicable for code re-usability. That 
seems to be okay to do here.

Fixing the runtime alignment entirely seems to be touching other code 
than AES-XTS. Yes, that's ideal cleanup for consistency. But, it seems 
to be less relevant in this series.  I'd be happy to follow up on that 
improvement though.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker
  2023-06-03 16:37           ` [dm-devel] " Borislav Petkov
@ 2023-06-04 22:13             ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-04 22:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, linux-crypto, dm-devel, ebiggers, elliott,
	gmazyland, luto, dave.hansen, tglx, mingo, x86, herbert, ardb,
	dan.j.williams, bernie.keany, charishma1.gairuboyina,
	lalithambika.krishnakumar, nhuck, Jonathan Corbet, Ingo Molnar,
	H. Peter Anvin, linux-doc

On 6/3/2023 9:37 AM, Borislav Petkov wrote:
> On Sat, Jun 03, 2023 at 08:22:24AM -0700, Chang S. Bae wrote:
>> +static __init int x86_nokeylocker_setup(char *arg)
>> +{
>> +	/* Expect an exact match without trailing characters. */
>> +	if (strlen(arg))
>> +		return 0;
>> +
>> +	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
>> +		return 1;
>> +
>> +	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
>> +	pr_info("x86/keylocker: Disabled by kernel command line.\n");
>> +	return 1;
>> +}
>> +__setup("nokeylocker", x86_nokeylocker_setup);
> 
> Can we stop adding those just to remove them at some point later but
> simply do:
> 
> clearcpuid=keylocker
> 
> ?

Oh, I was not sure about this policy. Thanks, now I'm glad that I have 
confidence in removing this.

Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker
@ 2023-06-04 22:13             ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-04 22:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-doc, dave.hansen, lalithambika.krishnakumar, dm-devel,
	H. Peter Anvin, nhuck, ardb, herbert, Jonathan Corbet, x86,
	mingo, ebiggers, Ingo Molnar, gmazyland, elliott, luto,
	dan.j.williams, charishma1.gairuboyina, linux-kernel, tglx,
	linux-crypto, bernie.keany

On 6/3/2023 9:37 AM, Borislav Petkov wrote:
> On Sat, Jun 03, 2023 at 08:22:24AM -0700, Chang S. Bae wrote:
>> +static __init int x86_nokeylocker_setup(char *arg)
>> +{
>> +	/* Expect an exact match without trailing characters. */
>> +	if (strlen(arg))
>> +		return 0;
>> +
>> +	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
>> +		return 1;
>> +
>> +	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
>> +	pr_info("x86/keylocker: Disabled by kernel command line.\n");
>> +	return 1;
>> +}
>> +__setup("nokeylocker", x86_nokeylocker_setup);
> 
> Can we stop adding those just to remove them at some point later but
> simply do:
> 
> clearcpuid=keylocker
> 
> ?

Oh, I was not sure about this policy. Thanks, now I'm glad that I have 
confidence in removing this.

Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  2023-06-04 22:02             ` [dm-devel] " Chang S. Bae
@ 2023-06-05  2:46               ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-06-05  2:46 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On Sun, Jun 04, 2023 at 03:02:32PM -0700, Chang S. Bae wrote:
> On 6/4/2023 8:34 AM, Eric Biggers wrote:
> > 
> > To re-iterate what I said on v6, the runtime alignment to a 16-byte boundary
> > should happen when translating the raw crypto_skcipher_ctx() into the pointer to
> > the aes_xts_ctx.  It should not happen when accessing each individual field in
> > the aes_xts_ctx.
> > 
> > Yet, this code is still doing runtime alignment when accessing each individual
> > field, as the second argument to aes_set_key_common() is 'void *raw_ctx' which
> > aes_set_key_common() runtime-aligns to crypto_aes_ctx.
> > 
> > We should keep everything consistent, which means making aes_set_key_common()
> > take a pointer to crypto_aes_ctx and not do the runtime alignment.
> 
> Let me clarify what is the problem this patch tried to solve here. The
> current struct aesni_xts_ctx is ugly. So, the main story is let's fix it
> before using the code for AES-KL.
> 
> Then, the rework part may be applicable for code re-usability. That seems to
> be okay to do here.
> 
> Fixing the runtime alignment entirely seems to be touching other code than
> AES-XTS. Yes, that's ideal cleanup for consistency. But, it seems to be less
> relevant in this series.  I'd be happy to follow up on that improvement
> though.

IMO the issue is that your patch makes the code (including the XTS code)
inconsistent because it makes it use a mix of both approaches: it aligns each
field individually, *and* it aligns the ctx up-front.  I was hoping to switch
fully from the former approach to the latter approach, instead of switching from
the former approach to a mix of the two approaches as you are proposing.

The following on top of this patch is what I am asking for.  I think it would be
appropriate to fold into this patch.

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 589648142c173..ad1ae7a88b59d 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -228,10 +228,10 @@ static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
 	return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
 }
 
-static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
+static int aes_set_key_common(struct crypto_tfm *tfm,
+			      struct crypto_aes_ctx *ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
-	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
 	int err;
 
 	if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
@@ -252,7 +252,8 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 static int aes_set_key(struct crypto_tfm *tfm, const u8 *in_key,
 		       unsigned int key_len)
 {
-	return aes_set_key_common(tfm, crypto_tfm_ctx(tfm), in_key, key_len);
+	return aes_set_key_common(tfm, aes_ctx(crypto_tfm_ctx(tfm)),
+				  in_key, key_len);
 }
 
 static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
@@ -285,7 +286,7 @@ static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			         unsigned int len)
 {
 	return aes_set_key_common(crypto_skcipher_tfm(tfm),
-				  crypto_skcipher_ctx(tfm), key, len);
+				  aes_ctx(crypto_skcipher_ctx(tfm)), key, len);
 }
 
 static int ecb_encrypt(struct skcipher_request *req)

^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
@ 2023-06-05  2:46               ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-06-05  2:46 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On Sun, Jun 04, 2023 at 03:02:32PM -0700, Chang S. Bae wrote:
> On 6/4/2023 8:34 AM, Eric Biggers wrote:
> > 
> > To re-iterate what I said on v6, the runtime alignment to a 16-byte boundary
> > should happen when translating the raw crypto_skcipher_ctx() into the pointer to
> > the aes_xts_ctx.  It should not happen when accessing each individual field in
> > the aes_xts_ctx.
> > 
> > Yet, this code is still doing runtime alignment when accessing each individual
> > field, as the second argument to aes_set_key_common() is 'void *raw_ctx' which
> > aes_set_key_common() runtime-aligns to crypto_aes_ctx.
> > 
> > We should keep everything consistent, which means making aes_set_key_common()
> > take a pointer to crypto_aes_ctx and not do the runtime alignment.
> 
> Let me clarify what is the problem this patch tried to solve here. The
> current struct aesni_xts_ctx is ugly. So, the main story is let's fix it
> before using the code for AES-KL.
> 
> Then, the rework part may be applicable for code re-usability. That seems to
> be okay to do here.
> 
> Fixing the runtime alignment entirely seems to be touching other code than
> AES-XTS. Yes, that's ideal cleanup for consistency. But, it seems to be less
> relevant in this series.  I'd be happy to follow up on that improvement
> though.

IMO the issue is that your patch makes the code (including the XTS code)
inconsistent because it makes it use a mix of both approaches: it aligns each
field individually, *and* it aligns the ctx up-front.  I was hoping to switch
fully from the former approach to the latter approach, instead of switching from
the former approach to a mix of the two approaches as you are proposing.

The following on top of this patch is what I am asking for.  I think it would be
appropriate to fold into this patch.

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 589648142c173..ad1ae7a88b59d 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -228,10 +228,10 @@ static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
 	return (struct aesni_xts_ctx *)aes_align_addr(crypto_skcipher_ctx(tfm));
 }
 
-static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
+static int aes_set_key_common(struct crypto_tfm *tfm,
+			      struct crypto_aes_ctx *ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
-	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
 	int err;
 
 	if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
@@ -252,7 +252,8 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 static int aes_set_key(struct crypto_tfm *tfm, const u8 *in_key,
 		       unsigned int key_len)
 {
-	return aes_set_key_common(tfm, crypto_tfm_ctx(tfm), in_key, key_len);
+	return aes_set_key_common(tfm, aes_ctx(crypto_tfm_ctx(tfm)),
+				  in_key, key_len);
 }
 
 static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
@@ -285,7 +286,7 @@ static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			         unsigned int len)
 {
 	return aes_set_key_common(crypto_skcipher_tfm(tfm),
-				  crypto_skcipher_ctx(tfm), key, len);
+				  aes_ctx(crypto_skcipher_ctx(tfm)), key, len);
 }
 
 static int ecb_encrypt(struct skcipher_request *req)

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
  2023-06-05  2:46               ` [dm-devel] " Eric Biggers
@ 2023-06-05  4:41                 ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-05  4:41 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On 6/4/2023 7:46 PM, Eric Biggers wrote:
> 
> IMO the issue is that your patch makes the code (including the XTS code)
> inconsistent because it makes it use a mix of both approaches: it aligns each
> field individually, *and* it aligns the ctx up-front. 

But, the code did this already:

common_rfc4106_set_key()
-> aesni_rfc4106_gcm_ctx_get(aead) 	// align ->ctx
-> aes_set_key_common()
    -> aes_ctx()				// here, this aligns again

And generic_gcmaes_set_key() as well.

Hmm, maybe a separate patch can spell out this issue more clearly for 
the record, no?

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx
@ 2023-06-05  4:41                 ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-05  4:41 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On 6/4/2023 7:46 PM, Eric Biggers wrote:
> 
> IMO the issue is that your patch makes the code (including the XTS code)
> inconsistent because it makes it use a mix of both approaches: it aligns each
> field individually, *and* it aligns the ctx up-front. 

But, the code did this already:

common_rfc4106_set_key()
-> aesni_rfc4106_gcm_ctx_get(aead) 	// align ->ctx
-> aes_set_key_common()
    -> aes_ctx()				// here, this aligns again

And generic_gcmaes_set_key() as well.

Hmm, maybe a separate patch can spell out this issue more clearly for 
the record, no?

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v5 01/12] Documentation/x86: Document Key Locker
  2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
@ 2023-06-05 10:52     ` Bagas Sanjaya
  -1 siblings, 0 replies; 247+ messages in thread
From: Bagas Sanjaya @ 2023-06-05 10:52 UTC (permalink / raw)
  To: Chang S. Bae, linux-crypto, dm-devel, herbert, ebiggers, ardb,
	x86, luto, tglx, bp, dave.hansen, mingo
  Cc: linux-kernel, dan.j.williams, charishma1.gairuboyina,
	kumar.n.dwarakanath, ravi.v.shankar, linux-doc

On 1/13/22 04:12, Chang S. Bae wrote:
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature feature to reduce key exfiltration
> +opportunities while maintaining a programming interface similar to AES-NI.
> +It converts the AES key into an encoded form, called the 'key handle'. The
> +key handle is a wrapped version of the clear-text key where the wrapping
> +key has limited exposure. Once converted, all subsequent data encryption
> +using new AES instructions (AES-KL) uses this key handle, reducing the
> +exposure of private key material in memory.
> +
> +Internal Wrapping Key (IWKey)
> +=============================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +IWKey Restore Failure
> +---------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +The event of an IWKey restore failure upon resume from suspend, all
> +established key handles become invalid. In flight dm-crypt operations
> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend, reboot. If the volume
                                            "suspend and reboot"?
> +impacted by an IWKey restore failure is a data-volume then it is possible
> +that I/O errors on that volume do not bring down the rest of the system.
> +However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new IWKey at initial boot, not any time after like resume from
> +suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private IWKey state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per VM IWKeys in memory, defeats the purpose of Key Locker. The
> +backup / restore facility is also not performant enough to be suitable for
> +guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> +  restriction failure, the instruction fails with setting RFLAGS.ZF. The
> +  crypto library implementation includes the flag check to return an error
> +  code. Note that the flag is also set when the internal wrapping key is
> +  changed because of missing backup.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> +  AES-KL instruction to process an 192-bit key. But there is no AES-KL
> +  instruction to process a 192-bit key. The AES-KL cipher implementation
> +  logs a warning message with a 192-bit key and then falls back to AES-NI.
> +  So, this 192-bit key-size limitation is only documented, not enforced. It
> +  means the key will remain in clear-text in memory. This is to meet Linux
> +  crypto-cipher expectation that each implementation must support all the
> +  AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> +  overhead when compared with AES-NI instructions.
> +

The rest is LGTM.

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v5 01/12] Documentation/x86: Document Key Locker
@ 2023-06-05 10:52     ` Bagas Sanjaya
  0 siblings, 0 replies; 247+ messages in thread
From: Bagas Sanjaya @ 2023-06-05 10:52 UTC (permalink / raw)
  To: Chang S. Bae, linux-crypto, dm-devel, herbert, ebiggers, ardb,
	x86, luto, tglx, bp, dave.hansen, mingo
  Cc: ravi.v.shankar, linux-doc, linux-kernel, kumar.n.dwarakanath,
	dan.j.williams, charishma1.gairuboyina

On 1/13/22 04:12, Chang S. Bae wrote:
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature feature to reduce key exfiltration
> +opportunities while maintaining a programming interface similar to AES-NI.
> +It converts the AES key into an encoded form, called the 'key handle'. The
> +key handle is a wrapped version of the clear-text key where the wrapping
> +key has limited exposure. Once converted, all subsequent data encryption
> +using new AES instructions (AES-KL) uses this key handle, reducing the
> +exposure of private key material in memory.
> +
> +Internal Wrapping Key (IWKey)
> +=============================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +IWKey Restore Failure
> +---------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +The event of an IWKey restore failure upon resume from suspend, all
> +established key handles become invalid. In flight dm-crypt operations
> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend, reboot. If the volume
                                            "suspend and reboot"?
> +impacted by an IWKey restore failure is a data-volume then it is possible
> +that I/O errors on that volume do not bring down the rest of the system.
> +However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new IWKey at initial boot, not any time after like resume from
> +suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private IWKey state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per VM IWKeys in memory, defeats the purpose of Key Locker. The
> +backup / restore facility is also not performant enough to be suitable for
> +guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> +  restriction failure, the instruction fails with setting RFLAGS.ZF. The
> +  crypto library implementation includes the flag check to return an error
> +  code. Note that the flag is also set when the internal wrapping key is
> +  changed because of missing backup.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> +  AES-KL instruction to process an 192-bit key. But there is no AES-KL
> +  instruction to process a 192-bit key. The AES-KL cipher implementation
> +  logs a warning message with a 192-bit key and then falls back to AES-NI.
> +  So, this 192-bit key-size limitation is only documented, not enforced. It
> +  means the key will remain in clear-text in memory. This is to meet Linux
> +  crypto-cipher expectation that each implementation must support all the
> +  AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> +  overhead when compared with AES-NI instructions.
> +

The rest is LGTM.

-- 
An old man doll... just what I always wanted! - Clara

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 01/12] Documentation/x86: Document Key Locker
  2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
@ 2023-06-05 10:54           ` Bagas Sanjaya
  -1 siblings, 0 replies; 247+ messages in thread
From: Bagas Sanjaya @ 2023-06-05 10:54 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	Ingo Molnar, H. Peter Anvin, Jonathan Corbet, linux-doc

On 6/3/23 22:22, Chang S. Bae wrote:
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature to reduce key exfiltration opportunities
> +while maintaining a programming interface similar to AES-NI. It
> +converts the AES key into an encoded form, called the 'key handle'.
> +The key handle is a wrapped version of the clear-text key where the
> +wrapping key has limited exposure. Once converted, all subsequent data
> +encryption using new AES instructions (AES-KL) uses this key handle,
> +reducing the exposure of private key material in memory.
> +
> +CPU-internal Wrapping Key
> +=========================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +Wrapping Key Restore Failure
> +----------------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +The event of a wrapping key restore failure upon resume from suspend, all
> +established key handles become invalid. In flight dm-crypt operations
> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend, reboot. If the volume
"... resume from suspend or reboot."
> +impacted by a wrapping key restore failure is a data-volume then it is
> +possible that I/O errors on that volume do not bring down the rest of the
> +system. However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new wrapping key at initial boot, not any time after like
> +resume from suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private wrapping key state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
> +The backup / restore facility is also not performant enough to be suitable
> +for guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> +  restriction failure, the instruction fails with setting RFLAGS.ZF. The
> +  crypto library implementation includes the flag check to return -EINVAL.
> +  Note that this flag is also set if the wrapping key is changed, e.g.,
> +  because of the backup error.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> +  AES-KL instruction to process an 192-bit key. The AES-KL cipher
> +  implementation logs a warning message with a 192-bit key and then falls
> +  back to AES-NI. So, this 192-bit key-size limitation is only documented,
> +  not enforced. It means the key will remain in clear-text in memory. This
> +  is to meet Linux crypto-cipher expectation that each implementation must
> +  support all the AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> +  overhead when compared with AES-NI instructions.
> +

The rest is LGTM, thanks!

Anyway,

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 01/12] Documentation/x86: Document Key Locker
@ 2023-06-05 10:54           ` Bagas Sanjaya
  0 siblings, 0 replies; 247+ messages in thread
From: Bagas Sanjaya @ 2023-06-05 10:54 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

On 6/3/23 22:22, Chang S. Bae wrote:
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature to reduce key exfiltration opportunities
> +while maintaining a programming interface similar to AES-NI. It
> +converts the AES key into an encoded form, called the 'key handle'.
> +The key handle is a wrapped version of the clear-text key where the
> +wrapping key has limited exposure. Once converted, all subsequent data
> +encryption using new AES instructions (AES-KL) uses this key handle,
> +reducing the exposure of private key material in memory.
> +
> +CPU-internal Wrapping Key
> +=========================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +Wrapping Key Restore Failure
> +----------------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +The event of a wrapping key restore failure upon resume from suspend, all
> +established key handles become invalid. In flight dm-crypt operations
> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend, reboot. If the volume
"... resume from suspend or reboot."
> +impacted by a wrapping key restore failure is a data-volume then it is
> +possible that I/O errors on that volume do not bring down the rest of the
> +system. However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new wrapping key at initial boot, not any time after like
> +resume from suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private wrapping key state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
> +The backup / restore facility is also not performant enough to be suitable
> +for guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> +  restriction failure, the instruction fails with setting RFLAGS.ZF. The
> +  crypto library implementation includes the flag check to return -EINVAL.
> +  Note that this flag is also set if the wrapping key is changed, e.g.,
> +  because of the backup error.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> +  AES-KL instruction to process an 192-bit key. The AES-KL cipher
> +  implementation logs a warning message with a 192-bit key and then falls
> +  back to AES-NI. So, this 192-bit key-size limitation is only documented,
> +  not enforced. It means the key will remain in clear-text in memory. This
> +  is to meet Linux crypto-cipher expectation that each implementation must
> +  support all the AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> +  overhead when compared with AES-NI instructions.
> +

The rest is LGTM, thanks!

Anyway,

Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>

-- 
An old man doll... just what I always wanted! - Clara

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 01/12] Documentation/x86: Document Key Locker
  2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
@ 2023-06-06  2:17           ` Randy Dunlap
  -1 siblings, 0 replies; 247+ messages in thread
From: Randy Dunlap @ 2023-06-06  2:17 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	Ingo Molnar, H. Peter Anvin, Jonathan Corbet, linux-doc



On 6/3/23 08:22, Chang S. Bae wrote:
> Document the overview of the feature along with relevant consideration
> when provisioning dm-crypt volumes with AES-KL instead of AES-NI.
> 
> ---
> ---
>  Documentation/arch/x86/index.rst     |  1 +
>  Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
>  2 files changed, 98 insertions(+)
>  create mode 100644 Documentation/arch/x86/keylocker.rst
> 

> diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
> new file mode 100644
> index 000000000000..5557b8d0659a
> --- /dev/null
> +++ b/Documentation/arch/x86/keylocker.rst
> @@ -0,0 +1,97 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature to reduce key exfiltration opportunities
> +while maintaining a programming interface similar to AES-NI. It
> +converts the AES key into an encoded form, called the 'key handle'.
> +The key handle is a wrapped version of the clear-text key where the
> +wrapping key has limited exposure. Once converted, all subsequent data
> +encryption using new AES instructions (AES-KL) uses this key handle,
> +reducing the exposure of private key material in memory.
> +
> +CPU-internal Wrapping Key
> +=========================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +Wrapping Key Restore Failure
> +----------------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +The event of a wrapping key restore failure upon resume from suspend, all

   Upon the event of a ...

> +established key handles become invalid. In flight dm-crypt operations
> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend, reboot. If the volume
> +impacted by a wrapping key restore failure is a data-volume then it is

                                                   data volume

> +possible that I/O errors on that volume do not bring down the rest of the
> +system. However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new wrapping key at initial boot, not any time after like
> +resume from suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private wrapping key state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
> +The backup / restore facility is also not performant enough to be suitable
> +for guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> +  restriction failure, the instruction fails with setting RFLAGS.ZF. The
> +  crypto library implementation includes the flag check to return -EINVAL.
> +  Note that this flag is also set if the wrapping key is changed, e.g.,
> +  because of the backup error.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> +  AES-KL instruction to process an 192-bit key. The AES-KL cipher
> +  implementation logs a warning message with a 192-bit key and then falls
> +  back to AES-NI. So, this 192-bit key-size limitation is only documented,

Is it logged anywhere?  i.e., a kernel log message?

> +  not enforced. It means the key will remain in clear-text in memory. This
> +  is to meet Linux crypto-cipher expectation that each implementation must
> +  support all the AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> +  overhead when compared with AES-NI instructions.
> +

-- 
~Randy

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 01/12] Documentation/x86: Document Key Locker
@ 2023-06-06  2:17           ` Randy Dunlap
  0 siblings, 0 replies; 247+ messages in thread
From: Randy Dunlap @ 2023-06-06  2:17 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott



On 6/3/23 08:22, Chang S. Bae wrote:
> Document the overview of the feature along with relevant consideration
> when provisioning dm-crypt volumes with AES-KL instead of AES-NI.
> 
> ---
> ---
>  Documentation/arch/x86/index.rst     |  1 +
>  Documentation/arch/x86/keylocker.rst | 97 ++++++++++++++++++++++++++++
>  2 files changed, 98 insertions(+)
>  create mode 100644 Documentation/arch/x86/keylocker.rst
> 

> diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
> new file mode 100644
> index 000000000000..5557b8d0659a
> --- /dev/null
> +++ b/Documentation/arch/x86/keylocker.rst
> @@ -0,0 +1,97 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature to reduce key exfiltration opportunities
> +while maintaining a programming interface similar to AES-NI. It
> +converts the AES key into an encoded form, called the 'key handle'.
> +The key handle is a wrapped version of the clear-text key where the
> +wrapping key has limited exposure. Once converted, all subsequent data
> +encryption using new AES instructions (AES-KL) uses this key handle,
> +reducing the exposure of private key material in memory.
> +
> +CPU-internal Wrapping Key
> +=========================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +Wrapping Key Restore Failure
> +----------------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +The event of a wrapping key restore failure upon resume from suspend, all

   Upon the event of a ...

> +established key handles become invalid. In flight dm-crypt operations
> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend, reboot. If the volume
> +impacted by a wrapping key restore failure is a data-volume then it is

                                                   data volume

> +possible that I/O errors on that volume do not bring down the rest of the
> +system. However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new wrapping key at initial boot, not any time after like
> +resume from suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private wrapping key state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
> +The backup / restore facility is also not performant enough to be suitable
> +for guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> +  restriction failure, the instruction fails with setting RFLAGS.ZF. The
> +  crypto library implementation includes the flag check to return -EINVAL.
> +  Note that this flag is also set if the wrapping key is changed, e.g.,
> +  because of the backup error.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> +  AES-KL instruction to process an 192-bit key. The AES-KL cipher
> +  implementation logs a warning message with a 192-bit key and then falls
> +  back to AES-NI. So, this 192-bit key-size limitation is only documented,

Is it logged anywhere?  i.e., a kernel log message?

> +  not enforced. It means the key will remain in clear-text in memory. This
> +  is to meet Linux crypto-cipher expectation that each implementation must
> +  support all the AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> +  overhead when compared with AES-NI instructions.
> +

-- 
~Randy

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 01/12] Documentation/x86: Document Key Locker
  2023-06-06  2:17           ` [dm-devel] " Randy Dunlap
@ 2023-06-06  4:18             ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-06  4:18 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, elliott, gmazyland, luto, dave.hansen, tglx, bp, mingo,
	x86, herbert, ardb, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, lalithambika.krishnakumar, nhuck,
	Ingo Molnar, H. Peter Anvin, Jonathan Corbet, linux-doc

On 6/5/2023 7:17 PM, Randy Dunlap wrote:
> On 6/3/23 08:22, Chang S. Bae wrote:
>> +
>> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
>> +  AES-KL instruction to process an 192-bit key. The AES-KL cipher
>> +  implementation logs a warning message with a 192-bit key and then falls
>> +  back to AES-NI. So, this 192-bit key-size limitation is only documented,
> 
> Is it logged anywhere?  i.e., a kernel log message?

Yes, this is the relevant change in the last patch:

 > +static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const 
u8 *in_key,
 > +			unsigned int keylen)
 > +{
...
 > +	if (unlikely(keylen == AES_KEYSIZE_192)) {
 > +		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
...
 > +}

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 01/12] Documentation/x86: Document Key Locker
@ 2023-06-06  4:18             ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-06  4:18 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel, linux-crypto, dm-devel
  Cc: x86, herbert, linux-doc, Jonathan Corbet, ardb, dave.hansen,
	dan.j.williams, mingo, ebiggers, lalithambika.krishnakumar,
	Ingo Molnar, bp, charishma1.gairuboyina, luto, H. Peter Anvin,
	bernie.keany, tglx, nhuck, gmazyland, elliott

On 6/5/2023 7:17 PM, Randy Dunlap wrote:
> On 6/3/23 08:22, Chang S. Bae wrote:
>> +
>> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
>> +  AES-KL instruction to process an 192-bit key. The AES-KL cipher
>> +  implementation logs a warning message with a 192-bit key and then falls
>> +  back to AES-NI. So, this 192-bit key-size limitation is only documented,
> 
> Is it logged anywhere?  i.e., a kernel log message?

Yes, this is the relevant change in the last patch:

 > +static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const 
u8 *in_key,
 > +			unsigned int keylen)
 > +{
...
 > +	if (unlikely(keylen == AES_KEYSIZE_192)) {
 > +		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
...
 > +}

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
  2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
@ 2023-06-07  5:35           ` Eric Biggers
  -1 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-06-07  5:35 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On Sat, Jun 03, 2023 at 08:22:27AM -0700, Chang S. Bae wrote:
> == Key Handle Restriction ==
> 
> A key handle may be encoded with some restrictions.

It's unclear what this means.  Please avoid passive tense and the word "may"
like this.  I think you mean something like "The AES-KL instruction set supports
selecting key usage restrictions at key handle creation time."

> Restrict every handle only available in kernel mode via setkey().

I think you mean something like "Restrict all key handles created by the kernel
to kernel mode use only."

Can you also mention why you are doing this?  I suppose it might as well be
done, but I'm not seeing how it would actually matter.

What other sorts of key usage restrictions does AES-KL support?  Are any other
ones useful here?

> Subsequently the key handle could be corrupted or fail with handle
> restrictions. Then, encrypt()/decrypt() returns -EINVAL.

Aren't these scenarios actually impossible?  At least without memory corruption.

> == Userspace Exposition ==
> 
> Some hardware implementations may have some performance penalties.

Likewise, please avoid vague statements like this.  This makes it unclear
whether this is something that happens in the real world or whether it's just
theoretical.  You indeed have actual benchmark results that show that AES-KL is
much slower than AES-NI on current CPUs, right?

> But, for disk encryption, storage bandwidth may be the bottleneck before
> encryption bandwidth.

Again, please try to be less vague.  E.g. "With a slow storage device, storage
bandwidth is the bottleneck, even if disk encryption is enabled..."

> Thus, advertise it with a unique name 'xts-aes-aeskl' in /proc/crypto while
> not replacing AES-NI under the generic name 'xts(aes)' with a lower priority.

The above sentence seems to say that xts-aes-aeskl does *not* have a lower
priority than xts-aes-aesni.  But actually it does.

> Then, the performance is unlikely better than 64-bit which has already a gap
> vs. AES-NI.

I don't understand what this sentence is trying to say.

> diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
> index 9bbfd01cfa2f..658adfd7aebf 100644
> --- a/arch/x86/crypto/Kconfig
> +++ b/arch/x86/crypto/Kconfig
> @@ -2,6 +2,11 @@
>  
>  menu "Accelerated Cryptographic Algorithms for CPU (x86)"
>  
> +config AS_HAS_KEYLOCKER
> +	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
> +	help
> +	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12

It looks like arch/x86/Kconfig.assembler would be a better place for this.

> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
> new file mode 100644
> index 000000000000..61addc61dd4e
> --- /dev/null
> +++ b/arch/x86/crypto/aeskl-intel_asm.S
> @@ -0,0 +1,552 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Implement AES algorithm using AES Key Locker instructions.
> + *
> + * Most code is based from the AES-NI implementation, aesni-intel_asm.S
> + *
> + */
> +
> +#include <linux/linkage.h>
> +#include <linux/cfi_types.h>
> +#include <asm/errno.h>
> +#include <asm/inst.h>
> +#include <asm/frame.h>
> +#include "aes-helper_asm.S"
> +
> +.text
> +
> +#define STATE1	%xmm0
> +#define STATE2	%xmm1
> +#define STATE3	%xmm2
> +#define STATE4	%xmm3
> +#define STATE5	%xmm4
> +#define STATE6	%xmm5
> +#define STATE7	%xmm6
> +#define STATE8	%xmm7
> +#define STATE	STATE1
> +
> +#define IV	%xmm9
> +#define KEY	%xmm10
> +#define INC	%xmm13
> +
> +#define IN1	%xmm8
> +#define IN	IN1

Why do both IN1 and IN exist?  Shouldn't there just be IN?

> +
> +#define AREG	%rax

Shouldn't %rax just be hardcoded?

> +#define HANDLEP	%rdi

This should be called CTX, to match the function prototypes.

> +#define UKEYP	OUTP

This should be called IN_KEY, to match the function prototypes.

> +#define GF128MUL_MASK %xmm11
> +
> +/*
> + * int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
> + */
> +SYM_FUNC_START(__aeskl_setkey)
> +	FRAME_BEGIN
> +	movl %edx, 480(HANDLEP)
> +	movdqu (UKEYP), STATE1
> +	mov $1, %eax
> +	cmp $16, %dl
> +	je .Lsetkey_128
> +
> +	movdqu 0x10(UKEYP), STATE2
> +	encodekey256 %eax, %eax
> +	movdqu STATE4, 0x30(HANDLEP)
> +	jmp .Lsetkey_end
> +.Lsetkey_128:
> +	encodekey128 %eax, %eax
> +
> +.Lsetkey_end:
> +	movdqu STATE1, (HANDLEP)
> +	movdqu STATE2, 0x10(HANDLEP)
> +	movdqu STATE3, 0x20(HANDLEP)

The moves to the ctx should use movdqa, since it is aligned.

> +
> +	xor AREG, AREG
> +	FRAME_END
> +	RET
> +SYM_FUNC_END(__aeskl_setkey)

This function always returns 0, so it really should return void.

> +/*
> + * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
> + */
> +SYM_FUNC_START(__aeskl_enc)
> +	FRAME_BEGIN
> +	movdqu (INP), STATE
> +	movl 480(HANDLEP), KLEN
> +
> +	cmp $16, KLEN
> +	je .Lenc_128
> +	aesenc256kl (HANDLEP), STATE
> +	jz .Lenc_err
> +	jmp .Lenc_noerr
> +.Lenc_128:
> +	aesenc128kl (HANDLEP), STATE
> +	jz .Lenc_err
> +
> +.Lenc_noerr:
> +	xor AREG, AREG
> +	jmp .Lenc_end
> +.Lenc_err:
> +	mov $(-EINVAL), AREG
> +.Lenc_end:
> +	movdqu STATE, (OUTP)
> +	FRAME_END
> +	RET
> +SYM_FUNC_END(__aeskl_enc)

In the common case (successful AES-256 encryption) this is executing 'jmp'
twice.  I think the code should be rearranged to eliminate these jmps.

> +/*
> + * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
> + *			   const u8 *src, unsigned int len, le128 *iv)
> + */
> +SYM_FUNC_START(__aeskl_xts_encrypt)

__aeskl_xts_encrypt() and __aeskl_xts_decrypt() are very similar.  To reduce
code duplication, can you consider generating them from a macro that takes an
argument that indicates whether it is encrypt or decrypt?

> +static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
> +			unsigned int keylen)
> +{
> +	struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
> +	int err;
> +
> +	if (!crypto_simd_usable())
> +		return -EBUSY;
> +
> +	if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
> +	    keylen != AES_KEYSIZE_256)
> +		return -EINVAL;
> +
> +	kernel_fpu_begin();
> +	if (unlikely(keylen == AES_KEYSIZE_192)) {
> +		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
> +		err = aesni_set_key(ctx, in_key, keylen);
> +	} else {
> +		if (!valid_keylocker())
> +			err = -ENODEV;
> +		else
> +			err = __aeskl_setkey(ctx, in_key, keylen);
> +	}
> +	kernel_fpu_end();
> +
> +	return err;
> +}
[...]
> +			.cra_ctxsize		= XTS_AES_CTX_SIZE,
[...]

Something that your AES-KL code does that's a bit ugly is that it abuses
'struct crypto_aes_ctx' to store a Keylocker key handle instead
of the actual AES key schedule which the struct is supposed to be for.

The proper way to represent that would be to make the tfm context for
xts-aes-aeskl be a union of crypto_aes_ctx and a Keylocker specific context.

If you don't do that and instead keep the proposed workaround, then please add a
comment somewhere that very clearly explains how the struct is being used.
Above aeskl_setkey() or above .cra_ctxsize might be a good place.

> diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
> new file mode 100644
> index 000000000000..5b1919f49efe
> --- /dev/null
> +++ b/arch/x86/crypto/aesni-intel_glue.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Support for Intel AES-NI instructions. This file contains function
> + * prototypes to be referenced for other AES implementations
> + */

It would be helpful if this comment was more concrete, like "These are AES-NI
functions that are used by the AES-KL code as a fallback when it is given a
192-bit key.  Key Locker does not support 192-bit keys."

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
@ 2023-06-07  5:35           ` Eric Biggers
  0 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2023-06-07  5:35 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On Sat, Jun 03, 2023 at 08:22:27AM -0700, Chang S. Bae wrote:
> == Key Handle Restriction ==
> 
> A key handle may be encoded with some restrictions.

It's unclear what this means.  Please avoid passive tense and the word "may"
like this.  I think you mean something like "The AES-KL instruction set supports
selecting key usage restrictions at key handle creation time."

> Restrict every handle only available in kernel mode via setkey().

I think you mean something like "Restrict all key handles created by the kernel
to kernel mode use only."

Can you also mention why you are doing this?  I suppose it might as well be
done, but I'm not seeing how it would actually matter.

What other sorts of key usage restrictions does AES-KL support?  Are any other
ones useful here?

> Subsequently the key handle could be corrupted or fail with handle
> restrictions. Then, encrypt()/decrypt() returns -EINVAL.

Aren't these scenarios actually impossible?  At least without memory corruption.

> == Userspace Exposition ==
> 
> Some hardware implementations may have some performance penalties.

Likewise, please avoid vague statements like this.  This makes it unclear
whether this is something that happens in the real world or whether it's just
theoretical.  You indeed have actual benchmark results that show that AES-KL is
much slower than AES-NI on current CPUs, right?

> But, for disk encryption, storage bandwidth may be the bottleneck before
> encryption bandwidth.

Again, please try to be less vague.  E.g. "With a slow storage device, storage
bandwidth is the bottleneck, even if disk encryption is enabled..."

> Thus, advertise it with a unique name 'xts-aes-aeskl' in /proc/crypto while
> not replacing AES-NI under the generic name 'xts(aes)' with a lower priority.

The above sentence seems to say that xts-aes-aeskl does *not* have a lower
priority than xts-aes-aesni.  But actually it does.

> Then, the performance is unlikely better than 64-bit which has already a gap
> vs. AES-NI.

I don't understand what this sentence is trying to say.

> diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
> index 9bbfd01cfa2f..658adfd7aebf 100644
> --- a/arch/x86/crypto/Kconfig
> +++ b/arch/x86/crypto/Kconfig
> @@ -2,6 +2,11 @@
>  
>  menu "Accelerated Cryptographic Algorithms for CPU (x86)"
>  
> +config AS_HAS_KEYLOCKER
> +	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
> +	help
> +	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12

It looks like arch/x86/Kconfig.assembler would be a better place for this.

> diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
> new file mode 100644
> index 000000000000..61addc61dd4e
> --- /dev/null
> +++ b/arch/x86/crypto/aeskl-intel_asm.S
> @@ -0,0 +1,552 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Implement AES algorithm using AES Key Locker instructions.
> + *
> + * Most code is based from the AES-NI implementation, aesni-intel_asm.S
> + *
> + */
> +
> +#include <linux/linkage.h>
> +#include <linux/cfi_types.h>
> +#include <asm/errno.h>
> +#include <asm/inst.h>
> +#include <asm/frame.h>
> +#include "aes-helper_asm.S"
> +
> +.text
> +
> +#define STATE1	%xmm0
> +#define STATE2	%xmm1
> +#define STATE3	%xmm2
> +#define STATE4	%xmm3
> +#define STATE5	%xmm4
> +#define STATE6	%xmm5
> +#define STATE7	%xmm6
> +#define STATE8	%xmm7
> +#define STATE	STATE1
> +
> +#define IV	%xmm9
> +#define KEY	%xmm10
> +#define INC	%xmm13
> +
> +#define IN1	%xmm8
> +#define IN	IN1

Why do both IN1 and IN exist?  Shouldn't there just be IN?

> +
> +#define AREG	%rax

Shouldn't %rax just be hardcoded?

> +#define HANDLEP	%rdi

This should be called CTX, to match the function prototypes.

> +#define UKEYP	OUTP

This should be called IN_KEY, to match the function prototypes.

> +#define GF128MUL_MASK %xmm11
> +
> +/*
> + * int __aeskl_setkey(struct crypto_aes_ctx *ctx, const u8 *in_key, unsigned int key_len)
> + */
> +SYM_FUNC_START(__aeskl_setkey)
> +	FRAME_BEGIN
> +	movl %edx, 480(HANDLEP)
> +	movdqu (UKEYP), STATE1
> +	mov $1, %eax
> +	cmp $16, %dl
> +	je .Lsetkey_128
> +
> +	movdqu 0x10(UKEYP), STATE2
> +	encodekey256 %eax, %eax
> +	movdqu STATE4, 0x30(HANDLEP)
> +	jmp .Lsetkey_end
> +.Lsetkey_128:
> +	encodekey128 %eax, %eax
> +
> +.Lsetkey_end:
> +	movdqu STATE1, (HANDLEP)
> +	movdqu STATE2, 0x10(HANDLEP)
> +	movdqu STATE3, 0x20(HANDLEP)

The moves to the ctx should use movdqa, since it is aligned.

> +
> +	xor AREG, AREG
> +	FRAME_END
> +	RET
> +SYM_FUNC_END(__aeskl_setkey)

This function always returns 0, so it really should return void.

> +/*
> + * int __aeskl_enc(const void *ctx, u8 *dst, const u8 *src)
> + */
> +SYM_FUNC_START(__aeskl_enc)
> +	FRAME_BEGIN
> +	movdqu (INP), STATE
> +	movl 480(HANDLEP), KLEN
> +
> +	cmp $16, KLEN
> +	je .Lenc_128
> +	aesenc256kl (HANDLEP), STATE
> +	jz .Lenc_err
> +	jmp .Lenc_noerr
> +.Lenc_128:
> +	aesenc128kl (HANDLEP), STATE
> +	jz .Lenc_err
> +
> +.Lenc_noerr:
> +	xor AREG, AREG
> +	jmp .Lenc_end
> +.Lenc_err:
> +	mov $(-EINVAL), AREG
> +.Lenc_end:
> +	movdqu STATE, (OUTP)
> +	FRAME_END
> +	RET
> +SYM_FUNC_END(__aeskl_enc)

In the common case (successful AES-256 encryption) this is executing 'jmp'
twice.  I think the code should be rearranged to eliminate these jmps.

> +/*
> + * int __aeskl_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
> + *			   const u8 *src, unsigned int len, le128 *iv)
> + */
> +SYM_FUNC_START(__aeskl_xts_encrypt)

__aeskl_xts_encrypt() and __aeskl_xts_decrypt() are very similar.  To reduce
code duplication, can you consider generating them from a macro that takes an
argument that indicates whether it is encrypt or decrypt?

> +static int aeskl_setkey(struct crypto_tfm *tfm, void *raw_ctx, const u8 *in_key,
> +			unsigned int keylen)
> +{
> +	struct crypto_aes_ctx *ctx = (struct crypto_aes_ctx *)raw_ctx;
> +	int err;
> +
> +	if (!crypto_simd_usable())
> +		return -EBUSY;
> +
> +	if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 &&
> +	    keylen != AES_KEYSIZE_256)
> +		return -EINVAL;
> +
> +	kernel_fpu_begin();
> +	if (unlikely(keylen == AES_KEYSIZE_192)) {
> +		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
> +		err = aesni_set_key(ctx, in_key, keylen);
> +	} else {
> +		if (!valid_keylocker())
> +			err = -ENODEV;
> +		else
> +			err = __aeskl_setkey(ctx, in_key, keylen);
> +	}
> +	kernel_fpu_end();
> +
> +	return err;
> +}
[...]
> +			.cra_ctxsize		= XTS_AES_CTX_SIZE,
[...]

Something that your AES-KL code does that's a bit ugly is that it abuses
'struct crypto_aes_ctx' to store a Keylocker key handle instead
of the actual AES key schedule which the struct is supposed to be for.

The proper way to represent that would be to make the tfm context for
xts-aes-aeskl be a union of crypto_aes_ctx and a Keylocker specific context.

If you don't do that and instead keep the proposed workaround, then please add a
comment somewhere that very clearly explains how the struct is being used.
Above aeskl_setkey() or above .cra_ctxsize might be a good place.

> diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
> new file mode 100644
> index 000000000000..5b1919f49efe
> --- /dev/null
> +++ b/arch/x86/crypto/aesni-intel_glue.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +/*
> + * Support for Intel AES-NI instructions. This file contains function
> + * prototypes to be referenced for other AES implementations
> + */

It would be helpful if this comment was more concrete, like "These are AES-NI
functions that are used by the AES-KL code as a fallback when it is given a
192-bit key.  Key Locker does not support 192-bit keys."

- Eric

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
  2023-06-07  5:35           ` [dm-devel] " Eric Biggers
@ 2023-06-07 22:06             ` Chang S. Bae
  -1 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-07 22:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, elliott, gmazyland, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, lalithambika.krishnakumar,
	nhuck, David S. Miller, Ingo Molnar, H. Peter Anvin

On 6/6/2023 10:35 PM, Eric Biggers wrote:
> On Sat, Jun 03, 2023 at 08:22:27AM -0700, Chang S. Bae wrote:
> 
> Can you also mention why you are doing this?  I suppose it might as well be
> done, but I'm not seeing how it would actually matter.

While this crypto implementation is in the kernel mode, userspace can 
call it:
     https://docs.kernel.org/crypto/userspace-if.html

And those AES instructions are executable in userspace.

Say someone takes a key handle out of the kernel code and then decrypts 
some disk image from userspace. At least, this is enforced not to do.

> What other sorts of key usage restrictions does AES-KL support?  Are any other
> ones useful here?

Besides this, there are additional bits to restrict using encryption and 
decryption respectively.

This can be found in Section 1.1.1.1 'Handle Restrictions' in its 
whitepaper:
 
https://www.intel.com/content/www/us/en/develop/download/intel-key-locker-specification.html

>> Subsequently the key handle could be corrupted or fail with handle
>> restrictions. Then, encrypt()/decrypt() returns -EINVAL.
> 
> Aren't these scenarios actually impossible?  At least without memory corruption.

Yes, in the dm-crypt path, I think. But, the key handle can be tainted 
in the userspace -> API path.

I think this may help users as this feature can do some integrity checks 
at first and then populate an error right away if it goes wrong.
>> Thus, advertise it with a unique name 'xts-aes-aeskl' in /proc/crypto while
>> not replacing AES-NI under the generic name 'xts(aes)' with a lower priority.
> 
> The above sentence seems to say that xts-aes-aeskl does *not* have a lower
> priority than xts-aes-aesni.  But actually it does.

No, it does not say that. This needs to call out the latter part more 
clearly.

>> Then, the performance is unlikely better than 64-bit which has already a gap
>> vs. AES-NI.
> 
> I don't understand what this sentence is trying to say.

This is in another section for explaining why 64-bitness only. I kinda 
added another point to avoid 32-bit code. But, anyways it is known that 
32-bit kernel mode is being deprecated. Then, the 128-bit register story 
seems to be enough there.

>> +config AS_HAS_KEYLOCKER
>> +	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
>> +	help
>> +	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
> 
> It looks like arch/x86/Kconfig.assembler would be a better place for this.

Yeah, the commit 5e8ebd841a44 ("x86: probe assembler capabilities via 
kconfig instead of makefile") moved those over there.

>> +
>> +#define IN1	%xmm8
>> +#define IN	IN1
> 
> Why do both IN1 and IN exist?  Shouldn't there just be IN?

Oh, this is a silly leftover from the CBC code as it has multiple inputs.

#define IN %xmm8 then, s/IN1/IN/g

>> +
>> +#define AREG	%rax
> 
> Shouldn't %rax just be hardcoded?

I thought this (or any other) renaming helps to read. Maybe I'm missing 
something. Can I get to know your thought on this?

>> +#define HANDLEP	%rdi
> 
> This should be called CTX, to match the function prototypes.
> 
>> +#define UKEYP	OUTP
> 
> This should be called IN_KEY, to match the function prototypes.

Okay. But, OTOH, the prototype itself is somewhat generic. Then its 
argument naming does not always match with what is supposed to be meant 
in the code. Thus, AES-NI renamed those like

     ctx    -> KEYP
     in_key -> UKEY
     ...

So, another option can be leaving some comments there, e.g. '# ctx is 
renamed to KEYP'.

>> +
>> +.Lsetkey_end:
>> +	movdqu STATE1, (HANDLEP)
>> +	movdqu STATE2, 0x10(HANDLEP)
>> +	movdqu STATE3, 0x20(HANDLEP)
> 
> The moves to the ctx should use movdqa, since it is aligned.

Reading the manual, the difference is whether generating #GP or not when 
any misaligned memory operand comes. Then, MOVDQA all here seems to be 
saying please check the alignment every time.

But, HANDLEP is known to have an aligned address. Then, the plain move 
seems to be enough and coherent with the glue code -- avoid unnecessary 
sanity checks.

>> +
>> +	xor AREG, AREG
>> +	FRAME_END
>> +	RET
>> +SYM_FUNC_END(__aeskl_setkey)
> 
> This function always returns 0, so it really should return void.

Yeah, fair enough.

> In the common case (successful AES-256 encryption) this is executing 'jmp'
> twice.  I think the code should be rearranged to eliminate these jmps.

Ah, right. I think a good point! Let me tweak this for those most likely 
cases.

> __aeskl_xts_encrypt() and __aeskl_xts_decrypt() are very similar.  To reduce
> code duplication, can you consider generating them from a macro that takes an
> argument that indicates whether it is encrypt or decrypt?

Yeah, I can see the code that prepares operands is common between them. 
But, I'm not sure folding them together can make it more readable.

> Something that your AES-KL code does that's a bit ugly is that it abuses
> 'struct crypto_aes_ctx' to store a Keylocker key handle instead
> of the actual AES key schedule which the struct is supposed to be for.
> 
> The proper way to represent that would be to make the tfm context for
> xts-aes-aeskl be a union of crypto_aes_ctx and a Keylocker specific context.

Agreed. I think this is likely the fallout of that struct aesni_xts_ctx 
fix. Previously, the field was a byte array which itself is not 
necessarily representing the extended-key format. Now the fix changed it 
to be more specific. Accordingly, Key Locker has to specify it.

Thanks,
Chang


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [dm-devel] [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm
@ 2023-06-07 22:06             ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2023-06-07 22:06 UTC (permalink / raw)
  To: Eric Biggers
  Cc: x86, herbert, David S. Miller, ardb, dave.hansen, dan.j.williams,
	linux-kernel, charishma1.gairuboyina, mingo,
	lalithambika.krishnakumar, dm-devel, Ingo Molnar, bp,
	linux-crypto, luto, H. Peter Anvin, bernie.keany, tglx, nhuck,
	gmazyland, elliott

On 6/6/2023 10:35 PM, Eric Biggers wrote:
> On Sat, Jun 03, 2023 at 08:22:27AM -0700, Chang S. Bae wrote:
> 
> Can you also mention why you are doing this?  I suppose it might as well be
> done, but I'm not seeing how it would actually matter.

While this crypto implementation is in the kernel mode, userspace can 
call it:
     https://docs.kernel.org/crypto/userspace-if.html

And those AES instructions are executable in userspace.

Say someone takes a key handle out of the kernel code and then decrypts 
some disk image from userspace. At least, this is enforced not to do.

> What other sorts of key usage restrictions does AES-KL support?  Are any other
> ones useful here?

Besides this, there are additional bits to restrict using encryption and 
decryption respectively.

This can be found in Section 1.1.1.1 'Handle Restrictions' in its 
whitepaper:
 
https://www.intel.com/content/www/us/en/develop/download/intel-key-locker-specification.html

>> Subsequently the key handle could be corrupted or fail with handle
>> restrictions. Then, encrypt()/decrypt() returns -EINVAL.
> 
> Aren't these scenarios actually impossible?  At least without memory corruption.

Yes, in the dm-crypt path, I think. But, the key handle can be tainted 
in the userspace -> API path.

I think this may help users as this feature can do some integrity checks 
at first and then populate an error right away if it goes wrong.
>> Thus, advertise it with a unique name 'xts-aes-aeskl' in /proc/crypto while
>> not replacing AES-NI under the generic name 'xts(aes)' with a lower priority.
> 
> The above sentence seems to say that xts-aes-aeskl does *not* have a lower
> priority than xts-aes-aesni.  But actually it does.

No, it does not say that. This needs to call out the latter part more 
clearly.

>> Then, the performance is unlikely better than 64-bit which has already a gap
>> vs. AES-NI.
> 
> I don't understand what this sentence is trying to say.

This is in another section for explaining why 64-bitness only. I kinda 
added another point to avoid 32-bit code. But, anyways it is known that 
32-bit kernel mode is being deprecated. Then, the 128-bit register story 
seems to be enough there.

>> +config AS_HAS_KEYLOCKER
>> +	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
>> +	help
>> +	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
> 
> It looks like arch/x86/Kconfig.assembler would be a better place for this.

Yeah, the commit 5e8ebd841a44 ("x86: probe assembler capabilities via 
kconfig instead of makefile") moved those over there.

>> +
>> +#define IN1	%xmm8
>> +#define IN	IN1
> 
> Why do both IN1 and IN exist?  Shouldn't there just be IN?

Oh, this is a silly leftover from the CBC code as it has multiple inputs.

#define IN %xmm8 then, s/IN1/IN/g

>> +
>> +#define AREG	%rax
> 
> Shouldn't %rax just be hardcoded?

I thought this (or any other) renaming helps to read. Maybe I'm missing 
something. Can I get to know your thought on this?

>> +#define HANDLEP	%rdi
> 
> This should be called CTX, to match the function prototypes.
> 
>> +#define UKEYP	OUTP
> 
> This should be called IN_KEY, to match the function prototypes.

Okay. But, OTOH, the prototype itself is somewhat generic. Then its 
argument naming does not always match with what is supposed to be meant 
in the code. Thus, AES-NI renamed those like

     ctx    -> KEYP
     in_key -> UKEY
     ...

So, another option can be leaving some comments there, e.g. '# ctx is 
renamed to KEYP'.

>> +
>> +.Lsetkey_end:
>> +	movdqu STATE1, (HANDLEP)
>> +	movdqu STATE2, 0x10(HANDLEP)
>> +	movdqu STATE3, 0x20(HANDLEP)
> 
> The moves to the ctx should use movdqa, since it is aligned.

Reading the manual, the difference is whether generating #GP or not when 
any misaligned memory operand comes. Then, MOVDQA all here seems to be 
saying please check the alignment every time.

But, HANDLEP is known to have an aligned address. Then, the plain move 
seems to be enough and coherent with the glue code -- avoid unnecessary 
sanity checks.

>> +
>> +	xor AREG, AREG
>> +	FRAME_END
>> +	RET
>> +SYM_FUNC_END(__aeskl_setkey)
> 
> This function always returns 0, so it really should return void.

Yeah, fair enough.

> In the common case (successful AES-256 encryption) this is executing 'jmp'
> twice.  I think the code should be rearranged to eliminate these jmps.

Ah, right. I think a good point! Let me tweak this for those most likely 
cases.

> __aeskl_xts_encrypt() and __aeskl_xts_decrypt() are very similar.  To reduce
> code duplication, can you consider generating them from a macro that takes an
> argument that indicates whether it is encrypt or decrypt?

Yeah, I can see the code that prepares operands is common between them. 
But, I'm not sure folding them together can make it more readable.

> Something that your AES-KL code does that's a bit ugly is that it abuses
> 'struct crypto_aes_ctx' to store a Keylocker key handle instead
> of the actual AES key schedule which the struct is supposed to be for.
> 
> The proper way to represent that would be to make the tfm context for
> xts-aes-aeskl be a union of crypto_aes_ctx and a Keylocker specific context.

Agreed. I think this is likely the fallout of that struct aesni_xts_ctx 
fix. Previously, the field was a byte array which itself is not 
necessarily representing the extended-key format. Now the fix changed it 
to be more specific. Accordingly, Key Locker has to specify it.

Thanks,
Chang

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH] crypto: x86/aesni: Align the address before aes_set_key_common()
  2023-06-05  4:41                 ` Chang S. Bae
  (?)
@ 2023-06-21 12:06                 ` Chang S. Bae
  2023-07-14  8:51                   ` Herbert Xu
  -1 siblings, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2023-06-21 12:06 UTC (permalink / raw)
  To: linux-kernel, linux-crypto
  Cc: herbert, davem, ebiggers, x86, tglx, mingo, bp, dave.hansen, hpa,
	chang.seok.bae

aes_set_key_common() performs runtime alignment to the void *raw_ctx
pointer. This facilitates consistent access to the 16byte-aligned
address during key extension.

However, the alignment is already handlded in the GCM-related setkey
functions before invoking the common function. Consequently, the
alignment in the common function is unnecessary for those functions.

To establish a consistent approach throughout the glue code, remove
the aes_ctx() call from its current location. Instead, place it at
each call site where the runtime alignment is currently absent.

Link: https://lore.kernel.org/lkml/20230605024623.GA4653@quark.localdomain/
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: linux-crypto@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
The need for this fix was discovered during Eric's review of the Key
Locker series [1]. Considering the upstream code also requires this
improvement, this is applicable regardless of the Key Locker enabling
[2].

[1] https://lore.kernel.org/lkml/20230605024623.GA4653@quark.localdomain/
[2] https://lore.kernel.org/lkml/f1093780-cdda-35ec-3ef1-e5fab4139bef@intel.com/
---
 arch/x86/crypto/aesni-intel_glue.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index a5b0cb3efeba..c4eea7e746e7 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -229,10 +229,10 @@ static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 	return (struct crypto_aes_ctx *)ALIGN(addr, align);
 }
 
-static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
+static int aes_set_key_common(struct crypto_tfm *tfm,
+			      struct crypto_aes_ctx *ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
-	struct crypto_aes_ctx *ctx = aes_ctx(raw_ctx);
 	int err;
 
 	if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
@@ -253,7 +253,7 @@ static int aes_set_key_common(struct crypto_tfm *tfm, void *raw_ctx,
 static int aes_set_key(struct crypto_tfm *tfm, const u8 *in_key,
 		       unsigned int key_len)
 {
-	return aes_set_key_common(tfm, crypto_tfm_ctx(tfm), in_key, key_len);
+	return aes_set_key_common(tfm, aes_ctx(crypto_tfm_ctx(tfm)), in_key, key_len);
 }
 
 static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
@@ -286,7 +286,7 @@ static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			         unsigned int len)
 {
 	return aes_set_key_common(crypto_skcipher_tfm(tfm),
-				  crypto_skcipher_ctx(tfm), key, len);
+				  aes_ctx(crypto_skcipher_ctx(tfm)), key, len);
 }
 
 static int ecb_encrypt(struct skcipher_request *req)
@@ -893,13 +893,13 @@ static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 	keylen /= 2;
 
 	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_crypt_ctx,
+	err = aes_set_key_common(crypto_skcipher_tfm(tfm), aes_ctx(ctx->raw_crypt_ctx),
 				 key, keylen);
 	if (err)
 		return err;
 
 	/* second half of xts-key is for tweak */
-	return aes_set_key_common(crypto_skcipher_tfm(tfm), ctx->raw_tweak_ctx,
+	return aes_set_key_common(crypto_skcipher_tfm(tfm), aes_ctx(ctx->raw_tweak_ctx),
 				  key + keylen, keylen);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH] crypto: x86/aesni: Align the address before aes_set_key_common()
  2023-06-21 12:06                 ` [PATCH] crypto: x86/aesni: Align the address before aes_set_key_common() Chang S. Bae
@ 2023-07-14  8:51                   ` Herbert Xu
  0 siblings, 0 replies; 247+ messages in thread
From: Herbert Xu @ 2023-07-14  8:51 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, davem, ebiggers, x86, tglx, mingo,
	bp, dave.hansen, hpa

On Wed, Jun 21, 2023 at 05:06:53AM -0700, Chang S. Bae wrote:
> aes_set_key_common() performs runtime alignment to the void *raw_ctx
> pointer. This facilitates consistent access to the 16byte-aligned
> address during key extension.
> 
> However, the alignment is already handlded in the GCM-related setkey
> functions before invoking the common function. Consequently, the
> alignment in the common function is unnecessary for those functions.
> 
> To establish a consistent approach throughout the glue code, remove
> the aes_ctx() call from its current location. Instead, place it at
> each call site where the runtime alignment is currently absent.
> 
> Link: https://lore.kernel.org/lkml/20230605024623.GA4653@quark.localdomain/
> Suggested-by: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: linux-crypto@vger.kernel.org
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> The need for this fix was discovered during Eric's review of the Key
> Locker series [1]. Considering the upstream code also requires this
> improvement, this is applicable regardless of the Key Locker enabling
> [2].
> 
> [1] https://lore.kernel.org/lkml/20230605024623.GA4653@quark.localdomain/
> [2] https://lore.kernel.org/lkml/f1093780-cdda-35ec-3ef1-e5fab4139bef@intel.com/
> ---
>  arch/x86/crypto/aesni-intel_glue.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void
  2023-06-07  5:35           ` [dm-devel] " Eric Biggers
  (?)
  (?)
@ 2024-03-11 21:32           ` Chang S. Bae
  2024-03-12  2:15             ` Eric Biggers
                               ` (2 more replies)
  -1 siblings, 3 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-11 21:32 UTC (permalink / raw)
  To: linux-kernel, linux-crypto; +Cc: herbert, davem, ebiggers, x86, chang.seok.bae

The aesni_set_key() implementation has no error case, yet its prototype
specifies to return an error code.

Modify the function prototype to return void and adjust the related code.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
Previously, Eric identified a similar case in my AES-KL code [1]. Then,
this parallel issue was realized.

[1]: https://lore.kernel.org/lkml/20230607053558.GC941@sol.localdomain/
---
 arch/x86/crypto/aesni-intel_asm.S  | 5 ++---
 arch/x86/crypto/aesni-intel_glue.c | 6 +++---
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 411d8c83e88a..7ecb55cae3d6 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1820,8 +1820,8 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                    unsigned int key_len)
  */
 SYM_FUNC_START(aesni_set_key)
 	FRAME_BEGIN
@@ -1926,7 +1926,6 @@ SYM_FUNC_START(aesni_set_key)
 	sub $0x10, UKEYP
 	cmp TKEYP, KEYP
 	jb .Ldec_key_loop
-	xor AREG, AREG
 #ifndef __x86_64__
 	popl KEYP
 #endif
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index b1d90c25975a..c807b2f48ea3 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -87,8 +87,8 @@ static inline void *aes_align_addr(void *addr)
 	return PTR_ALIGN(addr, AESNI_ALIGN);
 }
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
+asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
 asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -241,7 +241,7 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
 		err = aes_expandkey(ctx, in_key, key_len);
 	else {
 		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
+		aesni_set_key(ctx, in_key, key_len);
 		kernel_fpu_end();
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void
  2024-03-11 21:32           ` [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void Chang S. Bae
@ 2024-03-12  2:15             ` Eric Biggers
  2024-03-12  7:46             ` Ard Biesheuvel
  2024-03-22 23:04             ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Chang S. Bae
  2 siblings, 0 replies; 247+ messages in thread
From: Eric Biggers @ 2024-03-12  2:15 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, linux-crypto, herbert, davem, x86

On Mon, Mar 11, 2024 at 02:32:32PM -0700, Chang S. Bae wrote:
> The aesni_set_key() implementation has no error case, yet its prototype
> specifies to return an error code.
> 
> Modify the function prototype to return void and adjust the related code.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: linux-crypto@vger.kernel.org
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Previously, Eric identified a similar case in my AES-KL code [1]. Then,
> this parallel issue was realized.
> 
> [1]: https://lore.kernel.org/lkml/20230607053558.GC941@sol.localdomain/
> ---
>  arch/x86/crypto/aesni-intel_asm.S  | 5 ++---
>  arch/x86/crypto/aesni-intel_glue.c | 6 +++---
>  2 files changed, 5 insertions(+), 6 deletions(-)

Reviewed-by: Eric Biggers <ebiggers@google.com>

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void
  2024-03-11 21:32           ` [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void Chang S. Bae
  2024-03-12  2:15             ` Eric Biggers
@ 2024-03-12  7:46             ` Ard Biesheuvel
  2024-03-12 15:03               ` Chang S. Bae
  2024-03-22 23:04             ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Chang S. Bae
  2 siblings, 1 reply; 247+ messages in thread
From: Ard Biesheuvel @ 2024-03-12  7:46 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, linux-crypto, herbert, davem, ebiggers, x86

On Mon, 11 Mar 2024 at 22:48, Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> The aesni_set_key() implementation has no error case, yet its prototype
> specifies to return an error code.
>
> Modify the function prototype to return void and adjust the related code.
>
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: linux-crypto@vger.kernel.org
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Previously, Eric identified a similar case in my AES-KL code [1]. Then,
> this parallel issue was realized.
>
> [1]: https://lore.kernel.org/lkml/20230607053558.GC941@sol.localdomain/
> ---
>  arch/x86/crypto/aesni-intel_asm.S  | 5 ++---
>  arch/x86/crypto/aesni-intel_glue.c | 6 +++---
>  2 files changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
> index 411d8c83e88a..7ecb55cae3d6 100644
> --- a/arch/x86/crypto/aesni-intel_asm.S
> +++ b/arch/x86/crypto/aesni-intel_asm.S
> @@ -1820,8 +1820,8 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
>  SYM_FUNC_END(_key_expansion_256b)
>
>  /*
> - * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> - *                   unsigned int key_len)
> + * void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> + *                    unsigned int key_len)
>   */
>  SYM_FUNC_START(aesni_set_key)
>         FRAME_BEGIN
> @@ -1926,7 +1926,6 @@ SYM_FUNC_START(aesni_set_key)
>         sub $0x10, UKEYP
>         cmp TKEYP, KEYP
>         jb .Ldec_key_loop
> -       xor AREG, AREG
>  #ifndef __x86_64__
>         popl KEYP
>  #endif
> diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
> index b1d90c25975a..c807b2f48ea3 100644
> --- a/arch/x86/crypto/aesni-intel_glue.c
> +++ b/arch/x86/crypto/aesni-intel_glue.c
> @@ -87,8 +87,8 @@ static inline void *aes_align_addr(void *addr)
>         return PTR_ALIGN(addr, AESNI_ALIGN);
>  }
>
> -asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> -                            unsigned int key_len);
> +asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
> +                             unsigned int key_len);
>  asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
>  asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
>  asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
> @@ -241,7 +241,7 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
>                 err = aes_expandkey(ctx, in_key, key_len);
>         else {
>                 kernel_fpu_begin();
> -               err = aesni_set_key(ctx, in_key, key_len);
> +               aesni_set_key(ctx, in_key, key_len);

This will leave 'err' uninitialized.

>                 kernel_fpu_end();
>         }
>
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void
  2024-03-12  7:46             ` Ard Biesheuvel
@ 2024-03-12 15:03               ` Chang S. Bae
  2024-03-12 15:18                 ` Ard Biesheuvel
  0 siblings, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2024-03-12 15:03 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-kernel, linux-crypto, herbert, davem, ebiggers, x86

On 3/12/2024 12:46 AM, Ard Biesheuvel wrote:
> On Mon, 11 Mar 2024 at 22:48, Chang S. Bae <chang.seok.bae@intel.com> wrote:
>>
>> @@ -241,7 +241,7 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
>>                  err = aes_expandkey(ctx, in_key, key_len);
>>          else {
>>                  kernel_fpu_begin();
>> -               err = aesni_set_key(ctx, in_key, key_len);
>> +               aesni_set_key(ctx, in_key, key_len);
> 
> This will leave 'err' uninitialized.

Ah, right. Thanks for catching it.

Also, upon reviewing aes_expandkey(), I noticed there's no error case, 
except for the key length sanity check.

While addressing this, perhaps additional cleanup is considerable like:

@@ -233,19 +233,20 @@ static int aes_set_key_common(struct 
crypto_aes_ctx *ctx,
  {
         int err;

-       if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
-           key_len != AES_KEYSIZE_256)
-               return -EINVAL;
+       err = aes_check_keylen(key_len);
+       if (err)
+               return err;

         if (!crypto_simd_usable())
-               err = aes_expandkey(ctx, in_key, key_len);
+               /* no error with a valid key length  */
+               aes_expandkey(ctx, in_key, key_len);
         else {
                 kernel_fpu_begin();
-               err = aesni_set_key(ctx, in_key, key_len);
+               aesni_set_key(ctx, in_key, key_len);
                 kernel_fpu_end();
         }

-       return err;
+       return 0;
  }

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void
  2024-03-12 15:03               ` Chang S. Bae
@ 2024-03-12 15:18                 ` Ard Biesheuvel
  2024-03-12 15:37                   ` Chang S. Bae
  0 siblings, 1 reply; 247+ messages in thread
From: Ard Biesheuvel @ 2024-03-12 15:18 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, linux-crypto, herbert, davem, ebiggers, x86

On Tue, 12 Mar 2024 at 16:03, Chang S. Bae <chang.seok.bae@intel.com> wrote:
>
> On 3/12/2024 12:46 AM, Ard Biesheuvel wrote:
> > On Mon, 11 Mar 2024 at 22:48, Chang S. Bae <chang.seok.bae@intel.com> wrote:
> >>
> >> @@ -241,7 +241,7 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
> >>                  err = aes_expandkey(ctx, in_key, key_len);
> >>          else {
> >>                  kernel_fpu_begin();
> >> -               err = aesni_set_key(ctx, in_key, key_len);
> >> +               aesni_set_key(ctx, in_key, key_len);
> >
> > This will leave 'err' uninitialized.
>
> Ah, right. Thanks for catching it.
>
> Also, upon reviewing aes_expandkey(), I noticed there's no error case,
> except for the key length sanity check.
>
> While addressing this, perhaps additional cleanup is considerable like:
>
> @@ -233,19 +233,20 @@ static int aes_set_key_common(struct
> crypto_aes_ctx *ctx,
>   {
>          int err;
>
> -       if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
> -           key_len != AES_KEYSIZE_256)
> -               return -EINVAL;
> +       err = aes_check_keylen(key_len);
> +       if (err)
> +               return err;
>
>          if (!crypto_simd_usable())
> -               err = aes_expandkey(ctx, in_key, key_len);
> +               /* no error with a valid key length  */
> +               aes_expandkey(ctx, in_key, key_len);
>          else {
>                  kernel_fpu_begin();
> -               err = aesni_set_key(ctx, in_key, key_len);
> +               aesni_set_key(ctx, in_key, key_len);
>                  kernel_fpu_end();
>          }
>
> -       return err;
> +       return 0;
>   }
>

I wonder whether we need aesni_set_key() at all.

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void
  2024-03-12 15:18                 ` Ard Biesheuvel
@ 2024-03-12 15:37                   ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-12 15:37 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: linux-kernel, linux-crypto, herbert, davem, ebiggers, x86

On 3/12/2024 8:18 AM, Ard Biesheuvel wrote:
> 
> I wonder whether we need aesni_set_key() at all.

The following looks to be relevant from the AES-NI whitepaper [1]:

The Relative Cost of the Key Expansion
     ...
     Some less frequent applications require frequent key scheduling. For
     example, some random number generators may rekey frequently to
     achieve forward secrecy. One extreme example is a Davies-Meyer
     hashing construction, which uses a block cipher primitive as a
     compression function, and the cipher is re-keyed for each processed
     data block.

     Although these are not the mainstream usage models of the AES
     instructions, we point out that the AESKEYGENASSIST and AESIMC
     instructions facilitate Key Expansion procedure which is lookup
     tables free, and faster than software only key expansion. In
     addition, we point out that unrolling of the key expansion code,
     which is provided in the previous sections, improves the key
     expansion performance. The AES256 case can also utilize the
     instruction AESENCLAST, for the sbox transformation, that is faster
     than using AESKEYGENASSIST.

[1] 
https://www.intel.com/content/dam/doc/white-paper/advanced-encryption-standard-new-instructions-set-paper.pdf

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code
  2024-03-11 21:32           ` [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void Chang S. Bae
  2024-03-12  2:15             ` Eric Biggers
  2024-03-12  7:46             ` Ard Biesheuvel
@ 2024-03-22 23:04             ` Chang S. Bae
  2024-03-22 23:04               ` [PATCH v2 1/2] crypto: x86/aesni - Rearrange AES key size check Chang S. Bae
                                 ` (2 more replies)
  2 siblings, 3 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-22 23:04 UTC (permalink / raw)
  To: linux-kernel, linux-crypto
  Cc: herbert, davem, ebiggers, ardb, x86, chang.seok.bae

Previously, V1 [1] was posted to update the aesni_set_key() prototype to
remove an unnecessary error code return type.

During the review process, it was discovered that both aes_expandkey()
and the AES-NI glue code redundantly check the key size. V2 includes a
cleanup to invoke aes_expandkey() immediately if AES-NI is unusable.

Thanks,
Chang

[1]: https://lore.kernel.org/lkml/20240311213232.128240-1-chang.seok.bae@intel.com/

Chang S. Bae (2):
  crypto: x86/aesni - Rearrange AES key size check
  crypto: x86/aesni - Update aesni_set_key() to return void

 arch/x86/crypto/aesni-intel_asm.S  |  5 ++---
 arch/x86/crypto/aesni-intel_glue.c | 24 +++++++++++-------------
 2 files changed, 13 insertions(+), 16 deletions(-)


base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
-- 
2.34.1


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v2 1/2] crypto: x86/aesni - Rearrange AES key size check
  2024-03-22 23:04             ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Chang S. Bae
@ 2024-03-22 23:04               ` Chang S. Bae
  2024-03-22 23:04               ` [PATCH v2 2/2] crypto: x86/aesni - Update aesni_set_key() to return void Chang S. Bae
  2024-03-28 10:57               ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Herbert Xu
  2 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-22 23:04 UTC (permalink / raw)
  To: linux-kernel, linux-crypto
  Cc: herbert, davem, ebiggers, ardb, x86, chang.seok.bae

aes_expandkey() already includes an AES key size check. If AES-NI is
unusable, invoke the function without the size check.

Also, use aes_check_keylen() instead of open code.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-crypto@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
V2 <- V1:
* Add as a new patch.
---
 arch/x86/crypto/aesni-intel_glue.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index b1d90c25975a..8b3b17b065ad 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -233,18 +233,16 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
 {
 	int err;
 
-	if (key_len != AES_KEYSIZE_128 && key_len != AES_KEYSIZE_192 &&
-	    key_len != AES_KEYSIZE_256)
-		return -EINVAL;
-
 	if (!crypto_simd_usable())
-		err = aes_expandkey(ctx, in_key, key_len);
-	else {
-		kernel_fpu_begin();
-		err = aesni_set_key(ctx, in_key, key_len);
-		kernel_fpu_end();
-	}
+		return aes_expandkey(ctx, in_key, key_len);
 
+	err = aes_check_keylen(key_len);
+	if (err)
+		return err;
+
+	kernel_fpu_begin();
+	err = aesni_set_key(ctx, in_key, key_len);
+	kernel_fpu_end();
 	return err;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v2 2/2] crypto: x86/aesni - Update aesni_set_key() to return void
  2024-03-22 23:04             ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Chang S. Bae
  2024-03-22 23:04               ` [PATCH v2 1/2] crypto: x86/aesni - Rearrange AES key size check Chang S. Bae
@ 2024-03-22 23:04               ` Chang S. Bae
  2024-03-28 10:57               ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Herbert Xu
  2 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-22 23:04 UTC (permalink / raw)
  To: linux-kernel, linux-crypto
  Cc: herbert, davem, ebiggers, ardb, x86, chang.seok.bae, Eric Biggers

The aesni_set_key() implementation has no error case, yet its prototype
specifies to return an error code.

Modify the function prototype to return void and adjust the related code.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: linux-crypto@vger.kernel.org
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
V2 <- V1:
* Ensure not to return an uninitialized value. (Ard Biesheuvel)

Previously, Eric identified a similar case in my AES-KL code [1]. Then,
this parallel issue was realized.
[1]: https://lore.kernel.org/lkml/20230607053558.GC941@sol.localdomain/
---
 arch/x86/crypto/aesni-intel_asm.S  | 5 ++---
 arch/x86/crypto/aesni-intel_glue.c | 8 ++++----
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 411d8c83e88a..7ecb55cae3d6 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -1820,8 +1820,8 @@ SYM_FUNC_START_LOCAL(_key_expansion_256b)
 SYM_FUNC_END(_key_expansion_256b)
 
 /*
- * int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
- *                   unsigned int key_len)
+ * void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+ *                    unsigned int key_len)
  */
 SYM_FUNC_START(aesni_set_key)
 	FRAME_BEGIN
@@ -1926,7 +1926,6 @@ SYM_FUNC_START(aesni_set_key)
 	sub $0x10, UKEYP
 	cmp TKEYP, KEYP
 	jb .Ldec_key_loop
-	xor AREG, AREG
 #ifndef __x86_64__
 	popl KEYP
 #endif
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 8b3b17b065ad..0ea3abaaa645 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -87,8 +87,8 @@ static inline void *aes_align_addr(void *addr)
 	return PTR_ALIGN(addr, AESNI_ALIGN);
 }
 
-asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			     unsigned int key_len);
+asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
 asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -241,9 +241,9 @@ static int aes_set_key_common(struct crypto_aes_ctx *ctx,
 		return err;
 
 	kernel_fpu_begin();
-	err = aesni_set_key(ctx, in_key, key_len);
+	aesni_set_key(ctx, in_key, key_len);
 	kernel_fpu_end();
-	return err;
+	return 0;
 }
 
 static int aes_set_key(struct crypto_tfm *tfm, const u8 *in_key,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code
  2024-03-22 23:04             ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Chang S. Bae
  2024-03-22 23:04               ` [PATCH v2 1/2] crypto: x86/aesni - Rearrange AES key size check Chang S. Bae
  2024-03-22 23:04               ` [PATCH v2 2/2] crypto: x86/aesni - Update aesni_set_key() to return void Chang S. Bae
@ 2024-03-28 10:57               ` Herbert Xu
  2 siblings, 0 replies; 247+ messages in thread
From: Herbert Xu @ 2024-03-28 10:57 UTC (permalink / raw)
  To: Chang S. Bae; +Cc: linux-kernel, linux-crypto, davem, ebiggers, ardb, x86

On Fri, Mar 22, 2024 at 04:04:57PM -0700, Chang S. Bae wrote:
> Previously, V1 [1] was posted to update the aesni_set_key() prototype to
> remove an unnecessary error code return type.
> 
> During the review process, it was discovered that both aes_expandkey()
> and the AES-NI glue code redundantly check the key size. V2 includes a
> cleanup to invoke aes_expandkey() immediately if AES-NI is unusable.
> 
> Thanks,
> Chang
> 
> [1]: https://lore.kernel.org/lkml/20240311213232.128240-1-chang.seok.bae@intel.com/
> 
> Chang S. Bae (2):
>   crypto: x86/aesni - Rearrange AES key size check
>   crypto: x86/aesni - Update aesni_set_key() to return void
> 
>  arch/x86/crypto/aesni-intel_asm.S  |  5 ++---
>  arch/x86/crypto/aesni-intel_glue.c | 24 +++++++++++-------------
>  2 files changed, 13 insertions(+), 16 deletions(-)
> 
> 
> base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
> -- 
> 2.34.1

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v9 00/14] x86: Support Key Locker
  2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
                         ` (12 preceding siblings ...)
  (?)
@ 2024-03-29  1:53       ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 01/14] Documentation/x86: Document " Chang S. Bae
                           ` (14 more replies)
  -1 siblings, 15 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

Hi all,

As posting this version, I wanted to make sure these code changes were
acknowledgeable at first:

The previous enabling process has been paused to address vulnerabilities
[1][2] that could compromise Key Locker's ability to protect AES keys.
Now, with the mainlining of mitigations [3][4], patches (Patch 10-11)
were added to ensure the application of these mitigations.

During this period, there was a significant change in the mainline commit
b81fac906a8f ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"). This affected Key Locker's initialization
code, which clobbers XMM registers for loading a wrapping key, as it
depends on FPU initialization.

In this revision, the setup code was adjusted to separate the
initialization part to be invoked during arch_initcall(). The remaining
code for copying the wrapping key from the backup resides in the
identify_cpu() -> setup_keylocker() path. This separation simplifies the
code and resolves an issue with hotplug.

The remaining changes mainly focus on the AES crypto driver, addressing
feedback from Eric. Notably, while doing so, it was realized better to
disallow a module build. Key Locker's AES instructions do not support
192-bit keys. Supporting a module build would require exporting some
AES-NI functions, leading to performance-impacting indirect calls. I
think we can revisit module support later if necessary.

Then, the following is a summary of changes per patch since v8 [6]:

PATCH7-8:
* Invoke the setup code via arch_initcall() due to upstream changes
  delaying the FPU setup.

PATCH9-11:
* Add new patches for security and hotplug support clarification

PATCH12:
* Drop the "nokeylocker" option. (Borislav Petkov)

PATCH13:
* Introduce 'union x86_aes_ctx'. (Eric Biggers)
* Ensure 'inline' for wrapper functions.

PATCH14:
* Combine the XTS enc/dec assembly code in a macro.  (Eric Biggers)
* Define setkey() as void instead of returning 'int'.  (Eric Biggers)
* Rearrange the assembly code to reduce jumps, especially for success
  cases.  (Eric Biggers)
* Update the changelog for clarification. (Eric Biggers)
* Exclude module build.

This series is based on my AES-NI setkey() cleanup [7], which has been
recently merged into the crypto repository [8], and I thought it was
better to go first. You can also find this series here:
    git://github.com/intel-staging/keylocker.git kl-v9

Thanks,
Chang

[1] Gather Data Sampling (GDS)
    https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
[2] Register File Data Sampling (RFDS)
    https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
[3] Mainlining of GDS mitigation
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64094e7e3118aff4b0be8ff713c242303e139834
[4] Mainlining of RFDS Mitigation
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0e33cf955f07e3991e45109cb3e29fbc9ca51d06
[5]  Initialize FPU late
    https://lore.kernel.org/lkml/168778151512.3634408.11432553576702911909.tglx@vps.praguecc.cz/
[6] V8: https://lore.kernel.org/lkml/20230603152227.12335-1-chang.seok.bae@intel.com/
[7] https://lore.kernel.org/lkml/20240322230459.456606-1-chang.seok.bae@intel.com/
[8] git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git

Chang S. Bae (14):
  Documentation/x86: Document Key Locker
  x86/cpufeature: Enumerate Key Locker feature
  x86/insn: Add Key Locker instructions to the opcode map
  x86/asm: Add a wrapper function for the LOADIWKEY instruction
  x86/msr-index: Add MSRs for Key Locker wrapping key
  x86/keylocker: Define Key Locker CPUID leaf
  x86/cpu/keylocker: Load a wrapping key at boot time
  x86/PM/keylocker: Restore the wrapping key on the resume from ACPI
    S3/4
  x86/hotplug/keylocker: Ensure wrapping key backup capability
  x86/cpu/keylocker: Check Gather Data Sampling mitigation
  x86/cpu/keylocker: Check Register File Data Sampling mitigation
  x86/Kconfig: Add a configuration for Key Locker
  crypto: x86/aes - Prepare for new AES-XTS implementation
  crypto: x86/aes-kl - Implement the AES-XTS algorithm

 Documentation/arch/x86/index.rst            |   1 +
 Documentation/arch/x86/keylocker.rst        |  96 +++++
 arch/x86/Kconfig                            |   3 +
 arch/x86/Kconfig.assembler                  |   5 +
 arch/x86/crypto/Kconfig                     |  17 +
 arch/x86/crypto/Makefile                    |   3 +
 arch/x86/crypto/aes-helper_asm.S            |  22 ++
 arch/x86/crypto/aes-helper_glue.h           | 168 ++++++++
 arch/x86/crypto/aeskl-intel_asm.S           | 412 ++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c          | 187 +++++++++
 arch/x86/crypto/aeskl-intel_glue.h          |  35 ++
 arch/x86/crypto/aesni-intel_asm.S           |  47 +--
 arch/x86/crypto/aesni-intel_glue.c          | 193 ++-------
 arch/x86/crypto/aesni-intel_glue.h          |  40 ++
 arch/x86/include/asm/cpufeatures.h          |   1 +
 arch/x86/include/asm/disabled-features.h    |   8 +-
 arch/x86/include/asm/keylocker.h            |  42 ++
 arch/x86/include/asm/msr-index.h            |   6 +
 arch/x86/include/asm/special_insns.h        |  28 ++
 arch/x86/include/uapi/asm/processor-flags.h |   2 +
 arch/x86/kernel/Makefile                    |   1 +
 arch/x86/kernel/cpu/common.c                |   4 +-
 arch/x86/kernel/cpu/cpuid-deps.c            |   1 +
 arch/x86/kernel/keylocker.c                 | 219 +++++++++++
 arch/x86/lib/x86-opcode-map.txt             |  11 +-
 arch/x86/power/cpu.c                        |   2 +
 tools/arch/x86/lib/x86-opcode-map.txt       |  11 +-
 27 files changed, 1363 insertions(+), 202 deletions(-)
 create mode 100644 Documentation/arch/x86/keylocker.rst
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h
 create mode 100644 arch/x86/include/asm/keylocker.h
 create mode 100644 arch/x86/kernel/keylocker.c


base-commit: 3a447c31d337bdec7fbc605a7a1e00aff4c492d0
-- 
2.34.1


^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v9 01/14] Documentation/x86: Document Key Locker
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-31 15:48           ` Randy Dunlap
  2024-03-29  1:53         ` [PATCH v9 02/14] x86/cpufeature: Enumerate Key Locker feature Chang S. Bae
                           ` (13 subsequent siblings)
  14 siblings, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Bagas Sanjaya, Randy Dunlap

Document the overview of the feature along with relevant consideration
when provisioning dm-crypt volumes with AES-KL instead of AES-NI.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
---
Changes from v8:
* Change wording of documentation slightly. (Randy Dunlap and Bagas
  Sanjaya)

Changes from v6:
* Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
  documentation into Documentation/arch/"). (Nathan Huckleberry)
* Remove a duplicated sentence -- 'But there is no AES-KL instruction
  to process a 192-bit key.'
* Update the text for clarity and readability:
  - Clarify the error code and exemplify the backup failure
  - Use 'wrapping key' instead of less readable 'IWKey'

Changes from v5:
* Fix a typo: 'feature feature' -> 'feature'

Changes from RFC v2:
* Add as a new patch.

The preview is available here:
  https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
---
 Documentation/arch/x86/index.rst     |  1 +
 Documentation/arch/x86/keylocker.rst | 96 ++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)
 create mode 100644 Documentation/arch/x86/keylocker.rst

diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
index 8ac64d7de4dc..669c239c009f 100644
--- a/Documentation/arch/x86/index.rst
+++ b/Documentation/arch/x86/index.rst
@@ -43,3 +43,4 @@ x86-specific Documentation
    features
    elf_auxvec
    xstate
+   keylocker
diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
new file mode 100644
index 000000000000..b28addb8eaf4
--- /dev/null
+++ b/Documentation/arch/x86/keylocker.rst
@@ -0,0 +1,96 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+x86 Key Locker
+==============
+
+Introduction
+============
+
+Key Locker is a CPU feature to reduce key exfiltration opportunities
+while maintaining a programming interface similar to AES-NI. It
+converts the AES key into an encoded form, called the 'key handle'.
+The key handle is a wrapped version of the clear-text key where the
+wrapping key has limited exposure. Once converted, all subsequent data
+encryption using new AES instructions (AES-KL) uses this key handle,
+reducing the exposure of private key material in memory.
+
+CPU-internal Wrapping Key
+=========================
+
+The CPU-internal wrapping key is an entity in a software-invisible CPU
+state. On every system boot, a new key is loaded. So the key handle that
+was encoded by the old wrapping key is no longer usable on system shutdown
+or reboot.
+
+And the key may be lost on the following exceptional situation upon wakeup:
+
+Wrapping Key Restore Failure
+----------------------------
+
+The CPU state is volatile with the ACPI S3/4 sleep states. When the system
+supports those states, the key has to be backed up so that it is restored
+on wake up. The kernel saves the key in non-volatile media.
+
+Upon the event of a wrapping key restore failure upon resume from suspend,
+all established key handles become invalid. In flight dm-crypt operations
+receive error results from pending operations. In the likely scenario that
+dm-crypt is hosting the root filesystem the recovery is identical to if a
+storage controller failed to resume from suspend or reboot. If the volume
+impacted by a wrapping key restore failure is a data volume then it is
+possible that I/O errors on that volume do not bring down the rest of the
+system. However, a reboot is still required because the kernel will have
+soft-disabled Key Locker. Upon the failure, the crypto library code will
+return -ENODEV on every AES-KL function call. The Key Locker implementation
+only loads a new wrapping key at initial boot, not any time after like
+resume from suspend.
+
+Use Case and Non-use Cases
+==========================
+
+Bare metal disk encryption is the only intended use case.
+
+Userspace usage is not supported because there is no ABI provided to
+communicate and coordinate wrapping-key restore failure to userspace. For
+now, key restore failures are only coordinated with kernel users. But the
+kernel can not prevent userspace from using the feature's AES instructions
+('AES-KL') when the feature has been enabled. So, the lack of userspace
+support is only documented, not actively enforced.
+
+Key Locker is not expected to be advertised to guest VMs and the kernel
+implementation ignores it even if the VMM enumerates the capability. The
+expectation is that a guest VM wants private wrapping key state, but the
+architecture does not provide that. An emulation of that capability, by
+caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
+The backup / restore facility is also not performant enough to be suitable
+for guest VM context switches.
+
+AES Instruction Set
+===================
+
+The feature accompanies a new AES instruction set. This instruction set is
+analogous to AES-NI. A set of AES-NI instructions can be mapped to an
+AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
+of transformation, which is equivalent to nine times AESENC and one
+AESENCLAST in AES-NI.
+
+But they have some notable differences:
+
+* AES-KL provides a secure data transformation using an encrypted key.
+
+* If an invalid key handle is provided, e.g. a corrupted one or a handle
+  restriction failure, the instruction fails with setting RFLAGS.ZF. The
+  crypto library implementation includes the flag check to return -EINVAL.
+  Note that this flag is also set if the wrapping key is changed, e.g.,
+  because of the backup error.
+
+* AES-KL implements support for 128-bit and 256-bit keys, but there is no
+  AES-KL instruction to process an 192-bit key. The AES-KL cipher
+  implementation logs a warning message with a 192-bit key and then falls
+  back to AES-NI. So, this 192-bit key-size limitation is only documented,
+  not enforced. It means the key will remain in clear-text in memory. This
+  is to meet Linux crypto-cipher expectation that each implementation must
+  support all the AES-compliant key sizes.
+
+* Some AES-KL hardware implementation may have noticeable performance
+  overhead when compared with AES-NI instructions.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 02/14] x86/cpufeature: Enumerate Key Locker feature
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 01/14] Documentation/x86: Document " Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 03/14] x86/insn: Add Key Locker instructions to the opcode map Chang S. Bae
                           ` (12 subsequent siblings)
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

Key Locker is a CPU feature to minimize exposure of clear-text key
material. An encoded form, called 'key handle', is referenced for data
encryption or decryption instead of accessing the clear text key.

A wrapping key loaded in the CPU's software-inaccessible state is used
to transform a user key into a key handle. On rarely unexpected
hardware failure, the key could be lost.

Here enumerate this hardware capability. It will not be shown up in
/proc/cpuinfo as userspace usage is not supported. This is because
there is no ABI to coordinate the wrapping-key failure.

The feature supports Advanced Encryption Standard (AES) cipher
algorithm with new SIMD instruction set like its predecessor (AES-NI).
Mark the feature having dependency on XMM2 like AES-NI has. The new AES
implementation will be in the crypto library.

Add X86_FEATURE_KEYLOCKER to the disabled feature list. It will be
enabled by a new Kconfig option.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v6:
* Massage the changelog -- re-organize the change descriptions

Changes from RFC v2:
* Do not publish the feature flag to userspace.
* Update the changelog.

Changes from RFC v1:
* Updated the changelog.
---
 arch/x86/include/asm/cpufeatures.h          | 1 +
 arch/x86/include/asm/disabled-features.h    | 8 +++++++-
 arch/x86/include/uapi/asm/processor-flags.h | 2 ++
 arch/x86/kernel/cpu/cpuid-deps.c            | 1 +
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f0337f7bcf16..dd30435af487 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -399,6 +399,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_KEYLOCKER		(16*32+23) /* "" Key Locker */
 #define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index da4054fbf533..14aa6dc3b846 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -38,6 +38,12 @@
 # define DISABLE_OSPKE		(1<<(X86_FEATURE_OSPKE & 31))
 #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */
 
+#ifdef CONFIG_X86_KEYLOCKER
+# define DISABLE_KEYLOCKER	0
+#else
+# define DISABLE_KEYLOCKER	(1<<(X86_FEATURE_KEYLOCKER & 31))
+#endif /* CONFIG_X86_KEYLOCKER */
+
 #ifdef CONFIG_X86_5LEVEL
 # define DISABLE_LA57	0
 #else
@@ -150,7 +156,7 @@
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
 #define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
-			 DISABLE_ENQCMD)
+			 DISABLE_ENQCMD|DISABLE_KEYLOCKER)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	(DISABLE_IBT)
 #define DISABLED_MASK19	(DISABLE_SEV_SNP)
diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h
index f1a4adc78272..a24f7cb2cd68 100644
--- a/arch/x86/include/uapi/asm/processor-flags.h
+++ b/arch/x86/include/uapi/asm/processor-flags.h
@@ -128,6 +128,8 @@
 #define X86_CR4_PCIDE		_BITUL(X86_CR4_PCIDE_BIT)
 #define X86_CR4_OSXSAVE_BIT	18 /* enable xsave and xrestore */
 #define X86_CR4_OSXSAVE		_BITUL(X86_CR4_OSXSAVE_BIT)
+#define X86_CR4_KEYLOCKER_BIT	19 /* enable Key Locker */
+#define X86_CR4_KEYLOCKER	_BITUL(X86_CR4_KEYLOCKER_BIT)
 #define X86_CR4_SMEP_BIT	20 /* enable SMEP support */
 #define X86_CR4_SMEP		_BITUL(X86_CR4_SMEP_BIT)
 #define X86_CR4_SMAP_BIT	21 /* enable SMAP support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index b7174209d855..820dcf35eca9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -84,6 +84,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_SHSTK,			X86_FEATURE_XSAVES    },
 	{ X86_FEATURE_FRED,			X86_FEATURE_LKGS      },
 	{ X86_FEATURE_FRED,			X86_FEATURE_WRMSRNS   },
+	{ X86_FEATURE_KEYLOCKER,		X86_FEATURE_XMM2      },
 	{}
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 03/14] x86/insn: Add Key Locker instructions to the opcode map
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 01/14] Documentation/x86: Document " Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 02/14] x86/cpufeature: Enumerate Key Locker feature Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 04/14] x86/asm: Add a wrapper function for the LOADIWKEY instruction Chang S. Bae
                           ` (11 subsequent siblings)
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
	Load a CPU-internal wrapping key.

ENCODEKEY128:
	Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
	Wrap a 256-bit AES key to a key handle.

AESENC128KL:
	Encrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESENC256KL:
	Encrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESDEC128KL:
	Decrypt a 128-bit block of data using a 128-bit AES key
	indicated by a key handle.

AESDEC256KL:
	Decrypt a 128-bit block of data using a 256-bit AES key
	indicated by a key handle.

AESENCWIDE128KL:
	Encrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESENCWIDE256KL:
	Encrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

AESDECWIDE128KL:
	Decrypt 8 128-bit blocks of data using a 128-bit AES key
	indicated by a key handle.

AESDECWIDE256KL:
	Decrypt 8 128-bit blocks of data using a 256-bit AES key
	indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v6:
* Massage the changelog -- add the reason a bit.

Changes from RFC v1:
* Separated out the LOADIWKEY addition in a new patch.
* Included AES instructions to avoid warning messages when the AES Key
  Locker module is built.
---
 arch/x86/lib/x86-opcode-map.txt       | 11 +++++++----
 tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 12af572201a2..c94988d5130d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 12af572201a2..c94988d5130d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
 cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
 cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
 cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
 db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
 f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
 f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
 f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
 f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
 f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
 f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
 EndTable
 
 Table: 3-byte opcode 2 (0x0f 0x3a)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 04/14] x86/asm: Add a wrapper function for the LOADIWKEY instruction
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (2 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 03/14] x86/insn: Add Key Locker instructions to the opcode map Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 05/14] x86/msr-index: Add MSRs for Key Locker wrapping key Chang S. Bae
                           ` (10 subsequent siblings)
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

Key Locker introduces a CPU-internal wrapping key to encode a user key
to a key handle. Then a key handle is referenced instead of the plain
text key.

LOADIWKEY loads a wrapping key in the software-inaccessible CPU state.
It operates only in kernel mode.

The kernel will use this to load a new key at boot time. Establish an
accessor for the feature setup, and define struct iwkey to pass a key
value.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v6:
* Massage the changelog -- clarify the reason and the changes a bit.

Changes from v5:
* Fix a typo: kernel_cpu_begin() -> kernel_fpu_begin()

Changes from RFC v2:
* Separate out the code as a new patch.
* Improve the usability with the new struct as an argument. (Dan
  Williams)

Previously, Dan questioned the necessity of 'WARN_ON(!irq_fpu_usable())'
in the load_xmm_iwkey() function. However, it's worth noting that the
function comment emphasizes the caller's responsibility for invoking
kernel_fpu_begin(), which effectively performs the sanity check through
kernel_fpu_begin_mask().
---
 arch/x86/include/asm/keylocker.h     | 25 +++++++++++++++++++++++++
 arch/x86/include/asm/special_insns.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)
 create mode 100644 arch/x86/include/asm/keylocker.h

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
new file mode 100644
index 000000000000..4e731f577c50
--- /dev/null
+++ b/arch/x86/include/asm/keylocker.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef _ASM_KEYLOCKER_H
+#define _ASM_KEYLOCKER_H
+
+#ifndef __ASSEMBLY__
+
+#include <asm/fpu/types.h>
+
+/**
+ * struct iwkey - A temporary wrapping key storage.
+ * @integrity_key:	A 128-bit key used to verify the integrity of
+ *			key handles
+ * @encryption_key:	A 256-bit encryption key used for wrapping and
+ *			unwrapping clear text keys.
+ *
+ * This storage should be flushed immediately after being loaded.
+ */
+struct iwkey {
+	struct reg_128_bit integrity_key;
+	struct reg_128_bit encryption_key[2];
+};
+
+#endif /*__ASSEMBLY__ */
+#endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
index 2e9fc5c400cd..65267013f1e1 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -9,6 +9,7 @@
 #include <linux/errno.h>
 #include <linux/irqflags.h>
 #include <linux/jump_label.h>
+#include <asm/keylocker.h>
 
 /*
  * The compiler should not reorder volatile asm statements with respect to each
@@ -301,6 +302,33 @@ static __always_inline void tile_release(void)
 	asm volatile(".byte 0xc4, 0xe2, 0x78, 0x49, 0xc0");
 }
 
+/**
+ * load_xmm_iwkey - Load a CPU-internal wrapping key into XMM registers.
+ * @key:	A pointer to a struct iwkey containing the key data.
+ *
+ * The caller is responsible for invoking kernel_fpu_begin() before.
+ */
+static inline void load_xmm_iwkey(struct iwkey *key)
+{
+	struct reg_128_bit zeros = { 0 };
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %1, %%xmm1; movdqu %2, %%xmm2;"
+		      :: "m"(key->integrity_key), "m"(key->encryption_key[0]),
+			 "m"(key->encryption_key[1]));
+
+	/*
+	 * 'LOADIWKEY %xmm1,%xmm2' loads a key from XMM0-2 into a
+	 * software-invisible CPU state. With zero in EAX, CPU does not
+	 * perform hardware randomization and allows key backup.
+	 *
+	 * This instruction is supported by binutils >= 2.36.
+	 */
+	asm volatile (".byte 0xf3,0x0f,0x38,0xdc,0xd1" :: "a"(0));
+
+	asm volatile ("movdqu %0, %%xmm0; movdqu %0, %%xmm1; movdqu %0, %%xmm2;"
+		      :: "m"(zeros));
+}
+
 #endif /* __KERNEL__ */
 
 #endif /* _ASM_X86_SPECIAL_INSNS_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 05/14] x86/msr-index: Add MSRs for Key Locker wrapping key
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (3 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 04/14] x86/asm: Add a wrapper function for the LOADIWKEY instruction Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 06/14] x86/keylocker: Define Key Locker CPUID leaf Chang S. Bae
                           ` (9 subsequent siblings)
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

The wrapping key resides in the same power domain as the CPU cache.
Consequently, any sleep state that invalidates the cache, such as S3,
also affects the wrapping key's state.

However, as the wrapping key's state is inaccessible to software, a
specialized mechanism is necessary to save and restore the key during
deep sleep.

A set of new MSRs is provided as an abstract interface for saving,
restoring, and checking the wrapping key's status. The wrapping key
is securely saved in a platform-scoped state using non-volatile media.
Both the backup storage and its path from the CPU are encrypted and
integrity-protected to ensure security.

Define those MSRs for saving and restoring the key during S3/4 sleep
states.

Note that the non-volatility of the backup storage is not architecturally
guaranteed across off-states such as S5 and G3. In such cases, the kernel
may generate a new key during the next boot.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v8:
* Tweak the changelog.

Changes from v6:
* Tweak the changelog -- put the last for those about other sleep states

Changes from RFC v2:
* Update the changelog. (Dan Williams)
* Rename the MSRs. (Dan Williams)
---
 arch/x86/include/asm/msr-index.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 05956bd8bacf..a451fa1e2cd9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -1192,4 +1192,10 @@
 						* a #GP
 						*/
 
+/* MSRs for managing a CPU-internal wrapping key for Key Locker. */
+#define MSR_IA32_IWKEY_COPY_STATUS		0x00000990
+#define MSR_IA32_IWKEY_BACKUP_STATUS		0x00000991
+#define MSR_IA32_BACKUP_IWKEY_TO_PLATFORM	0x00000d91
+#define MSR_IA32_COPY_IWKEY_TO_LOCAL		0x00000d92
+
 #endif /* _ASM_X86_MSR_INDEX_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 06/14] x86/keylocker: Define Key Locker CPUID leaf
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (4 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 05/14] x86/msr-index: Add MSRs for Key Locker wrapping key Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 07/14] x86/cpu/keylocker: Load a wrapping key at boot time Chang S. Bae
                           ` (8 subsequent siblings)
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

Both Key Locker enabling code in the x86 core and AES Key Locker code
in the crypto library will need to reference feature-specific CPUID bits.
Define this CPUID leaf and bits.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
Changes from v6:
* Tweak the changelog -- comment the reason first and then brief the
  change.

Changes from RFC v2:
* Separate out the code as a new patch.
---
 arch/x86/include/asm/keylocker.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 4e731f577c50..1213d273c369 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -5,6 +5,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <linux/bits.h>
 #include <asm/fpu/types.h>
 
 /**
@@ -21,5 +22,11 @@ struct iwkey {
 	struct reg_128_bit encryption_key[2];
 };
 
+#define KEYLOCKER_CPUID			0x019
+#define KEYLOCKER_CPUID_EAX_SUPERVISOR	BIT(0)
+#define KEYLOCKER_CPUID_EBX_AESKLE	BIT(0)
+#define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
+#define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 07/14] x86/cpu/keylocker: Load a wrapping key at boot time
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (5 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 06/14] x86/keylocker: Define Key Locker CPUID leaf Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-04-07 23:04           ` [PATCH v9a " Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 08/14] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4 Chang S. Bae
                           ` (7 subsequent siblings)
  14 siblings, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Dave Hansen

The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled before the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time, involving
clobbering XMM registers. Perform this process under arch_initcall(),
ensuring that it occurs after FPU initialization. Finally, flush out
random bytes after loading.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt use, and that switching wrapping key per virtual machine
is impractical, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Elliott, Robert (Servers)" <elliott@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
Changes from v8:
* Invoke the setup code via arch_initcall(). The move was due to the
  upstream changes. Commit b81fac906a8f ("x86/fpu: Move FPU
  initialization into arch_cpu_finalize_init()") delays the FPU setup.
* Tweak code comments and the changelog.
* Revoke the review tag as the code change is significant.

Changes from v6:
* Switch to use 'static inline' for the empty functions, instead of
  macro that disallows type checks. (Eric Biggers and Dave Hansen)
* Use memzero_explicit() to wipe out the key data instead of writing
  the poison value over there. (Robert Elliott)
* Massage the changelog for the better readability.

Changes from v5:
* Call out the disabling when the feature is available on a virtual
  machine. Then, it will turn off the feature flag

Changes from RFC v2:
* Make bare metal only.
* Clean up the code (e.g. dynamically allocate the key cache).
  (Dan Williams)
* Massage the changelog.
* Move out the LOADIWKEY wrapper and the Key Locker CPUID defines.
---
 arch/x86/kernel/Makefile    |  1 +
 arch/x86/kernel/keylocker.c | 77 +++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 74077694da7d..d105e5785b90 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -137,6 +137,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..0d6b715baf1e
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,77 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/processor.h>
+
+static struct iwkey wrapping_key __initdata;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&wrapping_key.integrity_key, sizeof(wrapping_key.integrity_key));
+	get_random_bytes(&wrapping_key.encryption_key, sizeof(wrapping_key.encryption_key));
+}
+
+static void __init destroy_keylocker_data(void)
+{
+	memzero_explicit(&wrapping_key, sizeof(wrapping_key));
+}
+
+/*
+ * For loading the wrapping key into each CPU, the feature bit is set
+ * in the control register and FPU context management is performed.
+ */
+static void __init load_keylocker(struct work_struct *unused)
+{
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	kernel_fpu_begin();
+	load_xmm_iwkey(&wrapping_key);
+	kernel_fpu_end();
+}
+
+static int __init init_keylocker(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto disable;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto clear_cap;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	/* AESKLE depends on CR4.KEYLOCKER */
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+	    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+		pr_debug("x86/keylocker: Not fully supported.\n");
+		goto clear_cap;
+	}
+
+	generate_keylocker_data();
+	schedule_on_each_cpu(load_keylocker);
+	destroy_keylocker_data();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return 0;
+
+clear_cap:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+disable:
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+	return -ENODEV;
+}
+
+arch_initcall(init_keylocker);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 08/14] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (6 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 07/14] x86/cpu/keylocker: Load a wrapping key at boot time Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 09/14] x86/hotplug/keylocker: Ensure wrapping key backup capability Chang S. Bae
                           ` (6 subsequent siblings)
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Rafael J . Wysocki, Dave Hansen, Sangwhan Moon

The primary use case for the feature is bare metal dm-crypt. The key
needs to be restored properly on wakeup, as dm-crypt does not prompt
for the key on resume from suspend. Even if the prompt performs for
unlocking the volume, where the hibernation image is stored, it still
expects to reuse the key handles within the hibernation image once it
is loaded.

== Wrapping-key Restore ==

To meet dm-crypt's expectations, the key handles in the suspend-image has
to remain valid after resuming from an S-state. However, when the system
enters ACPI S3 or S4 sleep states, the wrapping key is discarded.

Key Locker provides a mechanism to back up the wrapping key in
non-volatile storage. Therefore, upon boot, request a backup of the
wrapping key and copy it back to each CPU upon wakeup. If the backup
mechanism is unavailable, disable the feature unless CONFIG_SUSPEND=n.

== Restore Failure ==

In the event of a key restore failure, the kernel proceeds with an
initialized wrapping key state. This action invalidates any key handles
present in the suspend-image, leading to I/O errors in dm-crypt
operations.

However, data integrity remains intact, and access is restored with new
handles created by the new wrapping key at the next boot. At least,
manage a feature-specific flag to communicate with the crypto
implementation, ensuring to stop using AES instructions upon the key
restore failure, instead of abruptly disabling the feature.

== Off-states ==

While the backup may persist in non-volatile media across S5 and G3 "off"
states, it is neither architecturally guaranteed nor expected by
dm-crypt. Therefore, a reboot can address this scenario with a new
wrapping key, as dm-crypt prompts for the key whenever the volume is
started.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Sangwhan Moon <sxm@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
Changes from v8:
* Rebase on the previous patch (patch7) changes, separating the wrapping
  key restoration code from the initial load. Previously, the
  identify_cpu() -> setup_keylocker() sequence in the hotplug path could
  hit __init code, leading to an explosion. This change removes the
  initialization code from the hotplug path. (Sangwhan Moon)
* Turn copy_keylocker() to return bool for simplification.
* Rename the flag for clarity: 'valid_kl' -> 'valid_wrapping_key'.
* Don't export symbol for valid_keylocker(), as AES-KL will be built-in.
  (see patch14 for detail).
* Tweak code comments and the changelog.
* Revoke the review tag as the code change is significant.

Changes from v6:
* Limit the symbol export only when needed.
* Improve the coding style -- reduce an indent after
  'if() { ... return; }'. (Eric Biggers)
* Fix the coding style -- reduce an indent after if() {...return;}.
  (Eric Biggers) Tweak the comment along with that.
* Improve the function prototype, instead of using a macro. (Eric
  Biggers and Dave Hansen)
* Update the documentation:
  - Massage the changelog to clarify the problem-and-solution by
    sections
  - Clarify the comment about the key restore failure.

Changes from v5:
* Fix the 'valid_kl' flag not to be set when the feature is disabled.
  (Reported by Marvin Hsu marvin.hsu@intel.com) Add the function
  comment about this.
* Improve the error handling in setup_keylocker(). All the error cases
  fall through the end that disables the feature. Otherwise, all the
  successful cases return immediately.

Changes from v4:
* Update the changelog and title. (Rafael Wysocki)

Changes from v3:
* Fix the build issue with !X86_KEYLOCKER. (Eric Biggers)

Changes from RFC v2:
* Change the backup key failure handling. (Dan Williams)

Changes from RFC v1:
* Folded the warning message into the if condition check. (Rafael
  Wysocki)
* Rebase on the changes of the previous patches.
* Added error code for key restoration failures.
* Moved the restore helper.
* Added function descriptions.
---
 arch/x86/include/asm/keylocker.h | 10 ++++
 arch/x86/kernel/cpu/common.c     |  4 +-
 arch/x86/kernel/keylocker.c      | 88 ++++++++++++++++++++++++++++++++
 arch/x86/power/cpu.c             |  2 +
 4 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/keylocker.h b/arch/x86/include/asm/keylocker.h
index 1213d273c369..c93102101c41 100644
--- a/arch/x86/include/asm/keylocker.h
+++ b/arch/x86/include/asm/keylocker.h
@@ -28,5 +28,15 @@ struct iwkey {
 #define KEYLOCKER_CPUID_EBX_WIDE	BIT(2)
 #define KEYLOCKER_CPUID_EBX_BACKUP	BIT(4)
 
+#ifdef CONFIG_X86_KEYLOCKER
+void setup_keylocker(void);
+void restore_keylocker(void);
+extern bool valid_keylocker(void);
+#else
+static inline void setup_keylocker(void) { }
+static inline void restore_keylocker(void) { }
+static inline bool valid_keylocker(void) { return false; }
+#endif
+
 #endif /*__ASSEMBLY__ */
 #endif /* _ASM_KEYLOCKER_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 5c1e6d6be267..bfbb1ca64664 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -62,6 +62,7 @@
 #include <asm/intel-family.h>
 #include <asm/cpu_device_id.h>
 #include <asm/fred.h>
+#include <asm/keylocker.h>
 #include <asm/uv/uv.h>
 #include <asm/ia32.h>
 #include <asm/set_memory.h>
@@ -1826,10 +1827,11 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Disable the PN if appropriate */
 	squash_the_stupid_serial_number(c);
 
-	/* Set up SMEP/SMAP/UMIP */
+	/* Setup various Intel-specific CPU security features */
 	setup_smep(c);
 	setup_smap(c);
 	setup_umip(c);
+	setup_keylocker();
 
 	/* Enable FSGSBASE instructions if available. */
 	if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 0d6b715baf1e..d5d11d0263b7 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -9,10 +9,24 @@
 
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
+#include <asm/msr.h>
 #include <asm/processor.h>
 
 static struct iwkey wrapping_key __initdata;
 
+/*
+ * This flag is set when a wrapping key is successfully loaded. If a key
+ * restoration fails, it is reset. This state is exported to the crypto
+ * library, indicating whether Key Locker is usable. Thus, the feature
+ * can be soft-disabled based on this flag.
+ */
+static bool valid_wrapping_key;
+
+bool valid_keylocker(void)
+{
+	return valid_wrapping_key;
+}
+
 static void __init generate_keylocker_data(void)
 {
 	get_random_bytes(&wrapping_key.integrity_key, sizeof(wrapping_key.integrity_key));
@@ -37,9 +51,69 @@ static void __init load_keylocker(struct work_struct *unused)
 	kernel_fpu_end();
 }
 
+/**
+ * copy_keylocker - Copy the wrapping key from the backup.
+ *
+ * Returns:	true if successful, otherwise false.
+ */
+static bool copy_keylocker(void)
+{
+	u64 status;
+
+	wrmsrl(MSR_IA32_COPY_IWKEY_TO_LOCAL, 1);
+	rdmsrl(MSR_IA32_IWKEY_COPY_STATUS, status);
+	return !!(status & BIT(0));
+}
+
+/*
+ * On wakeup, APs copy a wrapping key after the boot CPU verifies a valid
+ * backup status through restore_keylocker(). Subsequently, they adhere
+ * to the error handling protocol by invalidating the flag.
+ */
+void setup_keylocker(void)
+{
+	if (!valid_wrapping_key)
+		return;
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	if (copy_keylocker())
+		return;
+
+	pr_err_once("x86/keylocker: Invalid copy status.\n");
+	valid_wrapping_key = false;
+}
+
+/* The boot CPU restores the wrapping key in the first place on wakeup. */
+void restore_keylocker(void)
+{
+	u64 backup_status;
+
+	if (!valid_wrapping_key)
+		return;
+
+	rdmsrl(MSR_IA32_IWKEY_BACKUP_STATUS, backup_status);
+	if (backup_status & BIT(0)) {
+		if (copy_keylocker())
+			return;
+		pr_err("x86/keylocker: Invalid copy state.\n");
+	} else {
+		pr_err("x86/keylocker: The key backup access failed with %s.\n",
+		       (backup_status & BIT(2)) ? "read error" : "invalid status");
+	}
+
+	/*
+	 * Invalidate the feature via this flag to indicate that the
+	 * crypto code should voluntarily stop using the feature, rather
+	 * than abruptly disabling it.
+	 */
+	valid_wrapping_key = false;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
+	bool backup_available;
 
 	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
 		goto disable;
@@ -59,9 +133,23 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	/*
+	 * The backup is critical for restoring the wrapping key upon
+	 * wakeup.
+	 */
+	backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
+	if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
+		pr_debug("x86/keylocker: No key backup with possible S3/4.\n");
+		goto clear_cap;
+	}
+
 	generate_keylocker_data();
 	schedule_on_each_cpu(load_keylocker);
 	destroy_keylocker_data();
+	valid_wrapping_key = true;
+
+	if (backup_available)
+		wrmsrl(MSR_IA32_BACKUP_IWKEY_TO_PLATFORM, 1);
 
 	pr_info_once("x86/keylocker: Enabled.\n");
 	return 0;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 63230ff8cf4f..e99be45354cd 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -27,6 +27,7 @@
 #include <asm/mmu_context.h>
 #include <asm/cpu_device_id.h>
 #include <asm/microcode.h>
+#include <asm/keylocker.h>
 
 #ifdef CONFIG_X86_32
 __visible unsigned long saved_context_ebx;
@@ -264,6 +265,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	x86_platform.restore_sched_clock_state();
 	cache_bp_restore();
 	perf_restore_debug_store();
+	restore_keylocker();
 
 	c = &cpu_data(smp_processor_id());
 	if (cpu_has(c, X86_FEATURE_MSR_IA32_FEAT_CTL))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 09/14] x86/hotplug/keylocker: Ensure wrapping key backup capability
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (7 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 08/14] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4 Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation Chang S. Bae
                           ` (5 subsequent siblings)
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Sangwhan Moon

To facilitate CPU hotplug, the wrapping key needs to be loaded during CPU
hotplug bringup. setup_keylocker() already establishes the routine
for the wakeup path by copying the key from the backup state.

Disable the feature if it's missing with CONFIG_HOTPLUG_CPU=y. Also,
update the code comment to indicate support for CPU hotplug.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Sangwhan Moon <sxm@google.com>
---
Changes from v8:
* Add as a new patch.
---
 arch/x86/kernel/keylocker.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index d5d11d0263b7..1b57e11d93ad 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -69,6 +69,8 @@ static bool copy_keylocker(void)
  * On wakeup, APs copy a wrapping key after the boot CPU verifies a valid
  * backup status through restore_keylocker(). Subsequently, they adhere
  * to the error handling protocol by invalidating the flag.
+ *
+ * This setup routine is also invoked in the hotplug bringup path.
  */
 void setup_keylocker(void)
 {
@@ -135,11 +137,11 @@ static int __init init_keylocker(void)
 
 	/*
 	 * The backup is critical for restoring the wrapping key upon
-	 * wakeup.
+	 * wakeup or during hotplug bringup.
 	 */
 	backup_available = !!(ebx & KEYLOCKER_CPUID_EBX_BACKUP);
-	if (!backup_available && IS_ENABLED(CONFIG_SUSPEND)) {
-		pr_debug("x86/keylocker: No key backup with possible S3/4.\n");
+	if (!backup_available && (IS_ENABLED(CONFIG_SUSPEND) || IS_ENABLED(CONFIG_HOTPLUG_CPU))) {
+		pr_debug("x86/keylocker: No key backup with possible S3/4 or CPU hotplug.\n");
 		goto clear_cap;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (8 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 09/14] x86/hotplug/keylocker: Ensure wrapping key backup capability Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  6:57           ` Pawan Gupta
  2024-03-29  1:53         ` [PATCH v9 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation Chang S. Bae
                           ` (4 subsequent siblings)
  14 siblings, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Dave Hansen, Pawan Gupta

Gather Data Sampling is a transient execution side channel issue in some
CPU models. The stale data in registers is not guaranteed as secure when
this vulnerability is not addressed.

In the Key Locker usage during AES transformations, the temporary storage
of the original key in registers poses a risk. The key material can be
staled in some implementations, leading to susceptibility to leakage of
the AES key.

To mitigate this vulnerability, a qualified microcode image must be
applied. Software then verifies the mitigation state using MSRs. Add code
to ensure that the mitigation is installed and securely locked. Disable
the feature, otherwise.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Changes from v8:
* Add as a new patch.

Note that the code follows the guidance from [1]:
  "Intel recommends that system software does not enable Key Locker (by
   setting CR4.KL) unless the GDS mitigation is enabled
   (IA32_MCU_OPT_CTRL[GDS_MITG_DIS] (bit 4) is 0) and locked (IA32_MCU_OPT_CTRL
   [GDS_MITG_LOCK](bit 5) is 1)."

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
---
 arch/x86/kernel/keylocker.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 1b57e11d93ad..d4f3aa65ea8a 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -7,6 +7,7 @@
 #include <linux/random.h>
 #include <linux/string.h>
 
+#include <asm/cpu.h>
 #include <asm/fpu/api.h>
 #include <asm/keylocker.h>
 #include <asm/msr.h>
@@ -112,6 +113,37 @@ void restore_keylocker(void)
 	valid_wrapping_key = false;
 }
 
+/*
+ * The mitigation is implemented at a microcode level. Ensure that the
+ * microcode update is applied and the mitigation is locked.
+ */
+static bool __init have_gds_mitigation(void)
+{
+	u64 mcu_ctrl;
+
+	/* GDS_CTRL is set if new microcode is loaded. */
+	if (!(x86_read_arch_cap_msr() & ARCH_CAP_GDS_CTRL))
+		goto vulnerable;
+
+	/* If GDS_MITG_LOCKED is set, GDS_MITG_DIS is forced to 0. */
+	rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
+	if (mcu_ctrl & GDS_MITG_LOCKED)
+		return true;
+
+vulnerable:
+	pr_warn("x86/keylocker: Susceptible to the GDS vulnerability.\n");
+	return false;
+}
+
+/* Check if Key Locker is secure enough to be used. */
+static bool __init secure_keylocker(void)
+{
+	if (boot_cpu_has_bug(X86_BUG_GDS) && !have_gds_mitigation())
+		return false;
+
+	return true;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
@@ -125,6 +157,9 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	if (!secure_keylocker())
+		goto clear_cap;
+
 	cr4_set_bits(X86_CR4_KEYLOCKER);
 
 	/* AESKLE depends on CR4.KEYLOCKER */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (9 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  6:20           ` Pawan Gupta
  2024-03-29  1:53         ` [PATCH v9 12/14] x86/Kconfig: Add a configuration for Key Locker Chang S. Bae
                           ` (3 subsequent siblings)
  14 siblings, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Dave Hansen, Pawan Gupta

The Register File Data Sampling vulnerability may allow malicious
userspace programs to infer stale kernel register data, potentially
exposing sensitive key values, including AES keys.

To address this vulnerability, a microcode update needs to be applied to
the CPU, which modifies the VERW instruction to flush the affected CPU
buffers.

The kernel already has a facility to flush CPU buffers before returning
to userspace, which is indicated by the X86_FEATURE_CLEAR_CPU_BUF flag.

Ensure the mitigation before enabling Key Locker. Do not enable the
feature on CPUs affected by the vulnerability but lacks mitigation.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Change from v8:
* Add as a new patch.

Note that the code change follows the mitigation guidance [1]:
  "Software loading Key Locker keys using LOADIWKEY should execute a VERW
   to clear registers before transitioning to untrusted code to prevent
   later software from inferring the loaded key."

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
---
 arch/x86/kernel/keylocker.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index d4f3aa65ea8a..6e805c4da76d 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -135,12 +135,29 @@ static bool __init have_gds_mitigation(void)
 	return false;
 }
 
+/*
+ * IA32_ARCH_CAPABILITIES MSR is retrieved during the setting of
+ * X86_BUG_RFDS. Ensure that the mitigation is applied to flush CPU
+ * buffers by checking the flag.
+ */
+static bool __init have_rfds_mitigation(void)
+{
+	if (boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
+		return true;
+
+	pr_warn("x86/keylocker: Susceptible to the RFDS vulnerability.\n");
+	return false;
+}
+
 /* Check if Key Locker is secure enough to be used. */
 static bool __init secure_keylocker(void)
 {
 	if (boot_cpu_has_bug(X86_BUG_GDS) && !have_gds_mitigation())
 		return false;
 
+	if (boot_cpu_has_bug(X86_BUG_RFDS) && !have_rfds_mitigation())
+		return false;
+
 	return true;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 12/14] x86/Kconfig: Add a configuration for Key Locker
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (10 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 13/14] crypto: x86/aes - Prepare for new AES-XTS implementation Chang S. Bae
                           ` (2 subsequent siblings)
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

Add CONFIG_X86_KEYLOCKER to gate whether Key Locker is initialized at
boot. The option is selected by the Key Locker cipher module
CRYPTO_AES_KL (to be added in a later patch).

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
---
Changes from v8:
* Drop the "nokeylocker" option. (Borislav Petkov)

Changes from v6:
* Rebase on the upstream: commit a894a8a56b57 ("Documentation:
  kernel-parameters: sort all "no..." parameters")

Changes from RFC v2:
* Make the option selected by CRYPTO_AES_KL. (Dan Williams)
* Massage the changelog and the config option description.
---
 arch/x86/Kconfig | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 39886bab943a..41eb88dcfb62 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1878,6 +1878,9 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS
 
 	  If unsure, say y.
 
+config X86_KEYLOCKER
+	bool
+
 choice
 	prompt "TSX enable mode"
 	depends on CPU_SUP_INTEL
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 13/14] crypto: x86/aes - Prepare for new AES-XTS implementation
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (11 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 12/14] x86/Kconfig: Add a configuration for Key Locker Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-03-29  1:53         ` [PATCH v9 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm Chang S. Bae
  2024-04-07 23:24         ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

The key Locker's AES instruction set ('AES-KL') shares a similar
programming interface with AES-NI. The internal ABI in the assembly code
will have the same prototype as AES-NI, and the glue code will also be
identical.

The upcoming AES code will exclusively support the XTS mode as disk
encryption is the only intended use case.

Refactor the XTS-related code to eliminate code duplication and relocate
certain constant values to make them shareable. Also, introduce wrappers
for data transformation functions to return an error code, as AES-KL may
populate it.

Introduce union x86_aes_ctx as AES-KL will reference an encoded form
instead of an expanded AES key. This allows different AES context formats
in the shared code.

Inline the refactored code to the caller to prevent the potential
overhead of indirect calls.

No functional changes or performance regressions are intended.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
Changes from v8:
* Rebase on the AES-NI changes in the mainline -- mostly cleanup works.
* Introduce 'union x86_aes_ctx'. (Eric Biggers)
* Ensure 'inline' for wrapper functions.
* Tweak the changelog.

Changes from v7:
* Remove aesni_dec() as not referenced by the refactored helpers. But,
  keep the ASM symbol '__aesni_dec' to make it appears consistent with
  its counterpart '__aesni_enc'.
* Call out 'AES-XTS' in the subject.

Changes from v6:
* Inline the helper code to avoid the indirect call. (Eric Biggers)
* Rename the filename: aes-intel* -> aes-helper*. (Eric Biggers)
* Don't export symbols yet here. Instead, do it when needed later.
* Improve the coding style:
  - Follow the symbol convention: '_' -> '__' (Eric Biggers)
  - Fix a style issue -- 'dst = src = ...' catched by checkpatch.pl:
    "CHECK: multiple assignments should be avoided"
* Cleanup: move some define back to AES-NI code as not used by AES-KL

Changes from v5:
* Clean up the staled function definition -- cbc_crypt_common().
* Ensure kernel_fpu_end() for the possible error return from
  xts_crypt_common()->crypt1_fn().

Changes from v4:
* Drop CBC mode changes. (Eric Biggers)

Changes from v3:
* Drop ECB and CTR mode changes. (Eric Biggers)
* Export symbols. (Eric Biggers)

Changes from RFC v2:
* Massage the changelog. (Dan Williams)

Changes from RFC v1:
* Added as a new patch. (Ard Biesheuvel)
---
 arch/x86/crypto/aes-helper_asm.S   |  22 +++
 arch/x86/crypto/aes-helper_glue.h  | 167 ++++++++++++++++++++++
 arch/x86/crypto/aesni-intel_asm.S  |  47 +++----
 arch/x86/crypto/aesni-intel_glue.c | 213 ++++++++---------------------
 4 files changed, 261 insertions(+), 188 deletions(-)
 create mode 100644 arch/x86/crypto/aes-helper_asm.S
 create mode 100644 arch/x86/crypto/aes-helper_glue.h

diff --git a/arch/x86/crypto/aes-helper_asm.S b/arch/x86/crypto/aes-helper_asm.S
new file mode 100644
index 000000000000..b31abcdf63cb
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_asm.S
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+/*
+ * Constant values shared between AES implementations:
+ */
+
+.pushsection .rodata
+.align 16
+.Lcts_permute_table:
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
+	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
+.popsection
+
+.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
+.align 16
+.Lgf128mul_x_ble_mask:
+	.octa 0x00000000000000010000000000000087
+.previous
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
new file mode 100644
index 000000000000..52ba1fe5cf71
--- /dev/null
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -0,0 +1,167 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Shared glue code between AES implementations, refactored from the AES-NI's.
+ *
+ * The helper code is inlined for a performance reason. With the mitigation
+ * for speculative executions like retpoline, indirect calls become very
+ * expensive at a cost of measurable overhead.
+ */
+
+#ifndef _AES_HELPER_GLUE_H
+#define _AES_HELPER_GLUE_H
+
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/simd.h>
+
+#define AES_ALIGN		16
+#define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
+#define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
+#define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
+
+/*
+ * Preserve data types for various AES implementations available in x86
+ */
+union x86_aes_ctx {
+	struct crypto_aes_ctx aesni;
+};
+
+struct aes_xts_ctx {
+	union x86_aes_ctx tweak_ctx AES_ALIGN_ATTR;
+	union x86_aes_ctx crypt_ctx AES_ALIGN_ATTR;
+};
+
+static inline void *aes_align_addr(void *addr)
+{
+	return (crypto_tfm_ctx_alignment() >= AES_ALIGN) ? addr : PTR_ALIGN(addr, AES_ALIGN);
+}
+
+static inline struct aes_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
+{
+	return aes_align_addr(crypto_skcipher_ctx(tfm));
+}
+
+static inline int
+xts_setkey_common(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen,
+		  int (*fn)(union x86_aes_ctx *ctx, const u8 *in_key, unsigned int key_len))
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int err;
+
+	err = xts_verify_key(tfm, key, keylen);
+	if (err)
+		return err;
+
+	keylen /= 2;
+
+	/* first half of xts-key is for crypt */
+	err = fn(&ctx->crypt_ctx, key, keylen);
+	if (err)
+		return err;
+
+	/* second half of xts-key is for tweak */
+	return fn(&ctx->tweak_ctx, key + keylen, keylen);
+}
+
+static inline int
+xts_crypt_common(struct skcipher_request *req,
+		 int (*crypt_fn)(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				 unsigned int len, u8 *iv),
+		 int (*crypt1_fn)(const void *ctx, u8 *out, const u8 *in))
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct aes_xts_ctx *ctx = aes_xts_ctx(tfm);
+	int tail = req->cryptlen % AES_BLOCK_SIZE;
+	struct skcipher_request subreq;
+	struct skcipher_walk walk;
+	int err;
+
+	if (req->cryptlen < AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	err = skcipher_walk_virt(&walk, req, false);
+	if (!walk.nbytes)
+		return err;
+
+	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
+		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
+
+		skcipher_walk_abort(&walk);
+
+		skcipher_request_set_tfm(&subreq, tfm);
+		skcipher_request_set_callback(&subreq,
+					      skcipher_request_flags(req),
+					      NULL, NULL);
+		skcipher_request_set_crypt(&subreq, req->src, req->dst,
+					   blocks * AES_BLOCK_SIZE, req->iv);
+		req = &subreq;
+
+		err = skcipher_walk_virt(&walk, req, false);
+		if (!walk.nbytes)
+			return err;
+	} else {
+		tail = 0;
+	}
+
+	kernel_fpu_begin();
+
+	/* calculate first value of T */
+	err = crypt1_fn(&ctx->tweak_ctx, walk.iv, walk.iv);
+	if (err) {
+		kernel_fpu_end();
+		return err;
+	}
+
+	while (walk.nbytes > 0) {
+		int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes &= ~(AES_BLOCK_SIZE - 1);
+
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+
+		if (walk.nbytes > 0)
+			kernel_fpu_begin();
+	}
+
+	if (unlikely(tail > 0 && !err)) {
+		struct scatterlist sg_src[2], sg_dst[2];
+		struct scatterlist *src, *dst;
+
+		src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
+		if (req->dst != req->src)
+			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
+		else
+			dst = src;
+
+		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
+					   req->iv);
+
+		err = skcipher_walk_virt(&walk, &subreq, false);
+		if (err)
+			return err;
+
+		kernel_fpu_begin();
+		err = crypt_fn(&ctx->crypt_ctx, walk.dst.virt.addr, walk.src.virt.addr,
+			       walk.nbytes, walk.iv);
+		kernel_fpu_end();
+		if (err)
+			return err;
+
+		err = skcipher_walk_done(&walk, 0);
+	}
+	return err;
+}
+
+#endif /* _AES_HELPER_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S
index 7ecb55cae3d6..1015a36a73a0 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -28,6 +28,7 @@
 #include <linux/linkage.h>
 #include <asm/frame.h>
 #include <asm/nospec-branch.h>
+#include "aes-helper_asm.S"
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -1934,9 +1935,9 @@ SYM_FUNC_START(aesni_set_key)
 SYM_FUNC_END(aesni_set_key)
 
 /*
- * void aesni_enc(const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_enc(const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_enc)
+SYM_FUNC_START(__aesni_enc)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -1955,7 +1956,7 @@ SYM_FUNC_START(aesni_enc)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_enc)
+SYM_FUNC_END(__aesni_enc)
 
 /*
  * _aesni_enc1:		internal ABI
@@ -2123,9 +2124,9 @@ SYM_FUNC_START_LOCAL(_aesni_enc4)
 SYM_FUNC_END(_aesni_enc4)
 
 /*
- * void aesni_dec (const void *ctx, u8 *dst, const u8 *src)
+ * void __aesni_dec (const void *ctx, u8 *dst, const u8 *src)
  */
-SYM_FUNC_START(aesni_dec)
+SYM_FUNC_START(__aesni_dec)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl KEYP
@@ -2145,7 +2146,7 @@ SYM_FUNC_START(aesni_dec)
 #endif
 	FRAME_END
 	RET
-SYM_FUNC_END(aesni_dec)
+SYM_FUNC_END(__aesni_dec)
 
 /*
  * _aesni_dec1:		internal ABI
@@ -2688,22 +2689,14 @@ SYM_FUNC_START(aesni_cts_cbc_dec)
 	RET
 SYM_FUNC_END(aesni_cts_cbc_dec)
 
+#ifdef __x86_64__
+
 .pushsection .rodata
 .align 16
-.Lcts_permute_table:
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07
-	.byte		0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-	.byte		0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80
-#ifdef __x86_64__
 .Lbswap_mask:
 	.byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0
-#endif
 .popsection
 
-#ifdef __x86_64__
 /*
  * _aesni_inc_init:	internal ABI
  *	setup registers used by _aesni_inc
@@ -2818,12 +2811,6 @@ SYM_FUNC_END(aesni_ctr_enc)
 
 #endif
 
-.section	.rodata.cst16.gf128mul_x_ble_mask, "aM", @progbits, 16
-.align 16
-.Lgf128mul_x_ble_mask:
-	.octa 0x00000000000000010000000000000087
-.previous
-
 /*
  * _aesni_gf128mul_x_ble:		internal ABI
  *	Multiply in GF(2^128) for XTS IVs
@@ -2843,10 +2830,10 @@ SYM_FUNC_END(aesni_ctr_enc)
 	pxor KEY, IV;
 
 /*
- * void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_encrypt)
+SYM_FUNC_START(__aesni_xts_encrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -2995,13 +2982,13 @@ SYM_FUNC_START(aesni_xts_encrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_enc_ret
-SYM_FUNC_END(aesni_xts_encrypt)
+SYM_FUNC_END(__aesni_xts_encrypt)
 
 /*
- * void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
- *			  const u8 *src, unsigned int len, le128 *iv)
+ * void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *dst,
+ *			    const u8 *src, unsigned int len, le128 *iv)
  */
-SYM_FUNC_START(aesni_xts_decrypt)
+SYM_FUNC_START(__aesni_xts_decrypt)
 	FRAME_BEGIN
 #ifndef __x86_64__
 	pushl IVP
@@ -3157,4 +3144,4 @@ SYM_FUNC_START(aesni_xts_decrypt)
 
 	movups STATE, (OUTP)
 	jmp .Lxts_dec_ret
-SYM_FUNC_END(aesni_xts_decrypt)
+SYM_FUNC_END(__aesni_xts_decrypt)
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 0ea3abaaa645..4ac7b9a28967 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -36,33 +36,25 @@
 #include <linux/spinlock.h>
 #include <linux/static_call.h>
 
+#include "aes-helper_glue.h"
 
-#define AESNI_ALIGN	16
-#define AESNI_ALIGN_ATTR __attribute__ ((__aligned__(AESNI_ALIGN)))
-#define AES_BLOCK_MASK	(~(AES_BLOCK_SIZE - 1))
 #define RFC4106_HASH_SUBKEY_SIZE 16
-#define AESNI_ALIGN_EXTRA ((AESNI_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
-#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
-#define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
+#define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
+#define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AES_ALIGN_EXTRA)
 
 /* This data is stored at the end of the crypto_tfm struct.
  * It's a type of per "session" data storage location.
  * This needs to be 16 byte aligned.
  */
 struct aesni_rfc4106_gcm_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 	u8 nonce[4];
 };
 
 struct generic_gcmaes_ctx {
-	u8 hash_subkey[16] AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx aes_key_expanded AESNI_ALIGN_ATTR;
-};
-
-struct aesni_xts_ctx {
-	struct crypto_aes_ctx tweak_ctx AESNI_ALIGN_ATTR;
-	struct crypto_aes_ctx crypt_ctx AESNI_ALIGN_ATTR;
+	u8 hash_subkey[16] AES_ALIGN_ATTR;
+	struct crypto_aes_ctx aes_key_expanded AES_ALIGN_ATTR;
 };
 
 #define GCM_BLOCK_LEN 16
@@ -80,17 +72,10 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-static inline void *aes_align_addr(void *addr)
-{
-	if (crypto_tfm_ctx_alignment() >= AESNI_ALIGN)
-		return addr;
-	return PTR_ALIGN(addr, AESNI_ALIGN);
-}
-
 asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
 			      unsigned int key_len);
-asmlinkage void aesni_enc(const void *ctx, u8 *out, const u8 *in);
-asmlinkage void aesni_dec(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
 asmlinkage void aesni_ecb_dec(struct crypto_aes_ctx *ctx, u8 *out,
@@ -104,14 +89,20 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
+static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
 
-asmlinkage void aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				  const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
 
 #ifdef CONFIG_X86_64
 
@@ -223,11 +214,6 @@ static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
 	return aes_align_addr(raw_ctx);
 }
 
-static inline struct aesni_xts_ctx *aes_xts_ctx(struct crypto_skcipher *tfm)
-{
-	return aes_align_addr(crypto_skcipher_ctx(tfm));
-}
-
 static int aes_set_key_common(struct crypto_aes_ctx *ctx,
 			      const u8 *in_key, unsigned int key_len)
 {
@@ -261,7 +247,7 @@ static void aesni_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_encrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_enc(ctx, dst, src);
+		__aesni_enc(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
@@ -274,11 +260,31 @@ static void aesni_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
 		aes_decrypt(ctx, dst, src);
 	} else {
 		kernel_fpu_begin();
-		aesni_dec(ctx, dst, src);
+		__aesni_dec(ctx, dst, src);
 		kernel_fpu_end();
 	}
 }
 
+static inline int aesni_xts_setkey(union x86_aes_ctx *ctx,
+				   const u8 *in_key, unsigned int key_len)
+{
+	return aes_set_key_common(&ctx->aesni, in_key, key_len);
+}
+
+static inline int aesni_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
+static inline int aesni_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
 static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			         unsigned int len)
 {
@@ -524,7 +530,7 @@ static int ctr_crypt(struct skcipher_request *req)
 		nbytes &= ~AES_BLOCK_MASK;
 
 		if (walk.nbytes == walk.total && nbytes > 0) {
-			aesni_enc(ctx, keystream, walk.iv);
+			__aesni_enc(ctx, keystream, walk.iv);
 			crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - nbytes,
 				       walk.src.virt.addr + walk.nbytes - nbytes,
 				       keystream, nbytes);
@@ -668,8 +674,8 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
 			      u8 *iv, void *aes_ctx, u8 *auth_tag,
 			      unsigned long auth_tag_len)
 {
-	u8 databuf[sizeof(struct gcm_context_data) + (AESNI_ALIGN - 8)] __aligned(8);
-	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AESNI_ALIGN);
+	u8 databuf[sizeof(struct gcm_context_data) + (AES_ALIGN - 8)] __aligned(8);
+	struct gcm_context_data *data = PTR_ALIGN((void *)databuf, AES_ALIGN);
 	unsigned long left = req->cryptlen;
 	struct scatter_walk assoc_sg_walk;
 	struct skcipher_walk walk;
@@ -824,8 +830,8 @@ static int helper_rfc4106_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 	__be32 counter = cpu_to_be32(1);
 
@@ -852,8 +858,8 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct aesni_rfc4106_gcm_ctx *ctx = aesni_rfc4106_gcm_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	unsigned int i;
 
 	if (unlikely(req->assoclen != 16 && req->assoclen != 20))
@@ -878,126 +884,17 @@ static int helper_rfc4106_decrypt(struct aead_request *req)
 static int xts_aesni_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			    unsigned int keylen)
 {
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int err;
-
-	err = xts_verify_key(tfm, key, keylen);
-	if (err)
-		return err;
-
-	keylen /= 2;
-
-	/* first half of xts-key is for crypt */
-	err = aes_set_key_common(&ctx->crypt_ctx, key, keylen);
-	if (err)
-		return err;
-
-	/* second half of xts-key is for tweak */
-	return aes_set_key_common(&ctx->tweak_ctx, key + keylen, keylen);
-}
-
-static int xts_crypt(struct skcipher_request *req, bool encrypt)
-{
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
-	struct aesni_xts_ctx *ctx = aes_xts_ctx(tfm);
-	int tail = req->cryptlen % AES_BLOCK_SIZE;
-	struct skcipher_request subreq;
-	struct skcipher_walk walk;
-	int err;
-
-	if (req->cryptlen < AES_BLOCK_SIZE)
-		return -EINVAL;
-
-	err = skcipher_walk_virt(&walk, req, false);
-	if (!walk.nbytes)
-		return err;
-
-	if (unlikely(tail > 0 && walk.nbytes < walk.total)) {
-		int blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2;
-
-		skcipher_walk_abort(&walk);
-
-		skcipher_request_set_tfm(&subreq, tfm);
-		skcipher_request_set_callback(&subreq,
-					      skcipher_request_flags(req),
-					      NULL, NULL);
-		skcipher_request_set_crypt(&subreq, req->src, req->dst,
-					   blocks * AES_BLOCK_SIZE, req->iv);
-		req = &subreq;
-
-		err = skcipher_walk_virt(&walk, req, false);
-		if (!walk.nbytes)
-			return err;
-	} else {
-		tail = 0;
-	}
-
-	kernel_fpu_begin();
-
-	/* calculate first value of T */
-	aesni_enc(&ctx->tweak_ctx, walk.iv, walk.iv);
-
-	while (walk.nbytes > 0) {
-		int nbytes = walk.nbytes;
-
-		if (nbytes < walk.total)
-			nbytes &= ~(AES_BLOCK_SIZE - 1);
-
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
-
-		if (walk.nbytes > 0)
-			kernel_fpu_begin();
-	}
-
-	if (unlikely(tail > 0 && !err)) {
-		struct scatterlist sg_src[2], sg_dst[2];
-		struct scatterlist *src, *dst;
-
-		dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen);
-		if (req->dst != req->src)
-			dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen);
-
-		skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail,
-					   req->iv);
-
-		err = skcipher_walk_virt(&walk, &subreq, false);
-		if (err)
-			return err;
-
-		kernel_fpu_begin();
-		if (encrypt)
-			aesni_xts_encrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		else
-			aesni_xts_decrypt(&ctx->crypt_ctx,
-					  walk.dst.virt.addr, walk.src.virt.addr,
-					  walk.nbytes, walk.iv);
-		kernel_fpu_end();
-
-		err = skcipher_walk_done(&walk, 0);
-	}
-	return err;
+	return xts_setkey_common(tfm, key, keylen, aesni_xts_setkey);
 }
 
 static int xts_encrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, true);
+	return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
 }
 
 static int xts_decrypt(struct skcipher_request *req)
 {
-	return xts_crypt(req, false);
+	return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
 }
 
 static struct crypto_alg aesni_cipher_alg = {
@@ -1152,8 +1049,8 @@ static int generic_gcmaes_encrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 	__be32 counter = cpu_to_be32(1);
 
 	memcpy(iv, req->iv, 12);
@@ -1169,8 +1066,8 @@ static int generic_gcmaes_decrypt(struct aead_request *req)
 	struct crypto_aead *tfm = crypto_aead_reqtfm(req);
 	struct generic_gcmaes_ctx *ctx = generic_gcmaes_ctx_get(tfm);
 	void *aes_ctx = &(ctx->aes_key_expanded);
-	u8 ivbuf[16 + (AESNI_ALIGN - 8)] __aligned(8);
-	u8 *iv = PTR_ALIGN(&ivbuf[0], AESNI_ALIGN);
+	u8 ivbuf[16 + (AES_ALIGN - 8)] __aligned(8);
+	u8 *iv = PTR_ALIGN(&ivbuf[0], AES_ALIGN);
 
 	memcpy(iv, req->iv, 12);
 	*((__be32 *)(iv+12)) = counter;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (12 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 13/14] crypto: x86/aes - Prepare for new AES-XTS implementation Chang S. Bae
@ 2024-03-29  1:53         ` Chang S. Bae
  2024-04-07 23:24         ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
  14 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-03-29  1:53 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae

Key Locker is a CPU feature to reduce key exfiltration opportunities.
It converts the AES key into an encoded form, called 'key handle', to
reduce the exposure of private key material in memory.

This key conversion as well as all subsequent data transformations are
provided by new AES instructions ('AES-KL'). AES-KL is analogous to
that of AES-NI as maintains a similar programming interface.

Support the XTS mode as the primary use case is dm-crypt. The support has
some details worth mentioning, which differentiate itself from AES-NI,
that users may need to be aware of:

== Key Handle Restriction ==

The AES-KL instruction set supports selecting key usage restrictions at
key handle creation time. Restrict all key handles created by the kernel
to kernel mode use only.

The AES-KL instructions themselves are executable in userspace. This
restriction enforces the mode consistency in its operation.

If the key handle is created in userspace but referenced in the kernel,
then encrypt() and decrypt() functions will return -EINVAL.

=== AES-NI Dependency for AES Compliance ===

Key Locker is not AES compliant as it lacks 192-bit key support.
However, per the expectations of Linux crypto-cipher implementations,
the software cipher implementation must support all the AES-compliant
key sizes.

The AES-KL cipher implementation achieves this constraint by logging a
warning and falling back to AES-NI. In other words, the 192-bit
key-size limitation for what can be converted into a key handle is
only documented, not enforced.

This then creates a rather strong dependency on AES-NI. If this driver
supports a module build, the exported AES-NI functions cannot be inlined.
More importantly, indirect calls will impact the performance.

To simplify, disallow a module build for AES-KL and always select AES-NI.
This restriction can be relaxed if strong use cases arise against it.

== Wrapping Key Restore Failure Handling ==

In the event of hardware failure, the wrapping key is lost from deep
sleep states. Then, the wrapping key turns to zero which is an unusable
state.

The x86 core provides valid_keylocker() to indicate the failure.
Subsequent setkey() as well as encode()/decode() can check it and return
-ENODEV if failed. In this way, an error code can be returned, instead of
facing abrupt exceptions.

== Userspace Exposition ==

The Keylocker implementations so far have measurable performance
penalties. So, keep AES-NI as the default.

However, with a slow storage device, storage bandwidth is the bottleneck,
even if disk encryption is enabled by AES-KL. Thus, it is an end-user
consideration for selecting AES-KL. Users may pick it according to the
name 'xts-aes-aeskl' shown in /proc/crypto.

== 64-bit Only ==

Support 64-bit only, as the 32-bit kernel is being deprecated.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
Changes from v8:
* Rebase on the upstream changes.
* Combine the XTS enc/dec assembly code in a macro. (Eric Biggers)
* Define setkey() as void instead of returning 'int'. (Eric Biggers)
* Rearrange the assembly code to reduce jumps especially for success
  cases. (Eric Biggers)
* Update the changelog for clarification. (Eric Biggers)
* Exclude module build.

Changes from v7:
* Update the changelog -- remove 'API Limitation'. (Eric Biggers)
* Update the comment for valid_keylocker(). (Eric Biggers)
* Improve the code:
  - Remove the key-length check and simplify the code. (Eric Biggers)
  - Remove aeskl_dec() and __aeskl_dec() as not needed.
  - Simplify the register-function return handling. (Eric Biggers)
  - Rename setkey functions for coherent naming:
    aeskl_setkey() -> __aeskl_setkey(),
    aeskl_setkey_common() -> aeskl_setkey(),
    aeskl_xts_setkey() -> xts_setkey()
  - Revert an unnecessary comment.

Changes from v6:
* Merge all the AES-KL patches. (Eric Biggers)
* Make the driver for the 64-bit mode only. (Eric Biggers)
* Rework the key-size check code:
  - Trim unnecessary checks. (Eric Biggers)
  - Document the reason
  - Make sure both XTS keys with the same size
* Adjust the Kconfig change:
  - Move the location. (Robert Elliott)
  - Trim the description to follow others such as AES-NI.
* Update the changelog:
  - Explain the priority value for the common name under 'User
    Exposition' (renamed from 'Performance'). (Eric Biggers)
  - Trim the introduction
  - Switch to more imperative mood for those explaining the code
    change
  - Add a new section '64-bit Only'
* Adjust the ASM code to return a proper error code. (Eric Biggers)
* Update assembly code macros:
  - Remove unused one.
  - Document the reason for the duplicated ones.

Changes from v5:
* Replace the ret instruction with RET as rebased on the upstream -- commit
  f94909ceb1ed ("x86: Prepare asm files for straight-line-speculation").

Changes from v3:
* Exclude non-AES-KL objects. (Eric Biggers)
* Simplify the assembler dependency check. (Peter Zijlstra)
* Trim the Kconfig help text. (Dan Williams)
* Fix a defined-but-not-used warning.

Changes from RFC v2:
* Move out each mode support in new patches.
* Update the changelog to describe the limitation and the tradeoff
  clearly. (Andy Lutomirski)

Changes from RFC v1:
* Rebased on the refactored code. (Ard Biesheuvel)
* Dropped exporting the single block interface. (Ard Biesheuvel)
* Fixed the fallback and error handling paths. (Ard Biesheuvel)
* Revised the module description. (Dave Hansen and Peter Zijlstra)
* Made the build depend on the binutils version to support new
  instructions. (Borislav Petkov and Peter Zijlstra)
* Updated the changelog accordingly.
---
 arch/x86/Kconfig.assembler         |   5 +
 arch/x86/crypto/Kconfig            |  17 ++
 arch/x86/crypto/Makefile           |   3 +
 arch/x86/crypto/aes-helper_glue.h  |   7 +-
 arch/x86/crypto/aeskl-intel_asm.S  | 412 +++++++++++++++++++++++++++++
 arch/x86/crypto/aeskl-intel_glue.c | 187 +++++++++++++
 arch/x86/crypto/aeskl-intel_glue.h |  35 +++
 arch/x86/crypto/aesni-intel_glue.c |  30 +--
 arch/x86/crypto/aesni-intel_glue.h |  40 +++
 9 files changed, 704 insertions(+), 32 deletions(-)
 create mode 100644 arch/x86/crypto/aeskl-intel_asm.S
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.c
 create mode 100644 arch/x86/crypto/aeskl-intel_glue.h
 create mode 100644 arch/x86/crypto/aesni-intel_glue.h

diff --git a/arch/x86/Kconfig.assembler b/arch/x86/Kconfig.assembler
index 8ad41da301e5..0e58f2b61dd3 100644
--- a/arch/x86/Kconfig.assembler
+++ b/arch/x86/Kconfig.assembler
@@ -25,6 +25,11 @@ config AS_GFNI
 	help
 	  Supported by binutils >= 2.30 and LLVM integrated assembler
 
+config AS_HAS_KEYLOCKER
+	def_bool $(as-instr,encodekey256 %eax$(comma)%eax)
+	help
+	  Supported by binutils >= 2.36 and LLVM integrated assembler >= V12
+
 config AS_WRUSS
 	def_bool $(as-instr,wrussq %rax$(comma)(%rbx))
 	help
diff --git a/arch/x86/crypto/Kconfig b/arch/x86/crypto/Kconfig
index c9e59589a1ce..067bb149998b 100644
--- a/arch/x86/crypto/Kconfig
+++ b/arch/x86/crypto/Kconfig
@@ -29,6 +29,23 @@ config CRYPTO_AES_NI_INTEL
 	  Architecture: x86 (32-bit and 64-bit) using:
 	  - AES-NI (AES new instructions)
 
+config CRYPTO_AES_KL
+	bool "Ciphers: AES, modes: XTS (AES-KL)"
+	depends on X86 && 64BIT
+	depends on AS_HAS_KEYLOCKER
+	select CRYPTO_AES_NI_INTEL
+	select X86_KEYLOCKER
+
+	help
+	  Block cipher: AES cipher algorithms
+	  Length-preserving ciphers: AES with XTS
+
+	  Architecture: x86 (64-bit) using:
+	  - AES-KL (AES Key Locker)
+	  - AES-NI for a 192-bit key
+
+	  See Documentation/arch/x86/keylocker.rst for more details.
+
 config CRYPTO_BLOWFISH_X86_64
 	tristate "Ciphers: Blowfish, modes: ECB, CBC"
 	depends on X86 && 64BIT
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 9aa46093c91b..ae2aa7abd151 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -50,6 +50,9 @@ obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
 aesni-intel-$(CONFIG_64BIT) += aesni-intel_avx-x86_64.o aes_ctrby8_avx-x86_64.o
 
+obj-$(CONFIG_CRYPTO_AES_KL) += aeskl-intel.o
+aeskl-intel-y := aeskl-intel_asm.o aeskl-intel_glue.o
+
 obj-$(CONFIG_CRYPTO_SHA1_SSSE3) += sha1-ssse3.o
 sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
 sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
diff --git a/arch/x86/crypto/aes-helper_glue.h b/arch/x86/crypto/aes-helper_glue.h
index 52ba1fe5cf71..262c1cec0011 100644
--- a/arch/x86/crypto/aes-helper_glue.h
+++ b/arch/x86/crypto/aes-helper_glue.h
@@ -19,16 +19,17 @@
 #include <crypto/internal/aead.h>
 #include <crypto/internal/simd.h>
 
+#include "aeskl-intel_glue.h"
+
 #define AES_ALIGN		16
 #define AES_ALIGN_ATTR		__attribute__((__aligned__(AES_ALIGN)))
 #define AES_ALIGN_EXTRA		((AES_ALIGN - 1) & ~(CRYPTO_MINALIGN - 1))
 #define XTS_AES_CTX_SIZE	(sizeof(struct aes_xts_ctx) + AES_ALIGN_EXTRA)
 
-/*
- * Preserve data types for various AES implementations available in x86
- */
+/* Data types for the two AES implementations available in x86 */
 union x86_aes_ctx {
 	struct crypto_aes_ctx aesni;
+	struct aeskl_ctx aeskl;
 };
 
 struct aes_xts_ctx {
diff --git a/arch/x86/crypto/aeskl-intel_asm.S b/arch/x86/crypto/aeskl-intel_asm.S
new file mode 100644
index 000000000000..81af7f61aab5
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_asm.S
@@ -0,0 +1,412 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Implement AES algorithm using AES Key Locker instructions.
+ *
+ * Most code is based from the AES-NI implementation, aesni-intel_asm.S
+ *
+ */
+
+#include <linux/linkage.h>
+#include <linux/cfi_types.h>
+#include <asm/errno.h>
+#include <asm/inst.h>
+#include <asm/frame.h>
+#include "aes-helper_asm.S"
+
+.text
+
+#define STATE1	%xmm0
+#define STATE2	%xmm1
+#define STATE3	%xmm2
+#define STATE4	%xmm3
+#define STATE5	%xmm4
+#define STATE6	%xmm5
+#define STATE7	%xmm6
+#define STATE8	%xmm7
+#define STATE	STATE1
+
+#define IV	%xmm9
+#define KEY	%xmm10
+#define INC	%xmm13
+
+#define IN	%xmm8
+
+#define HANDLEP	%rdi
+#define OUTP	%rsi
+#define KLEN	%r9d
+#define INP	%rdx
+#define T1	%r10
+#define LEN	%rcx
+#define IVP	%r8
+
+#define UKEYP	OUTP
+#define GF128MUL_MASK %xmm11
+
+/*
+ * void __aeskl_setkey(struct crypto_aes_ctx *handlep, const u8 *ukeyp,
+ *		       unsigned int key_len)
+ */
+SYM_FUNC_START(__aeskl_setkey)
+	FRAME_BEGIN
+	movl %edx, 480(HANDLEP)
+	movdqu (UKEYP), STATE1
+	mov $1, %eax
+	cmp $16, %dl
+	je .Lsetkey_128
+
+	movdqu 0x10(UKEYP), STATE2
+	encodekey256 %eax, %eax
+	movdqu STATE4, 0x30(HANDLEP)
+	jmp .Lsetkey_end
+.Lsetkey_128:
+	encodekey128 %eax, %eax
+
+.Lsetkey_end:
+	movdqu STATE1, (HANDLEP)
+	movdqu STATE2, 0x10(HANDLEP)
+	movdqu STATE3, 0x20(HANDLEP)
+
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_setkey)
+
+/*
+ * int __aeskl_enc(const void *handlep, u8 *outp, const u8 *inp)
+ */
+SYM_FUNC_START(__aeskl_enc)
+	FRAME_BEGIN
+	movdqu (INP), STATE
+	movl 480(HANDLEP), KLEN
+
+	cmp $16, KLEN
+	je .Lenc_128
+	aesenc256kl (HANDLEP), STATE
+	jz .Lenc_err
+	xor %rax, %rax
+	jmp .Lenc_end
+.Lenc_128:
+	aesenc128kl (HANDLEP), STATE
+	jz .Lenc_err
+	xor %rax, %rax
+	jmp .Lenc_end
+
+.Lenc_err:
+	mov $(-EINVAL), %rax
+.Lenc_end:
+	movdqu STATE, (OUTP)
+	FRAME_END
+	RET
+SYM_FUNC_END(__aeskl_enc)
+
+/*
+ * XTS implementation
+ */
+
+/*
+ * _aeskl_gf128mul_x_ble: 	internal ABI
+ *	Multiply in GF(2^128) for XTS IVs
+ * input:
+ *	IV:	current IV
+ *	GF128MUL_MASK == mask with 0x87 and 0x01
+ * output:
+ *	IV:	next IV
+ * changed:
+ *	CTR:	== temporary value
+ *
+ * While based on the AES-NI code, this macro is separated here due to
+ * the register constraint. E.g., aesencwide256kl has implicit
+ * operands: XMM0-7.
+ */
+#define _aeskl_gf128mul_x_ble() \
+	pshufd $0x13, IV, KEY; \
+	paddq IV, IV; \
+	psrad $31, KEY; \
+	pand GF128MUL_MASK, KEY; \
+	pxor KEY, IV;
+
+.macro XTS_ENC_DEC operation
+	FRAME_BEGIN
+	movdqa .Lgf128mul_x_ble_mask(%rip), GF128MUL_MASK
+	movups (IVP), IV
+
+	mov 480(HANDLEP), KLEN
+
+.ifc \operation, dec
+	test $15, LEN
+	jz .Lxts_op8_\@
+	sub $16, LEN
+.endif
+
+.Lxts_op8_\@:
+	sub $128, LEN
+	jl .Lxts_op1_pre_\@
+
+	movdqa IV, STATE1
+	movdqu (INP), INC
+	pxor INC, STATE1
+	movdqu IV, (OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE2
+	movdqu 0x10(INP), INC
+	pxor INC, STATE2
+	movdqu IV, 0x10(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE3
+	movdqu 0x20(INP), INC
+	pxor INC, STATE3
+	movdqu IV, 0x20(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE4
+	movdqu 0x30(INP), INC
+	pxor INC, STATE4
+	movdqu IV, 0x30(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE5
+	movdqu 0x40(INP), INC
+	pxor INC, STATE5
+	movdqu IV, 0x40(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE6
+	movdqu 0x50(INP), INC
+	pxor INC, STATE6
+	movdqu IV, 0x50(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE7
+	movdqu 0x60(INP), INC
+	pxor INC, STATE7
+	movdqu IV, 0x60(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+	movdqa IV, STATE8
+	movdqu 0x70(INP), INC
+	pxor INC, STATE8
+	movdqu IV, 0x70(OUTP)
+
+	cmp $16, KLEN
+	je .Lxts_op8_128_\@
+.ifc \operation, dec
+	aesdecwide256kl (%rdi)
+.else
+	aesencwide256kl (%rdi)
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op8_end_\@
+.Lxts_op8_128_\@:
+.ifc \operation, dec
+	aesdecwide128kl (%rdi)
+.else
+	aesencwide128kl (%rdi)
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op8_end_\@:
+	movdqu 0x00(OUTP), INC
+	pxor INC, STATE1
+	movdqu STATE1, 0x00(OUTP)
+
+	movdqu 0x10(OUTP), INC
+	pxor INC, STATE2
+	movdqu STATE2, 0x10(OUTP)
+
+	movdqu 0x20(OUTP), INC
+	pxor INC, STATE3
+	movdqu STATE3, 0x20(OUTP)
+
+	movdqu 0x30(OUTP), INC
+	pxor INC, STATE4
+	movdqu STATE4, 0x30(OUTP)
+
+	movdqu 0x40(OUTP), INC
+	pxor INC, STATE5
+	movdqu STATE5, 0x40(OUTP)
+
+	movdqu 0x50(OUTP), INC
+	pxor INC, STATE6
+	movdqu STATE6, 0x50(OUTP)
+
+	movdqu 0x60(OUTP), INC
+	pxor INC, STATE7
+	movdqu STATE7, 0x60(OUTP)
+
+	movdqu 0x70(OUTP), INC
+	pxor INC, STATE8
+	movdqu STATE8, 0x70(OUTP)
+
+	_aeskl_gf128mul_x_ble()
+
+	add $128, INP
+	add $128, OUTP
+	test LEN, LEN
+	jnz .Lxts_op8_\@
+
+.Lxts_op_ret_\@:
+	movups IV, (IVP)
+	xor %rax, %rax
+	FRAME_END
+	RET
+
+.Lxts_op1_pre_\@:
+	add $128, LEN
+	jz .Lxts_op_ret_\@
+.ifc \operation, enc
+	sub $16, LEN
+	jl .Lxts_op_cts4_\@
+.endif
+
+.Lxts_op1_\@:
+	movdqu (INP), STATE1
+
+.ifc \operation, dec
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_op_cts1_\@
+.endif
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_op1_128_\@
+.ifc \operation, dec
+	aesdec256kl (HANDLEP), STATE1
+.else
+	aesenc256kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op1_end_\@
+.Lxts_op1_128_\@:
+.ifc \operation, dec
+	aesdec128kl (HANDLEP), STATE1
+.else
+	aesenc128kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op1_end_\@:
+	pxor IV, STATE1
+	_aeskl_gf128mul_x_ble()
+
+	test LEN, LEN
+	jz .Lxts_op1_out_\@
+
+.ifc \operation, enc
+	add $16, INP
+	sub $16, LEN
+	jl .Lxts_op_cts1_\@
+.endif
+
+	movdqu STATE1, (OUTP)
+	add $16, OUTP
+	jmp .Lxts_op1_\@
+
+.Lxts_op1_out_\@:
+	movdqu STATE1, (OUTP)
+	jmp .Lxts_op_ret_\@
+
+.Lxts_op_cts4_\@:
+.ifc \operation, enc
+	movdqu STATE8, STATE1
+	sub $16, OUTP
+.endif
+
+.Lxts_op_cts1_\@:
+.ifc \operation, dec
+	movdqa IV, STATE5
+	_aeskl_gf128mul_x_ble()
+
+	pxor IV, STATE1
+
+	cmp $16, KLEN
+	je .Lxts_dec1_cts_pre_128_\@
+	aesdec256kl (HANDLEP), STATE1
+	jz .Lxts_op_err_\@
+	jmp .Lxts_dec1_cts_pre_end_\@
+.Lxts_dec1_cts_pre_128_\@:
+	aesdec128kl (HANDLEP), STATE1
+	jz .Lxts_op_err_\@
+.Lxts_dec1_cts_pre_end_\@:
+	pxor IV, STATE1
+.endif
+
+	lea .Lcts_permute_table(%rip), T1
+	add LEN, INP		/* rewind input pointer */
+	add $16, LEN		/* # bytes in final block */
+	movups (INP), IN
+
+	mov T1, IVP
+	add $32, IVP
+	add LEN, T1
+	sub LEN, IVP
+	add OUTP, LEN
+
+	movups (T1), STATE2
+	movaps STATE1, STATE3
+	pshufb STATE2, STATE1
+	movups STATE1, (LEN)
+
+	movups (IVP), STATE1
+	pshufb STATE1, IN
+	pblendvb STATE3, IN
+	movaps IN, STATE1
+
+.ifc \operation, dec
+	pxor STATE5, STATE1
+.else
+	pxor IV, STATE1
+.endif
+
+	cmp $16, KLEN
+	je .Lxts_op1_cts_128_\@
+.ifc \operation, dec
+	aesdec256kl (HANDLEP), STATE1
+.else
+	aesenc256kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+	jmp .Lxts_op1_cts_end_\@
+.Lxts_op1_cts_128_\@:
+.ifc \operation, dec
+	aesdec128kl (HANDLEP), STATE1
+.else
+	aesenc128kl (HANDLEP), STATE1
+.endif
+	jz .Lxts_op_err_\@
+
+.Lxts_op1_cts_end_\@:
+.ifc \operation, dec
+	pxor STATE5, STATE1
+.else
+	pxor IV, STATE1
+.endif
+	movups STATE1, (OUTP)
+	xor %rax, %rax
+	FRAME_END
+	RET
+
+.Lxts_op_err_\@:
+	mov $(-EINVAL), %rax
+	FRAME_END
+	RET
+.endm
+
+/*
+ * int __aeskl_xts_encrypt(const struct aeskl_ctx *handlep, u8 *outp,
+ *			   const u8 *inp, unsigned int klen, le128 *ivp)
+ */
+SYM_FUNC_START(__aeskl_xts_encrypt)
+	XTS_ENC_DEC enc
+SYM_FUNC_END(__aeskl_xts_encrypt)
+
+/*
+ * int __aeskl_xts_decrypt(const struct crypto_aes_ctx *handlep, u8 *outp,
+ *			   const u8 *inp, unsigned int klen, le128 *ivp)
+ */
+SYM_FUNC_START(__aeskl_xts_decrypt)
+	XTS_ENC_DEC dec
+SYM_FUNC_END(__aeskl_xts_decrypt)
+
diff --git a/arch/x86/crypto/aeskl-intel_glue.c b/arch/x86/crypto/aeskl-intel_glue.c
new file mode 100644
index 000000000000..7672c4836da8
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Support for AES Key Locker instructions. This file contains glue
+ * code and the real AES implementation is in aeskl-intel_asm.S.
+ *
+ * Most code is based on AES-NI glue code, aesni-intel_glue.c
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/err.h>
+#include <crypto/algapi.h>
+#include <crypto/aes.h>
+#include <crypto/xts.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/internal/simd.h>
+#include <asm/simd.h>
+#include <asm/cpu_device_id.h>
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+
+#include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
+
+asmlinkage void __aeskl_setkey(struct aeskl_ctx *ctx, const u8 *in_key, unsigned int keylen);
+
+asmlinkage int __aeskl_enc(const void *ctx, u8 *out, const u8 *in);
+
+asmlinkage int __aeskl_xts_encrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+asmlinkage int __aeskl_xts_decrypt(const struct aeskl_ctx *ctx, u8 *out, const u8 *in,
+				   unsigned int len, u8 *iv);
+
+/*
+ * If a hardware failure occurs, the wrapping key may be lost during
+ * sleep states. The state of the feature can be retrieved via
+ * valid_keylocker().
+ *
+ * Since disabling can occur preemptively, check for availability on
+ * every use along with kernel_fpu_begin().
+ */
+
+static int aeskl_setkey(union x86_aes_ctx *ctx, const u8 *in_key, unsigned int keylen)
+{
+	int err;
+
+	if (!crypto_simd_usable())
+		return -EBUSY;
+
+	err = aes_check_keylen(keylen);
+	if (err)
+		return err;
+
+	if (unlikely(keylen == AES_KEYSIZE_192)) {
+		pr_warn_once("AES-KL does not support 192-bit key. Use AES-NI.\n");
+		kernel_fpu_begin();
+		aesni_set_key(&ctx->aesni, in_key, keylen);
+		kernel_fpu_end();
+		return 0;
+	}
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	kernel_fpu_begin();
+	__aeskl_setkey(&ctx->aeskl, in_key, keylen);
+	kernel_fpu_end();
+	return 0;
+}
+
+static inline int aeskl_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_enc(ctx, out, in);
+}
+
+static inline int aeskl_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_encrypt(&ctx->aeskl, out, in, len, iv);
+}
+
+static inline int aeskl_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	return __aeskl_xts_decrypt(&ctx->aeskl, out, in, len, iv);
+}
+
+static int xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+		      unsigned int keylen)
+{
+	return xts_setkey_common(tfm, key, keylen, aeskl_setkey);
+}
+
+static inline u32 xts_keylen(struct skcipher_request *req)
+{
+	struct aes_xts_ctx *ctx = aes_xts_ctx(crypto_skcipher_reqtfm(req));
+
+	return ctx->crypt_ctx.aeskl.key_length;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_encrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_encrypt, aesni_enc);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+	u32 keylen = xts_keylen(req);
+
+	if (likely(keylen != AES_KEYSIZE_192))
+		return xts_crypt_common(req, aeskl_xts_decrypt, aeskl_enc);
+	else
+		return xts_crypt_common(req, aesni_xts_decrypt, aesni_enc);
+}
+
+static struct skcipher_alg aeskl_skciphers[] = {
+	{
+		.base = {
+			.cra_name		= "__xts(aes)",
+			.cra_driver_name	= "__xts-aes-aeskl",
+			.cra_priority		= 200,
+			.cra_flags		= CRYPTO_ALG_INTERNAL,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= XTS_AES_CTX_SIZE,
+			.cra_module		= THIS_MODULE,
+		},
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.walksize	= 2 * AES_BLOCK_SIZE,
+		.setkey		= xts_setkey,
+		.encrypt	= xts_encrypt,
+		.decrypt	= xts_decrypt,
+	}
+};
+
+static struct simd_skcipher_alg *aeskl_simd_skciphers[ARRAY_SIZE(aeskl_skciphers)];
+
+static int __init aeskl_init(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!valid_keylocker())
+		return -ENODEV;
+
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_WIDE))
+		return -ENODEV;
+
+	/*
+	 * AES-KL itself does not rely on AES-NI. But, AES-KL does not
+	 * support 192-bit keys. To ensure AES compliance, AES-KL falls
+	 * back to AES-NI.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_AES))
+		return -ENODEV;
+
+	return simd_register_skciphers_compat(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+					      aeskl_simd_skciphers);
+}
+
+static void __exit aeskl_exit(void)
+{
+	simd_unregister_skciphers(aeskl_skciphers, ARRAY_SIZE(aeskl_skciphers),
+				  aeskl_simd_skciphers);
+}
+
+late_initcall(aeskl_init);
+module_exit(aeskl_exit);
+
+MODULE_DESCRIPTION("Rijndael (AES) Cipher Algorithm, AES Key Locker implementation");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/x86/crypto/aeskl-intel_glue.h b/arch/x86/crypto/aeskl-intel_glue.h
new file mode 100644
index 000000000000..57cfd6c55a4f
--- /dev/null
+++ b/arch/x86/crypto/aeskl-intel_glue.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _AESKL_INTEL_GLUE_H
+#define _AESKL_INTEL_GLUE_H
+
+#include <crypto/aes.h>
+#include <linux/types.h>
+
+#define AESKL_AAD_SIZE		16
+#define AESKL_TAG_SIZE		16
+#define AESKL_CIPHERTEXT_MAX	AES_KEYSIZE_256
+
+/* The Key Locker handle is an encoded form of the AES key. */
+struct aeskl_handle {
+	u8 additional_authdata[AESKL_AAD_SIZE];
+	u8 integrity_tag[AESKL_TAG_SIZE];
+	u8 ciphre_text[AESKL_CIPHERTEXT_MAX];
+};
+
+/*
+ * Key Locker does not support 192-bit key size. The driver needs to
+ * retrieve the key size in the first place. The offset of the
+ * 'key_length' field here should be compatible with struct
+ * crypto_aes_ctx.
+ */
+#define AESKL_CTX_RESERVED (sizeof(struct crypto_aes_ctx) - sizeof(struct aeskl_handle) \
+			    - sizeof(u32))
+
+struct aeskl_ctx {
+	struct aeskl_handle handle;
+	u8 reserved[AESKL_CTX_RESERVED];
+	u32 key_length;
+};
+
+#endif /* _AESKL_INTEL_GLUE_H */
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 4ac7b9a28967..d9c4aa055383 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -37,6 +37,7 @@
 #include <linux/static_call.h>
 
 #include "aes-helper_glue.h"
+#include "aesni-intel_glue.h"
 
 #define RFC4106_HASH_SUBKEY_SIZE 16
 #define AES_BLOCK_MASK (~(AES_BLOCK_SIZE - 1))
@@ -72,9 +73,6 @@ struct gcm_context_data {
 	u8 hash_keys[GCM_BLOCK_LEN * 16];
 };
 
-asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
-			      unsigned int key_len);
-asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void __aesni_dec(const void *ctx, u8 *out, const u8 *in);
 asmlinkage void aesni_ecb_enc(struct crypto_aes_ctx *ctx, u8 *out,
 			      const u8 *in, unsigned int len);
@@ -89,21 +87,9 @@ asmlinkage void aesni_cts_cbc_enc(struct crypto_aes_ctx *ctx, u8 *out,
 asmlinkage void aesni_cts_cbc_dec(struct crypto_aes_ctx *ctx, u8 *out,
 				  const u8 *in, unsigned int len, u8 *iv);
 
-static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
-{
-	__aesni_enc(ctx, out, in);
-	return 0;
-}
-
 #define AVX_GEN2_OPTSIZE 640
 #define AVX_GEN4_OPTSIZE 4096
 
-asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				    const u8 *in, unsigned int len, u8 *iv);
-
-asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
-				    const u8 *in, unsigned int len, u8 *iv);
-
 #ifdef CONFIG_X86_64
 
 asmlinkage void aesni_ctr_enc(struct crypto_aes_ctx *ctx, u8 *out,
@@ -271,20 +257,6 @@ static inline int aesni_xts_setkey(union x86_aes_ctx *ctx,
 	return aes_set_key_common(&ctx->aesni, in_key, key_len);
 }
 
-static inline int aesni_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
-				    unsigned int len, u8 *iv)
-{
-	__aesni_xts_encrypt(&ctx->aesni, out, in, len, iv);
-	return 0;
-}
-
-static inline int aesni_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
-				    unsigned int len, u8 *iv)
-{
-	__aesni_xts_decrypt(&ctx->aesni, out, in, len, iv);
-	return 0;
-}
-
 static int aesni_skcipher_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			         unsigned int len)
 {
diff --git a/arch/x86/crypto/aesni-intel_glue.h b/arch/x86/crypto/aesni-intel_glue.h
new file mode 100644
index 000000000000..999f81f5bcde
--- /dev/null
+++ b/arch/x86/crypto/aesni-intel_glue.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * These are AES-NI functions that are used by the AES-KL code as a
+ * fallback when it is given a 192-bit key. Key Locker does not support
+ * 192-bit keys.
+ */
+
+#ifndef _AESNI_INTEL_GLUE_H
+#define _AESNI_INTEL_GLUE_H
+
+asmlinkage void aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
+			      unsigned int key_len);
+asmlinkage void __aesni_enc(const void *ctx, u8 *out, const u8 *in);
+asmlinkage void __aesni_xts_encrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+asmlinkage void __aesni_xts_decrypt(const struct crypto_aes_ctx *ctx, u8 *out,
+				    const u8 *in, unsigned int len, u8 *iv);
+
+static inline int aesni_enc(const void *ctx, u8 *out, const u8 *in)
+{
+	__aesni_enc(ctx, out, in);
+	return 0;
+}
+
+static inline int aesni_xts_encrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_encrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
+static inline int aesni_xts_decrypt(const union x86_aes_ctx *ctx, u8 *out, const u8 *in,
+				    unsigned int len, u8 *iv)
+{
+	__aesni_xts_decrypt(&ctx->aesni, out, in, len, iv);
+	return 0;
+}
+
+#endif /* _AESNI_INTEL_GLUE_H */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v9 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation
  2024-03-29  1:53         ` [PATCH v9 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation Chang S. Bae
@ 2024-03-29  6:20           ` Pawan Gupta
  2024-04-07 23:04             ` [PATCH v9a " Chang S. Bae
  0 siblings, 1 reply; 247+ messages in thread
From: Pawan Gupta @ 2024-03-29  6:20 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, ebiggers, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, elliott,
	dan.j.williams, bernie.keany, charishma1.gairuboyina,
	Dave Hansen

On Thu, Mar 28, 2024 at 06:53:43PM -0700, Chang S. Bae wrote:
> The Register File Data Sampling vulnerability may allow malicious
> userspace programs to infer stale kernel register data, potentially
> exposing sensitive key values, including AES keys.
> 
> To address this vulnerability, a microcode update needs to be applied to
> the CPU, which modifies the VERW instruction to flush the affected CPU
> buffers.
> 
> The kernel already has a facility to flush CPU buffers before returning
> to userspace, which is indicated by the X86_FEATURE_CLEAR_CPU_BUF flag.
> 
> Ensure the mitigation before enabling Key Locker. Do not enable the
> feature on CPUs affected by the vulnerability but lacks mitigation.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
> Change from v8:
> * Add as a new patch.
> 
> Note that the code change follows the mitigation guidance [1]:
>   "Software loading Key Locker keys using LOADIWKEY should execute a VERW
>    to clear registers before transitioning to untrusted code to prevent
>    later software from inferring the loaded key."
> 
> [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
> ---
>  arch/x86/kernel/keylocker.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
> index d4f3aa65ea8a..6e805c4da76d 100644
> --- a/arch/x86/kernel/keylocker.c
> +++ b/arch/x86/kernel/keylocker.c
> @@ -135,12 +135,29 @@ static bool __init have_gds_mitigation(void)
>  	return false;
>  }
>  
> +/*
> + * IA32_ARCH_CAPABILITIES MSR is retrieved during the setting of
> + * X86_BUG_RFDS. Ensure that the mitigation is applied to flush CPU
> + * buffers by checking the flag.
> + */
> +static bool __init have_rfds_mitigation(void)
> +{
> +	if (boot_cpu_has(X86_FEATURE_CLEAR_CPU_BUF))
> +		return true;

X86_FEATURE_CLEAR_CPU_BUF is also set by other VERW based mitigations
like MDS. The feature flag does not guarantee that the microcode
required to mitigate RFDS is loaded.

A more robust check would be:

	if (rfds_mitigation == RFDS_MITIGATION_VERW)
		return true;

And it would be apt to move this function to arch/x86/kernel/cpu/bugs.c

> +
> +	pr_warn("x86/keylocker: Susceptible to the RFDS vulnerability.\n");
> +	return false;
> +}
> +
>  /* Check if Key Locker is secure enough to be used. */
>  static bool __init secure_keylocker(void)
>  {
>  	if (boot_cpu_has_bug(X86_BUG_GDS) && !have_gds_mitigation())
>  		return false;
>  
> +	if (boot_cpu_has_bug(X86_BUG_RFDS) && !have_rfds_mitigation())
> +		return false;
> +
>  	return true;
>  }

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v9 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
  2024-03-29  1:53         ` [PATCH v9 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation Chang S. Bae
@ 2024-03-29  6:57           ` Pawan Gupta
  2024-04-07 23:04             ` [PATCH v9a " Chang S. Bae
  0 siblings, 1 reply; 247+ messages in thread
From: Pawan Gupta @ 2024-03-29  6:57 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, ebiggers, luto,
	dave.hansen, tglx, bp, mingo, x86, herbert, ardb, elliott,
	dan.j.williams, bernie.keany, charishma1.gairuboyina,
	Dave Hansen

On Thu, Mar 28, 2024 at 06:53:42PM -0700, Chang S. Bae wrote:
> +/*
> + * The mitigation is implemented at a microcode level. Ensure that the
> + * microcode update is applied and the mitigation is locked.
> + */
> +static bool __init have_gds_mitigation(void)
> +{
> +	u64 mcu_ctrl;
> +
> +	/* GDS_CTRL is set if new microcode is loaded. */
> +	if (!(x86_read_arch_cap_msr() & ARCH_CAP_GDS_CTRL))
> +		goto vulnerable;
> +
> +	/* If GDS_MITG_LOCKED is set, GDS_MITG_DIS is forced to 0. */
> +	rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
> +	if (mcu_ctrl & GDS_MITG_LOCKED)
> +		return true;

Similar to RFDS, above checks can be simplified to:

	if (gds_mitigation == GDS_MITIGATION_FULL_LOCKED)
		return true;
> +
> +vulnerable:
> +	pr_warn("x86/keylocker: Susceptible to the GDS vulnerability.\n");
> +	return false;
> +}

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v9 01/14] Documentation/x86: Document Key Locker
  2024-03-29  1:53         ` [PATCH v9 01/14] Documentation/x86: Document " Chang S. Bae
@ 2024-03-31 15:48           ` Randy Dunlap
  0 siblings, 0 replies; 247+ messages in thread
From: Randy Dunlap @ 2024-03-31 15:48 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	Bagas Sanjaya

Hi,


On 3/28/24 18:53, Chang S. Bae wrote:
> Document the overview of the feature along with relevant consideration
> when provisioning dm-crypt volumes with AES-KL instead of AES-NI.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> ---
> Changes from v8:
> * Change wording of documentation slightly. (Randy Dunlap and Bagas
>   Sanjaya)
> 
> Changes from v6:
> * Rebase on the upstream -- commit ff61f0791ce9 ("docs: move x86
>   documentation into Documentation/arch/"). (Nathan Huckleberry)
> * Remove a duplicated sentence -- 'But there is no AES-KL instruction
>   to process a 192-bit key.'
> * Update the text for clarity and readability:
>   - Clarify the error code and exemplify the backup failure
>   - Use 'wrapping key' instead of less readable 'IWKey'
> 
> Changes from v5:
> * Fix a typo: 'feature feature' -> 'feature'
> 
> Changes from RFC v2:
> * Add as a new patch.
> 
> The preview is available here:
>   https://htmlpreview.github.io/?https://github.com/intel-staging/keylocker/kdoc/arch/x86/keylocker.html
> ---
>  Documentation/arch/x86/index.rst     |  1 +
>  Documentation/arch/x86/keylocker.rst | 96 ++++++++++++++++++++++++++++
>  2 files changed, 97 insertions(+)
>  create mode 100644 Documentation/arch/x86/keylocker.rst
> 
> diff --git a/Documentation/arch/x86/index.rst b/Documentation/arch/x86/index.rst
> index 8ac64d7de4dc..669c239c009f 100644
> --- a/Documentation/arch/x86/index.rst
> +++ b/Documentation/arch/x86/index.rst
> @@ -43,3 +43,4 @@ x86-specific Documentation
>     features
>     elf_auxvec
>     xstate
> +   keylocker
> diff --git a/Documentation/arch/x86/keylocker.rst b/Documentation/arch/x86/keylocker.rst
> new file mode 100644
> index 000000000000..b28addb8eaf4
> --- /dev/null
> +++ b/Documentation/arch/x86/keylocker.rst
> @@ -0,0 +1,96 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +x86 Key Locker
> +==============
> +
> +Introduction
> +============
> +
> +Key Locker is a CPU feature to reduce key exfiltration opportunities
> +while maintaining a programming interface similar to AES-NI. It
> +converts the AES key into an encoded form, called the 'key handle'.
> +The key handle is a wrapped version of the clear-text key where the
> +wrapping key has limited exposure. Once converted, all subsequent data
> +encryption using new AES instructions (AES-KL) uses this key handle,
> +reducing the exposure of private key material in memory.
> +
> +CPU-internal Wrapping Key
> +=========================
> +
> +The CPU-internal wrapping key is an entity in a software-invisible CPU
> +state. On every system boot, a new key is loaded. So the key handle that
> +was encoded by the old wrapping key is no longer usable on system shutdown
> +or reboot.
> +
> +And the key may be lost on the following exceptional situation upon wakeup:
> +
> +Wrapping Key Restore Failure
> +----------------------------
> +
> +The CPU state is volatile with the ACPI S3/4 sleep states. When the system
> +supports those states, the key has to be backed up so that it is restored
> +on wake up. The kernel saves the key in non-volatile media.
> +
> +Upon the event of a wrapping key restore failure upon resume from suspend,
> +all established key handles become invalid. In flight dm-crypt operations

                                               In-flight

> +receive error results from pending operations. In the likely scenario that
> +dm-crypt is hosting the root filesystem the recovery is identical to if a
> +storage controller failed to resume from suspend or reboot. If the volume
> +impacted by a wrapping key restore failure is a data volume then it is
> +possible that I/O errors on that volume do not bring down the rest of the
> +system. However, a reboot is still required because the kernel will have
> +soft-disabled Key Locker. Upon the failure, the crypto library code will
> +return -ENODEV on every AES-KL function call. The Key Locker implementation
> +only loads a new wrapping key at initial boot, not any time after like
> +resume from suspend.
> +
> +Use Case and Non-use Cases
> +==========================
> +
> +Bare metal disk encryption is the only intended use case.
> +
> +Userspace usage is not supported because there is no ABI provided to
> +communicate and coordinate wrapping-key restore failure to userspace. For
> +now, key restore failures are only coordinated with kernel users. But the
> +kernel can not prevent userspace from using the feature's AES instructions
> +('AES-KL') when the feature has been enabled. So, the lack of userspace
> +support is only documented, not actively enforced.
> +
> +Key Locker is not expected to be advertised to guest VMs and the kernel
> +implementation ignores it even if the VMM enumerates the capability. The
> +expectation is that a guest VM wants private wrapping key state, but the
> +architecture does not provide that. An emulation of that capability, by
> +caching per-VM wrapping keys in memory, defeats the purpose of Key Locker.
> +The backup / restore facility is also not performant enough to be suitable
> +for guest VM context switches.
> +
> +AES Instruction Set
> +===================
> +
> +The feature accompanies a new AES instruction set. This instruction set is
> +analogous to AES-NI. A set of AES-NI instructions can be mapped to an
> +AES-KL instruction. For example, AESENC128KL is responsible for ten rounds
> +of transformation, which is equivalent to nine times AESENC and one
> +AESENCLAST in AES-NI.
> +
> +But they have some notable differences:
> +
> +* AES-KL provides a secure data transformation using an encrypted key.
> +
> +* If an invalid key handle is provided, e.g. a corrupted one or a handle
> +  restriction failure, the instruction fails with setting RFLAGS.ZF. The
> +  crypto library implementation includes the flag check to return -EINVAL.
> +  Note that this flag is also set if the wrapping key is changed, e.g.,
> +  because of the backup error.
> +
> +* AES-KL implements support for 128-bit and 256-bit keys, but there is no
> +  AES-KL instruction to process an 192-bit key. The AES-KL cipher
> +  implementation logs a warning message with a 192-bit key and then falls
> +  back to AES-NI. So, this 192-bit key-size limitation is only documented,
> +  not enforced. It means the key will remain in clear-text in memory. This
> +  is to meet Linux crypto-cipher expectation that each implementation must
> +  support all the AES-compliant key sizes.
> +
> +* Some AES-KL hardware implementation may have noticeable performance
> +  overhead when compared with AES-NI instructions.

Reviewed-by: Randy Dunlap <rdunlap@infradead.org>

thanks.
-- 
#Randy

^ permalink raw reply	[flat|nested] 247+ messages in thread

* [PATCH v9a 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
  2024-03-29  6:57           ` Pawan Gupta
@ 2024-04-07 23:04             ` Chang S. Bae
  2024-04-19  0:01               ` Pawan Gupta
  2024-04-19 17:47               ` [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present Pawan Gupta
  0 siblings, 2 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-04-07 23:04 UTC (permalink / raw)
  To: linux-kernel, linux-crypto
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Dave Hansen, Pawan Gupta

Gather Data Sampling is a transient execution side channel issue in some
CPU models. The stale data in registers is not guaranteed as secure when
this vulnerability is not addressed.

In the Key Locker usage during AES transformations, the temporary storage
of the original key in registers poses a risk. The key material can be
staled in some implementations, leading to susceptibility to leakage of
the AES key.

To mitigate this vulnerability, a qualified microcode image must be
applied. Add code to ensure that the mitigation is installed and securely
locked. Disable the feature, otherwise.

Expand gds_ucode_mitigated() to examine the lock state.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Changes from v9:
* Removed MSR reads and utilized the helper function. (Pawan Gupta)

Alternatively, 'gds_mitigation' can be exported and referenced directly.
Using 'gds_mitigation == GDS_MITIGATION_FULL_LOCKED' may also be
readable. However, it was opted to expand gds_ucode_mitigated() for
consistency, as it is already established.

Note that this approach aligns with Intel's guidance, as the bugs.c code
checks the following MSR bits:
  "Intel recommends that system software does not enable Key Locker (by
   setting CR4.KL) unless the GDS mitigation is enabled
   (IA32_MCU_OPT_CTRL[GDS_MITG_DIS] (bit 4) is 0) and locked
   (IA32_MCU_OPT_CTRL [GDS_MITG_LOCK](bit 5) is 1)."

For more information, refer to Intel's technical documentation on Gather
Data Sampling:
  https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
---
 arch/x86/include/asm/processor.h |  7 ++++++-
 arch/x86/kernel/cpu/bugs.c       |  5 ++++-
 arch/x86/kernel/keylocker.c      | 12 ++++++++++++
 arch/x86/kvm/x86.c               |  2 +-
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f131f4..74eaa3a2b85b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -721,7 +721,12 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
-extern bool gds_ucode_mitigated(void);
+enum mitigation_info {
+	MITG_FULL,
+	MITG_LOCKED,
+};
+
+extern bool gds_ucode_mitigated(enum mitigation_info mitg);
 
 /*
  * Make previous memory operations globally visible before
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index e7ba936d798b..80f6e70619cb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -752,8 +752,11 @@ static const char * const gds_strings[] = {
 	[GDS_MITIGATION_HYPERVISOR]	= "Unknown: Dependent on hypervisor status",
 };
 
-bool gds_ucode_mitigated(void)
+bool gds_ucode_mitigated(enum mitigation_info mitg)
 {
+	if (mitg == MITG_LOCKED)
+		return gds_mitigation == GDS_MITIGATION_FULL_LOCKED;
+
 	return (gds_mitigation == GDS_MITIGATION_FULL ||
 		gds_mitigation == GDS_MITIGATION_FULL_LOCKED);
 }
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 1e81d0704eea..23cf4a235f11 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -113,6 +113,15 @@ void restore_keylocker(void)
 	valid_wrapping_key = false;
 }
 
+/* Check if Key Locker is secure enough to be used. */
+static bool __init secure_keylocker(void)
+{
+	if (boot_cpu_has_bug(X86_BUG_GDS) && !gds_ucode_mitigated(MITG_LOCKED))
+		return false;
+
+	return true;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
@@ -126,6 +135,9 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	if (!secure_keylocker())
+		goto clear_cap;
+
 	cr4_set_bits(X86_CR4_KEYLOCKER);
 
 	/* AESKLE depends on CR4.KEYLOCKER */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47d9f03b7778..4ab50e95fdb5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1675,7 +1675,7 @@ static u64 kvm_get_arch_capabilities(void)
 		 */
 	}
 
-	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
+	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated(MITG_FULL))
 		data |= ARCH_CAP_GDS_NO;
 
 	return data;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9a 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation
  2024-03-29  6:20           ` Pawan Gupta
@ 2024-04-07 23:04             ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-04-07 23:04 UTC (permalink / raw)
  To: linux-kernel, linux-crypto
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Dave Hansen, Pawan Gupta

The Register File Data Sampling vulnerability may allow malicious
userspace programs to infer stale kernel register data, potentially
exposing sensitive key values, including AES keys.

To address this vulnerability, a microcode update needs to be applied to
the CPU, which modifies the VERW instruction to flush the affected CPU
buffers.

Reference the 'rfds_mitigation' variable to check the mitigation status.
Do not enable Key Locker on CPUs affected by the vulnerability but
lacking mitigation.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Changes from v9:
* Remove the helper function and simplify the code by directly reading
  the status variable. (Pawan Gupta)

Note that this code change aligns with mitigation guidance, which
recommends:
  "Software loading Key Locker keys using LOADIWKEY should execute a VERW
   to clear registers before transitioning to untrusted code to prevent
   later software from inferring the loaded key."

For more information, refer to Intel's guidance on Register File Data
Sampling:
  https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
---
 arch/x86/include/asm/processor.h | 8 ++++++++
 arch/x86/kernel/cpu/bugs.c       | 8 +-------
 arch/x86/kernel/keylocker.c      | 3 +++
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 74eaa3a2b85b..b823163f4786 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -728,6 +728,14 @@ enum mitigation_info {
 
 extern bool gds_ucode_mitigated(enum mitigation_info mitg);
 
+enum rfds_mitigations {
+	RFDS_MITIGATION_OFF,
+	RFDS_MITIGATION_VERW,
+	RFDS_MITIGATION_UCODE_NEEDED,
+};
+
+extern enum rfds_mitigations rfds_mitigation;
+
 /*
  * Make previous memory operations globally visible before
  * a WRMSR.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 80f6e70619cb..a2ba1a0ef872 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -483,14 +483,8 @@ early_param("mmio_stale_data", mmio_stale_data_parse_cmdline);
 #undef pr_fmt
 #define pr_fmt(fmt)	"Register File Data Sampling: " fmt
 
-enum rfds_mitigations {
-	RFDS_MITIGATION_OFF,
-	RFDS_MITIGATION_VERW,
-	RFDS_MITIGATION_UCODE_NEEDED,
-};
-
 /* Default mitigation for Register File Data Sampling */
-static enum rfds_mitigations rfds_mitigation __ro_after_init =
+enum rfds_mitigations rfds_mitigation __ro_after_init =
 	IS_ENABLED(CONFIG_MITIGATION_RFDS) ? RFDS_MITIGATION_VERW : RFDS_MITIGATION_OFF;
 
 static const char * const rfds_strings[] = {
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 23cf4a235f11..09876693414c 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -119,6 +119,9 @@ static bool __init secure_keylocker(void)
 	if (boot_cpu_has_bug(X86_BUG_GDS) && !gds_ucode_mitigated(MITG_LOCKED))
 		return false;
 
+	if (boot_cpu_has_bug(X86_BUG_RFDS) && rfds_mitigation != RFDS_MITIGATION_VERW)
+		return false;
+
 	return true;
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH v9a 07/14] x86/cpu/keylocker: Load a wrapping key at boot time
  2024-03-29  1:53         ` [PATCH v9 07/14] x86/cpu/keylocker: Load a wrapping key at boot time Chang S. Bae
@ 2024-04-07 23:04           ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-04-07 23:04 UTC (permalink / raw)
  To: linux-kernel, linux-crypto
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Dave Hansen

The wrapping key is an entity to encode a clear text key into a key
handle. This key is a pivot in protecting user keys. So the value has
to be randomized before being loaded in the software-invisible CPU
state.

The wrapping key needs to be established before the first user. Given
that the only proposed Linux use case for Key Locker is dm-crypt, the
feature could be lazily enabled before the first dm-crypt user arrives.

But there is no precedent for late enabling of CPU features and it
adds maintenance burden without demonstrative benefit outside of
minimizing the visibility of Key Locker to userspace.

Therefore, generate random bytes and load them at boot time, involving
clobbering XMM registers. Perform this process under arch_initcall(),
ensuring that it occurs after FPU initialization. Finally, flush out
random bytes after loading.

Given that the Linux Key Locker support is only intended for bare
metal dm-crypt use, and that switching wrapping key per virtual machine
is impractical, explicitly skip this setup in the X86_FEATURE_HYPERVISOR
case.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Elliott, Robert (Servers)" <elliott@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
---
Changes from v9:
* Include 'tlbflush.h' back, which was once removed by mistake.
---
 arch/x86/kernel/Makefile    |  1 +
 arch/x86/kernel/keylocker.c | 78 +++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)
 create mode 100644 arch/x86/kernel/keylocker.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 74077694da7d..d105e5785b90 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -137,6 +137,7 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 obj-$(CONFIG_X86_UMIP)			+= umip.o
+obj-$(CONFIG_X86_KEYLOCKER)		+= keylocker.o
 
 obj-$(CONFIG_UNWINDER_ORC)		+= unwind_orc.o
 obj-$(CONFIG_UNWINDER_FRAME_POINTER)	+= unwind_frame.o
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
new file mode 100644
index 000000000000..8569b92971da
--- /dev/null
+++ b/arch/x86/kernel/keylocker.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Setup Key Locker feature and support the wrapping key management.
+ */
+
+#include <linux/random.h>
+#include <linux/string.h>
+
+#include <asm/fpu/api.h>
+#include <asm/keylocker.h>
+#include <asm/processor.h>
+#include <asm/tlbflush.h>
+
+static struct iwkey wrapping_key __initdata;
+
+static void __init generate_keylocker_data(void)
+{
+	get_random_bytes(&wrapping_key.integrity_key, sizeof(wrapping_key.integrity_key));
+	get_random_bytes(&wrapping_key.encryption_key, sizeof(wrapping_key.encryption_key));
+}
+
+static void __init destroy_keylocker_data(void)
+{
+	memzero_explicit(&wrapping_key, sizeof(wrapping_key));
+}
+
+/*
+ * For loading the wrapping key into each CPU, the feature bit is set
+ * in the control register and FPU context management is performed.
+ */
+static void __init load_keylocker(struct work_struct *unused)
+{
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	kernel_fpu_begin();
+	load_xmm_iwkey(&wrapping_key);
+	kernel_fpu_end();
+}
+
+static int __init init_keylocker(void)
+{
+	u32 eax, ebx, ecx, edx;
+
+	if (!cpu_feature_enabled(X86_FEATURE_KEYLOCKER))
+		goto disable;
+
+	if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {
+		pr_debug("x86/keylocker: Not compatible with a hypervisor.\n");
+		goto clear_cap;
+	}
+
+	cr4_set_bits(X86_CR4_KEYLOCKER);
+
+	/* AESKLE depends on CR4.KEYLOCKER */
+	cpuid_count(KEYLOCKER_CPUID, 0, &eax, &ebx, &ecx, &edx);
+	if (!(ebx & KEYLOCKER_CPUID_EBX_AESKLE) ||
+	    !(eax & KEYLOCKER_CPUID_EAX_SUPERVISOR)) {
+		pr_debug("x86/keylocker: Not fully supported.\n");
+		goto clear_cap;
+	}
+
+	generate_keylocker_data();
+	schedule_on_each_cpu(load_keylocker);
+	destroy_keylocker_data();
+
+	pr_info_once("x86/keylocker: Enabled.\n");
+	return 0;
+
+clear_cap:
+	setup_clear_cpu_cap(X86_FEATURE_KEYLOCKER);
+	pr_info_once("x86/keylocker: Disabled.\n");
+disable:
+	cr4_clear_bits(X86_CR4_KEYLOCKER);
+	return -ENODEV;
+}
+
+arch_initcall(init_keylocker);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH v9 00/14] x86: Support Key Locker
  2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
                           ` (13 preceding siblings ...)
  2024-03-29  1:53         ` [PATCH v9 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm Chang S. Bae
@ 2024-04-07 23:24         ` Chang S. Bae
  2024-04-08  1:48           ` Eric Biggers
  14 siblings, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2024-04-07 23:24 UTC (permalink / raw)
  To: linux-kernel, linux-crypto, dm-devel
  Cc: ebiggers, luto, dave.hansen, tglx, bp, mingo, x86, herbert, ardb,
	elliott, dan.j.williams, bernie.keany, charishma1.gairuboyina

On 3/28/2024 6:53 PM, Chang S. Bae wrote:
> 
> Then, the following is a summary of changes per patch since v8 [6]:
> 
> PATCH7-8:
> * Invoke the setup code via arch_initcall() due to upstream changes
>    delaying the FPU setup.
> 
> PATCH9-11:
> * Add new patches for security and hotplug support clarification

I've recently made updates to a few patches, primarily related to the
mitigation parts. While the series is still under review, Eric's VAES
patches have been merged into the crypto tree and are currently being
sorted out. Once things settle down, I will make a few adjustments on the
crypto side. Then, another revision will be necessary thereafter.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v9 00/14] x86: Support Key Locker
  2024-04-07 23:24         ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
@ 2024-04-08  1:48           ` Eric Biggers
  2024-04-15 22:16             ` Chang S. Bae
  0 siblings, 1 reply; 247+ messages in thread
From: Eric Biggers @ 2024-04-08  1:48 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, luto, dave.hansen, tglx,
	bp, mingo, x86, herbert, ardb, elliott, dan.j.williams,
	bernie.keany, charishma1.gairuboyina

Hi,

On Sun, Apr 07, 2024 at 04:24:18PM -0700, Chang S. Bae wrote:
> On 3/28/2024 6:53 PM, Chang S. Bae wrote:
> > 
> > Then, the following is a summary of changes per patch since v8 [6]:
> > 
> > PATCH7-8:
> > * Invoke the setup code via arch_initcall() due to upstream changes
> >    delaying the FPU setup.
> > 
> > PATCH9-11:
> > * Add new patches for security and hotplug support clarification
> 
> I've recently made updates to a few patches, primarily related to the
> mitigation parts. While the series is still under review, Eric's VAES
> patches have been merged into the crypto tree and are currently being
> sorted out. Once things settle down, I will make a few adjustments on the
> crypto side. Then, another revision will be necessary thereafter.
> 
> Thanks,
> Chang

Thanks for the updated patchset!

Do you have a plan for how this will be merged?  Which trees will the patches go
through?  I think that the actual AES-XTS implementation could still use a bit
more polishing; see my comments below.  However, patches 1-12 don't need to wait
for that.  Perhaps the x86 maintainers would like to take patches 1-12 for
v6.10?  Then the AES-XTS support can go through the crypto tree afterwards.

As you noticed, this cycle I've been optimizing AES-XTS for x86_64 by adding new
VAES and AES-NI + AVX implementations.  I have some ideas for the Key Locker
based implementation of AES-XTS:

First, surely it's the case that in practice, all CPUs that support Key Locker
also support AVX?  If so, then there's no need for the Key Locker assembly to
use legacy SSE instructions.  It should instead target AVX and use VEX-coded
instructions.  This would save some instructions and improve performance.

Since the Key Locker assembly only supports 64-bit mode, it should also feel
free to use registers xmm8-xmm15 for purposes such as caching the XTS tweaks.
This would improve performance.

Since the Key Locker assembly advances a large number of XTS tweaks at a time
(8), I'm also wondering if it would be faster to multiply by x^8 directly
instead of multiplying by x sequentially eight times.  This can be done using
the pclmulqdq instruction; see aes-xts-avx-x86_64.S which implements this
optimization.  Probably all CPUs that support Key Locker also support PCLMULQDQ.

I'm also trying to think of the best way to organize the Key Locker AES-XTS glue
code.  I see that you're proposing to share the glue code with the existing
AES-XTS implementations.  Unfortunately I don't think this ends up working very
well, due to the facts that the Key Locker code can return errors and uses a
different key type.  I think that for now, I'd prefer that you simply copied the
XTS glue code into aeskl-intel_glue.c and modified it as needed.  (But make sure
to use the new version of the glue code, which is faster.)

For falling back to AES-NI, I think the cleanest solution is to call the
top-level setkey, encrypt, and decrypt functions (the ones that are set in the
xts-aes-aesni skcipher_alg), instead of calling lower-level functions as your
current patchset does.

If you could keep the Key Locker assembly roughly stylistically consistent with
the new aes-xts-avx-x86_64.S, that would be great too.

Do you happen to know if there's any way to test the Key Locker AES-XTS code
without having access to a bare metal machine with a CPU that supports Key
Locker?  I tried a Sapphire Rapids based VM in Google Compute Engine, but it
doesn't enumerate Key Locker.  I also don't see anything in QEMU related to Key
Locker.  So I don't currently have an easy way to test this patchset.

Finally, a high level question.  Key Locker has been reported to be
substantially slower than AES-NI.  At the same time, VAES has recently doubled
performance over AES-NI.  I'd guess this leaves Key Locker even further behind.
Given that, how useful is this patchset?  I'm a bit concerned that this might be
something that sounds good in theory but won't be used in practice.  Are
performance improvements for Key Locker on the horizon?  (Well, there are the
improvements I suggested above, which should help, but it sounds like main issue
is the Key Locker instructions themselves which are just fundamentally slower.)

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v9 00/14] x86: Support Key Locker
  2024-04-08  1:48           ` Eric Biggers
@ 2024-04-15 22:16             ` Chang S. Bae
  2024-04-15 22:54               ` Eric Biggers
  0 siblings, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2024-04-15 22:16 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, luto, dave.hansen, tglx,
	bp, mingo, x86, herbert, ardb, elliott, dan.j.williams,
	bernie.keany, charishma1.gairuboyina

Hi Eric,

Sorry for my delay. I just found your message in my cluttered junk 
folder today after returning from my week off.

On 4/7/2024 6:48 PM, Eric Biggers wrote:
> 
> Do you have a plan for how this will be merged?  Which trees will the patches go
> through?  I think that the actual AES-XTS implementation could still use a bit
> more polishing; see my comments below.  However, patches 1-12 don't need to wait
> for that.  Perhaps the x86 maintainers would like to take patches 1-12 for
> v6.10?  Then the AES-XTS support can go through the crypto tree afterwards.

Yeah, this series spans both x86 code and the crypto driver. I believe 
the decision should be made by maintainers. But, I suspect they'll want 
to see a well-established use case code before moving forward.

> As you noticed, this cycle I've been optimizing AES-XTS for x86_64 by adding new
> VAES and AES-NI + AVX implementations.  I have some ideas for the Key Locker
> based implementation of AES-XTS:

Thanks for the effort! I agree that the code could be beneficial for 
daily disk encryption needs.

> First, surely it's the case that in practice, all CPUs that support Key Locker
> also support AVX?  If so, then there's no need for the Key Locker assembly to
> use legacy SSE instructions.  It should instead target AVX and use VEX-coded
> instructions.  This would save some instructions and improve performance.

Unfortunately, the Key Locker instructions using the AVX states were 
never implemented.

> Since the Key Locker assembly only supports 64-bit mode, it should also feel
> free to use registers xmm8-xmm15 for purposes such as caching the XTS tweaks.
> This would improve performance.
> 
> Since the Key Locker assembly advances a large number of XTS tweaks at a time
> (8), I'm also wondering if it would be faster to multiply by x^8 directly
> instead of multiplying by x sequentially eight times.  This can be done using
> the pclmulqdq instruction; see aes-xts-avx-x86_64.S which implements this
> optimization.  Probably all CPUs that support Key Locker also support PCLMULQDQ.

I'll revisit the assembly code to incorporate your suggestions.

> I'm also trying to think of the best way to organize the Key Locker AES-XTS glue
> code.  I see that you're proposing to share the glue code with the existing
> AES-XTS implementations.  Unfortunately I don't think this ends up working very
> well, due to the facts that the Key Locker code can return errors and uses a
> different key type.  I think that for now, I'd prefer that you simply copied the
> XTS glue code into aeskl-intel_glue.c and modified it as needed.  (But make sure
> to use the new version of the glue code, which is faster.)

I agree; the proposed glue code looks messy due to the different return 
errors and key types. Ard made a point earlier about establishing a 
shared common code as they're logically quite similar. But I suppose it 
is more practical to pursue a separate glue code at this point.

> For falling back to AES-NI, I think the cleanest solution is to call the
> top-level setkey, encrypt, and decrypt functions (the ones that are set in the
> xts-aes-aesni skcipher_alg), instead of calling lower-level functions as your
> current patchset does.

Yes, falling back is indeed one of the ugly parts of this series. Let me 
retry this as you suggested.

> If you could keep the Key Locker assembly roughly stylistically consistent with
> the new aes-xts-avx-x86_64.S, that would be great too.

Okay.

> Do you happen to know if there's any way to test the Key Locker AES-XTS code
> without having access to a bare metal machine with a CPU that supports Key
> Locker?  I tried a Sapphire Rapids based VM in Google Compute Engine, but it
> doesn't enumerate Key Locker.  I also don't see anything in QEMU related to Key
> Locker.  So I don't currently have an easy way to test this patchset.

No, there isn't currently an emulation option available to the public 
that I'm aware of. this feature has been available on client systems 
since the Tiger Lake generation.

> Finally, a high level question.  Key Locker has been reported to be
> substantially slower than AES-NI.  At the same time, VAES has recently doubled
> performance over AES-NI.  I'd guess this leaves Key Locker even further behind.
> Given that, how useful is this patchset?  I'm a bit concerned that this might be
> something that sounds good in theory but won't be used in practice.  Are
> performance improvements for Key Locker on the horizon?  (Well, there are the
> improvements I suggested above, which should help, but it sounds like main issue
> is the Key Locker instructions themselves which are just fundamentally slower.)

On our latest implementations, we've observed the Key Locker performance 
on cryptsetup seems to be roughly the same as what we posted earlier 
[2]. Yes, this sounds like a fair analogy to me, especially given your 
vAES code.

Thanks,
Chang

[1] 
https://lore.kernel.org/lkml/CAMj1kXGa4f21eH0mdxd1pQsZMUjUr1Btq+Dgw-gC=O-yYft7xw@mail.gmail.com/
[2] 
https://lore.kernel.org/lkml/20230410225936.8940-1-chang.seok.bae@intel.com/


^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v9 00/14] x86: Support Key Locker
  2024-04-15 22:16             ` Chang S. Bae
@ 2024-04-15 22:54               ` Eric Biggers
  2024-04-15 22:58                 ` Chang S. Bae
  0 siblings, 1 reply; 247+ messages in thread
From: Eric Biggers @ 2024-04-15 22:54 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, dm-devel, luto, dave.hansen, tglx,
	bp, mingo, x86, herbert, ardb, elliott, dan.j.williams,
	bernie.keany, charishma1.gairuboyina

On Mon, Apr 15, 2024 at 03:16:18PM -0700, Chang S. Bae wrote:
> > First, surely it's the case that in practice, all CPUs that support Key Locker
> > also support AVX?  If so, then there's no need for the Key Locker assembly to
> > use legacy SSE instructions.  It should instead target AVX and use VEX-coded
> > instructions.  This would save some instructions and improve performance.
> 
> Unfortunately, the Key Locker instructions using the AVX states were never
> implemented.

Sure, you could still use VEX-coded 128-bit instructions for everything other
than the actual AES (for example, the XTS tweak computation) though, right?
They're a bit more convenient to work with since they are non-destructive.

- Eric

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v9 00/14] x86: Support Key Locker
  2024-04-15 22:54               ` Eric Biggers
@ 2024-04-15 22:58                 ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-04-15 22:58 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-kernel, linux-crypto, dm-devel, luto, dave.hansen, tglx,
	bp, mingo, x86, herbert, ardb, elliott, dan.j.williams,
	bernie.keany, charishma1.gairuboyina

On 4/15/2024 3:54 PM, Eric Biggers wrote:
> 
> Sure, you could still use VEX-coded 128-bit instructions for everything other
> than the actual AES (for example, the XTS tweak computation) though, right?
> They're a bit more convenient to work with since they are non-destructive.

Right.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v9a 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
  2024-04-07 23:04             ` [PATCH v9a " Chang S. Bae
@ 2024-04-19  0:01               ` Pawan Gupta
  2024-04-22  7:49                 ` Chang S. Bae
  2024-04-19 17:47               ` [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present Pawan Gupta
  1 sibling, 1 reply; 247+ messages in thread
From: Pawan Gupta @ 2024-04-19  0:01 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, linux-crypto, ebiggers, luto, dave.hansen, tglx,
	bp, mingo, x86, herbert, ardb, elliott, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, Dave Hansen

On Sun, Apr 07, 2024 at 04:04:32PM -0700, Chang S. Bae wrote:
> Gather Data Sampling is a transient execution side channel issue in some
> CPU models. The stale data in registers is not guaranteed as secure when
> this vulnerability is not addressed.
> 
> In the Key Locker usage during AES transformations, the temporary storage
> of the original key in registers poses a risk. The key material can be
> staled in some implementations, leading to susceptibility to leakage of
> the AES key.
> 
> To mitigate this vulnerability, a qualified microcode image must be
> applied. Add code to ensure that the mitigation is installed and securely
> locked. Disable the feature, otherwise.
> 
> Expand gds_ucode_mitigated() to examine the lock state.
> 
> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
> Changes from v9:
> * Removed MSR reads and utilized the helper function. (Pawan Gupta)
> 
> Alternatively, 'gds_mitigation' can be exported and referenced directly.
> Using 'gds_mitigation == GDS_MITIGATION_FULL_LOCKED' may also be
> readable. However, it was opted to expand gds_ucode_mitigated() for
> consistency, as it is already established.
> 
> Note that this approach aligns with Intel's guidance, as the bugs.c code
> checks the following MSR bits:
>   "Intel recommends that system software does not enable Key Locker (by
>    setting CR4.KL) unless the GDS mitigation is enabled
>    (IA32_MCU_OPT_CTRL[GDS_MITG_DIS] (bit 4) is 0) and locked
>    (IA32_MCU_OPT_CTRL [GDS_MITG_LOCK](bit 5) is 1)."
> 
> For more information, refer to Intel's technical documentation on Gather
> Data Sampling:
>   https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html
> ---
>  arch/x86/include/asm/processor.h |  7 ++++++-
>  arch/x86/kernel/cpu/bugs.c       |  5 ++++-
>  arch/x86/kernel/keylocker.c      | 12 ++++++++++++
>  arch/x86/kvm/x86.c               |  2 +-
>  4 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 811548f131f4..74eaa3a2b85b 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -721,7 +721,12 @@ enum mds_mitigations {
>  	MDS_MITIGATION_VMWERV,
>  };
>  
> -extern bool gds_ucode_mitigated(void);
> +enum mitigation_info {
> +	MITG_FULL,
> +	MITG_LOCKED,
> +};
> +
> +extern bool gds_ucode_mitigated(enum mitigation_info mitg);
>  
>  /*
>   * Make previous memory operations globally visible before
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index e7ba936d798b..80f6e70619cb 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -752,8 +752,11 @@ static const char * const gds_strings[] = {
>  	[GDS_MITIGATION_HYPERVISOR]	= "Unknown: Dependent on hypervisor status",
>  };
>  
> -bool gds_ucode_mitigated(void)
> +bool gds_ucode_mitigated(enum mitigation_info mitg)
>  {
> +	if (mitg == MITG_LOCKED)
> +		return gds_mitigation == GDS_MITIGATION_FULL_LOCKED;
> +
>  	return (gds_mitigation == GDS_MITIGATION_FULL ||
>  		gds_mitigation == GDS_MITIGATION_FULL_LOCKED);
>  }
> diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
> index 1e81d0704eea..23cf4a235f11 100644
> --- a/arch/x86/kernel/keylocker.c
> +++ b/arch/x86/kernel/keylocker.c
> @@ -113,6 +113,15 @@ void restore_keylocker(void)
>  	valid_wrapping_key = false;
>  }
>  
> +/* Check if Key Locker is secure enough to be used. */
> +static bool __init secure_keylocker(void)
> +{
> +	if (boot_cpu_has_bug(X86_BUG_GDS) && !gds_ucode_mitigated(MITG_LOCKED))
> +		return false;
> +
> +	return true;
> +}
> +
>  static int __init init_keylocker(void)
>  {
>  	u32 eax, ebx, ecx, edx;
> @@ -126,6 +135,9 @@ static int __init init_keylocker(void)
>  		goto clear_cap;
>  	}
>  
> +	if (!secure_keylocker())
> +		goto clear_cap;
> +
>  	cr4_set_bits(X86_CR4_KEYLOCKER);
>  
>  	/* AESKLE depends on CR4.KEYLOCKER */
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 47d9f03b7778..4ab50e95fdb5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1675,7 +1675,7 @@ static u64 kvm_get_arch_capabilities(void)
>  		 */
>  	}
>  
> -	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
> +	if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated(MITG_FULL))
>  		data |= ARCH_CAP_GDS_NO;
>  
>  	return data;

Repurposing gds_ucode_mitigated() to check for the locked state is
adding a bit of a churn. We can introduce gds_mitigation_locked()
instead.

Is below looking okay:

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 811548f131f4..8ba96e8a8754 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -722,6 +722,7 @@ enum mds_mitigations {
 };
 
 extern bool gds_ucode_mitigated(void);
+extern bool gds_mitigation_locked(void);
 
 /*
  * Make previous memory operations globally visible before
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ca295b0c1eee..a7ec26988ddb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -753,6 +753,11 @@ bool gds_ucode_mitigated(void)
 }
 EXPORT_SYMBOL_GPL(gds_ucode_mitigated);
 
+bool gds_mitigation_locked(void)
+{
+	return gds_mitigation == GDS_MITIGATION_FULL_LOCKED;
+}
+
 void update_gds_msr(void)
 {
 	u64 mcu_ctrl_after;
diff --git a/arch/x86/kernel/keylocker.c b/arch/x86/kernel/keylocker.c
index 1b57e11d93ad..c40e72f482b1 100644
--- a/arch/x86/kernel/keylocker.c
+++ b/arch/x86/kernel/keylocker.c
@@ -112,6 +112,15 @@ void restore_keylocker(void)
 	valid_wrapping_key = false;
 }
 
+/* Check if Key Locker is secure enough to be used. */
+static bool __init secure_keylocker(void)
+{
+	if (boot_cpu_has_bug(X86_BUG_GDS) && !gds_mitigation_locked())
+		return false;
+
+	return true;
+}
+
 static int __init init_keylocker(void)
 {
 	u32 eax, ebx, ecx, edx;
@@ -125,6 +134,9 @@ static int __init init_keylocker(void)
 		goto clear_cap;
 	}
 
+	if (!secure_keylocker())
+		goto clear_cap;
+
 	cr4_set_bits(X86_CR4_KEYLOCKER);
 
 	/* AESKLE depends on CR4.KEYLOCKER */

^ permalink raw reply related	[flat|nested] 247+ messages in thread

* [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
  2024-04-07 23:04             ` [PATCH v9a " Chang S. Bae
  2024-04-19  0:01               ` Pawan Gupta
@ 2024-04-19 17:47               ` Pawan Gupta
  2024-04-19 18:03                 ` Daniel Sneddon
  2024-04-22  7:35                 ` Chang S. Bae
  1 sibling, 2 replies; 247+ messages in thread
From: Pawan Gupta @ 2024-04-19 17:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Josh Poimboeuf, daniel.sneddon,
	antonio.gomez.iglesias

In order to safely enable Intel Keylocker feature, Gather Data Sampling
(GDS) mitigation should be enabled and locked. Hardware provides a way to
lock the mitigation, such that the mitigation cannot be disabled until the
CPU is reset. Currently, GDS mitigation is enabled without the lock.

Below is the recommendation from Intel:

  "Intel recommends that system software does not enable Key Locker (by
  setting CR4.KL) unless the GDS mitigation is enabled (IA32_MCU_OPT_CTRL
  [GDS_MITG_DIS] (bit 4) is 0) and locked (IA32_MCU_OPT_CTRL
  [GDS_MITG_LOCK](bit 5) is 1). This will prevent an adversary that takes
  control of the system from turning off the mitigation in order to infer
  the keys behind Key Locker handles." [1]

When GDS mitigation is enabled, and Keylocker feature is present, also lock
the mitigation.

[1] Gather Data Sampling (ID# 785676)
    https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/gather-data-sampling.html

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
This should ideally go before the patch that enables Keylocker. It is
only compile tested.

 arch/x86/kernel/cpu/bugs.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ca295b0c1eee..2777a58110e0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -755,8 +755,8 @@ EXPORT_SYMBOL_GPL(gds_ucode_mitigated);
 
 void update_gds_msr(void)
 {
-	u64 mcu_ctrl_after;
-	u64 mcu_ctrl;
+	u64 mcu_ctrl, mcu_ctrl_after;
+	u64 gds_lock = 0;
 
 	switch (gds_mitigation) {
 	case GDS_MITIGATION_OFF:
@@ -769,6 +769,8 @@ void update_gds_msr(void)
 		 * the same state. Make sure the mitigation is enabled on all
 		 * CPUs.
 		 */
+		gds_lock = GDS_MITG_LOCKED;
+		fallthrough;
 	case GDS_MITIGATION_FULL:
 		rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
 		mcu_ctrl &= ~GDS_MITG_DIS;
@@ -779,6 +781,7 @@ void update_gds_msr(void)
 		return;
 	}
 
+	mcu_ctrl |= gds_lock;
 	wrmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
 
 	/*
@@ -840,6 +843,11 @@ static void __init gds_select_mitigation(void)
 		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
 	}
 
+	/* Keylocker can only be enabled when GDS mitigation is locked */
+	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
+	    gds_mitigation == GDS_MITIGATION_FULL)
+		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
+
 	update_gds_msr();
 out:
 	pr_info("%s\n", gds_strings[gds_mitigation]);

---
base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680
change-id: 20240418-gds-lock-26ecbce88470

Best regards,
-- 
Thanks,
Pawan


^ permalink raw reply related	[flat|nested] 247+ messages in thread

* Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
  2024-04-19 17:47               ` [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present Pawan Gupta
@ 2024-04-19 18:03                 ` Daniel Sneddon
  2024-04-19 20:19                   ` Pawan Gupta
  2024-04-22  7:35                 ` Chang S. Bae
  1 sibling, 1 reply; 247+ messages in thread
From: Daniel Sneddon @ 2024-04-19 18:03 UTC (permalink / raw)
  To: Pawan Gupta, linux-kernel
  Cc: x86, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	chang.seok.bae, Josh Poimboeuf, antonio.gomez.iglesias

On 4/19/24 10:47, Pawan Gupta wrote:
> +	u64 gds_lock = 0;
>  
>  	switch (gds_mitigation) {
>  	case GDS_MITIGATION_OFF:
> @@ -769,6 +769,8 @@ void update_gds_msr(void)
>  		 * the same state. Make sure the mitigation is enabled on all
>  		 * CPUs.
>  		 */
> +		gds_lock = GDS_MITG_LOCKED;
Can't we just drop the new gds_lock var and set mcu_ctrl |= GDS_MITG_LOCKED here?
> +		fallthrough;
>  	case GDS_MITIGATION_FULL:
>  		rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
>  		mcu_ctrl &= ~GDS_MITG_DIS;
> @@ -779,6 +781,7 @@ void update_gds_msr(void)
>  		return;
>  	}
>  
> +	mcu_ctrl |= gds_lock;
>  	wrmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);



^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
  2024-04-19 18:03                 ` Daniel Sneddon
@ 2024-04-19 20:19                   ` Pawan Gupta
  2024-04-19 20:33                     ` Daniel Sneddon
  0 siblings, 1 reply; 247+ messages in thread
From: Pawan Gupta @ 2024-04-19 20:19 UTC (permalink / raw)
  To: Daniel Sneddon
  Cc: linux-kernel, x86, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, chang.seok.bae, Josh Poimboeuf,
	antonio.gomez.iglesias

On Fri, Apr 19, 2024 at 11:03:28AM -0700, Daniel Sneddon wrote:
> On 4/19/24 10:47, Pawan Gupta wrote:
> > +	u64 gds_lock = 0;
> >  
> >  	switch (gds_mitigation) {
> >  	case GDS_MITIGATION_OFF:
> > @@ -769,6 +769,8 @@ void update_gds_msr(void)
> >  		 * the same state. Make sure the mitigation is enabled on all
> >  		 * CPUs.
> >  		 */
> > +		gds_lock = GDS_MITG_LOCKED;
> Can't we just drop the new gds_lock var and set mcu_ctrl |= GDS_MITG_LOCKED here?

Unfortunately no, because ...

> > +		fallthrough;
> >  	case GDS_MITIGATION_FULL:
> >  		rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);

... mcu_ctrl is read here, it will overwrite any previous value.

> >  		mcu_ctrl &= ~GDS_MITG_DIS;
> > @@ -779,6 +781,7 @@ void update_gds_msr(void)
> >  		return;
> >  	}
> >  
> > +	mcu_ctrl |= gds_lock;
> >  	wrmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
  2024-04-19 20:19                   ` Pawan Gupta
@ 2024-04-19 20:33                     ` Daniel Sneddon
  0 siblings, 0 replies; 247+ messages in thread
From: Daniel Sneddon @ 2024-04-19 20:33 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: linux-kernel, x86, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, chang.seok.bae, Josh Poimboeuf,
	antonio.gomez.iglesias

On 4/19/24 13:19, Pawan Gupta wrote:
> ... mcu_ctrl is read here, it will overwrite any previous value.Ah, yep. Bummer.

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
  2024-04-19 17:47               ` [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present Pawan Gupta
  2024-04-19 18:03                 ` Daniel Sneddon
@ 2024-04-22  7:35                 ` Chang S. Bae
  2024-04-22 21:32                   ` Pawan Gupta
  1 sibling, 1 reply; 247+ messages in thread
From: Chang S. Bae @ 2024-04-22  7:35 UTC (permalink / raw)
  To: Pawan Gupta, linux-kernel
  Cc: x86, dan.j.williams, bernie.keany, charishma1.gairuboyina,
	Josh Poimboeuf, daniel.sneddon, antonio.gomez.iglesias

On 4/19/2024 10:47 AM, Pawan Gupta wrote:
>   
>   	/*
> @@ -840,6 +843,11 @@ static void __init gds_select_mitigation(void)
>   		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
>   	}
>   
> +	/* Keylocker can only be enabled when GDS mitigation is locked */
> +	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
> +	    gds_mitigation == GDS_MITIGATION_FULL)
> +		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> +

I'm having trouble understanding this change:

gds_select_mitigation()
{
	...
	if (gds_mitigation == GDS_MITIGATION_FORCE)
		gds_mitigation = GDS_MITIGATION_FULL;

	rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
	if (mcu_ctrl & GDS_MITG_LOCKED) {
		...
		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
	}

	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
	    gds_mitigation == GDS_MITIGATION_FULL)
		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;

As I understand it, gds_mitigation is set to GDS_MITIGATION_FULL only if 
the gds force option is enabled but IA32_MCU_OPT_CTRL[GDS_MITG_LOCK] is 
not set.

Then, if the CPU has Key Locker, this code sets gds_mitigation to 
GDS_MITIGATION_FULL_LOCKED, which seems contradictory. I'm not sure why 
this change is necessary.

I'm also not convinced that the Key Locker series needs to modify this 
function. The Key Locker setup code should simply check the current 
mitigation status and enable the feature only if proper mitigation is in 
place. Am I missing something here?

Thanks,
Chang




^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH v9a 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation
  2024-04-19  0:01               ` Pawan Gupta
@ 2024-04-22  7:49                 ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-04-22  7:49 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: linux-kernel, linux-crypto, ebiggers, luto, dave.hansen, tglx,
	bp, mingo, x86, herbert, ardb, elliott, dan.j.williams,
	bernie.keany, charishma1.gairuboyina, Dave Hansen

On 4/18/2024 5:01 PM, Pawan Gupta wrote:
> 
> Repurposing gds_ucode_mitigated() to check for the locked state is
> adding a bit of a churn. We can introduce gds_mitigation_locked()
> instead.

I thought this option but I was less convinced about adding a new 
function for every new but slightly different check.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
  2024-04-22  7:35                 ` Chang S. Bae
@ 2024-04-22 21:32                   ` Pawan Gupta
  2024-04-22 22:13                     ` Chang S. Bae
  0 siblings, 1 reply; 247+ messages in thread
From: Pawan Gupta @ 2024-04-22 21:32 UTC (permalink / raw)
  To: Chang S. Bae
  Cc: linux-kernel, x86, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, Josh Poimboeuf, daniel.sneddon,
	antonio.gomez.iglesias

On Mon, Apr 22, 2024 at 12:35:45AM -0700, Chang S. Bae wrote:
> On 4/19/2024 10:47 AM, Pawan Gupta wrote:
> >   	/*
> > @@ -840,6 +843,11 @@ static void __init gds_select_mitigation(void)
> >   		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> >   	}
> > +	/* Keylocker can only be enabled when GDS mitigation is locked */
> > +	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
> > +	    gds_mitigation == GDS_MITIGATION_FULL)
> > +		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> > +
> 
> I'm having trouble understanding this change:
> 
> gds_select_mitigation()
> {
> 	...
> 	if (gds_mitigation == GDS_MITIGATION_FORCE)
> 		gds_mitigation = GDS_MITIGATION_FULL;
> 
> 	rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_ctrl);
> 	if (mcu_ctrl & GDS_MITG_LOCKED) {
> 		...
> 		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> 	}
> 
> 	if (boot_cpu_has(X86_FEATURE_KEYLOCKER) &&
> 	    gds_mitigation == GDS_MITIGATION_FULL)
> 		gds_mitigation = GDS_MITIGATION_FULL_LOCKED;
> 
> As I understand it, gds_mitigation is set to GDS_MITIGATION_FULL only if the
> gds force option is enabled but IA32_MCU_OPT_CTRL[GDS_MITG_LOCK] is not set.

Not true, GDS_MITIGATION_FULL is the default. Cmdline
gather_data_sampling=force deploys a software fallback mitigation when
the microcode mitigation is not present. But, when microcode mitigation
is present, mitigation is set to GDS_MITIGATION_FULL.

> Then, if the CPU has Key Locker, this code sets gds_mitigation to
> GDS_MITIGATION_FULL_LOCKED, which seems contradictory. I'm not sure why this
> change is necessary.
>
> I'm also not convinced that the Key Locker series needs to modify this
> function. The Key Locker setup code should simply check the current
> mitigation status and enable the feature only if proper mitigation is in
> place. Am I missing something here?

To enable Key Locker feature, "proper mitigation" is microcode mitigation
enabled and the GDS_MITG_LOCK bit set in MSR_IA32_MCU_OPT_CTRL. Do you
agree?

If not via this patch, how is GDS_MITG_LOCK going to be set?

Below is from Intel's documentation:

  "Intel recommends that system software does not enable Key Locker (by
  setting CR4.KL) unless the GDS mitigation is enabled (IA32_MCU_OPT_CTRL
  [GDS_MITG_DIS] (bit 4) is 0) and locked (IA32_MCU_OPT_CTRL
  [GDS_MITG_LOCK](bit 5) is 1). This will prevent an adversary that takes
  control of the system from turning off the mitigation in order to infer
  the keys behind Key Locker handles.

  To support GDS mitigation locking for Key Locker, microcode updates
  for Tiger Lake systems enable the following model-specific behavior
  for GDS_MITG_LOCK. On these systems, a write to IA32_MCU_OPT_CTRL MSR
  with GDS_MITG_DIS (bit 4) value 0 and GDS_MITG_LOCK (bit 5) value 1
  will lock both bits at these values until reset."

^ permalink raw reply	[flat|nested] 247+ messages in thread

* Re: [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present
  2024-04-22 21:32                   ` Pawan Gupta
@ 2024-04-22 22:13                     ` Chang S. Bae
  0 siblings, 0 replies; 247+ messages in thread
From: Chang S. Bae @ 2024-04-22 22:13 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: linux-kernel, x86, dan.j.williams, bernie.keany,
	charishma1.gairuboyina, Josh Poimboeuf, daniel.sneddon,
	antonio.gomez.iglesias

On 4/22/2024 2:32 PM, Pawan Gupta wrote:
> 
> To enable Key Locker feature, "proper mitigation" is microcode mitigation
> enabled and the GDS_MITG_LOCK bit set in MSR_IA32_MCU_OPT_CTRL. Do you
> agree?
> > If not via this patch, how is GDS_MITG_LOCK going to be set?

The lock bit seems to be set by microcode when SGX is available. 
However, if the lock bit is not set for Key Locker, it does seem odd. 
Introducing kernel code to override this situation might be seen as a 
workaround rather than a proper solution, potentially leading to more 
confusion.

I'd rather investigate the behavior of the microcode further, verify its 
consistency, and gain a clearer understanding of the requirement for 
this lock bit.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 247+ messages in thread

end of thread, other threads:[~2024-04-22 22:13 UTC | newest]

Thread overview: 247+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-12 21:12 [PATCH v5 00/12] x86: Support Key Locker Chang S. Bae
2022-01-12 21:12 ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 01/12] Documentation/x86: Document " Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2023-06-05 10:52   ` Bagas Sanjaya
2023-06-05 10:52     ` [dm-devel] " Bagas Sanjaya
2022-01-12 21:12 ` [PATCH v5 02/12] x86/cpufeature: Enumerate Key Locker feature Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 03/12] x86/insn: Add Key Locker instructions to the opcode map Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 05/12] x86/msr-index: Add MSRs for Key Locker internal wrapping key Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 06/12] x86/keylocker: Define Key Locker CPUID leaf Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-08-23 15:49   ` Evan Green
2022-08-23 15:49     ` [dm-devel] " Evan Green
2022-08-24 22:20     ` Chang S. Bae
2022-08-24 22:20       ` [dm-devel] " Chang S. Bae
2022-08-24 22:52       ` Evan Green
2022-08-24 22:52         ` [dm-devel] " Evan Green
2022-08-25  1:06         ` Chang S. Bae
2022-08-25  1:06           ` [dm-devel] " Chang S. Bae
2022-08-25 15:31           ` Evan Green
2022-08-25 15:31             ` [dm-devel] " Evan Green
2022-08-31 23:08             ` Chang S. Bae
2022-08-31 23:08               ` [dm-devel] " Chang S. Bae
2022-09-06 16:22               ` Evan Green
2022-09-06 16:22                 ` [dm-devel] " Evan Green
2022-09-06 16:46                 ` Chang S. Bae
2022-09-06 16:46                   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4 Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-29 17:31   ` [PATCH v5-fix " Chang S. Bae
2022-01-29 17:31     ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 09/12] x86/cpu: Add a configuration and command line option for Key Locker Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 10/12] crypto: x86/aes - Prepare for a new AES implementation Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-12 21:12 ` [PATCH v5 12/12] crypto: x86/aes-kl - Support XTS mode Chang S. Bae
2022-01-12 21:12   ` [dm-devel] " Chang S. Bae
2022-01-13 22:16 ` [PATCH v5 00/12] x86: Support Key Locker Dave Hansen
2022-01-13 22:16   ` [dm-devel] " Dave Hansen
2022-01-13 22:34   ` Bae, Chang Seok
2022-01-13 22:34     ` [dm-devel] " Bae, Chang Seok
2023-04-10 22:59 ` [PATCH v6 " Chang S. Bae
2023-04-10 22:59   ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 01/12] Documentation/x86: Document " Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 02/12] x86/cpufeature: Enumerate Key Locker feature Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 03/12] x86/insn: Add Key Locker instructions to the opcode map Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 05/12] x86/msr-index: Add MSRs for Key Locker internal wrapping key Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 06/12] x86/keylocker: Define Key Locker CPUID leaf Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 07/12] x86/cpu/keylocker: Load an internal wrapping key at boot-time Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-05-05 23:05     ` Eric Biggers
2023-05-05 23:05       ` [dm-devel] " Eric Biggers
2023-05-08 18:18       ` Chang S. Bae
2023-05-08 18:18         ` [dm-devel] " Chang S. Bae
2023-05-08 21:56         ` Dave Hansen
2023-05-08 21:56           ` [dm-devel] " Dave Hansen
2023-05-09  0:31           ` Chang S. Bae
2023-05-09  0:31             ` [dm-devel] " Chang S. Bae
2023-05-09  0:51             ` Dave Hansen
2023-05-09  0:51               ` [dm-devel] " Dave Hansen
2023-05-08 19:18     ` Elliott, Robert (Servers)
2023-05-08 19:18       ` [dm-devel] " Elliott, Robert (Servers)
2023-05-08 20:15       ` Chang S. Bae
2023-05-08 20:15         ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 08/12] x86/PM/keylocker: Restore internal wrapping key on resume from ACPI S3/4 Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-05-05 23:09     ` Eric Biggers
2023-05-05 23:09       ` [dm-devel] " Eric Biggers
2023-05-08 18:18       ` Chang S. Bae
2023-05-08 18:18         ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 09/12] x86/cpu: Add a configuration and command line option for Key Locker Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 10/12] crypto: x86/aes - Prepare for a new AES implementation Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-05-05 23:27     ` Eric Biggers
2023-05-05 23:27       ` [dm-devel] " Eric Biggers
2023-05-09  0:55       ` Chang S. Bae
2023-05-09  0:55         ` [dm-devel] " Chang S. Bae
2023-05-11 19:05         ` Chang S. Bae
2023-05-11 19:05           ` [dm-devel] " Chang S. Bae
2023-05-11 21:39           ` Eric Biggers
2023-05-11 21:39             ` [dm-devel] " Eric Biggers
2023-05-11 23:19             ` Chang S. Bae
2023-05-11 23:19               ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 11/12] crypto: x86/aes-kl - Support AES algorithm using Key Locker instructions Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-05-06  0:01     ` Eric Biggers
2023-05-06  0:01       ` [dm-devel] " Eric Biggers
2023-05-08 18:18       ` Chang S. Bae
2023-05-08 18:18         ` [dm-devel] " Chang S. Bae
2023-05-24 17:18         ` Chang S. Bae
2023-05-24 17:18           ` [dm-devel] " Chang S. Bae
2023-05-12 17:52       ` Milan Broz
2023-05-12 17:52         ` [dm-devel] " Milan Broz
2023-05-08 19:21     ` Elliott, Robert (Servers)
2023-05-08 19:21       ` [dm-devel] " Elliott, Robert (Servers)
2023-05-08 19:24       ` Elliott, Robert (Servers)
2023-05-08 19:24         ` [dm-devel] " Elliott, Robert (Servers)
2023-05-08 20:00         ` Chang S. Bae
2023-05-08 20:00           ` [dm-devel] " Chang S. Bae
2023-04-10 22:59   ` [PATCH v6 12/12] crypto: x86/aes-kl - Support XTS mode Chang S. Bae
2023-04-10 22:59     ` [dm-devel] " Chang S. Bae
2023-05-24 16:57   ` [PATCH v7 00/12] x86: Support Key Locker Chang S. Bae
2023-05-24 16:57     ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 01/12] Documentation/x86: Document " Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 02/12] x86/cpufeature: Enumerate Key Locker feature Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 03/12] x86/insn: Add Key Locker instructions to the opcode map Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 05/12] x86/msr-index: Add MSRs for Key Locker wrapping key Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 06/12] x86/keylocker: Define Key Locker CPUID leaf Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 07/12] x86/cpu/keylocker: Load a wrapping key at boot-time Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 08/12] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4 Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 09/12] x86/cpu: Add a configuration and command line option for Key Locker Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-26  6:54       ` Eric Biggers
2023-05-26  6:54         ` [dm-devel] " Eric Biggers
2023-05-30 20:50         ` Chang S. Bae
2023-05-30 20:50           ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [PATCH v7 11/12] crypto: x86/aes - Prepare for a new AES implementation Chang S. Bae
2023-05-24 16:57       ` [dm-devel] " Chang S. Bae
2023-05-24 16:57     ` [dm-devel] [PATCH v7 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm Chang S. Bae
2023-05-24 16:57       ` Chang S. Bae
2023-05-26  7:23       ` Eric Biggers
2023-05-26  7:23         ` [dm-devel] " Eric Biggers
2023-05-30 20:49         ` Chang S. Bae
2023-05-30 20:49           ` [dm-devel] " Chang S. Bae
2023-06-03 15:22     ` [PATCH v8 00/12] x86: Support Key Locker Chang S. Bae
2023-06-03 15:22       ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 01/12] Documentation/x86: Document " Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-05 10:54         ` Bagas Sanjaya
2023-06-05 10:54           ` [dm-devel] " Bagas Sanjaya
2023-06-06  2:17         ` Randy Dunlap
2023-06-06  2:17           ` [dm-devel] " Randy Dunlap
2023-06-06  4:18           ` Chang S. Bae
2023-06-06  4:18             ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 02/12] x86/cpufeature: Enumerate Key Locker feature Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 03/12] x86/insn: Add Key Locker instructions to the opcode map Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 04/12] x86/asm: Add a wrapper function for the LOADIWKEY instruction Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 05/12] x86/msr-index: Add MSRs for Key Locker wrapping key Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 06/12] x86/keylocker: Define Key Locker CPUID leaf Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 07/12] x86/cpu/keylocker: Load a wrapping key at boot-time Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 08/12] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4 Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 09/12] x86/cpu: Add a configuration and command line option for Key Locker Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 16:37         ` Borislav Petkov
2023-06-03 16:37           ` [dm-devel] " Borislav Petkov
2023-06-04 22:13           ` Chang S. Bae
2023-06-04 22:13             ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 10/12] crypto: x86/aesni - Use the proper data type in struct aesni_xts_ctx Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-04 15:34         ` Eric Biggers
2023-06-04 15:34           ` [dm-devel] " Eric Biggers
2023-06-04 22:02           ` Chang S. Bae
2023-06-04 22:02             ` [dm-devel] " Chang S. Bae
2023-06-05  2:46             ` Eric Biggers
2023-06-05  2:46               ` [dm-devel] " Eric Biggers
2023-06-05  4:41               ` Chang S. Bae
2023-06-05  4:41                 ` Chang S. Bae
2023-06-21 12:06                 ` [PATCH] crypto: x86/aesni: Align the address before aes_set_key_common() Chang S. Bae
2023-07-14  8:51                   ` Herbert Xu
2023-06-03 15:22       ` [PATCH v8 11/12] crypto: x86/aes - Prepare for a new AES-XTS implementation Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-03 15:22       ` [PATCH v8 12/12] crypto: x86/aes-kl - Implement the AES-XTS algorithm Chang S. Bae
2023-06-03 15:22         ` [dm-devel] " Chang S. Bae
2023-06-07  5:35         ` Eric Biggers
2023-06-07  5:35           ` [dm-devel] " Eric Biggers
2023-06-07 22:06           ` Chang S. Bae
2023-06-07 22:06             ` [dm-devel] " Chang S. Bae
2024-03-11 21:32           ` [PATCH] crypto: x86/aesni - Update aesni_set_key() to return void Chang S. Bae
2024-03-12  2:15             ` Eric Biggers
2024-03-12  7:46             ` Ard Biesheuvel
2024-03-12 15:03               ` Chang S. Bae
2024-03-12 15:18                 ` Ard Biesheuvel
2024-03-12 15:37                   ` Chang S. Bae
2024-03-22 23:04             ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Chang S. Bae
2024-03-22 23:04               ` [PATCH v2 1/2] crypto: x86/aesni - Rearrange AES key size check Chang S. Bae
2024-03-22 23:04               ` [PATCH v2 2/2] crypto: x86/aesni - Update aesni_set_key() to return void Chang S. Bae
2024-03-28 10:57               ` [PATCH v2 0/2] crypto: x86/aesni - Simplify AES key expansion code Herbert Xu
2024-03-29  1:53       ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 01/14] Documentation/x86: Document " Chang S. Bae
2024-03-31 15:48           ` Randy Dunlap
2024-03-29  1:53         ` [PATCH v9 02/14] x86/cpufeature: Enumerate Key Locker feature Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 03/14] x86/insn: Add Key Locker instructions to the opcode map Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 04/14] x86/asm: Add a wrapper function for the LOADIWKEY instruction Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 05/14] x86/msr-index: Add MSRs for Key Locker wrapping key Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 06/14] x86/keylocker: Define Key Locker CPUID leaf Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 07/14] x86/cpu/keylocker: Load a wrapping key at boot time Chang S. Bae
2024-04-07 23:04           ` [PATCH v9a " Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 08/14] x86/PM/keylocker: Restore the wrapping key on the resume from ACPI S3/4 Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 09/14] x86/hotplug/keylocker: Ensure wrapping key backup capability Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 10/14] x86/cpu/keylocker: Check Gather Data Sampling mitigation Chang S. Bae
2024-03-29  6:57           ` Pawan Gupta
2024-04-07 23:04             ` [PATCH v9a " Chang S. Bae
2024-04-19  0:01               ` Pawan Gupta
2024-04-22  7:49                 ` Chang S. Bae
2024-04-19 17:47               ` [PATCH 15/14] x86/gds: Lock GDS mitigation when keylocker feature is present Pawan Gupta
2024-04-19 18:03                 ` Daniel Sneddon
2024-04-19 20:19                   ` Pawan Gupta
2024-04-19 20:33                     ` Daniel Sneddon
2024-04-22  7:35                 ` Chang S. Bae
2024-04-22 21:32                   ` Pawan Gupta
2024-04-22 22:13                     ` Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 11/14] x86/cpu/keylocker: Check Register File Data Sampling mitigation Chang S. Bae
2024-03-29  6:20           ` Pawan Gupta
2024-04-07 23:04             ` [PATCH v9a " Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 12/14] x86/Kconfig: Add a configuration for Key Locker Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 13/14] crypto: x86/aes - Prepare for new AES-XTS implementation Chang S. Bae
2024-03-29  1:53         ` [PATCH v9 14/14] crypto: x86/aes-kl - Implement the AES-XTS algorithm Chang S. Bae
2024-04-07 23:24         ` [PATCH v9 00/14] x86: Support Key Locker Chang S. Bae
2024-04-08  1:48           ` Eric Biggers
2024-04-15 22:16             ` Chang S. Bae
2024-04-15 22:54               ` Eric Biggers
2024-04-15 22:58                 ` Chang S. Bae

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.