All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/38] crypto: Provide aes-round.h and host accel
@ 2023-06-09  2:23 Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 01/38] tcg/ppc: Define _CALL_AIX for clang on ppc64(be) Richard Henderson
                   ` (37 more replies)
  0 siblings, 38 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Inspired by Ard Biesheuvel's RFC patches for accelerating AES
under emulation, provide a set of primitives that maps between
the guest and host fragments.

There is a small guest correctness test case.

I think the end result is quite a bit cleaner, since the logic
is now centralized, rather than spread across 4 different guests.

Further work could clean up crypto/aes.c itself to use these
instead of the tables directly.  I'm sure that's just an ultimate
fallback when an appropriate system library is not available, and
so not terribly important, but it could still significantly reduce
the amount of code we carry.

I would imagine structuring a polynomial multiplication header
in a similar way.  There are 4 or 5 versions of those spread across
the different guests.

Changes for v2:
  * Change aesenc_SB_SR -> aesenc_SB_SR_AK
  * Change aesdec_ISB_ISR -> aesdec_ISB_ISR_AK

  Both of these because if we have to provide a zero to x86 and ppc
  hosts, we can do that at the guest level just as easily as the host.
  Which allows x86 and ppc guests to provide the key their key.

  * Add aesdec_ISB_ISR_AK_IMC

  Provide a variation for the Power8 primitive.  Easy enough to do
  with two x86 instructions.

  * Add ppc host support.

  Nasty issues with <altivec.h>, fighting with builtins vs bswap,
  so everything is in inline asm.


r~


Richard Henderson (38):
  tcg/ppc: Define _CALL_AIX for clang on ppc64(be)
  util: Add cpuinfo-ppc.c
  tests/multiarch: Add test-aes
  target/arm: Move aesmc and aesimc tables to crypto/aes.c
  crypto/aes: Add constants for ShiftRows, InvShiftRows
  crypto: Add aesenc_SB_SR_AK
  target/i386: Use aesenc_SB_SR_AK
  target/arm: Demultiplex AESE and AESMC
  target/arm: Use aesenc_SB_SR_AK
  target/ppc: Use aesenc_SB_SR_AK
  target/riscv: Use aesenc_SB_SR_AK
  crypto: Add aesdec_ISB_ISR_AK
  target/i386: Use aesdec_ISB_ISR_AK
  target/arm: Use aesdec_ISB_ISR_AK
  target/ppc: Use aesdec_ISB_ISR_AK
  target/riscv: Use aesdec_ISB_ISR_AK
  crypto: Add aesenc_MC
  target/arm: Use aesenc_MC
  crypto: Add aesdec_IMC
  target/i386: Use aesdec_IMC
  target/arm: Use aesdec_IMC
  target/riscv: Use aesdec_IMC
  crypto: Add aesenc_SB_SR_MC_AK
  target/i386: Use aesenc_SB_SR_MC_AK
  target/ppc: Use aesenc_SB_SR_MC_AK
  target/riscv: Use aesenc_SB_SR_MC_AK
  crypto: Add aesdec_ISB_ISR_IMC_AK
  target/i386: Use aesdec_ISB_ISR_IMC_AK
  target/riscv: Use aesdec_ISB_ISR_IMC_AK
  crypto: Add aesdec_ISB_ISR_AK_IMC
  target/ppc: Use aesdec_ISB_ISR_AK_IMC
  crypto: Remove AES_shifts, AES_ishifts
  crypto: Implement aesdec_IMC with AES_imc_rot
  crypto: Remove AES_imc
  crypto: Unexport AES_*_rot, AES_TeN, AES_TdN
  host/include/i386: Implement aes-round.h
  host/include/aarch64: Implement aes-round.h
  host/include/ppc: Implement aes-round.h

 meson.build                             |   9 +
 host/include/aarch64/host/aes-round.h   | 205 ++++++
 host/include/aarch64/host/cpuinfo.h     |   1 +
 host/include/generic/host/aes-round.h   |  33 +
 host/include/i386/host/aes-round.h      | 152 +++++
 host/include/i386/host/cpuinfo.h        |   1 +
 host/include/ppc/host/aes-round.h       | 181 ++++++
 host/include/ppc/host/cpuinfo.h         |  30 +
 host/include/ppc64/host/aes-round.h     |   1 +
 host/include/ppc64/host/cpuinfo.h       |   1 +
 host/include/x86_64/host/aes-round.h    |   1 +
 include/crypto/aes-round.h              | 164 +++++
 include/crypto/aes.h                    |  30 -
 target/arm/helper.h                     |   2 +
 target/i386/ops_sse.h                   |  60 +-
 tcg/ppc/tcg-target.h                    |  16 +-
 target/arm/tcg/sve.decode               |   4 +-
 crypto/aes.c                            | 796 ++++++++++++++++--------
 target/arm/tcg/crypto_helper.c          | 249 +++-----
 target/arm/tcg/translate-a64.c          |  13 +-
 target/arm/tcg/translate-neon.c         |   4 +-
 target/arm/tcg/translate-sve.c          |   8 +-
 target/ppc/int_helper.c                 |  50 +-
 target/riscv/crypto_helper.c            | 138 ++--
 tests/tcg/aarch64/test-aes.c            |  58 ++
 tests/tcg/i386/test-aes.c               |  68 ++
 tests/tcg/ppc64/test-aes.c              | 116 ++++
 tests/tcg/riscv64/test-aes.c            |  76 +++
 util/cpuinfo-aarch64.c                  |   2 +
 util/cpuinfo-i386.c                     |   3 +
 util/cpuinfo-ppc.c                      |  65 ++
 tcg/ppc/tcg-target.c.inc                |  67 +-
 tests/tcg/multiarch/test-aes-main.c.inc | 183 ++++++
 tests/tcg/aarch64/Makefile.target       |   4 +
 tests/tcg/i386/Makefile.target          |   4 +
 tests/tcg/ppc64/Makefile.target         |   1 +
 tests/tcg/riscv64/Makefile.target       |   4 +
 util/meson.build                        |   2 +
 38 files changed, 2074 insertions(+), 728 deletions(-)
 create mode 100644 host/include/aarch64/host/aes-round.h
 create mode 100644 host/include/generic/host/aes-round.h
 create mode 100644 host/include/i386/host/aes-round.h
 create mode 100644 host/include/ppc/host/aes-round.h
 create mode 100644 host/include/ppc/host/cpuinfo.h
 create mode 100644 host/include/ppc64/host/aes-round.h
 create mode 100644 host/include/ppc64/host/cpuinfo.h
 create mode 100644 host/include/x86_64/host/aes-round.h
 create mode 100644 include/crypto/aes-round.h
 create mode 100644 tests/tcg/aarch64/test-aes.c
 create mode 100644 tests/tcg/i386/test-aes.c
 create mode 100644 tests/tcg/ppc64/test-aes.c
 create mode 100644 tests/tcg/riscv64/test-aes.c
 create mode 100644 util/cpuinfo-ppc.c
 create mode 100644 tests/tcg/multiarch/test-aes-main.c.inc

-- 
2.34.1



^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 01/38] tcg/ppc: Define _CALL_AIX for clang on ppc64(be)
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-12 13:25   ` Daniel Henrique Barboza
  2023-06-09  2:23 ` [PATCH v2 02/38] util: Add cpuinfo-ppc.c Richard Henderson
                   ` (36 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Restructure the ifdef ladder, separating 64-bit from 32-bit,
and ensure _CALL_AIX is set for ELF v1.  Fixes the build for
ppc64 big-endian host with clang.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/ppc/tcg-target.c.inc | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 507fe6cda8..5c8378f8f6 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -29,15 +29,24 @@
 /*
  * Standardize on the _CALL_FOO symbols used by GCC:
  * Apple XCode does not define _CALL_DARWIN.
- * Clang defines _CALL_ELF (64-bit) but not _CALL_SYSV (32-bit).
+ * Clang defines _CALL_ELF (64-bit) but not _CALL_SYSV or _CALL_AIX.
  */
-#if !defined(_CALL_SYSV) && \
-    !defined(_CALL_DARWIN) && \
-    !defined(_CALL_AIX) && \
-    !defined(_CALL_ELF)
-# if defined(__APPLE__)
+#if TCG_TARGET_REG_BITS == 64
+# ifdef _CALL_AIX
+    /* ok */
+# elif defined(_CALL_ELF) && _CALL_ELF == 1
+#  define _CALL_AIX
+# elif defined(_CALL_ELF) && _CALL_ELF == 2
+    /* ok */
+# else
+#  error "Unknown ABI"
+# endif
+#else
+# if defined(_CALL_SYSV) || defined(_CALL_DARWIN)
+    /* ok */
+# elif defined(__APPLE__)
 #  define _CALL_DARWIN
-# elif defined(__ELF__) && TCG_TARGET_REG_BITS == 32
+# elif defined(__ELF__)
 #  define _CALL_SYSV
 # else
 #  error "Unknown ABI"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 02/38] util: Add cpuinfo-ppc.c
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 01/38] tcg/ppc: Define _CALL_AIX for clang on ppc64(be) Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-12 13:27   ` Daniel Henrique Barboza
  2023-06-19 10:37   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 03/38] tests/multiarch: Add test-aes Richard Henderson
                   ` (35 subsequent siblings)
  37 siblings, 2 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Move the code from tcg/.  Fix a bug in that PPC_FEATURE2_ARCH_3_10
is actually spelled PPC_FEATURE2_ARCH_3_1.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/ppc/host/cpuinfo.h   | 29 ++++++++++++++++
 host/include/ppc64/host/cpuinfo.h |  1 +
 tcg/ppc/tcg-target.h              | 16 ++++-----
 util/cpuinfo-ppc.c                | 57 +++++++++++++++++++++++++++++++
 tcg/ppc/tcg-target.c.inc          | 44 +-----------------------
 util/meson.build                  |  2 ++
 6 files changed, 98 insertions(+), 51 deletions(-)
 create mode 100644 host/include/ppc/host/cpuinfo.h
 create mode 100644 host/include/ppc64/host/cpuinfo.h
 create mode 100644 util/cpuinfo-ppc.c

diff --git a/host/include/ppc/host/cpuinfo.h b/host/include/ppc/host/cpuinfo.h
new file mode 100644
index 0000000000..7ec252ef52
--- /dev/null
+++ b/host/include/ppc/host/cpuinfo.h
@@ -0,0 +1,29 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Host specific cpu indentification for ppc.
+ */
+
+#ifndef HOST_CPUINFO_H
+#define HOST_CPUINFO_H
+
+/* Digested version of <cpuid.h> */
+
+#define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
+#define CPUINFO_V2_06           (1u << 1)
+#define CPUINFO_V2_07           (1u << 2)
+#define CPUINFO_V3_00           (1u << 3)
+#define CPUINFO_V3_10           (1u << 4)
+#define CPUINFO_ISEL            (1u << 5)
+#define CPUINFO_ALTIVEC         (1u << 6)
+#define CPUINFO_VSX             (1u << 7)
+
+/* Initialized with a constructor. */
+extern unsigned cpuinfo;
+
+/*
+ * We cannot rely on constructor ordering, so other constructors must
+ * use the function interface rather than the variable above.
+ */
+unsigned cpuinfo_init(void);
+
+#endif /* HOST_CPUINFO_H */
diff --git a/host/include/ppc64/host/cpuinfo.h b/host/include/ppc64/host/cpuinfo.h
new file mode 100644
index 0000000000..2f036a0627
--- /dev/null
+++ b/host/include/ppc64/host/cpuinfo.h
@@ -0,0 +1 @@
+#include "host/include/ppc/host/cpuinfo.h"
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index c7552b6391..b632a5a647 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -25,6 +25,8 @@
 #ifndef PPC_TCG_TARGET_H
 #define PPC_TCG_TARGET_H
 
+#include "host/cpuinfo.h"
+
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 #define TCG_TARGET_NB_REGS 64
@@ -61,14 +63,12 @@ typedef enum {
     tcg_isa_3_10,
 } TCGPowerISA;
 
-extern TCGPowerISA have_isa;
-extern bool have_altivec;
-extern bool have_vsx;
-
-#define have_isa_2_06  (have_isa >= tcg_isa_2_06)
-#define have_isa_2_07  (have_isa >= tcg_isa_2_07)
-#define have_isa_3_00  (have_isa >= tcg_isa_3_00)
-#define have_isa_3_10  (have_isa >= tcg_isa_3_10)
+#define have_isa_2_06  (cpuinfo & CPUINFO_V2_06)
+#define have_isa_2_07  (cpuinfo & CPUINFO_V2_07)
+#define have_isa_3_00  (cpuinfo & CPUINFO_V3_00)
+#define have_isa_3_10  (cpuinfo & CPUINFO_V3_10)
+#define have_altivec   (cpuinfo & CPUINFO_ALTIVEC)
+#define have_vsx       (cpuinfo & CPUINFO_VSX)
 
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_ext8u_i32        0 /* andi */
diff --git a/util/cpuinfo-ppc.c b/util/cpuinfo-ppc.c
new file mode 100644
index 0000000000..ee761de33a
--- /dev/null
+++ b/util/cpuinfo-ppc.c
@@ -0,0 +1,57 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Host specific cpu indentification for ppc.
+ */
+
+#include "qemu/osdep.h"
+#include "host/cpuinfo.h"
+
+#ifdef CONFIG_GETAUXVAL
+# include <sys/auxv.h>
+#else
+# include <asm/cputable.h>
+# include "elf.h"
+#endif
+
+unsigned cpuinfo;
+
+/* Called both as constructor and (possibly) via other constructors. */
+unsigned __attribute__((constructor)) cpuinfo_init(void)
+{
+    unsigned info = cpuinfo;
+    unsigned long hwcap, hwcap2;
+
+    if (info) {
+        return info;
+    }
+
+    hwcap = qemu_getauxval(AT_HWCAP);
+    hwcap2 = qemu_getauxval(AT_HWCAP2);
+    info = CPUINFO_ALWAYS;
+
+    if (hwcap & PPC_FEATURE_ARCH_2_06) {
+        info |= CPUINFO_V2_06;
+    }
+    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
+        info |= CPUINFO_V2_07;
+    }
+    if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
+        info |= CPUINFO_V3_00;
+    }
+    if (hwcap2 & PPC_FEATURE2_ARCH_3_1) {
+        info |= CPUINFO_V3_10;
+    }
+    if (hwcap2 & PPC_FEATURE2_HAS_ISEL) {
+        info |= CPUINFO_ISEL;
+    }
+    if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
+        info |= CPUINFO_ALTIVEC;
+        /* We only care about the portion of VSX that overlaps Altivec. */
+        if (hwcap & PPC_FEATURE_HAS_VSX) {
+            info |= CPUINFO_VSX;
+        }
+    }
+
+    cpuinfo = info;
+    return info;
+}
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 5c8378f8f6..c866f2c997 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -101,10 +101,7 @@
 #define ALL_GENERAL_REGS  0xffffffffu
 #define ALL_VECTOR_REGS   0xffffffff00000000ull
 
-TCGPowerISA have_isa;
-static bool have_isel;
-bool have_altivec;
-bool have_vsx;
+#define have_isel  (cpuinfo & CPUINFO_ISEL)
 
 #ifndef CONFIG_SOFTMMU
 #define TCG_GUEST_BASE_REG 30
@@ -3879,45 +3876,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
 static void tcg_target_init(TCGContext *s)
 {
-    unsigned long hwcap = qemu_getauxval(AT_HWCAP);
-    unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
-
-    have_isa = tcg_isa_base;
-    if (hwcap & PPC_FEATURE_ARCH_2_06) {
-        have_isa = tcg_isa_2_06;
-    }
-#ifdef PPC_FEATURE2_ARCH_2_07
-    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
-        have_isa = tcg_isa_2_07;
-    }
-#endif
-#ifdef PPC_FEATURE2_ARCH_3_00
-    if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
-        have_isa = tcg_isa_3_00;
-    }
-#endif
-#ifdef PPC_FEATURE2_ARCH_3_10
-    if (hwcap2 & PPC_FEATURE2_ARCH_3_10) {
-        have_isa = tcg_isa_3_10;
-    }
-#endif
-
-#ifdef PPC_FEATURE2_HAS_ISEL
-    /* Prefer explicit instruction from the kernel. */
-    have_isel = (hwcap2 & PPC_FEATURE2_HAS_ISEL) != 0;
-#else
-    /* Fall back to knowing Power7 (2.06) has ISEL. */
-    have_isel = have_isa_2_06;
-#endif
-
-    if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
-        have_altivec = true;
-        /* We only care about the portion of VSX that overlaps Altivec. */
-        if (hwcap & PPC_FEATURE_HAS_VSX) {
-            have_vsx = true;
-        }
-    }
-
     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
     if (have_altivec) {
diff --git a/util/meson.build b/util/meson.build
index 3a93071d27..a375160286 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -113,4 +113,6 @@ if cpu == 'aarch64'
   util_ss.add(files('cpuinfo-aarch64.c'))
 elif cpu in ['x86', 'x86_64']
   util_ss.add(files('cpuinfo-i386.c'))
+elif cpu in ['ppc', 'ppc64']
+  util_ss.add(files('cpuinfo-ppc.c'))
 endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 03/38] tests/multiarch: Add test-aes
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 01/38] tcg/ppc: Define _CALL_AIX for clang on ppc64(be) Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 02/38] util: Add cpuinfo-ppc.c Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-12 14:46   ` Alex Bennée
  2023-06-09  2:23 ` [PATCH v2 04/38] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
                   ` (34 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Use a shared driver and backends for i386, aarch64, ppc64, riscv64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tests/tcg/aarch64/test-aes.c            |  58 ++++++++
 tests/tcg/i386/test-aes.c               |  68 +++++++++
 tests/tcg/ppc64/test-aes.c              | 116 +++++++++++++++
 tests/tcg/riscv64/test-aes.c            |  76 ++++++++++
 tests/tcg/multiarch/test-aes-main.c.inc | 183 ++++++++++++++++++++++++
 tests/tcg/aarch64/Makefile.target       |   4 +
 tests/tcg/i386/Makefile.target          |   4 +
 tests/tcg/ppc64/Makefile.target         |   1 +
 tests/tcg/riscv64/Makefile.target       |   4 +
 9 files changed, 514 insertions(+)
 create mode 100644 tests/tcg/aarch64/test-aes.c
 create mode 100644 tests/tcg/i386/test-aes.c
 create mode 100644 tests/tcg/ppc64/test-aes.c
 create mode 100644 tests/tcg/riscv64/test-aes.c
 create mode 100644 tests/tcg/multiarch/test-aes-main.c.inc

diff --git a/tests/tcg/aarch64/test-aes.c b/tests/tcg/aarch64/test-aes.c
new file mode 100644
index 0000000000..2cd324f09b
--- /dev/null
+++ b/tests/tcg/aarch64/test-aes.c
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "../multiarch/test-aes-main.c.inc"
+
+bool test_SB_SR(uint8_t *o, const uint8_t *i)
+{
+    /* aese also adds round key, so supply zero. */
+    asm("ld1 { v0.16b }, [%1]\n\t"
+        "movi v1.16b, #0\n\t"
+        "aese v0.16b, v1.16b\n\t"
+        "st1 { v0.16b }, [%0]"
+        : : "r"(o), "r"(i) : "v0", "v1", "memory");
+    return true;
+}
+
+bool test_MC(uint8_t *o, const uint8_t *i)
+{
+    asm("ld1 { v0.16b }, [%1]\n\t"
+        "aesmc v0.16b, v0.16b\n\t"
+        "st1 { v0.16b }, [%0]"
+        : : "r"(o), "r"(i) : "v0", "memory");
+    return true;
+}
+
+bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
+
+bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
+{
+    /* aesd also adds round key, so supply zero. */
+    asm("ld1 { v0.16b }, [%1]\n\t"
+        "movi v1.16b, #0\n\t"
+        "aesd v0.16b, v1.16b\n\t"
+        "st1 { v0.16b }, [%0]"
+        : : "r"(o), "r"(i) : "v0", "v1", "memory");
+    return true;
+}
+
+bool test_IMC(uint8_t *o, const uint8_t *i)
+{
+    asm("ld1 { v0.16b }, [%1]\n\t"
+        "aesimc v0.16b, v0.16b\n\t"
+        "st1 { v0.16b }, [%0]"
+        : : "r"(o), "r"(i) : "v0", "memory");
+    return true;
+}
+
+bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
+
+bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
diff --git a/tests/tcg/i386/test-aes.c b/tests/tcg/i386/test-aes.c
new file mode 100644
index 0000000000..199395e6cc
--- /dev/null
+++ b/tests/tcg/i386/test-aes.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "../multiarch/test-aes-main.c.inc"
+#include <immintrin.h>
+
+static bool test_SB_SR(uint8_t *o, const uint8_t *i)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+
+    /* aesenclast also adds round key, so supply zero. */
+    vi = _mm_aesenclast_si128(vi, _mm_setzero_si128());
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
+
+static bool test_MC(uint8_t *o, const uint8_t *i)
+{
+    return false;
+}
+
+static bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+    __m128i vk = _mm_loadu_si128((const __m128i_u *)k);
+
+    vi = _mm_aesenc_si128(vi, vk);
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
+
+static bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+
+    /* aesdeclast also adds round key, so supply zero. */
+    vi = _mm_aesdeclast_si128(vi, _mm_setzero_si128());
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
+
+static bool test_IMC(uint8_t *o, const uint8_t *i)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+
+    vi = _mm_aesimc_si128(vi);
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
+
+static bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
+
+static bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
+    __m128i vk = _mm_loadu_si128((const __m128i_u *)k);
+
+    vi = _mm_aesdec_si128(vi, vk);
+
+    _mm_storeu_si128((__m128i_u *)o, vi);
+    return true;
+}
diff --git a/tests/tcg/ppc64/test-aes.c b/tests/tcg/ppc64/test-aes.c
new file mode 100644
index 0000000000..1d2be488e9
--- /dev/null
+++ b/tests/tcg/ppc64/test-aes.c
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "../multiarch/test-aes-main.c.inc"
+
+#undef BIG_ENDIAN
+#define BIG_ENDIAN  (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+
+static unsigned char bswap_le[16] __attribute__((aligned(16))) = {
+    8,9,10,11,12,13,14,15,
+    0,1,2,3,4,5,6,7
+};
+
+bool test_SB_SR(uint8_t *o, const uint8_t *i)
+{
+    /* vcipherlast also adds round key, so supply zero. */
+    if (BIG_ENDIAN) {
+        asm("lxvd2x 32,0,%1\n\t"
+            "vspltisb 1,0\n\t"
+            "vcipherlast 0,0,1\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i) : "memory", "v0", "v1");
+    } else {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 34,0,%2\n\t"
+            "vspltisb 1,0\n\t"
+            "vperm 0,0,0,2\n\t"
+            "vcipherlast 0,0,1\n\t"
+            "vperm 0,0,0,2\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(bswap_le) : "memory", "v0", "v1", "v2");
+    }
+    return true;
+}
+
+bool test_MC(uint8_t *o, const uint8_t *i)
+{
+    return false;
+}
+
+bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    if (BIG_ENDIAN) {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 33,0,%2\n\t"
+            "vcipher 0,0,1\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(k) : "memory", "v0", "v1");
+    } else {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 33,0,%2\n\t"
+            "lxvd2x 34,0,%3\n\t"
+            "vperm 0,0,0,2\n\t"
+            "vperm 1,1,1,2\n\t"
+            "vcipher 0,0,1\n\t"
+            "vperm 0,0,0,2\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(k), "r"(bswap_le)
+              : "memory", "v0", "v1", "v2");
+    }
+    return true;
+}
+
+bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
+{
+    /* vcipherlast also adds round key, so supply zero. */
+    if (BIG_ENDIAN) {
+        asm("lxvd2x 32,0,%1\n\t"
+            "vspltisb 1,0\n\t"
+            "vncipherlast 0,0,1\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i) : "memory", "v0", "v1");
+    } else {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 34,0,%2\n\t"
+            "vspltisb 1,0\n\t"
+            "vperm 0,0,0,2\n\t"
+            "vncipherlast 0,0,1\n\t"
+            "vperm 0,0,0,2\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(bswap_le) : "memory", "v0", "v1", "v2");
+    }
+    return true;
+}
+
+bool test_IMC(uint8_t *o, const uint8_t *i)
+{
+    return false;
+}
+
+bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    if (BIG_ENDIAN) {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 33,0,%2\n\t"
+            "vncipher 0,0,1\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(k) : "memory", "v0", "v1");
+    } else {
+        asm("lxvd2x 32,0,%1\n\t"
+            "lxvd2x 33,0,%2\n\t"
+            "lxvd2x 34,0,%3\n\t"
+            "vperm 0,0,0,2\n\t"
+            "vperm 1,1,1,2\n\t"
+            "vncipher 0,0,1\n\t"
+            "vperm 0,0,0,2\n\t"
+            "stxvd2x 32,0,%0"
+            : : "r"(o), "r"(i), "r"(k), "r"(bswap_le)
+              : "memory", "v0", "v1", "v2");
+    }
+    return true;
+}
+
+bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
diff --git a/tests/tcg/riscv64/test-aes.c b/tests/tcg/riscv64/test-aes.c
new file mode 100644
index 0000000000..3d7ef0e33a
--- /dev/null
+++ b/tests/tcg/riscv64/test-aes.c
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "../multiarch/test-aes-main.c.inc"
+
+bool test_SB_SR(uint8_t *o, const uint8_t *i)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+
+    asm("aes64es %0,%2,%3\n\t"
+        "aes64es %1,%3,%2"
+        : "=&r"(o8[0]), "=&r"(o8[1]) : "r"(i8[0]), "r"(i8[1]));
+    return true;
+}
+
+bool test_MC(uint8_t *o, const uint8_t *i)
+{
+    return false;
+}
+
+bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+    const uint64_t *k8 = (const uint64_t *)k;
+
+    asm("aes64esm %0,%2,%3\n\t"
+        "aes64esm %1,%3,%2\n\t"
+        "xor %0,%0,%4\n\t"
+        "xor %1,%1,%5"
+        : "=&r"(o8[0]), "=&r"(o8[1])
+        : "r"(i8[0]), "r"(i8[1]), "r"(k8[0]), "r"(k8[1]));
+    return true;
+}
+
+bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+
+    asm("aes64ds %0,%2,%3\n\t"
+        "aes64ds %1,%3,%2"
+        : "=&r"(o8[0]), "=&r"(o8[1]) : "r"(i8[0]), "r"(i8[1]));
+    return true;
+}
+
+bool test_IMC(uint8_t *o, const uint8_t *i)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+
+    asm("aes64im %0,%0\n\t"
+        "aes64im %1,%1"
+        : "=r"(o8[0]), "=r"(o8[1]) : "0"(i8[0]), "1"(i8[1]));
+    return true;
+}
+
+bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    return false;
+}
+
+bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
+{
+    uint64_t *o8 = (uint64_t *)o;
+    const uint64_t *i8 = (const uint64_t *)i;
+    const uint64_t *k8 = (const uint64_t *)k;
+
+    asm("aes64dsm %0,%2,%3\n\t"
+        "aes64dsm %1,%3,%2\n\t"
+        "xor %0,%0,%4\n\t"
+        "xor %1,%1,%5"
+        : "=&r"(o8[0]), "=&r"(o8[1])
+        : "r"(i8[0]), "r"(i8[1]), "r"(k8[0]), "r"(k8[1]));
+    return true;
+}
diff --git a/tests/tcg/multiarch/test-aes-main.c.inc b/tests/tcg/multiarch/test-aes-main.c.inc
new file mode 100644
index 0000000000..0039f8ba55
--- /dev/null
+++ b/tests/tcg/multiarch/test-aes-main.c.inc
@@ -0,0 +1,183 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+
+static bool test_SB_SR(uint8_t *o, const uint8_t *i);
+static bool test_MC(uint8_t *o, const uint8_t *i);
+static bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k);
+
+static bool test_ISB_ISR(uint8_t *o, const uint8_t *i);
+static bool test_IMC(uint8_t *o, const uint8_t *i);
+static bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k);
+static bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k);
+
+/*
+ * From https://doi.org/10.6028/NIST.FIPS.197-upd1,
+ * Appendix B -- Cipher Example
+ *
+ * Note that the formatting of the 4x4 matrices in the document is
+ * column-major, whereas C is row-major.  Therefore to get the bytes
+ * in the same order as the text, the matrices are transposed.
+ *
+ * Note that we are not going to test SubBytes or ShiftRows separately,
+ * so the "After SubBytes" column is omitted, using only the combined
+ * result "After ShiftRows" column.
+ */
+
+/* Ease the inline assembly by aligning everything. */
+typedef struct {
+    uint8_t b[16] __attribute__((aligned(16)));
+} State;
+
+typedef struct {
+    State start, after_sr, after_mc, round_key;
+} Round;
+
+static const Round rounds[] = {
+    /* Round 1 */
+    { { { 0x19, 0x3d, 0xe3, 0xbe,       /* start */
+          0xa0, 0xf4, 0xe2, 0x2b,
+          0x9a, 0xc6, 0x8d, 0x2a,
+          0xe9, 0xf8, 0x48, 0x08, } },
+
+      { { 0xd4, 0xbf, 0x5d, 0x30,       /* after shiftrows */
+          0xe0, 0xb4, 0x52, 0xae,
+          0xb8, 0x41, 0x11, 0xf1,
+          0x1e, 0x27, 0x98, 0xe5, } },
+
+      { { 0x04, 0x66, 0x81, 0xe5,       /* after mixcolumns */
+          0xe0, 0xcb, 0x19, 0x9a,
+          0x48, 0xf8, 0xd3, 0x7a,
+          0x28, 0x06, 0x26, 0x4c, } },
+
+      { { 0xa0, 0xfa, 0xfe, 0x17,       /* round key */
+          0x88, 0x54, 0x2c, 0xb1,
+          0x23, 0xa3, 0x39, 0x39,
+          0x2a, 0x6c, 0x76, 0x05, } } },
+
+    /* Round 2 */
+    { { { 0xa4, 0x9c, 0x7f, 0xf2,       /* start */
+          0x68, 0x9f, 0x35, 0x2b,
+          0x6b, 0x5b, 0xea, 0x43,
+          0x02, 0x6a, 0x50, 0x49, } },
+
+      { { 0x49, 0xdb, 0x87, 0x3b,       /* after shiftrows */
+          0x45, 0x39, 0x53, 0x89,
+          0x7f, 0x02, 0xd2, 0xf1,
+          0x77, 0xde, 0x96, 0x1a, } },
+
+      { { 0x58, 0x4d, 0xca, 0xf1,       /* after mixcolumns */
+          0x1b, 0x4b, 0x5a, 0xac,
+          0xdb, 0xe7, 0xca, 0xa8,
+          0x1b, 0x6b, 0xb0, 0xe5, } },
+
+      { { 0xf2, 0xc2, 0x95, 0xf2,       /* round key */
+          0x7a, 0x96, 0xb9, 0x43,
+          0x59, 0x35, 0x80, 0x7a,
+          0x73, 0x59, 0xf6, 0x7f, } } },
+
+    /* Round 3 */
+    { { { 0xaa, 0x8f, 0x5f, 0x03,       /* start */
+          0x61, 0xdd, 0xe3, 0xef,
+          0x82, 0xd2, 0x4a, 0xd2,
+          0x68, 0x32, 0x46, 0x9a, } },
+
+      { { 0xac, 0xc1, 0xd6, 0xb8,       /* after shiftrows */
+          0xef, 0xb5, 0x5a, 0x7b,
+          0x13, 0x23, 0xcf, 0xdf,
+          0x45, 0x73, 0x11, 0xb5, } },
+
+      { { 0x75, 0xec, 0x09, 0x93,       /* after mixcolumns */
+          0x20, 0x0b, 0x63, 0x33,
+          0x53, 0xc0, 0xcf, 0x7c,
+          0xbb, 0x25, 0xd0, 0xdc, } },
+
+      { { 0x3d, 0x80, 0x47, 0x7d,       /* round key */
+          0x47, 0x16, 0xfe, 0x3e,
+          0x1e, 0x23, 0x7e, 0x44,
+          0x6d, 0x7a, 0x88, 0x3b, } } },
+};
+
+static void verify_log(const char *prefix, const State *s)
+{
+    printf("%s:", prefix);
+    for (int i = 0; i < sizeof(State); ++i) {
+        printf(" %02x", s->b[i]);
+    }
+    printf("\n");
+}
+
+static void verify(const State *ref, const State *tst, const char *which)
+{
+    if (!memcmp(ref, tst, sizeof(State))) {
+        return;
+    }
+
+    printf("Mismatch on %s\n", which);
+    verify_log("ref", ref);
+    verify_log("tst", tst);
+    exit(EXIT_FAILURE);
+}
+
+int main()
+{
+    int i, n = sizeof(rounds) / sizeof(Round);
+    State t;
+
+    for (i = 0; i < n; ++i) {
+        if (test_SB_SR(t.b, rounds[i].start.b)) {
+            verify(&rounds[i].after_sr, &t, "SB+SR");
+        }
+    }
+
+    for (i = 0; i < n; ++i) {
+        if (test_MC(t.b, rounds[i].after_sr.b)) {
+            verify(&rounds[i].after_mc, &t, "MC");
+        }
+    }
+
+    /* The kernel of Cipher(). */
+    for (i = 0; i < n - 1; ++i) {
+        if (test_SB_SR_MC_AK(t.b, rounds[i].start.b, rounds[i].round_key.b)) {
+            verify(&rounds[i + 1].start, &t, "SB+SR+MC+AK");
+        }
+    }
+
+    for (i = 0; i < n; ++i) {
+        if (test_ISB_ISR(t.b, rounds[i].after_sr.b)) {
+            verify(&rounds[i].start, &t, "ISB+ISR");
+        }
+    }
+
+    for (i = 0; i < n; ++i) {
+        if (test_IMC(t.b, rounds[i].after_mc.b)) {
+            verify(&rounds[i].after_sr, &t, "IMC");
+        }
+    }
+
+    /* The kernel of InvCipher(). */
+    for (i = n - 1; i > 0; --i) {
+        if (test_ISB_ISR_AK_IMC(t.b, rounds[i].after_sr.b,
+                                rounds[i - 1].round_key.b)) {
+            verify(&rounds[i - 1].after_sr, &t, "ISB+ISR+AK+IMC");
+        }
+    }
+
+    /*
+     * The kernel of EqInvCipher().  
+     * We must compute a different round key: apply InvMixColumns to
+     * the standard round key, per KeyExpansion vs KeyExpansionEIC.
+     */
+    for (i = 1; i < n; ++i) {
+        if (test_IMC(t.b, rounds[i - 1].round_key.b) &&
+            test_ISB_ISR_IMC_AK(t.b, rounds[i].after_sr.b, t.b)) {
+            verify(&rounds[i - 1].after_sr, &t, "ISB+ISR+IMC+AK");
+        }
+    }
+
+    return EXIT_SUCCESS;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index 3430fd3cd8..d217474d0d 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -74,6 +74,10 @@ endif
 AARCH64_TESTS += sve-ioctls
 sve-ioctls: CFLAGS+=-march=armv8.1-a+sve
 
+AARCH64_TESTS += test-aes
+test-aes: CFLAGS += -O -march=armv8-a+aes
+test-aes: test-aes-main.c.inc
+
 # Vector SHA1
 sha1-vector: CFLAGS=-O3
 sha1-vector: sha1.c
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index 821822ed0c..3ba61e3880 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -28,6 +28,10 @@ run-test-i386-bmi2: QEMU_OPTS += -cpu max
 test-i386-adcox: CFLAGS=-O2
 run-test-i386-adcox: QEMU_OPTS += -cpu max
 
+test-aes: CFLAGS += -O -msse2 -maes
+test-aes: test-aes-main.c.inc
+run-test-aes: QEMU_OPTS += -cpu max
+
 #
 # hello-i386 is a barebones app
 #
diff --git a/tests/tcg/ppc64/Makefile.target b/tests/tcg/ppc64/Makefile.target
index b084963b9a..5721c159f2 100644
--- a/tests/tcg/ppc64/Makefile.target
+++ b/tests/tcg/ppc64/Makefile.target
@@ -36,5 +36,6 @@ run-vector: QEMU_OPTS += -cpu POWER10
 
 PPC64_TESTS += signal_save_restore_xer
 PPC64_TESTS += xxspltw
+PPC64_TESTS += test-aes
 
 TESTS += $(PPC64_TESTS)
diff --git a/tests/tcg/riscv64/Makefile.target b/tests/tcg/riscv64/Makefile.target
index 9973ba3b5f..4002d14b9e 100644
--- a/tests/tcg/riscv64/Makefile.target
+++ b/tests/tcg/riscv64/Makefile.target
@@ -9,3 +9,7 @@ TESTS += noexec
 TESTS += test-noc
 test-noc: LDFLAGS = -nostdlib -static
 run-test-noc: QEMU_OPTS += -cpu rv64,c=false
+
+TESTS += test-aes
+test-aes: CFLAGS += -O -march=rv64gzk
+run-test-aes: QEMU_OPTS += -cpu rv64,zk=on
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 04/38] target/arm: Move aesmc and aesimc tables to crypto/aes.c
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (2 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 03/38] tests/multiarch: Add test-aes Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 16:49   ` Daniel P. Berrangé
  2023-06-09  2:23 ` [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
                   ` (33 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini,
	Philippe Mathieu-Daudé

We do not currently have a table in crypto/ for just MixColumns.
Move both tables for consistency.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/crypto/aes.h           |   6 ++
 crypto/aes.c                   | 140 ++++++++++++++++++++++++++++++++
 target/arm/tcg/crypto_helper.c | 143 ++-------------------------------
 3 files changed, 151 insertions(+), 138 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index 822d64588c..24b073d569 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -34,6 +34,12 @@ extern const uint8_t AES_isbox[256];
 extern const uint8_t AES_shifts[16];
 extern const uint8_t AES_ishifts[16];
 
+/* AES MixColumns, for use with rot32. */
+extern const uint32_t AES_mc_rot[256];
+
+/* AES InvMixColumns, for use with rot32. */
+extern const uint32_t AES_imc_rot[256];
+
 /* AES InvMixColumns */
 /* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
 /* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
diff --git a/crypto/aes.c b/crypto/aes.c
index af72ff7779..67bb74b8e3 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -116,6 +116,146 @@ const uint8_t AES_ishifts[16] = {
     0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3
 };
 
+/*
+ * MixColumns lookup table, for use with rot32.
+ */
+const uint32_t AES_mc_rot[256] = {
+    0x00000000, 0x03010102, 0x06020204, 0x05030306,
+    0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
+    0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
+    0x140c0c18, 0x170d0d1a, 0x120e0e1c, 0x110f0f1e,
+    0x30101020, 0x33111122, 0x36121224, 0x35131326,
+    0x3c141428, 0x3f15152a, 0x3a16162c, 0x3917172e,
+    0x28181830, 0x2b191932, 0x2e1a1a34, 0x2d1b1b36,
+    0x241c1c38, 0x271d1d3a, 0x221e1e3c, 0x211f1f3e,
+    0x60202040, 0x63212142, 0x66222244, 0x65232346,
+    0x6c242448, 0x6f25254a, 0x6a26264c, 0x6927274e,
+    0x78282850, 0x7b292952, 0x7e2a2a54, 0x7d2b2b56,
+    0x742c2c58, 0x772d2d5a, 0x722e2e5c, 0x712f2f5e,
+    0x50303060, 0x53313162, 0x56323264, 0x55333366,
+    0x5c343468, 0x5f35356a, 0x5a36366c, 0x5937376e,
+    0x48383870, 0x4b393972, 0x4e3a3a74, 0x4d3b3b76,
+    0x443c3c78, 0x473d3d7a, 0x423e3e7c, 0x413f3f7e,
+    0xc0404080, 0xc3414182, 0xc6424284, 0xc5434386,
+    0xcc444488, 0xcf45458a, 0xca46468c, 0xc947478e,
+    0xd8484890, 0xdb494992, 0xde4a4a94, 0xdd4b4b96,
+    0xd44c4c98, 0xd74d4d9a, 0xd24e4e9c, 0xd14f4f9e,
+    0xf05050a0, 0xf35151a2, 0xf65252a4, 0xf55353a6,
+    0xfc5454a8, 0xff5555aa, 0xfa5656ac, 0xf95757ae,
+    0xe85858b0, 0xeb5959b2, 0xee5a5ab4, 0xed5b5bb6,
+    0xe45c5cb8, 0xe75d5dba, 0xe25e5ebc, 0xe15f5fbe,
+    0xa06060c0, 0xa36161c2, 0xa66262c4, 0xa56363c6,
+    0xac6464c8, 0xaf6565ca, 0xaa6666cc, 0xa96767ce,
+    0xb86868d0, 0xbb6969d2, 0xbe6a6ad4, 0xbd6b6bd6,
+    0xb46c6cd8, 0xb76d6dda, 0xb26e6edc, 0xb16f6fde,
+    0x907070e0, 0x937171e2, 0x967272e4, 0x957373e6,
+    0x9c7474e8, 0x9f7575ea, 0x9a7676ec, 0x997777ee,
+    0x887878f0, 0x8b7979f2, 0x8e7a7af4, 0x8d7b7bf6,
+    0x847c7cf8, 0x877d7dfa, 0x827e7efc, 0x817f7ffe,
+    0x9b80801b, 0x98818119, 0x9d82821f, 0x9e83831d,
+    0x97848413, 0x94858511, 0x91868617, 0x92878715,
+    0x8388880b, 0x80898909, 0x858a8a0f, 0x868b8b0d,
+    0x8f8c8c03, 0x8c8d8d01, 0x898e8e07, 0x8a8f8f05,
+    0xab90903b, 0xa8919139, 0xad92923f, 0xae93933d,
+    0xa7949433, 0xa4959531, 0xa1969637, 0xa2979735,
+    0xb398982b, 0xb0999929, 0xb59a9a2f, 0xb69b9b2d,
+    0xbf9c9c23, 0xbc9d9d21, 0xb99e9e27, 0xba9f9f25,
+    0xfba0a05b, 0xf8a1a159, 0xfda2a25f, 0xfea3a35d,
+    0xf7a4a453, 0xf4a5a551, 0xf1a6a657, 0xf2a7a755,
+    0xe3a8a84b, 0xe0a9a949, 0xe5aaaa4f, 0xe6abab4d,
+    0xefacac43, 0xecadad41, 0xe9aeae47, 0xeaafaf45,
+    0xcbb0b07b, 0xc8b1b179, 0xcdb2b27f, 0xceb3b37d,
+    0xc7b4b473, 0xc4b5b571, 0xc1b6b677, 0xc2b7b775,
+    0xd3b8b86b, 0xd0b9b969, 0xd5baba6f, 0xd6bbbb6d,
+    0xdfbcbc63, 0xdcbdbd61, 0xd9bebe67, 0xdabfbf65,
+    0x5bc0c09b, 0x58c1c199, 0x5dc2c29f, 0x5ec3c39d,
+    0x57c4c493, 0x54c5c591, 0x51c6c697, 0x52c7c795,
+    0x43c8c88b, 0x40c9c989, 0x45caca8f, 0x46cbcb8d,
+    0x4fcccc83, 0x4ccdcd81, 0x49cece87, 0x4acfcf85,
+    0x6bd0d0bb, 0x68d1d1b9, 0x6dd2d2bf, 0x6ed3d3bd,
+    0x67d4d4b3, 0x64d5d5b1, 0x61d6d6b7, 0x62d7d7b5,
+    0x73d8d8ab, 0x70d9d9a9, 0x75dadaaf, 0x76dbdbad,
+    0x7fdcdca3, 0x7cdddda1, 0x79dedea7, 0x7adfdfa5,
+    0x3be0e0db, 0x38e1e1d9, 0x3de2e2df, 0x3ee3e3dd,
+    0x37e4e4d3, 0x34e5e5d1, 0x31e6e6d7, 0x32e7e7d5,
+    0x23e8e8cb, 0x20e9e9c9, 0x25eaeacf, 0x26ebebcd,
+    0x2fececc3, 0x2cededc1, 0x29eeeec7, 0x2aefefc5,
+    0x0bf0f0fb, 0x08f1f1f9, 0x0df2f2ff, 0x0ef3f3fd,
+    0x07f4f4f3, 0x04f5f5f1, 0x01f6f6f7, 0x02f7f7f5,
+    0x13f8f8eb, 0x10f9f9e9, 0x15fafaef, 0x16fbfbed,
+    0x1ffcfce3, 0x1cfdfde1, 0x19fefee7, 0x1affffe5,
+};
+
+/*
+ * Inverse MixColumns lookup table, for use with rot32.
+ */
+const uint32_t AES_imc_rot[256] = {
+    0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
+    0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
+    0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
+    0x745c6c48, 0x7f516546, 0x62467e54, 0x694b775a,
+    0xb0d090e0, 0xbbdd99ee, 0xa6ca82fc, 0xadc78bf2,
+    0x9ce4b4d8, 0x97e9bdd6, 0x8afea6c4, 0x81f3afca,
+    0xe8b8d890, 0xe3b5d19e, 0xfea2ca8c, 0xf5afc382,
+    0xc48cfca8, 0xcf81f5a6, 0xd296eeb4, 0xd99be7ba,
+    0x7bbb3bdb, 0x70b632d5, 0x6da129c7, 0x66ac20c9,
+    0x578f1fe3, 0x5c8216ed, 0x41950dff, 0x4a9804f1,
+    0x23d373ab, 0x28de7aa5, 0x35c961b7, 0x3ec468b9,
+    0x0fe75793, 0x04ea5e9d, 0x19fd458f, 0x12f04c81,
+    0xcb6bab3b, 0xc066a235, 0xdd71b927, 0xd67cb029,
+    0xe75f8f03, 0xec52860d, 0xf1459d1f, 0xfa489411,
+    0x9303e34b, 0x980eea45, 0x8519f157, 0x8e14f859,
+    0xbf37c773, 0xb43ace7d, 0xa92dd56f, 0xa220dc61,
+    0xf66d76ad, 0xfd607fa3, 0xe07764b1, 0xeb7a6dbf,
+    0xda595295, 0xd1545b9b, 0xcc434089, 0xc74e4987,
+    0xae053edd, 0xa50837d3, 0xb81f2cc1, 0xb31225cf,
+    0x82311ae5, 0x893c13eb, 0x942b08f9, 0x9f2601f7,
+    0x46bde64d, 0x4db0ef43, 0x50a7f451, 0x5baafd5f,
+    0x6a89c275, 0x6184cb7b, 0x7c93d069, 0x779ed967,
+    0x1ed5ae3d, 0x15d8a733, 0x08cfbc21, 0x03c2b52f,
+    0x32e18a05, 0x39ec830b, 0x24fb9819, 0x2ff69117,
+    0x8dd64d76, 0x86db4478, 0x9bcc5f6a, 0x90c15664,
+    0xa1e2694e, 0xaaef6040, 0xb7f87b52, 0xbcf5725c,
+    0xd5be0506, 0xdeb30c08, 0xc3a4171a, 0xc8a91e14,
+    0xf98a213e, 0xf2872830, 0xef903322, 0xe49d3a2c,
+    0x3d06dd96, 0x360bd498, 0x2b1ccf8a, 0x2011c684,
+    0x1132f9ae, 0x1a3ff0a0, 0x0728ebb2, 0x0c25e2bc,
+    0x656e95e6, 0x6e639ce8, 0x737487fa, 0x78798ef4,
+    0x495ab1de, 0x4257b8d0, 0x5f40a3c2, 0x544daacc,
+    0xf7daec41, 0xfcd7e54f, 0xe1c0fe5d, 0xeacdf753,
+    0xdbeec879, 0xd0e3c177, 0xcdf4da65, 0xc6f9d36b,
+    0xafb2a431, 0xa4bfad3f, 0xb9a8b62d, 0xb2a5bf23,
+    0x83868009, 0x888b8907, 0x959c9215, 0x9e919b1b,
+    0x470a7ca1, 0x4c0775af, 0x51106ebd, 0x5a1d67b3,
+    0x6b3e5899, 0x60335197, 0x7d244a85, 0x7629438b,
+    0x1f6234d1, 0x146f3ddf, 0x097826cd, 0x02752fc3,
+    0x335610e9, 0x385b19e7, 0x254c02f5, 0x2e410bfb,
+    0x8c61d79a, 0x876cde94, 0x9a7bc586, 0x9176cc88,
+    0xa055f3a2, 0xab58faac, 0xb64fe1be, 0xbd42e8b0,
+    0xd4099fea, 0xdf0496e4, 0xc2138df6, 0xc91e84f8,
+    0xf83dbbd2, 0xf330b2dc, 0xee27a9ce, 0xe52aa0c0,
+    0x3cb1477a, 0x37bc4e74, 0x2aab5566, 0x21a65c68,
+    0x10856342, 0x1b886a4c, 0x069f715e, 0x0d927850,
+    0x64d90f0a, 0x6fd40604, 0x72c31d16, 0x79ce1418,
+    0x48ed2b32, 0x43e0223c, 0x5ef7392e, 0x55fa3020,
+    0x01b79aec, 0x0aba93e2, 0x17ad88f0, 0x1ca081fe,
+    0x2d83bed4, 0x268eb7da, 0x3b99acc8, 0x3094a5c6,
+    0x59dfd29c, 0x52d2db92, 0x4fc5c080, 0x44c8c98e,
+    0x75ebf6a4, 0x7ee6ffaa, 0x63f1e4b8, 0x68fcedb6,
+    0xb1670a0c, 0xba6a0302, 0xa77d1810, 0xac70111e,
+    0x9d532e34, 0x965e273a, 0x8b493c28, 0x80443526,
+    0xe90f427c, 0xe2024b72, 0xff155060, 0xf418596e,
+    0xc53b6644, 0xce366f4a, 0xd3217458, 0xd82c7d56,
+    0x7a0ca137, 0x7101a839, 0x6c16b32b, 0x671bba25,
+    0x5638850f, 0x5d358c01, 0x40229713, 0x4b2f9e1d,
+    0x2264e947, 0x2969e049, 0x347efb5b, 0x3f73f255,
+    0x0e50cd7f, 0x055dc471, 0x184adf63, 0x1347d66d,
+    0xcadc31d7, 0xc1d138d9, 0xdcc623cb, 0xd7cb2ac5,
+    0xe6e815ef, 0xede51ce1, 0xf0f207f3, 0xfbff0efd,
+    0x92b479a7, 0x99b970a9, 0x84ae6bbb, 0x8fa362b5,
+    0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
+};
+
 /* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
 /* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
 /* AES_imc[x][2] = [x].[0d, 0b, 0e, 09]; */
diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index d28690321f..06254939d2 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -80,149 +80,16 @@ void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
 
 static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, bool decrypt)
 {
-    static uint32_t const mc[][256] = { {
-        /* MixColumns lookup table */
-        0x00000000, 0x03010102, 0x06020204, 0x05030306,
-        0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
-        0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
-        0x140c0c18, 0x170d0d1a, 0x120e0e1c, 0x110f0f1e,
-        0x30101020, 0x33111122, 0x36121224, 0x35131326,
-        0x3c141428, 0x3f15152a, 0x3a16162c, 0x3917172e,
-        0x28181830, 0x2b191932, 0x2e1a1a34, 0x2d1b1b36,
-        0x241c1c38, 0x271d1d3a, 0x221e1e3c, 0x211f1f3e,
-        0x60202040, 0x63212142, 0x66222244, 0x65232346,
-        0x6c242448, 0x6f25254a, 0x6a26264c, 0x6927274e,
-        0x78282850, 0x7b292952, 0x7e2a2a54, 0x7d2b2b56,
-        0x742c2c58, 0x772d2d5a, 0x722e2e5c, 0x712f2f5e,
-        0x50303060, 0x53313162, 0x56323264, 0x55333366,
-        0x5c343468, 0x5f35356a, 0x5a36366c, 0x5937376e,
-        0x48383870, 0x4b393972, 0x4e3a3a74, 0x4d3b3b76,
-        0x443c3c78, 0x473d3d7a, 0x423e3e7c, 0x413f3f7e,
-        0xc0404080, 0xc3414182, 0xc6424284, 0xc5434386,
-        0xcc444488, 0xcf45458a, 0xca46468c, 0xc947478e,
-        0xd8484890, 0xdb494992, 0xde4a4a94, 0xdd4b4b96,
-        0xd44c4c98, 0xd74d4d9a, 0xd24e4e9c, 0xd14f4f9e,
-        0xf05050a0, 0xf35151a2, 0xf65252a4, 0xf55353a6,
-        0xfc5454a8, 0xff5555aa, 0xfa5656ac, 0xf95757ae,
-        0xe85858b0, 0xeb5959b2, 0xee5a5ab4, 0xed5b5bb6,
-        0xe45c5cb8, 0xe75d5dba, 0xe25e5ebc, 0xe15f5fbe,
-        0xa06060c0, 0xa36161c2, 0xa66262c4, 0xa56363c6,
-        0xac6464c8, 0xaf6565ca, 0xaa6666cc, 0xa96767ce,
-        0xb86868d0, 0xbb6969d2, 0xbe6a6ad4, 0xbd6b6bd6,
-        0xb46c6cd8, 0xb76d6dda, 0xb26e6edc, 0xb16f6fde,
-        0x907070e0, 0x937171e2, 0x967272e4, 0x957373e6,
-        0x9c7474e8, 0x9f7575ea, 0x9a7676ec, 0x997777ee,
-        0x887878f0, 0x8b7979f2, 0x8e7a7af4, 0x8d7b7bf6,
-        0x847c7cf8, 0x877d7dfa, 0x827e7efc, 0x817f7ffe,
-        0x9b80801b, 0x98818119, 0x9d82821f, 0x9e83831d,
-        0x97848413, 0x94858511, 0x91868617, 0x92878715,
-        0x8388880b, 0x80898909, 0x858a8a0f, 0x868b8b0d,
-        0x8f8c8c03, 0x8c8d8d01, 0x898e8e07, 0x8a8f8f05,
-        0xab90903b, 0xa8919139, 0xad92923f, 0xae93933d,
-        0xa7949433, 0xa4959531, 0xa1969637, 0xa2979735,
-        0xb398982b, 0xb0999929, 0xb59a9a2f, 0xb69b9b2d,
-        0xbf9c9c23, 0xbc9d9d21, 0xb99e9e27, 0xba9f9f25,
-        0xfba0a05b, 0xf8a1a159, 0xfda2a25f, 0xfea3a35d,
-        0xf7a4a453, 0xf4a5a551, 0xf1a6a657, 0xf2a7a755,
-        0xe3a8a84b, 0xe0a9a949, 0xe5aaaa4f, 0xe6abab4d,
-        0xefacac43, 0xecadad41, 0xe9aeae47, 0xeaafaf45,
-        0xcbb0b07b, 0xc8b1b179, 0xcdb2b27f, 0xceb3b37d,
-        0xc7b4b473, 0xc4b5b571, 0xc1b6b677, 0xc2b7b775,
-        0xd3b8b86b, 0xd0b9b969, 0xd5baba6f, 0xd6bbbb6d,
-        0xdfbcbc63, 0xdcbdbd61, 0xd9bebe67, 0xdabfbf65,
-        0x5bc0c09b, 0x58c1c199, 0x5dc2c29f, 0x5ec3c39d,
-        0x57c4c493, 0x54c5c591, 0x51c6c697, 0x52c7c795,
-        0x43c8c88b, 0x40c9c989, 0x45caca8f, 0x46cbcb8d,
-        0x4fcccc83, 0x4ccdcd81, 0x49cece87, 0x4acfcf85,
-        0x6bd0d0bb, 0x68d1d1b9, 0x6dd2d2bf, 0x6ed3d3bd,
-        0x67d4d4b3, 0x64d5d5b1, 0x61d6d6b7, 0x62d7d7b5,
-        0x73d8d8ab, 0x70d9d9a9, 0x75dadaaf, 0x76dbdbad,
-        0x7fdcdca3, 0x7cdddda1, 0x79dedea7, 0x7adfdfa5,
-        0x3be0e0db, 0x38e1e1d9, 0x3de2e2df, 0x3ee3e3dd,
-        0x37e4e4d3, 0x34e5e5d1, 0x31e6e6d7, 0x32e7e7d5,
-        0x23e8e8cb, 0x20e9e9c9, 0x25eaeacf, 0x26ebebcd,
-        0x2fececc3, 0x2cededc1, 0x29eeeec7, 0x2aefefc5,
-        0x0bf0f0fb, 0x08f1f1f9, 0x0df2f2ff, 0x0ef3f3fd,
-        0x07f4f4f3, 0x04f5f5f1, 0x01f6f6f7, 0x02f7f7f5,
-        0x13f8f8eb, 0x10f9f9e9, 0x15fafaef, 0x16fbfbed,
-        0x1ffcfce3, 0x1cfdfde1, 0x19fefee7, 0x1affffe5,
-    }, {
-        /* Inverse MixColumns lookup table */
-        0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
-        0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
-        0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
-        0x745c6c48, 0x7f516546, 0x62467e54, 0x694b775a,
-        0xb0d090e0, 0xbbdd99ee, 0xa6ca82fc, 0xadc78bf2,
-        0x9ce4b4d8, 0x97e9bdd6, 0x8afea6c4, 0x81f3afca,
-        0xe8b8d890, 0xe3b5d19e, 0xfea2ca8c, 0xf5afc382,
-        0xc48cfca8, 0xcf81f5a6, 0xd296eeb4, 0xd99be7ba,
-        0x7bbb3bdb, 0x70b632d5, 0x6da129c7, 0x66ac20c9,
-        0x578f1fe3, 0x5c8216ed, 0x41950dff, 0x4a9804f1,
-        0x23d373ab, 0x28de7aa5, 0x35c961b7, 0x3ec468b9,
-        0x0fe75793, 0x04ea5e9d, 0x19fd458f, 0x12f04c81,
-        0xcb6bab3b, 0xc066a235, 0xdd71b927, 0xd67cb029,
-        0xe75f8f03, 0xec52860d, 0xf1459d1f, 0xfa489411,
-        0x9303e34b, 0x980eea45, 0x8519f157, 0x8e14f859,
-        0xbf37c773, 0xb43ace7d, 0xa92dd56f, 0xa220dc61,
-        0xf66d76ad, 0xfd607fa3, 0xe07764b1, 0xeb7a6dbf,
-        0xda595295, 0xd1545b9b, 0xcc434089, 0xc74e4987,
-        0xae053edd, 0xa50837d3, 0xb81f2cc1, 0xb31225cf,
-        0x82311ae5, 0x893c13eb, 0x942b08f9, 0x9f2601f7,
-        0x46bde64d, 0x4db0ef43, 0x50a7f451, 0x5baafd5f,
-        0x6a89c275, 0x6184cb7b, 0x7c93d069, 0x779ed967,
-        0x1ed5ae3d, 0x15d8a733, 0x08cfbc21, 0x03c2b52f,
-        0x32e18a05, 0x39ec830b, 0x24fb9819, 0x2ff69117,
-        0x8dd64d76, 0x86db4478, 0x9bcc5f6a, 0x90c15664,
-        0xa1e2694e, 0xaaef6040, 0xb7f87b52, 0xbcf5725c,
-        0xd5be0506, 0xdeb30c08, 0xc3a4171a, 0xc8a91e14,
-        0xf98a213e, 0xf2872830, 0xef903322, 0xe49d3a2c,
-        0x3d06dd96, 0x360bd498, 0x2b1ccf8a, 0x2011c684,
-        0x1132f9ae, 0x1a3ff0a0, 0x0728ebb2, 0x0c25e2bc,
-        0x656e95e6, 0x6e639ce8, 0x737487fa, 0x78798ef4,
-        0x495ab1de, 0x4257b8d0, 0x5f40a3c2, 0x544daacc,
-        0xf7daec41, 0xfcd7e54f, 0xe1c0fe5d, 0xeacdf753,
-        0xdbeec879, 0xd0e3c177, 0xcdf4da65, 0xc6f9d36b,
-        0xafb2a431, 0xa4bfad3f, 0xb9a8b62d, 0xb2a5bf23,
-        0x83868009, 0x888b8907, 0x959c9215, 0x9e919b1b,
-        0x470a7ca1, 0x4c0775af, 0x51106ebd, 0x5a1d67b3,
-        0x6b3e5899, 0x60335197, 0x7d244a85, 0x7629438b,
-        0x1f6234d1, 0x146f3ddf, 0x097826cd, 0x02752fc3,
-        0x335610e9, 0x385b19e7, 0x254c02f5, 0x2e410bfb,
-        0x8c61d79a, 0x876cde94, 0x9a7bc586, 0x9176cc88,
-        0xa055f3a2, 0xab58faac, 0xb64fe1be, 0xbd42e8b0,
-        0xd4099fea, 0xdf0496e4, 0xc2138df6, 0xc91e84f8,
-        0xf83dbbd2, 0xf330b2dc, 0xee27a9ce, 0xe52aa0c0,
-        0x3cb1477a, 0x37bc4e74, 0x2aab5566, 0x21a65c68,
-        0x10856342, 0x1b886a4c, 0x069f715e, 0x0d927850,
-        0x64d90f0a, 0x6fd40604, 0x72c31d16, 0x79ce1418,
-        0x48ed2b32, 0x43e0223c, 0x5ef7392e, 0x55fa3020,
-        0x01b79aec, 0x0aba93e2, 0x17ad88f0, 0x1ca081fe,
-        0x2d83bed4, 0x268eb7da, 0x3b99acc8, 0x3094a5c6,
-        0x59dfd29c, 0x52d2db92, 0x4fc5c080, 0x44c8c98e,
-        0x75ebf6a4, 0x7ee6ffaa, 0x63f1e4b8, 0x68fcedb6,
-        0xb1670a0c, 0xba6a0302, 0xa77d1810, 0xac70111e,
-        0x9d532e34, 0x965e273a, 0x8b493c28, 0x80443526,
-        0xe90f427c, 0xe2024b72, 0xff155060, 0xf418596e,
-        0xc53b6644, 0xce366f4a, 0xd3217458, 0xd82c7d56,
-        0x7a0ca137, 0x7101a839, 0x6c16b32b, 0x671bba25,
-        0x5638850f, 0x5d358c01, 0x40229713, 0x4b2f9e1d,
-        0x2264e947, 0x2969e049, 0x347efb5b, 0x3f73f255,
-        0x0e50cd7f, 0x055dc471, 0x184adf63, 0x1347d66d,
-        0xcadc31d7, 0xc1d138d9, 0xdcc623cb, 0xd7cb2ac5,
-        0xe6e815ef, 0xede51ce1, 0xf0f207f3, 0xfbff0efd,
-        0x92b479a7, 0x99b970a9, 0x84ae6bbb, 0x8fa362b5,
-        0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
-    } };
-
     union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
+    const uint32_t *mc = decrypt ? AES_imc_rot : AES_mc_rot;
     int i;
 
     for (i = 0; i < 16; i += 4) {
         CR_ST_WORD(st, i >> 2) =
-            mc[decrypt][CR_ST_BYTE(st, i)] ^
-            rol32(mc[decrypt][CR_ST_BYTE(st, i + 1)], 8) ^
-            rol32(mc[decrypt][CR_ST_BYTE(st, i + 2)], 16) ^
-            rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
+            mc[CR_ST_BYTE(st, i)] ^
+            rol32(mc[CR_ST_BYTE(st, i + 1)], 8) ^
+            rol32(mc[CR_ST_BYTE(st, i + 2)], 16) ^
+            rol32(mc[CR_ST_BYTE(st, i + 3)], 24);
     }
 
     rd[0] = st.l[0];
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (3 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 04/38] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 15:41   ` Daniel P. Berrangé
  2023-06-29 10:21   ` Ard Biesheuvel
  2023-06-09  2:23 ` [PATCH v2 06/38] crypto: Add aesenc_SB_SR_AK Richard Henderson
                   ` (32 subsequent siblings)
  37 siblings, 2 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini,
	Philippe Mathieu-Daudé

These symbols will avoid the indirection through memory
when fully unrolling some new primitives.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 crypto/aes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/crypto/aes.c b/crypto/aes.c
index 67bb74b8e3..cdf937883d 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -108,12 +108,58 @@ const uint8_t AES_isbox[256] = {
     0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D,
 };
 
+/* AES ShiftRows, for complete unrolling. */
+enum {
+    AES_SH_0 = 0x0,
+    AES_SH_1 = 0x5,
+    AES_SH_2 = 0xa,
+    AES_SH_3 = 0xf,
+    AES_SH_4 = 0x4,
+    AES_SH_5 = 0x9,
+    AES_SH_6 = 0xe,
+    AES_SH_7 = 0x3,
+    AES_SH_8 = 0x8,
+    AES_SH_9 = 0xd,
+    AES_SH_A = 0x2,
+    AES_SH_B = 0x7,
+    AES_SH_C = 0xc,
+    AES_SH_D = 0x1,
+    AES_SH_E = 0x6,
+    AES_SH_F = 0xb,
+};
+
 const uint8_t AES_shifts[16] = {
-    0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11
+    AES_SH_0, AES_SH_1, AES_SH_2, AES_SH_3,
+    AES_SH_4, AES_SH_5, AES_SH_6, AES_SH_7,
+    AES_SH_8, AES_SH_9, AES_SH_A, AES_SH_B,
+    AES_SH_C, AES_SH_D, AES_SH_E, AES_SH_F,
+};
+
+/* AES InvShiftRows, for complete unrolling. */
+enum {
+    AES_ISH_0 = 0x0,
+    AES_ISH_1 = 0xd,
+    AES_ISH_2 = 0xa,
+    AES_ISH_3 = 0x7,
+    AES_ISH_4 = 0x4,
+    AES_ISH_5 = 0x1,
+    AES_ISH_6 = 0xe,
+    AES_ISH_7 = 0xb,
+    AES_ISH_8 = 0x8,
+    AES_ISH_9 = 0x5,
+    AES_ISH_A = 0x2,
+    AES_ISH_B = 0xf,
+    AES_ISH_C = 0xc,
+    AES_ISH_D = 0x9,
+    AES_ISH_E = 0x6,
+    AES_ISH_F = 0x3,
 };
 
 const uint8_t AES_ishifts[16] = {
-    0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3
+    AES_ISH_0, AES_ISH_1, AES_ISH_2, AES_ISH_3,
+    AES_ISH_4, AES_ISH_5, AES_ISH_6, AES_ISH_7,
+    AES_ISH_8, AES_ISH_9, AES_ISH_A, AES_ISH_B,
+    AES_ISH_C, AES_ISH_D, AES_ISH_E, AES_ISH_F,
 };
 
 /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 06/38] crypto: Add aesenc_SB_SR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (4 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 16:56   ` Daniel P. Berrangé
  2023-06-09  2:23 ` [PATCH v2 07/38] target/i386: Use aesenc_SB_SR_AK Richard Henderson
                   ` (31 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Start adding infrastructure for accelerating guest AES.
Begin with a SubBytes + ShiftRows + AddRoundKey primitive.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h | 16 ++++++++++
 include/crypto/aes-round.h            | 44 +++++++++++++++++++++++++++
 crypto/aes.c                          | 44 +++++++++++++++++++++++++++
 3 files changed, 104 insertions(+)
 create mode 100644 host/include/generic/host/aes-round.h
 create mode 100644 include/crypto/aes-round.h

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
new file mode 100644
index 0000000000..19c8505e2b
--- /dev/null
+++ b/host/include/generic/host/aes-round.h
@@ -0,0 +1,16 @@
+/*
+ * No host specific aes acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef GENERIC_HOST_AES_ROUND_H
+#define GENERIC_HOST_AES_ROUND_H
+
+#define HAVE_AES_ACCEL  false
+#define ATTR_AES_ACCEL
+
+void aesenc_SB_SR_AK_accel(AESState *, const AESState *,
+                           const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
+#endif
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
new file mode 100644
index 0000000000..15ea1f42bc
--- /dev/null
+++ b/include/crypto/aes-round.h
@@ -0,0 +1,44 @@
+/*
+ * AES round fragments, generic version
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef CRYPTO_AES_ROUND_H
+#define CRYPTO_AES_ROUND_H
+
+/* Hosts with acceleration will usually need a 16-byte vector type. */
+typedef uint8_t AESStateVec __attribute__((vector_size(16)));
+
+typedef union {
+    uint8_t b[16];
+    uint32_t w[4];
+    uint64_t d[4];
+    AESStateVec v;
+} AESState;
+
+#include "host/aes-round.h"
+
+/*
+ * Perform SubBytes + ShiftRows.
+ */
+
+void aesenc_SB_SR_AK_gen(AESState *ret, const AESState *st,
+                         const AESState *rk);
+void aesenc_SB_SR_AK_genrev(AESState *ret, const AESState *st,
+                            const AESState *rk);
+
+static inline void aesenc_SB_SR_AK(AESState *r, const AESState *st,
+                                   const AESState *rk, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesenc_SB_SR_AK_accel(r, st, rk, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesenc_SB_SR_AK_gen(r, st, rk);
+    } else {
+        aesenc_SB_SR_AK_genrev(r, st, rk);
+    }
+}
+
+#endif /* CRYPTO_AES_ROUND_H */
diff --git a/crypto/aes.c b/crypto/aes.c
index cdf937883d..896f6f44f1 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -29,6 +29,7 @@
  */
 #include "qemu/osdep.h"
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 
 typedef uint32_t u32;
 typedef uint8_t u8;
@@ -1249,6 +1250,49 @@ static const u32 rcon[] = {
         0x1B000000, 0x36000000, /* for 128-bit blocks, Rijndael never uses more than 10 rcon values */
 };
 
+/* Perform SubBytes + ShiftRows + AddRoundKey. */
+static inline void
+aesenc_SB_SR_AK_swap(AESState *ret, const AESState *st,
+                     const AESState *rk, bool swap)
+{
+    const int swap_b = swap ? 15 : 0;
+    AESState t;
+
+    t.b[swap_b ^ 0x0] = AES_sbox[st->b[swap_b ^ AES_SH_0]];
+    t.b[swap_b ^ 0x1] = AES_sbox[st->b[swap_b ^ AES_SH_1]];
+    t.b[swap_b ^ 0x2] = AES_sbox[st->b[swap_b ^ AES_SH_2]];
+    t.b[swap_b ^ 0x3] = AES_sbox[st->b[swap_b ^ AES_SH_3]];
+    t.b[swap_b ^ 0x4] = AES_sbox[st->b[swap_b ^ AES_SH_4]];
+    t.b[swap_b ^ 0x5] = AES_sbox[st->b[swap_b ^ AES_SH_5]];
+    t.b[swap_b ^ 0x6] = AES_sbox[st->b[swap_b ^ AES_SH_6]];
+    t.b[swap_b ^ 0x7] = AES_sbox[st->b[swap_b ^ AES_SH_7]];
+    t.b[swap_b ^ 0x8] = AES_sbox[st->b[swap_b ^ AES_SH_8]];
+    t.b[swap_b ^ 0x9] = AES_sbox[st->b[swap_b ^ AES_SH_9]];
+    t.b[swap_b ^ 0xa] = AES_sbox[st->b[swap_b ^ AES_SH_A]];
+    t.b[swap_b ^ 0xb] = AES_sbox[st->b[swap_b ^ AES_SH_B]];
+    t.b[swap_b ^ 0xc] = AES_sbox[st->b[swap_b ^ AES_SH_C]];
+    t.b[swap_b ^ 0xd] = AES_sbox[st->b[swap_b ^ AES_SH_D]];
+    t.b[swap_b ^ 0xe] = AES_sbox[st->b[swap_b ^ AES_SH_E]];
+    t.b[swap_b ^ 0xf] = AES_sbox[st->b[swap_b ^ AES_SH_F]];
+
+    /*
+     * Perform the AddRoundKey with generic vectors.
+     * This may be expanded to either host integer or host vector code.
+     * The key and output endianness match, so no bswap required.
+     */
+    ret->v = t.v ^ rk->v;
+}
+
+void aesenc_SB_SR_AK_gen(AESState *r, const AESState *s, const AESState *k)
+{
+    aesenc_SB_SR_AK_swap(r, s, k, false);
+}
+
+void aesenc_SB_SR_AK_genrev(AESState *r, const AESState *s, const AESState *k)
+{
+    aesenc_SB_SR_AK_swap(r, s, k, true);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 07/38] target/i386: Use aesenc_SB_SR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (5 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 06/38] crypto: Add aesenc_SB_SR_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 10:43   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 08/38] target/arm: Demultiplex AESE and AESMC Richard Henderson
                   ` (30 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESENCLAST instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index fb63af7afa..63fdecbe03 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -19,6 +19,7 @@
  */
 
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 
 #if SHIFT == 0
 #define Reg MMXReg
@@ -2202,12 +2203,12 @@ void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 
 void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 {
-    int i;
-    Reg st = *v;
-    Reg rk = *s;
+    for (int i = 0; i < SHIFT; i++) {
+        AESState *ad = (AESState *)&d->ZMM_X(i);
+        AESState *st = (AESState *)&v->ZMM_X(i);
+        AESState *rk = (AESState *)&s->ZMM_X(i);
 
-    for (i = 0; i < 8 << SHIFT; i++) {
-        d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i & 15] + (i & ~15))]);
+        aesenc_SB_SR_AK(ad, st, rk, false);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 08/38] target/arm: Demultiplex AESE and AESMC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (6 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 07/38] target/i386: Use aesenc_SB_SR_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 09/38] target/arm: Use aesenc_SB_SR_AK Richard Henderson
                   ` (29 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini,
	Philippe Mathieu-Daudé

Split these helpers so that we are not passing 'decrypt'
within the simd descriptor.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.h             |  2 ++
 target/arm/tcg/sve.decode       |  4 ++--
 target/arm/tcg/crypto_helper.c  | 37 +++++++++++++++++++++++----------
 target/arm/tcg/translate-a64.c  | 13 ++++--------
 target/arm/tcg/translate-neon.c |  4 ++--
 target/arm/tcg/translate-sve.c  |  8 ++++---
 6 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 3335c2b10b..95e32a697a 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -552,7 +552,9 @@ DEF_HELPER_FLAGS_2(neon_qzip16, TCG_CALL_NO_RWG, void, ptr, ptr)
 DEF_HELPER_FLAGS_2(neon_qzip32, TCG_CALL_NO_RWG, void, ptr, ptr)
 
 DEF_HELPER_FLAGS_4(crypto_aese, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(crypto_aesd, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(crypto_aesmc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(crypto_aesimc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(crypto_sha1su0, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(crypto_sha1c, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/tcg/sve.decode b/target/arm/tcg/sve.decode
index 14b3a69c36..04b6fcc0cf 100644
--- a/target/arm/tcg/sve.decode
+++ b/target/arm/tcg/sve.decode
@@ -1629,8 +1629,8 @@ STNT1_zprz      1110010 .. 10 ..... 001 ... ..... ..... \
 ### SVE2 Crypto Extensions
 
 # SVE2 crypto unary operations
-# AESMC and AESIMC
-AESMC           01000101 00 10000011100 decrypt:1 00000 rd:5
+AESMC           01000101 00 10000011100 0 00000 rd:5
+AESIMC          01000101 00 10000011100 1 00000 rd:5
 
 # SVE2 crypto destructive binary operations
 AESE            01000101 00 10001 0 11100 0 ..... .....  @rdn_rm_e0
diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index 06254939d2..75882d9ea3 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -45,11 +45,9 @@ static void clear_tail_16(void *vd, uint32_t desc)
     clear_tail(vd, opr_sz, max_sz);
 }
 
-static void do_crypto_aese(uint64_t *rd, uint64_t *rn,
-                           uint64_t *rm, bool decrypt)
+static void do_crypto_aese(uint64_t *rd, uint64_t *rn, uint64_t *rm,
+                           const uint8_t *sbox, const uint8_t *shift)
 {
-    static uint8_t const * const sbox[2] = { AES_sbox, AES_isbox };
-    static uint8_t const * const shift[2] = { AES_shifts, AES_ishifts };
     union CRYPTO_STATE rk = { .l = { rm[0], rm[1] } };
     union CRYPTO_STATE st = { .l = { rn[0], rn[1] } };
     int i;
@@ -60,7 +58,7 @@ static void do_crypto_aese(uint64_t *rd, uint64_t *rn,
 
     /* combine ShiftRows operation and sbox substitution */
     for (i = 0; i < 16; i++) {
-        CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])];
+        CR_ST_BYTE(st, i) = sbox[CR_ST_BYTE(rk, shift[i])];
     }
 
     rd[0] = st.l[0];
@@ -70,18 +68,26 @@ static void do_crypto_aese(uint64_t *rd, uint64_t *rn,
 void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc);
-    bool decrypt = simd_data(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aese(vd + i, vn + i, vm + i, decrypt);
+        do_crypto_aese(vd + i, vn + i, vm + i, AES_sbox, AES_shifts);
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
 
-static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, bool decrypt)
+void HELPER(crypto_aesd)(void *vd, void *vn, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+
+    for (i = 0; i < opr_sz; i += 16) {
+        do_crypto_aese(vd + i, vn + i, vm + i, AES_isbox, AES_ishifts);
+    }
+    clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
+
+static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, const uint32_t *mc)
 {
     union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
-    const uint32_t *mc = decrypt ? AES_imc_rot : AES_mc_rot;
     int i;
 
     for (i = 0; i < 16; i += 4) {
@@ -99,10 +105,19 @@ static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, bool decrypt)
 void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc);
-    bool decrypt = simd_data(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aesmc(vd + i, vm + i, decrypt);
+        do_crypto_aesmc(vd + i, vm + i, AES_mc_rot);
+    }
+    clear_tail(vd, opr_sz, simd_maxsz(desc));
+}
+
+void HELPER(crypto_aesimc)(void *vd, void *vm, uint32_t desc)
+{
+    intptr_t i, opr_sz = simd_oprsz(desc);
+
+    for (i = 0; i < opr_sz; i += 16) {
+        do_crypto_aesmc(vd + i, vm + i, AES_imc_rot);
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index aa93f37e21..8b7337ad01 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -13566,7 +13566,6 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
     int opcode = extract32(insn, 12, 5);
     int rn = extract32(insn, 5, 5);
     int rd = extract32(insn, 0, 5);
-    int decrypt;
     gen_helper_gvec_2 *genfn2 = NULL;
     gen_helper_gvec_3 *genfn3 = NULL;
 
@@ -13577,20 +13576,16 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
 
     switch (opcode) {
     case 0x4: /* AESE */
-        decrypt = 0;
         genfn3 = gen_helper_crypto_aese;
         break;
     case 0x6: /* AESMC */
-        decrypt = 0;
         genfn2 = gen_helper_crypto_aesmc;
         break;
     case 0x5: /* AESD */
-        decrypt = 1;
-        genfn3 = gen_helper_crypto_aese;
+        genfn3 = gen_helper_crypto_aesd;
         break;
     case 0x7: /* AESIMC */
-        decrypt = 1;
-        genfn2 = gen_helper_crypto_aesmc;
+        genfn2 = gen_helper_crypto_aesimc;
         break;
     default:
         unallocated_encoding(s);
@@ -13601,9 +13596,9 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
         return;
     }
     if (genfn2) {
-        gen_gvec_op2_ool(s, true, rd, rn, decrypt, genfn2);
+        gen_gvec_op2_ool(s, true, rd, rn, 0, genfn2);
     } else {
-        gen_gvec_op3_ool(s, true, rd, rd, rn, decrypt, genfn3);
+        gen_gvec_op3_ool(s, true, rd, rd, rn, 0, genfn3);
     }
 }
 
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 03913de047..8de4ceb203 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -3451,9 +3451,9 @@ static bool trans_VMVN(DisasContext *s, arg_2misc *a)
     }
 
 WRAP_2M_3_OOL_FN(gen_AESE, gen_helper_crypto_aese, 0)
-WRAP_2M_3_OOL_FN(gen_AESD, gen_helper_crypto_aese, 1)
+WRAP_2M_3_OOL_FN(gen_AESD, gen_helper_crypto_aesd, 0)
 WRAP_2M_2_OOL_FN(gen_AESMC, gen_helper_crypto_aesmc, 0)
-WRAP_2M_2_OOL_FN(gen_AESIMC, gen_helper_crypto_aesmc, 1)
+WRAP_2M_2_OOL_FN(gen_AESIMC, gen_helper_crypto_aesimc, 0)
 WRAP_2M_2_OOL_FN(gen_SHA1H, gen_helper_crypto_sha1h, 0)
 WRAP_2M_2_OOL_FN(gen_SHA1SU1, gen_helper_crypto_sha1su1, 0)
 WRAP_2M_2_OOL_FN(gen_SHA256SU0, gen_helper_crypto_sha256su0, 0)
diff --git a/target/arm/tcg/translate-sve.c b/target/arm/tcg/translate-sve.c
index ff050626e6..b98f469cb1 100644
--- a/target/arm/tcg/translate-sve.c
+++ b/target/arm/tcg/translate-sve.c
@@ -7151,12 +7151,14 @@ TRANS_FEAT(USDOT_zzzz, aa64_sve_i8mm, gen_gvec_ool_arg_zzzz,
            a->esz == 2 ? gen_helper_gvec_usdot_b : NULL, a, 0)
 
 TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz,
-                        gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt)
+                        gen_helper_crypto_aesmc, a->rd, a->rd, 0)
+TRANS_FEAT_NONSTREAMING(AESIMC, aa64_sve2_aes, gen_gvec_ool_zz,
+                        gen_helper_crypto_aesimc, a->rd, a->rd, 0)
 
 TRANS_FEAT_NONSTREAMING(AESE, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
-                        gen_helper_crypto_aese, a, false)
+                        gen_helper_crypto_aese, a, 0)
 TRANS_FEAT_NONSTREAMING(AESD, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
-                        gen_helper_crypto_aese, a, true)
+                        gen_helper_crypto_aesd, a, 0)
 
 TRANS_FEAT_NONSTREAMING(SM4E, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
                         gen_helper_crypto_sm4e, a, 0)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 09/38] target/arm: Use aesenc_SB_SR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (7 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 08/38] target/arm: Demultiplex AESE and AESMC Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 10/38] target/ppc: " Richard Henderson
                   ` (28 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESE instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/crypto_helper.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index 75882d9ea3..00f3b21507 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -15,6 +15,7 @@
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 #include "crypto/sm4.h"
 #include "vec_internal.h"
 
@@ -45,6 +46,8 @@ static void clear_tail_16(void *vd, uint32_t desc)
     clear_tail(vd, opr_sz, max_sz);
 }
 
+static const AESState aes_zero = { };
+
 static void do_crypto_aese(uint64_t *rd, uint64_t *rn, uint64_t *rm,
                            const uint8_t *sbox, const uint8_t *shift)
 {
@@ -70,7 +73,26 @@ void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aese(vd + i, vn + i, vm + i, AES_sbox, AES_shifts);
+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vn + i);
+        AESState *rk = (AESState *)(vm + i);
+        AESState t;
+
+        /*
+         * Our uint64_t are in the wrong order for big-endian.
+         * The Arm AddRoundKey comes first, while the API AddRoundKey
+         * comes last: perform the xor here, and provide zero to API.
+         */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1] ^ rk->d[1];
+            t.d[1] = st->d[0] ^ rk->d[0];
+            aesenc_SB_SR_AK(&t, &t, &aes_zero, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            t.v = st->v ^ rk->v;
+            aesenc_SB_SR_AK(ad, &t, &aes_zero, false);
+        }
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 10/38] target/ppc: Use aesenc_SB_SR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (8 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 09/38] target/arm: Use aesenc_SB_SR_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-12 13:26   ` Daniel Henrique Barboza
  2023-06-19 10:47   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 11/38] target/riscv: " Richard Henderson
                   ` (27 subsequent siblings)
  37 siblings, 2 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the VCIPHERLAST instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index d97a7f1f28..34257e9d76 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -25,6 +25,7 @@
 #include "qemu/log.h"
 #include "exec/helper-proto.h"
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 #include "fpu/softfloat.h"
 #include "qapi/error.h"
 #include "qemu/guest-random.h"
@@ -2947,13 +2948,7 @@ void helper_vcipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
 void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    ppc_avr_t result;
-    int i;
-
-    VECTOR_FOR_INORDER_I(i, u8) {
-        result.VsrB(i) = b->VsrB(i) ^ (AES_sbox[a->VsrB(AES_shifts[i])]);
-    }
-    *r = result;
+    aesenc_SB_SR_AK((AESState *)r, (AESState *)a, (AESState *)b, true);
 }
 
 void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 11/38] target/riscv: Use aesenc_SB_SR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (9 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 10/38] target/ppc: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 12/38] crypto: Add aesdec_ISB_ISR_AK Richard Henderson
                   ` (26 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AES64ES instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 2ef30281b1..b072fed3e2 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -22,6 +22,7 @@
 #include "exec/exec-all.h"
 #include "exec/helper-proto.h"
 #include "crypto/aes.h"
+#include "crypto/aes-round.h"
 #include "crypto/sm4.h"
 
 #define AES_XTIME(a) \
@@ -136,6 +137,8 @@ target_ulong HELPER(aes32dsi)(target_ulong rs1, target_ulong rs2,
     AES_INVMIXBYTE(COL, 1, 2, 3, 0) << 8 | \
     AES_INVMIXBYTE(COL, 0, 1, 2, 3) << 0)
 
+static const AESState aes_zero = { };
+
 static inline target_ulong aes64_operation(target_ulong rs1, target_ulong rs2,
                                            bool enc, bool mix)
 {
@@ -200,7 +203,12 @@ target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2)
 
 target_ulong HELPER(aes64es)(target_ulong rs1, target_ulong rs2)
 {
-    return aes64_operation(rs1, rs2, true, false);
+    AESState t;
+
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = rs2;
+    aesenc_SB_SR_AK(&t, &t, &aes_zero, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(aes64ds)(target_ulong rs1, target_ulong rs2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 12/38] crypto: Add aesdec_ISB_ISR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (10 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 11/38] target/riscv: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 13/38] target/i386: Use aesdec_ISB_ISR_AK Richard Henderson
                   ` (25 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Add a primitive for InvSubBytes + InvShiftRows + AddRoundKey.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  4 +++
 include/crypto/aes-round.h            | 21 +++++++++++++
 crypto/aes.c                          | 43 +++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index 19c8505e2b..e8f6bb0b99 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -13,4 +13,8 @@ void aesenc_SB_SR_AK_accel(AESState *, const AESState *,
                            const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
+void aesdec_ISB_ISR_AK_accel(AESState *, const AESState *,
+                             const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
+
 #endif
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index 15ea1f42bc..56376cc83b 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -41,4 +41,25 @@ static inline void aesenc_SB_SR_AK(AESState *r, const AESState *st,
     }
 }
 
+/*
+ * Perform InvSubBytes + InvShiftRows.
+ */
+
+void aesdec_ISB_ISR_AK_gen(AESState *ret, const AESState *st,
+                           const AESState *rk);
+void aesdec_ISB_ISR_AK_genrev(AESState *ret, const AESState *st,
+                              const AESState *rk);
+
+static inline void aesdec_ISB_ISR_AK(AESState *r, const AESState *st,
+                                     const AESState *rk, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesdec_ISB_ISR_AK_accel(r, st, rk, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesdec_ISB_ISR_AK_gen(r, st, rk);
+    } else {
+        aesdec_ISB_ISR_AK_genrev(r, st, rk);
+    }
+}
+
 #endif /* CRYPTO_AES_ROUND_H */
diff --git a/crypto/aes.c b/crypto/aes.c
index 896f6f44f1..767930223c 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1293,6 +1293,49 @@ void aesenc_SB_SR_AK_genrev(AESState *r, const AESState *s, const AESState *k)
     aesenc_SB_SR_AK_swap(r, s, k, true);
 }
 
+/* Perform InvSubBytes + InvShiftRows. */
+static inline void
+aesdec_ISB_ISR_AK_swap(AESState *ret, const AESState *st,
+                       const AESState *rk, bool swap)
+{
+    const int swap_b = swap ? 15 : 0;
+    AESState t;
+
+    t.b[swap_b ^ 0x0] = AES_isbox[st->b[swap_b ^ AES_ISH_0]];
+    t.b[swap_b ^ 0x1] = AES_isbox[st->b[swap_b ^ AES_ISH_1]];
+    t.b[swap_b ^ 0x2] = AES_isbox[st->b[swap_b ^ AES_ISH_2]];
+    t.b[swap_b ^ 0x3] = AES_isbox[st->b[swap_b ^ AES_ISH_3]];
+    t.b[swap_b ^ 0x4] = AES_isbox[st->b[swap_b ^ AES_ISH_4]];
+    t.b[swap_b ^ 0x5] = AES_isbox[st->b[swap_b ^ AES_ISH_5]];
+    t.b[swap_b ^ 0x6] = AES_isbox[st->b[swap_b ^ AES_ISH_6]];
+    t.b[swap_b ^ 0x7] = AES_isbox[st->b[swap_b ^ AES_ISH_7]];
+    t.b[swap_b ^ 0x8] = AES_isbox[st->b[swap_b ^ AES_ISH_8]];
+    t.b[swap_b ^ 0x9] = AES_isbox[st->b[swap_b ^ AES_ISH_9]];
+    t.b[swap_b ^ 0xa] = AES_isbox[st->b[swap_b ^ AES_ISH_A]];
+    t.b[swap_b ^ 0xb] = AES_isbox[st->b[swap_b ^ AES_ISH_B]];
+    t.b[swap_b ^ 0xc] = AES_isbox[st->b[swap_b ^ AES_ISH_C]];
+    t.b[swap_b ^ 0xd] = AES_isbox[st->b[swap_b ^ AES_ISH_D]];
+    t.b[swap_b ^ 0xe] = AES_isbox[st->b[swap_b ^ AES_ISH_E]];
+    t.b[swap_b ^ 0xf] = AES_isbox[st->b[swap_b ^ AES_ISH_F]];
+
+    /*
+     * Perform the AddRoundKey with generic vectors.
+     * This may be expanded to either host integer or host vector code.
+     * The key and output endianness match, so no bswap required.
+     */
+    ret->v = t.v ^ rk->v;
+}
+
+void aesdec_ISB_ISR_AK_gen(AESState *r, const AESState *s, const AESState *k)
+{
+    aesdec_ISB_ISR_AK_swap(r, s, k, false);
+}
+
+void aesdec_ISB_ISR_AK_genrev(AESState *r, const AESState *s, const AESState *k)
+{
+    aesdec_ISB_ISR_AK_swap(r, s, k, true);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 13/38] target/i386: Use aesdec_ISB_ISR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (11 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 12/38] crypto: Add aesdec_ISB_ISR_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 10:51   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 14/38] target/arm: " Richard Henderson
                   ` (24 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESDECLAST instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 63fdecbe03..0a37bde595 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2177,12 +2177,12 @@ void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 
 void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 {
-    int i;
-    Reg st = *v;
-    Reg rk = *s;
+    for (int i = 0; i < SHIFT; i++) {
+        AESState *ad = (AESState *)&d->ZMM_X(i);
+        AESState *st = (AESState *)&v->ZMM_X(i);
+        AESState *rk = (AESState *)&s->ZMM_X(i);
 
-    for (i = 0; i < 8 << SHIFT; i++) {
-        d->B(i) = rk.B(i) ^ (AES_isbox[st.B(AES_ishifts[i & 15] + (i & ~15))]);
+        aesdec_ISB_ISR_AK(ad, st, rk, false);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 14/38] target/arm: Use aesdec_ISB_ISR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (12 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 13/38] target/i386: Use aesdec_ISB_ISR_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 15/38] target/ppc: " Richard Henderson
                   ` (23 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESD instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/crypto_helper.c | 37 +++++++++++++++-------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index 00f3b21507..d2cb74e7fc 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -48,26 +48,6 @@ static void clear_tail_16(void *vd, uint32_t desc)
 
 static const AESState aes_zero = { };
 
-static void do_crypto_aese(uint64_t *rd, uint64_t *rn, uint64_t *rm,
-                           const uint8_t *sbox, const uint8_t *shift)
-{
-    union CRYPTO_STATE rk = { .l = { rm[0], rm[1] } };
-    union CRYPTO_STATE st = { .l = { rn[0], rn[1] } };
-    int i;
-
-    /* xor state vector with round key */
-    rk.l[0] ^= st.l[0];
-    rk.l[1] ^= st.l[1];
-
-    /* combine ShiftRows operation and sbox substitution */
-    for (i = 0; i < 16; i++) {
-        CR_ST_BYTE(st, i) = sbox[CR_ST_BYTE(rk, shift[i])];
-    }
-
-    rd[0] = st.l[0];
-    rd[1] = st.l[1];
-}
-
 void HELPER(crypto_aese)(void *vd, void *vn, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc);
@@ -102,7 +82,22 @@ void HELPER(crypto_aesd)(void *vd, void *vn, void *vm, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aese(vd + i, vn + i, vm + i, AES_isbox, AES_ishifts);
+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vn + i);
+        AESState *rk = (AESState *)(vm + i);
+        AESState t;
+
+        /* Our uint64_t are in the wrong order for big-endian. */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1] ^ rk->d[1];
+            t.d[1] = st->d[0] ^ rk->d[0];
+            aesdec_ISB_ISR_AK(&t, &t, &aes_zero, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            t.v = st->v ^ rk->v;
+            aesdec_ISB_ISR_AK(ad, &t, &aes_zero, false);
+        }
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 15/38] target/ppc: Use aesdec_ISB_ISR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (13 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 14/38] target/arm: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-12 13:27   ` Daniel Henrique Barboza
  2023-06-19 10:51   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 16/38] target/riscv: " Richard Henderson
                   ` (22 subsequent siblings)
  37 siblings, 2 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the VNCIPHERLAST instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 34257e9d76..15f07fca2b 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2973,13 +2973,7 @@ void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
 void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    ppc_avr_t result;
-    int i;
-
-    VECTOR_FOR_INORDER_I(i, u8) {
-        result.VsrB(i) = b->VsrB(i) ^ (AES_isbox[a->VsrB(AES_ishifts[i])]);
-    }
-    *r = result;
+    aesdec_ISB_ISR_AK((AESState *)r, (AESState *)a, (AESState *)b, true);
 }
 
 void helper_vshasigmaw(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 16/38] target/riscv: Use aesdec_ISB_ISR_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (14 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 15/38] target/ppc: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 17/38] crypto: Add aesenc_MC Richard Henderson
                   ` (21 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AES64DS instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index b072fed3e2..e61f7fe1e5 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -213,7 +213,12 @@ target_ulong HELPER(aes64es)(target_ulong rs1, target_ulong rs2)
 
 target_ulong HELPER(aes64ds)(target_ulong rs1, target_ulong rs2)
 {
-    return aes64_operation(rs1, rs2, false, false);
+    AESState t;
+
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = rs2;
+    aesdec_ISB_ISR_AK(&t, &t, &aes_zero, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(aes64dsm)(target_ulong rs1, target_ulong rs2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 17/38] crypto: Add aesenc_MC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (15 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 16/38] target/riscv: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 18/38] target/arm: Use aesenc_MC Richard Henderson
                   ` (20 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Add a primitive for MixColumns.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  2 +
 include/crypto/aes-round.h            | 18 ++++++++
 crypto/aes.c                          | 59 +++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index e8f6bb0b99..b00e9b50b1 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -9,6 +9,8 @@
 #define HAVE_AES_ACCEL  false
 #define ATTR_AES_ACCEL
 
+void aesenc_MC_accel(AESState *, const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
 void aesenc_SB_SR_AK_accel(AESState *, const AESState *,
                            const AESState *, bool)
     QEMU_ERROR("unsupported accel");
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index 56376cc83b..9f263ca726 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -20,6 +20,24 @@ typedef union {
 
 #include "host/aes-round.h"
 
+/*
+ * Perform MixColumns.
+ */
+
+void aesenc_MC_gen(AESState *ret, const AESState *st);
+void aesenc_MC_genrev(AESState *ret, const AESState *st);
+
+static inline void aesenc_MC(AESState *r, const AESState *st, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesenc_MC_accel(r, st, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesenc_MC_gen(r, st);
+    } else {
+        aesenc_MC_genrev(r, st);
+    }
+}
+
 /*
  * Perform SubBytes + ShiftRows.
  */
diff --git a/crypto/aes.c b/crypto/aes.c
index 767930223c..89de8e8db4 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -28,6 +28,8 @@
  * EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
+#include "qemu/bitops.h"
 #include "crypto/aes.h"
 #include "crypto/aes-round.h"
 
@@ -1293,6 +1295,63 @@ void aesenc_SB_SR_AK_genrev(AESState *r, const AESState *s, const AESState *k)
     aesenc_SB_SR_AK_swap(r, s, k, true);
 }
 
+/* Perform MixColumns. */
+static inline void
+aesenc_MC_swap(AESState *r, const AESState *st, bool swap)
+{
+    int swap_b = swap * 0xf;
+    int swap_w = swap * 0x3;
+    bool be = HOST_BIG_ENDIAN ^ swap;
+    uint32_t t;
+
+    /* Note that AES_mc_rot is encoded for little-endian. */
+    t = (      AES_mc_rot[st->b[swap_b ^ 0x0]] ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x1]], 8) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x2]], 16) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x3]], 24));
+    if (be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 0] = t;
+
+    t = (      AES_mc_rot[st->b[swap_b ^ 0x4]] ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x5]], 8) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x6]], 16) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x7]], 24));
+    if (be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 1] = t;
+
+    t = (      AES_mc_rot[st->b[swap_b ^ 0x8]] ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0x9]], 8) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xA]], 16) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xB]], 24));
+    if (be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 2] = t;
+
+    t = (      AES_mc_rot[st->b[swap_b ^ 0xC]] ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xD]], 8) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xE]], 16) ^
+         rol32(AES_mc_rot[st->b[swap_b ^ 0xF]], 24));
+    if (be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 3] = t;
+}
+
+void aesenc_MC_gen(AESState *r, const AESState *st)
+{
+    aesenc_MC_swap(r, st, false);
+}
+
+void aesenc_MC_genrev(AESState *r, const AESState *st)
+{
+    aesenc_MC_swap(r, st, true);
+}
+
 /* Perform InvSubBytes + InvShiftRows. */
 static inline void
 aesdec_ISB_ISR_AK_swap(AESState *ret, const AESState *st,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 18/38] target/arm: Use aesenc_MC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (16 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 17/38] crypto: Add aesenc_MC Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 19/38] crypto: Add aesdec_IMC Richard Henderson
                   ` (19 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESMC instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/crypto_helper.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index d2cb74e7fc..1952aaac58 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -124,7 +124,20 @@ void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aesmc(vd + i, vm + i, AES_mc_rot);
+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vm + i);
+        AESState t;
+
+        /* Our uint64_t are in the wrong order for big-endian. */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1];
+            t.d[1] = st->d[0];
+            aesenc_MC(&t, &t, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            aesenc_MC(ad, st, false);
+        }
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 19/38] crypto: Add aesdec_IMC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (17 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 18/38] target/arm: Use aesenc_MC Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 20/38] target/i386: Use aesdec_IMC Richard Henderson
                   ` (18 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Add a primitive for InvMixColumns.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  2 +
 include/crypto/aes-round.h            | 18 +++++++++
 crypto/aes.c                          | 57 +++++++++++++++++++++++++++
 3 files changed, 77 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index b00e9b50b1..34068afe40 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -15,6 +15,8 @@ void aesenc_SB_SR_AK_accel(AESState *, const AESState *,
                            const AESState *, bool)
     QEMU_ERROR("unsupported accel");
 
+void aesdec_IMC_accel(AESState *, const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
 void aesdec_ISB_ISR_AK_accel(AESState *, const AESState *,
                              const AESState *, bool)
     QEMU_ERROR("unsupported accel");
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index 9f263ca726..b80d4de664 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -59,6 +59,24 @@ static inline void aesenc_SB_SR_AK(AESState *r, const AESState *st,
     }
 }
 
+/*
+ * Perform InvMixColumns.
+ */
+
+void aesdec_IMC_gen(AESState *ret, const AESState *st);
+void aesdec_IMC_genrev(AESState *ret, const AESState *st);
+
+static inline void aesdec_IMC(AESState *r, const AESState *st, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesdec_IMC_accel(r, st, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesdec_IMC_gen(r, st);
+    } else {
+        aesdec_IMC_genrev(r, st);
+    }
+}
+
 /*
  * Perform InvSubBytes + InvShiftRows.
  */
diff --git a/crypto/aes.c b/crypto/aes.c
index 89de8e8db4..bfd41e3fb9 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1395,6 +1395,63 @@ void aesdec_ISB_ISR_AK_genrev(AESState *r, const AESState *s, const AESState *k)
     aesdec_ISB_ISR_AK_swap(r, s, k, true);
 }
 
+/* Perform InvMixColumns. */
+static inline void
+aesdec_IMC_swap(AESState *r, const AESState *st, bool swap)
+{
+    int swap_b = swap * 0xf;
+    int swap_w = swap * 0x3;
+    bool be = HOST_BIG_ENDIAN ^ swap;
+    uint32_t t;
+
+    /* Note that AES_imc is encoded for big-endian. */
+    t = (AES_imc[st->b[swap_b ^ 0x0]][0] ^
+         AES_imc[st->b[swap_b ^ 0x1]][1] ^
+         AES_imc[st->b[swap_b ^ 0x2]][2] ^
+         AES_imc[st->b[swap_b ^ 0x3]][3]);
+    if (!be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 0] = t;
+
+    t = (AES_imc[st->b[swap_b ^ 0x4]][0] ^
+         AES_imc[st->b[swap_b ^ 0x5]][1] ^
+         AES_imc[st->b[swap_b ^ 0x6]][2] ^
+         AES_imc[st->b[swap_b ^ 0x7]][3]);
+    if (!be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 1] = t;
+
+    t = (AES_imc[st->b[swap_b ^ 0x8]][0] ^
+         AES_imc[st->b[swap_b ^ 0x9]][1] ^
+         AES_imc[st->b[swap_b ^ 0xA]][2] ^
+         AES_imc[st->b[swap_b ^ 0xB]][3]);
+    if (!be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 2] = t;
+
+    t = (AES_imc[st->b[swap_b ^ 0xC]][0] ^
+         AES_imc[st->b[swap_b ^ 0xD]][1] ^
+         AES_imc[st->b[swap_b ^ 0xE]][2] ^
+         AES_imc[st->b[swap_b ^ 0xF]][3]);
+    if (!be) {
+        t = bswap32(t);
+    }
+    r->w[swap_w ^ 3] = t;
+}
+
+void aesdec_IMC_gen(AESState *r, const AESState *st)
+{
+    aesdec_IMC_swap(r, st, false);
+}
+
+void aesdec_IMC_genrev(AESState *r, const AESState *st)
+{
+    aesdec_IMC_swap(r, st, true);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 20/38] target/i386: Use aesdec_IMC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (18 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 19/38] crypto: Add aesdec_IMC Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 21/38] target/arm: " Richard Henderson
                   ` (17 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESIMC instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 0a37bde595..893913ebf8 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2215,15 +2215,10 @@ void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 #if SHIFT == 1
 void glue(helper_aesimc, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 {
-    int i;
-    Reg tmp = *s;
+    AESState *ad = (AESState *)&d->ZMM_X(0);
+    AESState *st = (AESState *)&s->ZMM_X(0);
 
-    for (i = 0 ; i < 4 ; i++) {
-        d->L(i) = bswap32(AES_imc[tmp.B(4 * i + 0)][0] ^
-                          AES_imc[tmp.B(4 * i + 1)][1] ^
-                          AES_imc[tmp.B(4 * i + 2)][2] ^
-                          AES_imc[tmp.B(4 * i + 3)][3]);
-    }
+    aesdec_IMC(ad, st, false);
 }
 
 void glue(helper_aeskeygenassist, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 21/38] target/arm: Use aesdec_IMC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (19 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 20/38] target/i386: Use aesdec_IMC Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 22/38] target/riscv: " Richard Henderson
                   ` (16 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESIMC instruction.  We have converted everything
to crypto/aes-round.h; crypto/aes.h is no longer needed.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/crypto_helper.c | 33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/target/arm/tcg/crypto_helper.c b/target/arm/tcg/crypto_helper.c
index 1952aaac58..fdd70abbfd 100644
--- a/target/arm/tcg/crypto_helper.c
+++ b/target/arm/tcg/crypto_helper.c
@@ -14,7 +14,6 @@
 #include "cpu.h"
 #include "exec/helper-proto.h"
 #include "tcg/tcg-gvec-desc.h"
-#include "crypto/aes.h"
 #include "crypto/aes-round.h"
 #include "crypto/sm4.h"
 #include "vec_internal.h"
@@ -102,23 +101,6 @@ void HELPER(crypto_aesd)(void *vd, void *vn, void *vm, uint32_t desc)
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
 
-static void do_crypto_aesmc(uint64_t *rd, uint64_t *rm, const uint32_t *mc)
-{
-    union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
-    int i;
-
-    for (i = 0; i < 16; i += 4) {
-        CR_ST_WORD(st, i >> 2) =
-            mc[CR_ST_BYTE(st, i)] ^
-            rol32(mc[CR_ST_BYTE(st, i + 1)], 8) ^
-            rol32(mc[CR_ST_BYTE(st, i + 2)], 16) ^
-            rol32(mc[CR_ST_BYTE(st, i + 3)], 24);
-    }
-
-    rd[0] = st.l[0];
-    rd[1] = st.l[1];
-}
-
 void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t desc)
 {
     intptr_t i, opr_sz = simd_oprsz(desc);
@@ -147,7 +129,20 @@ void HELPER(crypto_aesimc)(void *vd, void *vm, uint32_t desc)
     intptr_t i, opr_sz = simd_oprsz(desc);
 
     for (i = 0; i < opr_sz; i += 16) {
-        do_crypto_aesmc(vd + i, vm + i, AES_imc_rot);
+        AESState *ad = (AESState *)(vd + i);
+        AESState *st = (AESState *)(vm + i);
+        AESState t;
+
+        /* Our uint64_t are in the wrong order for big-endian. */
+        if (HOST_BIG_ENDIAN) {
+            t.d[0] = st->d[1];
+            t.d[1] = st->d[0];
+            aesdec_IMC(&t, &t, false);
+            ad->d[0] = t.d[1];
+            ad->d[1] = t.d[0];
+        } else {
+            aesdec_IMC(ad, st, false);
+        }
     }
     clear_tail(vd, opr_sz, simd_maxsz(desc));
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 22/38] target/riscv: Use aesdec_IMC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (20 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 21/38] target/arm: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 23/38] crypto: Add aesenc_SB_SR_MC_AK Richard Henderson
                   ` (15 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AES64IM instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index e61f7fe1e5..505166ce5a 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -272,17 +272,12 @@ target_ulong HELPER(aes64ks1i)(target_ulong rs1, target_ulong rnum)
 
 target_ulong HELPER(aes64im)(target_ulong rs1)
 {
-    uint64_t RS1 = rs1;
-    uint32_t col_0 = RS1 & 0xFFFFFFFF;
-    uint32_t col_1 = RS1 >> 32;
-    target_ulong result;
+    AESState t;
 
-    col_0 = AES_INVMIXCOLUMN(col_0);
-    col_1 = AES_INVMIXCOLUMN(col_1);
-
-    result = ((uint64_t)col_1 << 32) | col_0;
-
-    return result;
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = 0;
+    aesdec_IMC(&t, &t, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(sm4ed)(target_ulong rs1, target_ulong rs2,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 23/38] crypto: Add aesenc_SB_SR_MC_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (21 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 22/38] target/riscv: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 24/38] target/i386: Use aesenc_SB_SR_MC_AK Richard Henderson
                   ` (14 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Add a primitive for SubBytes + ShiftRows + MixColumns + AddRoundKey.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  3 ++
 include/crypto/aes-round.h            | 21 ++++++++++
 crypto/aes.c                          | 56 +++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index 34068afe40..ee64db32fa 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -14,6 +14,9 @@ void aesenc_MC_accel(AESState *, const AESState *, bool)
 void aesenc_SB_SR_AK_accel(AESState *, const AESState *,
                            const AESState *, bool)
     QEMU_ERROR("unsupported accel");
+void aesenc_SB_SR_MC_AK_accel(AESState *, const AESState *,
+                              const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
 
 void aesdec_IMC_accel(AESState *, const AESState *, bool)
     QEMU_ERROR("unsupported accel");
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index b80d4de664..9e10c3ee9e 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -77,6 +77,27 @@ static inline void aesdec_IMC(AESState *r, const AESState *st, bool be)
     }
 }
 
+/*
+ * Perform SubBytes + ShiftRows + MixColumns + AddRoundKey.
+ */
+
+void aesenc_SB_SR_MC_AK_gen(AESState *ret, const AESState *st,
+                            const AESState *rk);
+void aesenc_SB_SR_MC_AK_genrev(AESState *ret, const AESState *st,
+                               const AESState *rk);
+
+static inline void aesenc_SB_SR_MC_AK(AESState *r, const AESState *st,
+                                      const AESState *rk, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesenc_SB_SR_MC_AK_accel(r, st, rk, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesenc_SB_SR_MC_AK_gen(r, st, rk);
+    } else {
+        aesenc_SB_SR_MC_AK_genrev(r, st, rk);
+    }
+}
+
 /*
  * Perform InvSubBytes + InvShiftRows.
  */
diff --git a/crypto/aes.c b/crypto/aes.c
index bfd41e3fb9..0c281472aa 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1352,6 +1352,62 @@ void aesenc_MC_genrev(AESState *r, const AESState *st)
     aesenc_MC_swap(r, st, true);
 }
 
+/* Perform SubBytes + ShiftRows + MixColumns + AddRoundKey. */
+static inline void
+aesenc_SB_SR_MC_AK_swap(AESState *r, const AESState *st,
+                        const AESState *rk, bool swap)
+{
+    int swap_b = swap * 0xf;
+    int swap_w = swap * 0x3;
+    bool be = HOST_BIG_ENDIAN ^ swap;
+    uint32_t w0, w1, w2, w3;
+
+    w0 = (AES_Te0[st->b[swap_b ^ AES_SH_0]] ^
+          AES_Te1[st->b[swap_b ^ AES_SH_1]] ^
+          AES_Te2[st->b[swap_b ^ AES_SH_2]] ^
+          AES_Te3[st->b[swap_b ^ AES_SH_3]]);
+
+    w1 = (AES_Te0[st->b[swap_b ^ AES_SH_4]] ^
+          AES_Te1[st->b[swap_b ^ AES_SH_5]] ^
+          AES_Te2[st->b[swap_b ^ AES_SH_6]] ^
+          AES_Te3[st->b[swap_b ^ AES_SH_7]]);
+
+    w2 = (AES_Te0[st->b[swap_b ^ AES_SH_8]] ^
+          AES_Te1[st->b[swap_b ^ AES_SH_9]] ^
+          AES_Te2[st->b[swap_b ^ AES_SH_A]] ^
+          AES_Te3[st->b[swap_b ^ AES_SH_B]]);
+
+    w3 = (AES_Te0[st->b[swap_b ^ AES_SH_C]] ^
+          AES_Te1[st->b[swap_b ^ AES_SH_D]] ^
+          AES_Te2[st->b[swap_b ^ AES_SH_E]] ^
+          AES_Te3[st->b[swap_b ^ AES_SH_F]]);
+
+    /* Note that AES_TeX is encoded for big-endian. */
+    if (!be) {
+        w0 = bswap32(w0);
+        w1 = bswap32(w1);
+        w2 = bswap32(w2);
+        w3 = bswap32(w3);
+    }
+
+    r->w[swap_w ^ 0] = rk->w[swap_w ^ 0] ^ w0;
+    r->w[swap_w ^ 1] = rk->w[swap_w ^ 1] ^ w1;
+    r->w[swap_w ^ 2] = rk->w[swap_w ^ 2] ^ w2;
+    r->w[swap_w ^ 3] = rk->w[swap_w ^ 3] ^ w3;
+}
+
+void aesenc_SB_SR_MC_AK_gen(AESState *r, const AESState *st,
+                            const AESState *rk)
+{
+    aesenc_SB_SR_MC_AK_swap(r, st, rk, false);
+}
+
+void aesenc_SB_SR_MC_AK_genrev(AESState *r, const AESState *st,
+                               const AESState *rk)
+{
+    aesenc_SB_SR_MC_AK_swap(r, st, rk, true);
+}
+
 /* Perform InvSubBytes + InvShiftRows. */
 static inline void
 aesdec_ISB_ISR_AK_swap(AESState *ret, const AESState *st,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 24/38] target/i386: Use aesenc_SB_SR_MC_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (22 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 23/38] crypto: Add aesenc_SB_SR_MC_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 25/38] target/ppc: " Richard Henderson
                   ` (13 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESENC instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 893913ebf8..93a4e0cf16 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2188,16 +2188,12 @@ void glue(helper_aesdeclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 
 void glue(helper_aesenc, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 {
-    int i;
-    Reg st = *v;
-    Reg rk = *s;
+    for (int i = 0; i < SHIFT; i++) {
+        AESState *ad = (AESState *)&d->ZMM_X(i);
+        AESState *st = (AESState *)&v->ZMM_X(i);
+        AESState *rk = (AESState *)&s->ZMM_X(i);
 
-    for (i = 0 ; i < 2 << SHIFT ; i++) {
-        int j = i & 3;
-        d->L(i) = rk.L(i) ^ bswap32(AES_Te0[st.B(AES_shifts[4 * j + 0])] ^
-                                    AES_Te1[st.B(AES_shifts[4 * j + 1])] ^
-                                    AES_Te2[st.B(AES_shifts[4 * j + 2])] ^
-                                    AES_Te3[st.B(AES_shifts[4 * j + 3])]);
+        aesenc_SB_SR_MC_AK(ad, st, rk, false);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 25/38] target/ppc: Use aesenc_SB_SR_MC_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (23 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 24/38] target/i386: Use aesenc_SB_SR_MC_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-12 13:28   ` Daniel Henrique Barboza
  2023-06-09  2:23 ` [PATCH v2 26/38] target/riscv: " Richard Henderson
                   ` (12 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the VCIPHER instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 15f07fca2b..1e477924b7 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2933,17 +2933,11 @@ void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
 
 void helper_vcipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    ppc_avr_t result;
-    int i;
+    AESState *ad = (AESState *)r;
+    AESState *st = (AESState *)a;
+    AESState *rk = (AESState *)b;
 
-    VECTOR_FOR_INORDER_I(i, u32) {
-        result.VsrW(i) = b->VsrW(i) ^
-            (AES_Te0[a->VsrB(AES_shifts[4 * i + 0])] ^
-             AES_Te1[a->VsrB(AES_shifts[4 * i + 1])] ^
-             AES_Te2[a->VsrB(AES_shifts[4 * i + 2])] ^
-             AES_Te3[a->VsrB(AES_shifts[4 * i + 3])]);
-    }
-    *r = result;
+    aesenc_SB_SR_MC_AK(ad, st, rk, true);
 }
 
 void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 26/38] target/riscv: Use aesenc_SB_SR_MC_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (24 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 25/38] target/ppc: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 27/38] crypto: Add aesdec_ISB_ISR_IMC_AK Richard Henderson
                   ` (11 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AES64ESM instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index 505166ce5a..c036fe8632 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -198,7 +198,12 @@ static inline target_ulong aes64_operation(target_ulong rs1, target_ulong rs2,
 
 target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2)
 {
-    return aes64_operation(rs1, rs2, true, true);
+    AESState t;
+
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = rs2;
+    aesenc_SB_SR_MC_AK(&t, &t, &aes_zero, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(aes64es)(target_ulong rs1, target_ulong rs2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 27/38] crypto: Add aesdec_ISB_ISR_IMC_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (25 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 26/38] target/riscv: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 28/38] target/i386: Use aesdec_ISB_ISR_IMC_AK Richard Henderson
                   ` (10 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Add a primitive for InvSubBytes + InvShiftRows +
InvMixColumns + AddRoundKey.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  3 ++
 include/crypto/aes-round.h            | 21 ++++++++++
 crypto/aes.c                          | 56 +++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index ee64db32fa..16b4447831 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -23,5 +23,8 @@ void aesdec_IMC_accel(AESState *, const AESState *, bool)
 void aesdec_ISB_ISR_AK_accel(AESState *, const AESState *,
                              const AESState *, bool)
     QEMU_ERROR("unsupported accel");
+void aesdec_ISB_ISR_IMC_AK_accel(AESState *, const AESState *,
+                                 const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
 
 #endif
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index 9e10c3ee9e..31c5f10df6 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -119,4 +119,25 @@ static inline void aesdec_ISB_ISR_AK(AESState *r, const AESState *st,
     }
 }
 
+/*
+ * Perform InvSubBytes + InvShiftRows + InvMixColumns + AddRoundKey.
+ */
+
+void aesdec_ISB_ISR_IMC_AK_gen(AESState *ret, const AESState *st,
+                               const AESState *rk);
+void aesdec_ISB_ISR_IMC_AK_genrev(AESState *ret, const AESState *st,
+                                  const AESState *rk);
+
+static inline void aesdec_ISB_ISR_IMC_AK(AESState *r, const AESState *st,
+                                         const AESState *rk, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesdec_ISB_ISR_IMC_AK_accel(r, st, rk, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesdec_ISB_ISR_IMC_AK_gen(r, st, rk);
+    } else {
+        aesdec_ISB_ISR_IMC_AK_genrev(r, st, rk);
+    }
+}
+
 #endif /* CRYPTO_AES_ROUND_H */
diff --git a/crypto/aes.c b/crypto/aes.c
index 0c281472aa..b671a3a6fb 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1508,6 +1508,62 @@ void aesdec_IMC_genrev(AESState *r, const AESState *st)
     aesdec_IMC_swap(r, st, true);
 }
 
+/* Perform InvSubBytes + InvShiftRows + InvMixColumns + AddRoundKey. */
+static inline void
+aesdec_ISB_ISR_IMC_AK_swap(AESState *r, const AESState *st,
+                           const AESState *rk, bool swap)
+{
+    int swap_b = swap * 0xf;
+    int swap_w = swap * 0x3;
+    bool be = HOST_BIG_ENDIAN ^ swap;
+    uint32_t w0, w1, w2, w3;
+
+    w0 = (AES_Td0[st->b[swap_b ^ AES_ISH_0]] ^
+          AES_Td1[st->b[swap_b ^ AES_ISH_1]] ^
+          AES_Td2[st->b[swap_b ^ AES_ISH_2]] ^
+          AES_Td3[st->b[swap_b ^ AES_ISH_3]]);
+
+    w1 = (AES_Td0[st->b[swap_b ^ AES_ISH_4]] ^
+          AES_Td1[st->b[swap_b ^ AES_ISH_5]] ^
+          AES_Td2[st->b[swap_b ^ AES_ISH_6]] ^
+          AES_Td3[st->b[swap_b ^ AES_ISH_7]]);
+
+    w2 = (AES_Td0[st->b[swap_b ^ AES_ISH_8]] ^
+          AES_Td1[st->b[swap_b ^ AES_ISH_9]] ^
+          AES_Td2[st->b[swap_b ^ AES_ISH_A]] ^
+          AES_Td3[st->b[swap_b ^ AES_ISH_B]]);
+
+    w3 = (AES_Td0[st->b[swap_b ^ AES_ISH_C]] ^
+          AES_Td1[st->b[swap_b ^ AES_ISH_D]] ^
+          AES_Td2[st->b[swap_b ^ AES_ISH_E]] ^
+          AES_Td3[st->b[swap_b ^ AES_ISH_F]]);
+
+    /* Note that AES_TdX is encoded for big-endian. */
+    if (!be) {
+        w0 = bswap32(w0);
+        w1 = bswap32(w1);
+        w2 = bswap32(w2);
+        w3 = bswap32(w3);
+    }
+
+    r->w[swap_w ^ 0] = rk->w[swap_w ^ 0] ^ w0;
+    r->w[swap_w ^ 1] = rk->w[swap_w ^ 1] ^ w1;
+    r->w[swap_w ^ 2] = rk->w[swap_w ^ 2] ^ w2;
+    r->w[swap_w ^ 3] = rk->w[swap_w ^ 3] ^ w3;
+}
+
+void aesdec_ISB_ISR_IMC_AK_gen(AESState *r, const AESState *st,
+                               const AESState *rk)
+{
+    aesdec_ISB_ISR_IMC_AK_swap(r, st, rk, false);
+}
+
+void aesdec_ISB_ISR_IMC_AK_genrev(AESState *r, const AESState *st,
+                                  const AESState *rk)
+{
+    aesdec_ISB_ISR_IMC_AK_swap(r, st, rk, true);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 28/38] target/i386: Use aesdec_ISB_ISR_IMC_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (26 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 27/38] crypto: Add aesdec_ISB_ISR_IMC_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 29/38] target/riscv: " Richard Henderson
                   ` (9 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AESDEC instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/ops_sse.h | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index 93a4e0cf16..a0e425733f 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2162,16 +2162,12 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s,
 
 void glue(helper_aesdec, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
 {
-    int i;
-    Reg st = *v;
-    Reg rk = *s;
+    for (int i = 0; i < SHIFT; i++) {
+        AESState *ad = (AESState *)&d->ZMM_X(i);
+        AESState *st = (AESState *)&v->ZMM_X(i);
+        AESState *rk = (AESState *)&s->ZMM_X(i);
 
-    for (i = 0 ; i < 2 << SHIFT ; i++) {
-        int j = i & 3;
-        d->L(i) = rk.L(i) ^ bswap32(AES_Td0[st.B(AES_ishifts[4 * j + 0])] ^
-                                    AES_Td1[st.B(AES_ishifts[4 * j + 1])] ^
-                                    AES_Td2[st.B(AES_ishifts[4 * j + 2])] ^
-                                    AES_Td3[st.B(AES_ishifts[4 * j + 3])]);
+        aesdec_ISB_ISR_IMC_AK(ad, st, rk, false);
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 29/38] target/riscv: Use aesdec_ISB_ISR_IMC_AK
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (27 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 28/38] target/i386: Use aesdec_ISB_ISR_IMC_AK Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:23 ` [PATCH v2 30/38] crypto: Add aesdec_ISB_ISR_AK_IMC Richard Henderson
                   ` (8 subsequent siblings)
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the AES64DSM instruction.  This was the last use
of aes64_operation and its support macros, so remove them all.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/crypto_helper.c | 101 ++++-------------------------------
 1 file changed, 10 insertions(+), 91 deletions(-)

diff --git a/target/riscv/crypto_helper.c b/target/riscv/crypto_helper.c
index c036fe8632..99d85a6188 100644
--- a/target/riscv/crypto_helper.c
+++ b/target/riscv/crypto_helper.c
@@ -104,98 +104,8 @@ target_ulong HELPER(aes32dsi)(target_ulong rs1, target_ulong rs2,
     return aes32_operation(shamt, rs1, rs2, false, false);
 }
 
-#define BY(X, I) ((X >> (8 * I)) & 0xFF)
-
-#define AES_SHIFROWS_LO(RS1, RS2) ( \
-    (((RS1 >> 24) & 0xFF) << 56) | (((RS2 >> 48) & 0xFF) << 48) | \
-    (((RS2 >> 8) & 0xFF) << 40) | (((RS1 >> 32) & 0xFF) << 32) | \
-    (((RS2 >> 56) & 0xFF) << 24) | (((RS2 >> 16) & 0xFF) << 16) | \
-    (((RS1 >> 40) & 0xFF) << 8) | (((RS1 >> 0) & 0xFF) << 0))
-
-#define AES_INVSHIFROWS_LO(RS1, RS2) ( \
-    (((RS2 >> 24) & 0xFF) << 56) | (((RS2 >> 48) & 0xFF) << 48) | \
-    (((RS1 >> 8) & 0xFF) << 40) | (((RS1 >> 32) & 0xFF) << 32) | \
-    (((RS1 >> 56) & 0xFF) << 24) | (((RS2 >> 16) & 0xFF) << 16) | \
-    (((RS2 >> 40) & 0xFF) << 8) | (((RS1 >> 0) & 0xFF) << 0))
-
-#define AES_MIXBYTE(COL, B0, B1, B2, B3) ( \
-    BY(COL, B3) ^ BY(COL, B2) ^ AES_GFMUL(BY(COL, B1), 3) ^ \
-    AES_GFMUL(BY(COL, B0), 2))
-
-#define AES_MIXCOLUMN(COL) ( \
-    AES_MIXBYTE(COL, 3, 0, 1, 2) << 24 | \
-    AES_MIXBYTE(COL, 2, 3, 0, 1) << 16 | \
-    AES_MIXBYTE(COL, 1, 2, 3, 0) << 8 | AES_MIXBYTE(COL, 0, 1, 2, 3) << 0)
-
-#define AES_INVMIXBYTE(COL, B0, B1, B2, B3) ( \
-    AES_GFMUL(BY(COL, B3), 0x9) ^ AES_GFMUL(BY(COL, B2), 0xd) ^ \
-    AES_GFMUL(BY(COL, B1), 0xb) ^ AES_GFMUL(BY(COL, B0), 0xe))
-
-#define AES_INVMIXCOLUMN(COL) ( \
-    AES_INVMIXBYTE(COL, 3, 0, 1, 2) << 24 | \
-    AES_INVMIXBYTE(COL, 2, 3, 0, 1) << 16 | \
-    AES_INVMIXBYTE(COL, 1, 2, 3, 0) << 8 | \
-    AES_INVMIXBYTE(COL, 0, 1, 2, 3) << 0)
-
 static const AESState aes_zero = { };
 
-static inline target_ulong aes64_operation(target_ulong rs1, target_ulong rs2,
-                                           bool enc, bool mix)
-{
-    uint64_t RS1 = rs1;
-    uint64_t RS2 = rs2;
-    uint64_t result;
-    uint64_t temp;
-    uint32_t col_0;
-    uint32_t col_1;
-
-    if (enc) {
-        temp = AES_SHIFROWS_LO(RS1, RS2);
-        temp = (((uint64_t)AES_sbox[(temp >> 0) & 0xFF] << 0) |
-                ((uint64_t)AES_sbox[(temp >> 8) & 0xFF] << 8) |
-                ((uint64_t)AES_sbox[(temp >> 16) & 0xFF] << 16) |
-                ((uint64_t)AES_sbox[(temp >> 24) & 0xFF] << 24) |
-                ((uint64_t)AES_sbox[(temp >> 32) & 0xFF] << 32) |
-                ((uint64_t)AES_sbox[(temp >> 40) & 0xFF] << 40) |
-                ((uint64_t)AES_sbox[(temp >> 48) & 0xFF] << 48) |
-                ((uint64_t)AES_sbox[(temp >> 56) & 0xFF] << 56));
-        if (mix) {
-            col_0 = temp & 0xFFFFFFFF;
-            col_1 = temp >> 32;
-
-            col_0 = AES_MIXCOLUMN(col_0);
-            col_1 = AES_MIXCOLUMN(col_1);
-
-            result = ((uint64_t)col_1 << 32) | col_0;
-        } else {
-            result = temp;
-        }
-    } else {
-        temp = AES_INVSHIFROWS_LO(RS1, RS2);
-        temp = (((uint64_t)AES_isbox[(temp >> 0) & 0xFF] << 0) |
-                ((uint64_t)AES_isbox[(temp >> 8) & 0xFF] << 8) |
-                ((uint64_t)AES_isbox[(temp >> 16) & 0xFF] << 16) |
-                ((uint64_t)AES_isbox[(temp >> 24) & 0xFF] << 24) |
-                ((uint64_t)AES_isbox[(temp >> 32) & 0xFF] << 32) |
-                ((uint64_t)AES_isbox[(temp >> 40) & 0xFF] << 40) |
-                ((uint64_t)AES_isbox[(temp >> 48) & 0xFF] << 48) |
-                ((uint64_t)AES_isbox[(temp >> 56) & 0xFF] << 56));
-        if (mix) {
-            col_0 = temp & 0xFFFFFFFF;
-            col_1 = temp >> 32;
-
-            col_0 = AES_INVMIXCOLUMN(col_0);
-            col_1 = AES_INVMIXCOLUMN(col_1);
-
-            result = ((uint64_t)col_1 << 32) | col_0;
-        } else {
-            result = temp;
-        }
-    }
-
-    return result;
-}
-
 target_ulong HELPER(aes64esm)(target_ulong rs1, target_ulong rs2)
 {
     AESState t;
@@ -228,7 +138,16 @@ target_ulong HELPER(aes64ds)(target_ulong rs1, target_ulong rs2)
 
 target_ulong HELPER(aes64dsm)(target_ulong rs1, target_ulong rs2)
 {
-    return aes64_operation(rs1, rs2, false, true);
+    AESState t, z = { };
+
+    /*
+     * This instruction does not include a round key,
+     * so supply a zero to our primitive.
+     */
+    t.d[HOST_BIG_ENDIAN] = rs1;
+    t.d[!HOST_BIG_ENDIAN] = rs2;
+    aesdec_ISB_ISR_IMC_AK(&t, &t, &z, false);
+    return t.d[HOST_BIG_ENDIAN];
 }
 
 target_ulong HELPER(aes64ks2)(target_ulong rs1, target_ulong rs2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 30/38] crypto: Add aesdec_ISB_ISR_AK_IMC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (28 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 29/38] target/riscv: " Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 13:59   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 31/38] target/ppc: Use aesdec_ISB_ISR_AK_IMC Richard Henderson
                   ` (7 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Add a primitive for InvSubBytes + InvShiftRows +
AddRoundKey + InvMixColumns.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/generic/host/aes-round.h |  3 +++
 include/crypto/aes-round.h            | 21 +++++++++++++++++++++
 crypto/aes.c                          | 14 ++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
index 16b4447831..c52fea936f 100644
--- a/host/include/generic/host/aes-round.h
+++ b/host/include/generic/host/aes-round.h
@@ -23,6 +23,9 @@ void aesdec_IMC_accel(AESState *, const AESState *, bool)
 void aesdec_ISB_ISR_AK_accel(AESState *, const AESState *,
                              const AESState *, bool)
     QEMU_ERROR("unsupported accel");
+void aesdec_ISB_ISR_AK_IMC_accel(AESState *, const AESState *,
+                                 const AESState *, bool)
+    QEMU_ERROR("unsupported accel");
 void aesdec_ISB_ISR_IMC_AK_accel(AESState *, const AESState *,
                                  const AESState *, bool)
     QEMU_ERROR("unsupported accel");
diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
index 31c5f10df6..dd8f49becb 100644
--- a/include/crypto/aes-round.h
+++ b/include/crypto/aes-round.h
@@ -119,6 +119,27 @@ static inline void aesdec_ISB_ISR_AK(AESState *r, const AESState *st,
     }
 }
 
+/*
+ * Perform InvSubBytes + InvShiftRows + AddRoundKey + InvMixColumns.
+ */
+
+void aesdec_ISB_ISR_AK_IMC_gen(AESState *ret, const AESState *st,
+                               const AESState *rk);
+void aesdec_ISB_ISR_AK_IMC_genrev(AESState *ret, const AESState *st,
+                                  const AESState *rk);
+
+static inline void aesdec_ISB_ISR_AK_IMC(AESState *r, const AESState *st,
+                                         const AESState *rk, bool be)
+{
+    if (HAVE_AES_ACCEL) {
+        aesdec_ISB_ISR_AK_IMC_accel(r, st, rk, be);
+    } else if (HOST_BIG_ENDIAN == be) {
+        aesdec_ISB_ISR_AK_IMC_gen(r, st, rk);
+    } else {
+        aesdec_ISB_ISR_AK_IMC_genrev(r, st, rk);
+    }
+}
+
 /*
  * Perform InvSubBytes + InvShiftRows + InvMixColumns + AddRoundKey.
  */
diff --git a/crypto/aes.c b/crypto/aes.c
index b671a3a6fb..f0721ad4a2 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1564,6 +1564,20 @@ void aesdec_ISB_ISR_IMC_AK_genrev(AESState *r, const AESState *st,
     aesdec_ISB_ISR_IMC_AK_swap(r, st, rk, true);
 }
 
+void aesdec_ISB_ISR_AK_IMC_gen(AESState *ret, const AESState *st,
+                               const AESState *rk)
+{
+    aesdec_ISB_ISR_AK_gen(ret, st, rk);
+    aesdec_IMC_gen(ret, ret);
+}
+
+void aesdec_ISB_ISR_AK_IMC_genrev(AESState *ret, const AESState *st,
+                                  const AESState *rk)
+{
+    aesdec_ISB_ISR_AK_genrev(ret, st, rk);
+    aesdec_IMC_genrev(ret, ret);
+}
+
 /**
  * Expand the cipher key into the encryption key schedule.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 31/38] target/ppc: Use aesdec_ISB_ISR_AK_IMC
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (29 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 30/38] crypto: Add aesdec_ISB_ISR_AK_IMC Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-12 13:28   ` Daniel Henrique Barboza
  2023-06-19 13:46   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 32/38] crypto: Remove AES_shifts, AES_ishifts Richard Henderson
                   ` (6 subsequent siblings)
  37 siblings, 2 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This implements the VNCIPHER instruction.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/int_helper.c | 19 ++++---------------
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 1e477924b7..834da80fe3 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -2947,22 +2947,11 @@ void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 
 void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
 {
-    /* This differs from what is written in ISA V2.07.  The RTL is */
-    /* incorrect and will be fixed in V2.07B.                      */
-    int i;
-    ppc_avr_t tmp;
+    AESState *ad = (AESState *)r;
+    AESState *st = (AESState *)a;
+    AESState *rk = (AESState *)b;
 
-    VECTOR_FOR_INORDER_I(i, u8) {
-        tmp.VsrB(i) = b->VsrB(i) ^ AES_isbox[a->VsrB(AES_ishifts[i])];
-    }
-
-    VECTOR_FOR_INORDER_I(i, u32) {
-        r->VsrW(i) =
-            AES_imc[tmp.VsrB(4 * i + 0)][0] ^
-            AES_imc[tmp.VsrB(4 * i + 1)][1] ^
-            AES_imc[tmp.VsrB(4 * i + 2)][2] ^
-            AES_imc[tmp.VsrB(4 * i + 3)][3];
-    }
+    aesdec_ISB_ISR_AK_IMC(ad, st, rk, true);
 }
 
 void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 32/38] crypto: Remove AES_shifts, AES_ishifts
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (30 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 31/38] target/ppc: Use aesdec_ISB_ISR_AK_IMC Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 13:45   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 33/38] crypto: Implement aesdec_IMC with AES_imc_rot Richard Henderson
                   ` (5 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

These arrays are no longer used, replaced by AES_SH_*, AES_ISH_*.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/crypto/aes.h |  4 ----
 crypto/aes.c         | 14 --------------
 2 files changed, 18 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index 24b073d569..aa8b54065d 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -30,10 +30,6 @@ void AES_decrypt(const unsigned char *in, unsigned char *out,
 extern const uint8_t AES_sbox[256];
 extern const uint8_t AES_isbox[256];
 
-/* AES ShiftRows and InvShiftRows */
-extern const uint8_t AES_shifts[16];
-extern const uint8_t AES_ishifts[16];
-
 /* AES MixColumns, for use with rot32. */
 extern const uint32_t AES_mc_rot[256];
 
diff --git a/crypto/aes.c b/crypto/aes.c
index f0721ad4a2..7b8a22fe3e 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -131,13 +131,6 @@ enum {
     AES_SH_F = 0xb,
 };
 
-const uint8_t AES_shifts[16] = {
-    AES_SH_0, AES_SH_1, AES_SH_2, AES_SH_3,
-    AES_SH_4, AES_SH_5, AES_SH_6, AES_SH_7,
-    AES_SH_8, AES_SH_9, AES_SH_A, AES_SH_B,
-    AES_SH_C, AES_SH_D, AES_SH_E, AES_SH_F,
-};
-
 /* AES InvShiftRows, for complete unrolling. */
 enum {
     AES_ISH_0 = 0x0,
@@ -158,13 +151,6 @@ enum {
     AES_ISH_F = 0x3,
 };
 
-const uint8_t AES_ishifts[16] = {
-    AES_ISH_0, AES_ISH_1, AES_ISH_2, AES_ISH_3,
-    AES_ISH_4, AES_ISH_5, AES_ISH_6, AES_ISH_7,
-    AES_ISH_8, AES_ISH_9, AES_ISH_A, AES_ISH_B,
-    AES_ISH_C, AES_ISH_D, AES_ISH_E, AES_ISH_F,
-};
-
 /*
  * MixColumns lookup table, for use with rot32.
  */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 33/38] crypto: Implement aesdec_IMC with AES_imc_rot
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (31 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 32/38] crypto: Remove AES_shifts, AES_ishifts Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-20  5:09   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 34/38] crypto: Remove AES_imc Richard Henderson
                   ` (4 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This method uses one uint32_t * 256 table instead of 4,
which means its data cache overhead is less.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 crypto/aes.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/crypto/aes.c b/crypto/aes.c
index 7b8a22fe3e..4da2cd7077 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -1446,39 +1446,39 @@ aesdec_IMC_swap(AESState *r, const AESState *st, bool swap)
     bool be = HOST_BIG_ENDIAN ^ swap;
     uint32_t t;
 
-    /* Note that AES_imc is encoded for big-endian. */
-    t = (AES_imc[st->b[swap_b ^ 0x0]][0] ^
-         AES_imc[st->b[swap_b ^ 0x1]][1] ^
-         AES_imc[st->b[swap_b ^ 0x2]][2] ^
-         AES_imc[st->b[swap_b ^ 0x3]][3]);
-    if (!be) {
+    /* Note that AES_imc_rot is encoded for little-endian. */
+    t = (      AES_imc_rot[st->b[swap_b ^ 0x0]] ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x1]], 8) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x2]], 16) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x3]], 24));
+    if (be) {
         t = bswap32(t);
     }
     r->w[swap_w ^ 0] = t;
 
-    t = (AES_imc[st->b[swap_b ^ 0x4]][0] ^
-         AES_imc[st->b[swap_b ^ 0x5]][1] ^
-         AES_imc[st->b[swap_b ^ 0x6]][2] ^
-         AES_imc[st->b[swap_b ^ 0x7]][3]);
-    if (!be) {
+    t = (      AES_imc_rot[st->b[swap_b ^ 0x4]] ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x5]], 8) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x6]], 16) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x7]], 24));
+    if (be) {
         t = bswap32(t);
     }
     r->w[swap_w ^ 1] = t;
 
-    t = (AES_imc[st->b[swap_b ^ 0x8]][0] ^
-         AES_imc[st->b[swap_b ^ 0x9]][1] ^
-         AES_imc[st->b[swap_b ^ 0xA]][2] ^
-         AES_imc[st->b[swap_b ^ 0xB]][3]);
-    if (!be) {
+    t = (      AES_imc_rot[st->b[swap_b ^ 0x8]] ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0x9]], 8) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xA]], 16) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xB]], 24));
+    if (be) {
         t = bswap32(t);
     }
     r->w[swap_w ^ 2] = t;
 
-    t = (AES_imc[st->b[swap_b ^ 0xC]][0] ^
-         AES_imc[st->b[swap_b ^ 0xD]][1] ^
-         AES_imc[st->b[swap_b ^ 0xE]][2] ^
-         AES_imc[st->b[swap_b ^ 0xF]][3]);
-    if (!be) {
+    t = (      AES_imc_rot[st->b[swap_b ^ 0xC]] ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xD]], 8) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xE]], 16) ^
+         rol32(AES_imc_rot[st->b[swap_b ^ 0xF]], 24));
+    if (be) {
         t = bswap32(t);
     }
     r->w[swap_w ^ 3] = t;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 34/38] crypto: Remove AES_imc
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (32 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 33/38] crypto: Implement aesdec_IMC with AES_imc_rot Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 13:19   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 35/38] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN Richard Henderson
                   ` (3 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

This array is no longer used.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/crypto/aes.h |   7 --
 crypto/aes.c         | 264 -------------------------------------------
 2 files changed, 271 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index aa8b54065d..99209f51b9 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -36,13 +36,6 @@ extern const uint32_t AES_mc_rot[256];
 /* AES InvMixColumns, for use with rot32. */
 extern const uint32_t AES_imc_rot[256];
 
-/* AES InvMixColumns */
-/* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
-/* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
-/* AES_imc[x][2] = [x].[0d, 0b, 0e, 09]; */
-/* AES_imc[x][3] = [x].[09, 0d, 0b, 0e]; */
-extern const uint32_t AES_imc[256][4];
-
 /*
 AES_Te0[x] = S [x].[02, 01, 01, 03];
 AES_Te1[x] = S [x].[03, 02, 01, 01];
diff --git a/crypto/aes.c b/crypto/aes.c
index 4da2cd7077..a18267b9f8 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -291,270 +291,6 @@ const uint32_t AES_imc_rot[256] = {
     0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
 };
 
-/* AES_imc[x][0] = [x].[0e, 09, 0d, 0b]; */
-/* AES_imc[x][1] = [x].[0b, 0e, 09, 0d]; */
-/* AES_imc[x][2] = [x].[0d, 0b, 0e, 09]; */
-/* AES_imc[x][3] = [x].[09, 0d, 0b, 0e]; */
-const uint32_t AES_imc[256][4] = {
-    { 0x00000000, 0x00000000, 0x00000000, 0x00000000, }, /* x=00 */
-    { 0x0E090D0B, 0x0B0E090D, 0x0D0B0E09, 0x090D0B0E, }, /* x=01 */
-    { 0x1C121A16, 0x161C121A, 0x1A161C12, 0x121A161C, }, /* x=02 */
-    { 0x121B171D, 0x1D121B17, 0x171D121B, 0x1B171D12, }, /* x=03 */
-    { 0x3824342C, 0x2C382434, 0x342C3824, 0x24342C38, }, /* x=04 */
-    { 0x362D3927, 0x27362D39, 0x3927362D, 0x2D392736, }, /* x=05 */
-    { 0x24362E3A, 0x3A24362E, 0x2E3A2436, 0x362E3A24, }, /* x=06 */
-    { 0x2A3F2331, 0x312A3F23, 0x23312A3F, 0x3F23312A, }, /* x=07 */
-    { 0x70486858, 0x58704868, 0x68587048, 0x48685870, }, /* x=08 */
-    { 0x7E416553, 0x537E4165, 0x65537E41, 0x4165537E, }, /* x=09 */
-    { 0x6C5A724E, 0x4E6C5A72, 0x724E6C5A, 0x5A724E6C, }, /* x=0A */
-    { 0x62537F45, 0x4562537F, 0x7F456253, 0x537F4562, }, /* x=0B */
-    { 0x486C5C74, 0x74486C5C, 0x5C74486C, 0x6C5C7448, }, /* x=0C */
-    { 0x4665517F, 0x7F466551, 0x517F4665, 0x65517F46, }, /* x=0D */
-    { 0x547E4662, 0x62547E46, 0x4662547E, 0x7E466254, }, /* x=0E */
-    { 0x5A774B69, 0x695A774B, 0x4B695A77, 0x774B695A, }, /* x=0F */
-    { 0xE090D0B0, 0xB0E090D0, 0xD0B0E090, 0x90D0B0E0, }, /* x=10 */
-    { 0xEE99DDBB, 0xBBEE99DD, 0xDDBBEE99, 0x99DDBBEE, }, /* x=11 */
-    { 0xFC82CAA6, 0xA6FC82CA, 0xCAA6FC82, 0x82CAA6FC, }, /* x=12 */
-    { 0xF28BC7AD, 0xADF28BC7, 0xC7ADF28B, 0x8BC7ADF2, }, /* x=13 */
-    { 0xD8B4E49C, 0x9CD8B4E4, 0xE49CD8B4, 0xB4E49CD8, }, /* x=14 */
-    { 0xD6BDE997, 0x97D6BDE9, 0xE997D6BD, 0xBDE997D6, }, /* x=15 */
-    { 0xC4A6FE8A, 0x8AC4A6FE, 0xFE8AC4A6, 0xA6FE8AC4, }, /* x=16 */
-    { 0xCAAFF381, 0x81CAAFF3, 0xF381CAAF, 0xAFF381CA, }, /* x=17 */
-    { 0x90D8B8E8, 0xE890D8B8, 0xB8E890D8, 0xD8B8E890, }, /* x=18 */
-    { 0x9ED1B5E3, 0xE39ED1B5, 0xB5E39ED1, 0xD1B5E39E, }, /* x=19 */
-    { 0x8CCAA2FE, 0xFE8CCAA2, 0xA2FE8CCA, 0xCAA2FE8C, }, /* x=1A */
-    { 0x82C3AFF5, 0xF582C3AF, 0xAFF582C3, 0xC3AFF582, }, /* x=1B */
-    { 0xA8FC8CC4, 0xC4A8FC8C, 0x8CC4A8FC, 0xFC8CC4A8, }, /* x=1C */
-    { 0xA6F581CF, 0xCFA6F581, 0x81CFA6F5, 0xF581CFA6, }, /* x=1D */
-    { 0xB4EE96D2, 0xD2B4EE96, 0x96D2B4EE, 0xEE96D2B4, }, /* x=1E */
-    { 0xBAE79BD9, 0xD9BAE79B, 0x9BD9BAE7, 0xE79BD9BA, }, /* x=1F */
-    { 0xDB3BBB7B, 0x7BDB3BBB, 0xBB7BDB3B, 0x3BBB7BDB, }, /* x=20 */
-    { 0xD532B670, 0x70D532B6, 0xB670D532, 0x32B670D5, }, /* x=21 */
-    { 0xC729A16D, 0x6DC729A1, 0xA16DC729, 0x29A16DC7, }, /* x=22 */
-    { 0xC920AC66, 0x66C920AC, 0xAC66C920, 0x20AC66C9, }, /* x=23 */
-    { 0xE31F8F57, 0x57E31F8F, 0x8F57E31F, 0x1F8F57E3, }, /* x=24 */
-    { 0xED16825C, 0x5CED1682, 0x825CED16, 0x16825CED, }, /* x=25 */
-    { 0xFF0D9541, 0x41FF0D95, 0x9541FF0D, 0x0D9541FF, }, /* x=26 */
-    { 0xF104984A, 0x4AF10498, 0x984AF104, 0x04984AF1, }, /* x=27 */
-    { 0xAB73D323, 0x23AB73D3, 0xD323AB73, 0x73D323AB, }, /* x=28 */
-    { 0xA57ADE28, 0x28A57ADE, 0xDE28A57A, 0x7ADE28A5, }, /* x=29 */
-    { 0xB761C935, 0x35B761C9, 0xC935B761, 0x61C935B7, }, /* x=2A */
-    { 0xB968C43E, 0x3EB968C4, 0xC43EB968, 0x68C43EB9, }, /* x=2B */
-    { 0x9357E70F, 0x0F9357E7, 0xE70F9357, 0x57E70F93, }, /* x=2C */
-    { 0x9D5EEA04, 0x049D5EEA, 0xEA049D5E, 0x5EEA049D, }, /* x=2D */
-    { 0x8F45FD19, 0x198F45FD, 0xFD198F45, 0x45FD198F, }, /* x=2E */
-    { 0x814CF012, 0x12814CF0, 0xF012814C, 0x4CF01281, }, /* x=2F */
-    { 0x3BAB6BCB, 0xCB3BAB6B, 0x6BCB3BAB, 0xAB6BCB3B, }, /* x=30 */
-    { 0x35A266C0, 0xC035A266, 0x66C035A2, 0xA266C035, }, /* x=31 */
-    { 0x27B971DD, 0xDD27B971, 0x71DD27B9, 0xB971DD27, }, /* x=32 */
-    { 0x29B07CD6, 0xD629B07C, 0x7CD629B0, 0xB07CD629, }, /* x=33 */
-    { 0x038F5FE7, 0xE7038F5F, 0x5FE7038F, 0x8F5FE703, }, /* x=34 */
-    { 0x0D8652EC, 0xEC0D8652, 0x52EC0D86, 0x8652EC0D, }, /* x=35 */
-    { 0x1F9D45F1, 0xF11F9D45, 0x45F11F9D, 0x9D45F11F, }, /* x=36 */
-    { 0x119448FA, 0xFA119448, 0x48FA1194, 0x9448FA11, }, /* x=37 */
-    { 0x4BE30393, 0x934BE303, 0x03934BE3, 0xE303934B, }, /* x=38 */
-    { 0x45EA0E98, 0x9845EA0E, 0x0E9845EA, 0xEA0E9845, }, /* x=39 */
-    { 0x57F11985, 0x8557F119, 0x198557F1, 0xF1198557, }, /* x=3A */
-    { 0x59F8148E, 0x8E59F814, 0x148E59F8, 0xF8148E59, }, /* x=3B */
-    { 0x73C737BF, 0xBF73C737, 0x37BF73C7, 0xC737BF73, }, /* x=3C */
-    { 0x7DCE3AB4, 0xB47DCE3A, 0x3AB47DCE, 0xCE3AB47D, }, /* x=3D */
-    { 0x6FD52DA9, 0xA96FD52D, 0x2DA96FD5, 0xD52DA96F, }, /* x=3E */
-    { 0x61DC20A2, 0xA261DC20, 0x20A261DC, 0xDC20A261, }, /* x=3F */
-    { 0xAD766DF6, 0xF6AD766D, 0x6DF6AD76, 0x766DF6AD, }, /* x=40 */
-    { 0xA37F60FD, 0xFDA37F60, 0x60FDA37F, 0x7F60FDA3, }, /* x=41 */
-    { 0xB16477E0, 0xE0B16477, 0x77E0B164, 0x6477E0B1, }, /* x=42 */
-    { 0xBF6D7AEB, 0xEBBF6D7A, 0x7AEBBF6D, 0x6D7AEBBF, }, /* x=43 */
-    { 0x955259DA, 0xDA955259, 0x59DA9552, 0x5259DA95, }, /* x=44 */
-    { 0x9B5B54D1, 0xD19B5B54, 0x54D19B5B, 0x5B54D19B, }, /* x=45 */
-    { 0x894043CC, 0xCC894043, 0x43CC8940, 0x4043CC89, }, /* x=46 */
-    { 0x87494EC7, 0xC787494E, 0x4EC78749, 0x494EC787, }, /* x=47 */
-    { 0xDD3E05AE, 0xAEDD3E05, 0x05AEDD3E, 0x3E05AEDD, }, /* x=48 */
-    { 0xD33708A5, 0xA5D33708, 0x08A5D337, 0x3708A5D3, }, /* x=49 */
-    { 0xC12C1FB8, 0xB8C12C1F, 0x1FB8C12C, 0x2C1FB8C1, }, /* x=4A */
-    { 0xCF2512B3, 0xB3CF2512, 0x12B3CF25, 0x2512B3CF, }, /* x=4B */
-    { 0xE51A3182, 0x82E51A31, 0x3182E51A, 0x1A3182E5, }, /* x=4C */
-    { 0xEB133C89, 0x89EB133C, 0x3C89EB13, 0x133C89EB, }, /* x=4D */
-    { 0xF9082B94, 0x94F9082B, 0x2B94F908, 0x082B94F9, }, /* x=4E */
-    { 0xF701269F, 0x9FF70126, 0x269FF701, 0x01269FF7, }, /* x=4F */
-    { 0x4DE6BD46, 0x464DE6BD, 0xBD464DE6, 0xE6BD464D, }, /* x=50 */
-    { 0x43EFB04D, 0x4D43EFB0, 0xB04D43EF, 0xEFB04D43, }, /* x=51 */
-    { 0x51F4A750, 0x5051F4A7, 0xA75051F4, 0xF4A75051, }, /* x=52 */
-    { 0x5FFDAA5B, 0x5B5FFDAA, 0xAA5B5FFD, 0xFDAA5B5F, }, /* x=53 */
-    { 0x75C2896A, 0x6A75C289, 0x896A75C2, 0xC2896A75, }, /* x=54 */
-    { 0x7BCB8461, 0x617BCB84, 0x84617BCB, 0xCB84617B, }, /* x=55 */
-    { 0x69D0937C, 0x7C69D093, 0x937C69D0, 0xD0937C69, }, /* x=56 */
-    { 0x67D99E77, 0x7767D99E, 0x9E7767D9, 0xD99E7767, }, /* x=57 */
-    { 0x3DAED51E, 0x1E3DAED5, 0xD51E3DAE, 0xAED51E3D, }, /* x=58 */
-    { 0x33A7D815, 0x1533A7D8, 0xD81533A7, 0xA7D81533, }, /* x=59 */
-    { 0x21BCCF08, 0x0821BCCF, 0xCF0821BC, 0xBCCF0821, }, /* x=5A */
-    { 0x2FB5C203, 0x032FB5C2, 0xC2032FB5, 0xB5C2032F, }, /* x=5B */
-    { 0x058AE132, 0x32058AE1, 0xE132058A, 0x8AE13205, }, /* x=5C */
-    { 0x0B83EC39, 0x390B83EC, 0xEC390B83, 0x83EC390B, }, /* x=5D */
-    { 0x1998FB24, 0x241998FB, 0xFB241998, 0x98FB2419, }, /* x=5E */
-    { 0x1791F62F, 0x2F1791F6, 0xF62F1791, 0x91F62F17, }, /* x=5F */
-    { 0x764DD68D, 0x8D764DD6, 0xD68D764D, 0x4DD68D76, }, /* x=60 */
-    { 0x7844DB86, 0x867844DB, 0xDB867844, 0x44DB8678, }, /* x=61 */
-    { 0x6A5FCC9B, 0x9B6A5FCC, 0xCC9B6A5F, 0x5FCC9B6A, }, /* x=62 */
-    { 0x6456C190, 0x906456C1, 0xC1906456, 0x56C19064, }, /* x=63 */
-    { 0x4E69E2A1, 0xA14E69E2, 0xE2A14E69, 0x69E2A14E, }, /* x=64 */
-    { 0x4060EFAA, 0xAA4060EF, 0xEFAA4060, 0x60EFAA40, }, /* x=65 */
-    { 0x527BF8B7, 0xB7527BF8, 0xF8B7527B, 0x7BF8B752, }, /* x=66 */
-    { 0x5C72F5BC, 0xBC5C72F5, 0xF5BC5C72, 0x72F5BC5C, }, /* x=67 */
-    { 0x0605BED5, 0xD50605BE, 0xBED50605, 0x05BED506, }, /* x=68 */
-    { 0x080CB3DE, 0xDE080CB3, 0xB3DE080C, 0x0CB3DE08, }, /* x=69 */
-    { 0x1A17A4C3, 0xC31A17A4, 0xA4C31A17, 0x17A4C31A, }, /* x=6A */
-    { 0x141EA9C8, 0xC8141EA9, 0xA9C8141E, 0x1EA9C814, }, /* x=6B */
-    { 0x3E218AF9, 0xF93E218A, 0x8AF93E21, 0x218AF93E, }, /* x=6C */
-    { 0x302887F2, 0xF2302887, 0x87F23028, 0x2887F230, }, /* x=6D */
-    { 0x223390EF, 0xEF223390, 0x90EF2233, 0x3390EF22, }, /* x=6E */
-    { 0x2C3A9DE4, 0xE42C3A9D, 0x9DE42C3A, 0x3A9DE42C, }, /* x=6F */
-    { 0x96DD063D, 0x3D96DD06, 0x063D96DD, 0xDD063D96, }, /* x=70 */
-    { 0x98D40B36, 0x3698D40B, 0x0B3698D4, 0xD40B3698, }, /* x=71 */
-    { 0x8ACF1C2B, 0x2B8ACF1C, 0x1C2B8ACF, 0xCF1C2B8A, }, /* x=72 */
-    { 0x84C61120, 0x2084C611, 0x112084C6, 0xC6112084, }, /* x=73 */
-    { 0xAEF93211, 0x11AEF932, 0x3211AEF9, 0xF93211AE, }, /* x=74 */
-    { 0xA0F03F1A, 0x1AA0F03F, 0x3F1AA0F0, 0xF03F1AA0, }, /* x=75 */
-    { 0xB2EB2807, 0x07B2EB28, 0x2807B2EB, 0xEB2807B2, }, /* x=76 */
-    { 0xBCE2250C, 0x0CBCE225, 0x250CBCE2, 0xE2250CBC, }, /* x=77 */
-    { 0xE6956E65, 0x65E6956E, 0x6E65E695, 0x956E65E6, }, /* x=78 */
-    { 0xE89C636E, 0x6EE89C63, 0x636EE89C, 0x9C636EE8, }, /* x=79 */
-    { 0xFA877473, 0x73FA8774, 0x7473FA87, 0x877473FA, }, /* x=7A */
-    { 0xF48E7978, 0x78F48E79, 0x7978F48E, 0x8E7978F4, }, /* x=7B */
-    { 0xDEB15A49, 0x49DEB15A, 0x5A49DEB1, 0xB15A49DE, }, /* x=7C */
-    { 0xD0B85742, 0x42D0B857, 0x5742D0B8, 0xB85742D0, }, /* x=7D */
-    { 0xC2A3405F, 0x5FC2A340, 0x405FC2A3, 0xA3405FC2, }, /* x=7E */
-    { 0xCCAA4D54, 0x54CCAA4D, 0x4D54CCAA, 0xAA4D54CC, }, /* x=7F */
-    { 0x41ECDAF7, 0xF741ECDA, 0xDAF741EC, 0xECDAF741, }, /* x=80 */
-    { 0x4FE5D7FC, 0xFC4FE5D7, 0xD7FC4FE5, 0xE5D7FC4F, }, /* x=81 */
-    { 0x5DFEC0E1, 0xE15DFEC0, 0xC0E15DFE, 0xFEC0E15D, }, /* x=82 */
-    { 0x53F7CDEA, 0xEA53F7CD, 0xCDEA53F7, 0xF7CDEA53, }, /* x=83 */
-    { 0x79C8EEDB, 0xDB79C8EE, 0xEEDB79C8, 0xC8EEDB79, }, /* x=84 */
-    { 0x77C1E3D0, 0xD077C1E3, 0xE3D077C1, 0xC1E3D077, }, /* x=85 */
-    { 0x65DAF4CD, 0xCD65DAF4, 0xF4CD65DA, 0xDAF4CD65, }, /* x=86 */
-    { 0x6BD3F9C6, 0xC66BD3F9, 0xF9C66BD3, 0xD3F9C66B, }, /* x=87 */
-    { 0x31A4B2AF, 0xAF31A4B2, 0xB2AF31A4, 0xA4B2AF31, }, /* x=88 */
-    { 0x3FADBFA4, 0xA43FADBF, 0xBFA43FAD, 0xADBFA43F, }, /* x=89 */
-    { 0x2DB6A8B9, 0xB92DB6A8, 0xA8B92DB6, 0xB6A8B92D, }, /* x=8A */
-    { 0x23BFA5B2, 0xB223BFA5, 0xA5B223BF, 0xBFA5B223, }, /* x=8B */
-    { 0x09808683, 0x83098086, 0x86830980, 0x80868309, }, /* x=8C */
-    { 0x07898B88, 0x8807898B, 0x8B880789, 0x898B8807, }, /* x=8D */
-    { 0x15929C95, 0x9515929C, 0x9C951592, 0x929C9515, }, /* x=8E */
-    { 0x1B9B919E, 0x9E1B9B91, 0x919E1B9B, 0x9B919E1B, }, /* x=8F */
-    { 0xA17C0A47, 0x47A17C0A, 0x0A47A17C, 0x7C0A47A1, }, /* x=90 */
-    { 0xAF75074C, 0x4CAF7507, 0x074CAF75, 0x75074CAF, }, /* x=91 */
-    { 0xBD6E1051, 0x51BD6E10, 0x1051BD6E, 0x6E1051BD, }, /* x=92 */
-    { 0xB3671D5A, 0x5AB3671D, 0x1D5AB367, 0x671D5AB3, }, /* x=93 */
-    { 0x99583E6B, 0x6B99583E, 0x3E6B9958, 0x583E6B99, }, /* x=94 */
-    { 0x97513360, 0x60975133, 0x33609751, 0x51336097, }, /* x=95 */
-    { 0x854A247D, 0x7D854A24, 0x247D854A, 0x4A247D85, }, /* x=96 */
-    { 0x8B432976, 0x768B4329, 0x29768B43, 0x4329768B, }, /* x=97 */
-    { 0xD134621F, 0x1FD13462, 0x621FD134, 0x34621FD1, }, /* x=98 */
-    { 0xDF3D6F14, 0x14DF3D6F, 0x6F14DF3D, 0x3D6F14DF, }, /* x=99 */
-    { 0xCD267809, 0x09CD2678, 0x7809CD26, 0x267809CD, }, /* x=9A */
-    { 0xC32F7502, 0x02C32F75, 0x7502C32F, 0x2F7502C3, }, /* x=9B */
-    { 0xE9105633, 0x33E91056, 0x5633E910, 0x105633E9, }, /* x=9C */
-    { 0xE7195B38, 0x38E7195B, 0x5B38E719, 0x195B38E7, }, /* x=9D */
-    { 0xF5024C25, 0x25F5024C, 0x4C25F502, 0x024C25F5, }, /* x=9E */
-    { 0xFB0B412E, 0x2EFB0B41, 0x412EFB0B, 0x0B412EFB, }, /* x=9F */
-    { 0x9AD7618C, 0x8C9AD761, 0x618C9AD7, 0xD7618C9A, }, /* x=A0 */
-    { 0x94DE6C87, 0x8794DE6C, 0x6C8794DE, 0xDE6C8794, }, /* x=A1 */
-    { 0x86C57B9A, 0x9A86C57B, 0x7B9A86C5, 0xC57B9A86, }, /* x=A2 */
-    { 0x88CC7691, 0x9188CC76, 0x769188CC, 0xCC769188, }, /* x=A3 */
-    { 0xA2F355A0, 0xA0A2F355, 0x55A0A2F3, 0xF355A0A2, }, /* x=A4 */
-    { 0xACFA58AB, 0xABACFA58, 0x58ABACFA, 0xFA58ABAC, }, /* x=A5 */
-    { 0xBEE14FB6, 0xB6BEE14F, 0x4FB6BEE1, 0xE14FB6BE, }, /* x=A6 */
-    { 0xB0E842BD, 0xBDB0E842, 0x42BDB0E8, 0xE842BDB0, }, /* x=A7 */
-    { 0xEA9F09D4, 0xD4EA9F09, 0x09D4EA9F, 0x9F09D4EA, }, /* x=A8 */
-    { 0xE49604DF, 0xDFE49604, 0x04DFE496, 0x9604DFE4, }, /* x=A9 */
-    { 0xF68D13C2, 0xC2F68D13, 0x13C2F68D, 0x8D13C2F6, }, /* x=AA */
-    { 0xF8841EC9, 0xC9F8841E, 0x1EC9F884, 0x841EC9F8, }, /* x=AB */
-    { 0xD2BB3DF8, 0xF8D2BB3D, 0x3DF8D2BB, 0xBB3DF8D2, }, /* x=AC */
-    { 0xDCB230F3, 0xF3DCB230, 0x30F3DCB2, 0xB230F3DC, }, /* x=AD */
-    { 0xCEA927EE, 0xEECEA927, 0x27EECEA9, 0xA927EECE, }, /* x=AE */
-    { 0xC0A02AE5, 0xE5C0A02A, 0x2AE5C0A0, 0xA02AE5C0, }, /* x=AF */
-    { 0x7A47B13C, 0x3C7A47B1, 0xB13C7A47, 0x47B13C7A, }, /* x=B0 */
-    { 0x744EBC37, 0x37744EBC, 0xBC37744E, 0x4EBC3774, }, /* x=B1 */
-    { 0x6655AB2A, 0x2A6655AB, 0xAB2A6655, 0x55AB2A66, }, /* x=B2 */
-    { 0x685CA621, 0x21685CA6, 0xA621685C, 0x5CA62168, }, /* x=B3 */
-    { 0x42638510, 0x10426385, 0x85104263, 0x63851042, }, /* x=B4 */
-    { 0x4C6A881B, 0x1B4C6A88, 0x881B4C6A, 0x6A881B4C, }, /* x=B5 */
-    { 0x5E719F06, 0x065E719F, 0x9F065E71, 0x719F065E, }, /* x=B6 */
-    { 0x5078920D, 0x0D507892, 0x920D5078, 0x78920D50, }, /* x=B7 */
-    { 0x0A0FD964, 0x640A0FD9, 0xD9640A0F, 0x0FD9640A, }, /* x=B8 */
-    { 0x0406D46F, 0x6F0406D4, 0xD46F0406, 0x06D46F04, }, /* x=B9 */
-    { 0x161DC372, 0x72161DC3, 0xC372161D, 0x1DC37216, }, /* x=BA */
-    { 0x1814CE79, 0x791814CE, 0xCE791814, 0x14CE7918, }, /* x=BB */
-    { 0x322BED48, 0x48322BED, 0xED48322B, 0x2BED4832, }, /* x=BC */
-    { 0x3C22E043, 0x433C22E0, 0xE0433C22, 0x22E0433C, }, /* x=BD */
-    { 0x2E39F75E, 0x5E2E39F7, 0xF75E2E39, 0x39F75E2E, }, /* x=BE */
-    { 0x2030FA55, 0x552030FA, 0xFA552030, 0x30FA5520, }, /* x=BF */
-    { 0xEC9AB701, 0x01EC9AB7, 0xB701EC9A, 0x9AB701EC, }, /* x=C0 */
-    { 0xE293BA0A, 0x0AE293BA, 0xBA0AE293, 0x93BA0AE2, }, /* x=C1 */
-    { 0xF088AD17, 0x17F088AD, 0xAD17F088, 0x88AD17F0, }, /* x=C2 */
-    { 0xFE81A01C, 0x1CFE81A0, 0xA01CFE81, 0x81A01CFE, }, /* x=C3 */
-    { 0xD4BE832D, 0x2DD4BE83, 0x832DD4BE, 0xBE832DD4, }, /* x=C4 */
-    { 0xDAB78E26, 0x26DAB78E, 0x8E26DAB7, 0xB78E26DA, }, /* x=C5 */
-    { 0xC8AC993B, 0x3BC8AC99, 0x993BC8AC, 0xAC993BC8, }, /* x=C6 */
-    { 0xC6A59430, 0x30C6A594, 0x9430C6A5, 0xA59430C6, }, /* x=C7 */
-    { 0x9CD2DF59, 0x599CD2DF, 0xDF599CD2, 0xD2DF599C, }, /* x=C8 */
-    { 0x92DBD252, 0x5292DBD2, 0xD25292DB, 0xDBD25292, }, /* x=C9 */
-    { 0x80C0C54F, 0x4F80C0C5, 0xC54F80C0, 0xC0C54F80, }, /* x=CA */
-    { 0x8EC9C844, 0x448EC9C8, 0xC8448EC9, 0xC9C8448E, }, /* x=CB */
-    { 0xA4F6EB75, 0x75A4F6EB, 0xEB75A4F6, 0xF6EB75A4, }, /* x=CC */
-    { 0xAAFFE67E, 0x7EAAFFE6, 0xE67EAAFF, 0xFFE67EAA, }, /* x=CD */
-    { 0xB8E4F163, 0x63B8E4F1, 0xF163B8E4, 0xE4F163B8, }, /* x=CE */
-    { 0xB6EDFC68, 0x68B6EDFC, 0xFC68B6ED, 0xEDFC68B6, }, /* x=CF */
-    { 0x0C0A67B1, 0xB10C0A67, 0x67B10C0A, 0x0A67B10C, }, /* x=D0 */
-    { 0x02036ABA, 0xBA02036A, 0x6ABA0203, 0x036ABA02, }, /* x=D1 */
-    { 0x10187DA7, 0xA710187D, 0x7DA71018, 0x187DA710, }, /* x=D2 */
-    { 0x1E1170AC, 0xAC1E1170, 0x70AC1E11, 0x1170AC1E, }, /* x=D3 */
-    { 0x342E539D, 0x9D342E53, 0x539D342E, 0x2E539D34, }, /* x=D4 */
-    { 0x3A275E96, 0x963A275E, 0x5E963A27, 0x275E963A, }, /* x=D5 */
-    { 0x283C498B, 0x8B283C49, 0x498B283C, 0x3C498B28, }, /* x=D6 */
-    { 0x26354480, 0x80263544, 0x44802635, 0x35448026, }, /* x=D7 */
-    { 0x7C420FE9, 0xE97C420F, 0x0FE97C42, 0x420FE97C, }, /* x=D8 */
-    { 0x724B02E2, 0xE2724B02, 0x02E2724B, 0x4B02E272, }, /* x=D9 */
-    { 0x605015FF, 0xFF605015, 0x15FF6050, 0x5015FF60, }, /* x=DA */
-    { 0x6E5918F4, 0xF46E5918, 0x18F46E59, 0x5918F46E, }, /* x=DB */
-    { 0x44663BC5, 0xC544663B, 0x3BC54466, 0x663BC544, }, /* x=DC */
-    { 0x4A6F36CE, 0xCE4A6F36, 0x36CE4A6F, 0x6F36CE4A, }, /* x=DD */
-    { 0x587421D3, 0xD3587421, 0x21D35874, 0x7421D358, }, /* x=DE */
-    { 0x567D2CD8, 0xD8567D2C, 0x2CD8567D, 0x7D2CD856, }, /* x=DF */
-    { 0x37A10C7A, 0x7A37A10C, 0x0C7A37A1, 0xA10C7A37, }, /* x=E0 */
-    { 0x39A80171, 0x7139A801, 0x017139A8, 0xA8017139, }, /* x=E1 */
-    { 0x2BB3166C, 0x6C2BB316, 0x166C2BB3, 0xB3166C2B, }, /* x=E2 */
-    { 0x25BA1B67, 0x6725BA1B, 0x1B6725BA, 0xBA1B6725, }, /* x=E3 */
-    { 0x0F853856, 0x560F8538, 0x38560F85, 0x8538560F, }, /* x=E4 */
-    { 0x018C355D, 0x5D018C35, 0x355D018C, 0x8C355D01, }, /* x=E5 */
-    { 0x13972240, 0x40139722, 0x22401397, 0x97224013, }, /* x=E6 */
-    { 0x1D9E2F4B, 0x4B1D9E2F, 0x2F4B1D9E, 0x9E2F4B1D, }, /* x=E7 */
-    { 0x47E96422, 0x2247E964, 0x642247E9, 0xE9642247, }, /* x=E8 */
-    { 0x49E06929, 0x2949E069, 0x692949E0, 0xE0692949, }, /* x=E9 */
-    { 0x5BFB7E34, 0x345BFB7E, 0x7E345BFB, 0xFB7E345B, }, /* x=EA */
-    { 0x55F2733F, 0x3F55F273, 0x733F55F2, 0xF2733F55, }, /* x=EB */
-    { 0x7FCD500E, 0x0E7FCD50, 0x500E7FCD, 0xCD500E7F, }, /* x=EC */
-    { 0x71C45D05, 0x0571C45D, 0x5D0571C4, 0xC45D0571, }, /* x=ED */
-    { 0x63DF4A18, 0x1863DF4A, 0x4A1863DF, 0xDF4A1863, }, /* x=EE */
-    { 0x6DD64713, 0x136DD647, 0x47136DD6, 0xD647136D, }, /* x=EF */
-    { 0xD731DCCA, 0xCAD731DC, 0xDCCAD731, 0x31DCCAD7, }, /* x=F0 */
-    { 0xD938D1C1, 0xC1D938D1, 0xD1C1D938, 0x38D1C1D9, }, /* x=F1 */
-    { 0xCB23C6DC, 0xDCCB23C6, 0xC6DCCB23, 0x23C6DCCB, }, /* x=F2 */
-    { 0xC52ACBD7, 0xD7C52ACB, 0xCBD7C52A, 0x2ACBD7C5, }, /* x=F3 */
-    { 0xEF15E8E6, 0xE6EF15E8, 0xE8E6EF15, 0x15E8E6EF, }, /* x=F4 */
-    { 0xE11CE5ED, 0xEDE11CE5, 0xE5EDE11C, 0x1CE5EDE1, }, /* x=F5 */
-    { 0xF307F2F0, 0xF0F307F2, 0xF2F0F307, 0x07F2F0F3, }, /* x=F6 */
-    { 0xFD0EFFFB, 0xFBFD0EFF, 0xFFFBFD0E, 0x0EFFFBFD, }, /* x=F7 */
-    { 0xA779B492, 0x92A779B4, 0xB492A779, 0x79B492A7, }, /* x=F8 */
-    { 0xA970B999, 0x99A970B9, 0xB999A970, 0x70B999A9, }, /* x=F9 */
-    { 0xBB6BAE84, 0x84BB6BAE, 0xAE84BB6B, 0x6BAE84BB, }, /* x=FA */
-    { 0xB562A38F, 0x8FB562A3, 0xA38FB562, 0x62A38FB5, }, /* x=FB */
-    { 0x9F5D80BE, 0xBE9F5D80, 0x80BE9F5D, 0x5D80BE9F, }, /* x=FC */
-    { 0x91548DB5, 0xB591548D, 0x8DB59154, 0x548DB591, }, /* x=FD */
-    { 0x834F9AA8, 0xA8834F9A, 0x9AA8834F, 0x4F9AA883, }, /* x=FE */
-    { 0x8D4697A3, 0xA38D4697, 0x97A38D46, 0x4697A38D, }, /* x=FF */
-};
-
-
 
 /*
 AES_Te0[x] = S [x].[02, 01, 01, 03];
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 35/38] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (33 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 34/38] crypto: Remove AES_imc Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-19 13:18   ` Philippe Mathieu-Daudé
  2023-06-09  2:23 ` [PATCH v2 36/38] host/include/i386: Implement aes-round.h Richard Henderson
                   ` (2 subsequent siblings)
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

These arrays are no longer used outside of aes.c.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/crypto/aes.h | 25 -------------------------
 crypto/aes.c         | 33 +++++++++++++++++++++------------
 2 files changed, 21 insertions(+), 37 deletions(-)

diff --git a/include/crypto/aes.h b/include/crypto/aes.h
index 99209f51b9..709d4d226b 100644
--- a/include/crypto/aes.h
+++ b/include/crypto/aes.h
@@ -30,29 +30,4 @@ void AES_decrypt(const unsigned char *in, unsigned char *out,
 extern const uint8_t AES_sbox[256];
 extern const uint8_t AES_isbox[256];
 
-/* AES MixColumns, for use with rot32. */
-extern const uint32_t AES_mc_rot[256];
-
-/* AES InvMixColumns, for use with rot32. */
-extern const uint32_t AES_imc_rot[256];
-
-/*
-AES_Te0[x] = S [x].[02, 01, 01, 03];
-AES_Te1[x] = S [x].[03, 02, 01, 01];
-AES_Te2[x] = S [x].[01, 03, 02, 01];
-AES_Te3[x] = S [x].[01, 01, 03, 02];
-AES_Te4[x] = S [x].[01, 01, 01, 01];
-
-AES_Td0[x] = Si[x].[0e, 09, 0d, 0b];
-AES_Td1[x] = Si[x].[0b, 0e, 09, 0d];
-AES_Td2[x] = Si[x].[0d, 0b, 0e, 09];
-AES_Td3[x] = Si[x].[09, 0d, 0b, 0e];
-AES_Td4[x] = Si[x].[01, 01, 01, 01];
-*/
-
-extern const uint32_t AES_Te0[256], AES_Te1[256], AES_Te2[256],
-                      AES_Te3[256], AES_Te4[256];
-extern const uint32_t AES_Td0[256], AES_Td1[256], AES_Td2[256],
-                      AES_Td3[256], AES_Td4[256];
-
 #endif
diff --git a/crypto/aes.c b/crypto/aes.c
index a18267b9f8..30ed2303db 100644
--- a/crypto/aes.c
+++ b/crypto/aes.c
@@ -154,7 +154,7 @@ enum {
 /*
  * MixColumns lookup table, for use with rot32.
  */
-const uint32_t AES_mc_rot[256] = {
+static const uint32_t AES_mc_rot[256] = {
     0x00000000, 0x03010102, 0x06020204, 0x05030306,
     0x0c040408, 0x0f05050a, 0x0a06060c, 0x0907070e,
     0x18080810, 0x1b090912, 0x1e0a0a14, 0x1d0b0b16,
@@ -224,7 +224,7 @@ const uint32_t AES_mc_rot[256] = {
 /*
  * Inverse MixColumns lookup table, for use with rot32.
  */
-const uint32_t AES_imc_rot[256] = {
+static const uint32_t AES_imc_rot[256] = {
     0x00000000, 0x0b0d090e, 0x161a121c, 0x1d171b12,
     0x2c342438, 0x27392d36, 0x3a2e3624, 0x31233f2a,
     0x58684870, 0x5365417e, 0x4e725a6c, 0x457f5362,
@@ -306,7 +306,7 @@ AES_Td3[x] = Si[x].[09, 0d, 0b, 0e];
 AES_Td4[x] = Si[x].[01, 01, 01, 01];
 */
 
-const uint32_t AES_Te0[256] = {
+static const uint32_t AES_Te0[256] = {
     0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU,
     0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U,
     0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU,
@@ -372,7 +372,8 @@ const uint32_t AES_Te0[256] = {
     0x824141c3U, 0x299999b0U, 0x5a2d2d77U, 0x1e0f0f11U,
     0x7bb0b0cbU, 0xa85454fcU, 0x6dbbbbd6U, 0x2c16163aU,
 };
-const uint32_t AES_Te1[256] = {
+
+static const uint32_t AES_Te1[256] = {
     0xa5c66363U, 0x84f87c7cU, 0x99ee7777U, 0x8df67b7bU,
     0x0dfff2f2U, 0xbdd66b6bU, 0xb1de6f6fU, 0x5491c5c5U,
     0x50603030U, 0x03020101U, 0xa9ce6767U, 0x7d562b2bU,
@@ -438,7 +439,8 @@ const uint32_t AES_Te1[256] = {
     0xc3824141U, 0xb0299999U, 0x775a2d2dU, 0x111e0f0fU,
     0xcb7bb0b0U, 0xfca85454U, 0xd66dbbbbU, 0x3a2c1616U,
 };
-const uint32_t AES_Te2[256] = {
+
+static const uint32_t AES_Te2[256] = {
     0x63a5c663U, 0x7c84f87cU, 0x7799ee77U, 0x7b8df67bU,
     0xf20dfff2U, 0x6bbdd66bU, 0x6fb1de6fU, 0xc55491c5U,
     0x30506030U, 0x01030201U, 0x67a9ce67U, 0x2b7d562bU,
@@ -504,8 +506,8 @@ const uint32_t AES_Te2[256] = {
     0x41c38241U, 0x99b02999U, 0x2d775a2dU, 0x0f111e0fU,
     0xb0cb7bb0U, 0x54fca854U, 0xbbd66dbbU, 0x163a2c16U,
 };
-const uint32_t AES_Te3[256] = {
 
+static const uint32_t AES_Te3[256] = {
     0x6363a5c6U, 0x7c7c84f8U, 0x777799eeU, 0x7b7b8df6U,
     0xf2f20dffU, 0x6b6bbdd6U, 0x6f6fb1deU, 0xc5c55491U,
     0x30305060U, 0x01010302U, 0x6767a9ceU, 0x2b2b7d56U,
@@ -571,7 +573,8 @@ const uint32_t AES_Te3[256] = {
     0x4141c382U, 0x9999b029U, 0x2d2d775aU, 0x0f0f111eU,
     0xb0b0cb7bU, 0x5454fca8U, 0xbbbbd66dU, 0x16163a2cU,
 };
-const uint32_t AES_Te4[256] = {
+
+static const uint32_t AES_Te4[256] = {
     0x63636363U, 0x7c7c7c7cU, 0x77777777U, 0x7b7b7b7bU,
     0xf2f2f2f2U, 0x6b6b6b6bU, 0x6f6f6f6fU, 0xc5c5c5c5U,
     0x30303030U, 0x01010101U, 0x67676767U, 0x2b2b2b2bU,
@@ -637,7 +640,8 @@ const uint32_t AES_Te4[256] = {
     0x41414141U, 0x99999999U, 0x2d2d2d2dU, 0x0f0f0f0fU,
     0xb0b0b0b0U, 0x54545454U, 0xbbbbbbbbU, 0x16161616U,
 };
-const uint32_t AES_Td0[256] = {
+
+static const uint32_t AES_Td0[256] = {
     0x51f4a750U, 0x7e416553U, 0x1a17a4c3U, 0x3a275e96U,
     0x3bab6bcbU, 0x1f9d45f1U, 0xacfa58abU, 0x4be30393U,
     0x2030fa55U, 0xad766df6U, 0x88cc7691U, 0xf5024c25U,
@@ -703,7 +707,8 @@ const uint32_t AES_Td0[256] = {
     0x39a80171U, 0x080cb3deU, 0xd8b4e49cU, 0x6456c190U,
     0x7bcb8461U, 0xd532b670U, 0x486c5c74U, 0xd0b85742U,
 };
-const uint32_t AES_Td1[256] = {
+
+static const uint32_t AES_Td1[256] = {
     0x5051f4a7U, 0x537e4165U, 0xc31a17a4U, 0x963a275eU,
     0xcb3bab6bU, 0xf11f9d45U, 0xabacfa58U, 0x934be303U,
     0x552030faU, 0xf6ad766dU, 0x9188cc76U, 0x25f5024cU,
@@ -769,7 +774,8 @@ const uint32_t AES_Td1[256] = {
     0x7139a801U, 0xde080cb3U, 0x9cd8b4e4U, 0x906456c1U,
     0x617bcb84U, 0x70d532b6U, 0x74486c5cU, 0x42d0b857U,
 };
-const uint32_t AES_Td2[256] = {
+
+static const uint32_t AES_Td2[256] = {
     0xa75051f4U, 0x65537e41U, 0xa4c31a17U, 0x5e963a27U,
     0x6bcb3babU, 0x45f11f9dU, 0x58abacfaU, 0x03934be3U,
     0xfa552030U, 0x6df6ad76U, 0x769188ccU, 0x4c25f502U,
@@ -836,7 +842,8 @@ const uint32_t AES_Td2[256] = {
     0x017139a8U, 0xb3de080cU, 0xe49cd8b4U, 0xc1906456U,
     0x84617bcbU, 0xb670d532U, 0x5c74486cU, 0x5742d0b8U,
 };
-const uint32_t AES_Td3[256] = {
+
+static const uint32_t AES_Td3[256] = {
     0xf4a75051U, 0x4165537eU, 0x17a4c31aU, 0x275e963aU,
     0xab6bcb3bU, 0x9d45f11fU, 0xfa58abacU, 0xe303934bU,
     0x30fa5520U, 0x766df6adU, 0xcc769188U, 0x024c25f5U,
@@ -902,7 +909,8 @@ const uint32_t AES_Td3[256] = {
     0xa8017139U, 0x0cb3de08U, 0xb4e49cd8U, 0x56c19064U,
     0xcb84617bU, 0x32b670d5U, 0x6c5c7448U, 0xb85742d0U,
 };
-const uint32_t AES_Td4[256] = {
+
+static const uint32_t AES_Td4[256] = {
     0x52525252U, 0x09090909U, 0x6a6a6a6aU, 0xd5d5d5d5U,
     0x30303030U, 0x36363636U, 0xa5a5a5a5U, 0x38383838U,
     0xbfbfbfbfU, 0x40404040U, 0xa3a3a3a3U, 0x9e9e9e9eU,
@@ -968,6 +976,7 @@ const uint32_t AES_Td4[256] = {
     0xe1e1e1e1U, 0x69696969U, 0x14141414U, 0x63636363U,
     0x55555555U, 0x21212121U, 0x0c0c0c0cU, 0x7d7d7d7dU,
 };
+
 static const u32 rcon[] = {
         0x01000000, 0x02000000, 0x04000000, 0x08000000,
         0x10000000, 0x20000000, 0x40000000, 0x80000000,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 36/38] host/include/i386: Implement aes-round.h
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (34 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 35/38] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN Richard Henderson
@ 2023-06-09  2:23 ` Richard Henderson
  2023-06-09  2:24 ` [PATCH v2 37/38] host/include/aarch64: " Richard Henderson
  2023-06-09  2:24 ` [PATCH v2 38/38] host/include/ppc: " Richard Henderson
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Detect AES in cpuinfo; implement the accel hooks.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/i386/host/aes-round.h   | 152 +++++++++++++++++++++++++++
 host/include/i386/host/cpuinfo.h     |   1 +
 host/include/x86_64/host/aes-round.h |   1 +
 util/cpuinfo-i386.c                  |   3 +
 4 files changed, 157 insertions(+)
 create mode 100644 host/include/i386/host/aes-round.h
 create mode 100644 host/include/x86_64/host/aes-round.h

diff --git a/host/include/i386/host/aes-round.h b/host/include/i386/host/aes-round.h
new file mode 100644
index 0000000000..f9abb7c352
--- /dev/null
+++ b/host/include/i386/host/aes-round.h
@@ -0,0 +1,152 @@
+/*
+ * x86 specific aes acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HOST_AES_ROUND_H
+#define HOST_AES_ROUND_H
+
+#include "host/cpuinfo.h"
+#include <immintrin.h>
+
+#if defined(__AES__) && defined(__SSSE3__)
+# define HAVE_AES_ACCEL  true
+# define ATTR_AES_ACCEL
+#else
+# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_AES)
+# define ATTR_AES_ACCEL  __attribute__((target("aes,ssse3")))
+#endif
+
+static inline __m128i ATTR_AES_ACCEL
+aes_accel_bswap(__m128i x)
+{
+    return _mm_shuffle_epi8(x, _mm_set_epi8(0, 1, 2, 3, 4, 5, 6, 7, 8,
+                                            9, 10, 11, 12, 13, 14, 15));
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_MC_accel(AESState *ret, const AESState *st, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i z = _mm_setzero_si128();
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = _mm_aesdeclast_si128(t, z);
+        t = _mm_aesenc_si128(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesdeclast_si128(t, z);
+        t = _mm_aesenc_si128(t, z);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_AK_accel(AESState *ret, const AESState *st,
+                      const AESState *rk, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i k = (__m128i)rk->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = _mm_aesenclast_si128(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesenclast_si128(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_MC_AK_accel(AESState *ret, const AESState *st,
+                         const AESState *rk, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i k = (__m128i)rk->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = _mm_aesenc_si128(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesenc_si128(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_IMC_accel(AESState *ret, const AESState *st, bool be)
+{
+    __m128i t = (__m128i)st->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = _mm_aesimc_si128(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesimc_si128(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_AK_accel(AESState *ret, const AESState *st,
+                        const AESState *rk, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i k = (__m128i)rk->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = _mm_aesdeclast_si128(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesdeclast_si128(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_AK_IMC_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i k = (__m128i)rk->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = _mm_aesdeclast_si128(t, k);
+        t = _mm_aesimc_si128(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesdeclast_si128(t, k);
+        t = _mm_aesimc_si128(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_IMC_AK_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    __m128i t = (__m128i)st->v;
+    __m128i k = (__m128i)rk->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = _mm_aesdec_si128(t, k);
+        t = aes_accel_bswap(t);
+    } else {
+        t = _mm_aesdec_si128(t, k);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+#endif
diff --git a/host/include/i386/host/cpuinfo.h b/host/include/i386/host/cpuinfo.h
index a6537123cf..073d0a426f 100644
--- a/host/include/i386/host/cpuinfo.h
+++ b/host/include/i386/host/cpuinfo.h
@@ -26,6 +26,7 @@
 #define CPUINFO_AVX512VBMI2     (1u << 15)
 #define CPUINFO_ATOMIC_VMOVDQA  (1u << 16)
 #define CPUINFO_ATOMIC_VMOVDQU  (1u << 17)
+#define CPUINFO_AES             (1u << 18)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/host/include/x86_64/host/aes-round.h b/host/include/x86_64/host/aes-round.h
new file mode 100644
index 0000000000..7da13f5424
--- /dev/null
+++ b/host/include/x86_64/host/aes-round.h
@@ -0,0 +1 @@
+#include "host/include/i386/host/aes-round.h"
diff --git a/util/cpuinfo-i386.c b/util/cpuinfo-i386.c
index ab6143d9e7..3a7b7e0ad1 100644
--- a/util/cpuinfo-i386.c
+++ b/util/cpuinfo-i386.c
@@ -40,6 +40,9 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
         info |= (c & bit_MOVBE ? CPUINFO_MOVBE : 0);
         info |= (c & bit_POPCNT ? CPUINFO_POPCNT : 0);
 
+        /* Our AES support requires PSHUFB as well. */
+        info |= ((c & bit_AES) && (c & bit_SSSE3) ? CPUINFO_AES : 0);
+
         /* For AVX features, we must check available and usable. */
         if ((c & bit_AVX) && (c & bit_OSXSAVE)) {
             unsigned bv = xgetbv_low(0);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 37/38] host/include/aarch64: Implement aes-round.h
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (35 preceding siblings ...)
  2023-06-09  2:23 ` [PATCH v2 36/38] host/include/i386: Implement aes-round.h Richard Henderson
@ 2023-06-09  2:24 ` Richard Henderson
  2023-06-09  2:24 ` [PATCH v2 38/38] host/include/ppc: " Richard Henderson
  37 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Detect AES in cpuinfo; implement the accel hooks.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 meson.build                           |   9 ++
 host/include/aarch64/host/aes-round.h | 205 ++++++++++++++++++++++++++
 host/include/aarch64/host/cpuinfo.h   |   1 +
 util/cpuinfo-aarch64.c                |   2 +
 4 files changed, 217 insertions(+)
 create mode 100644 host/include/aarch64/host/aes-round.h

diff --git a/meson.build b/meson.build
index 34306a6205..d622e54bef 100644
--- a/meson.build
+++ b/meson.build
@@ -2665,6 +2665,15 @@ config_host_data.set('CONFIG_AVX512BW_OPT', get_option('avx512bw') \
     int main(int argc, char *argv[]) { return bar(argv[0]); }
   '''), error_message: 'AVX512BW not available').allowed())
 
+# For both AArch64 and AArch32, detect if builtins are available.
+config_host_data.set('CONFIG_ARM_AES_BUILTIN', cc.compiles('''
+    #include <arm_neon.h>
+    #ifndef __ARM_FEATURE_AES
+    __attribute__((target("+crypto")))
+    #endif
+    void foo(uint8x16_t *p) { *p = vaesmcq_u8(*p); }
+  '''))
+
 have_pvrdma = get_option('pvrdma') \
   .require(rdma.found(), error_message: 'PVRDMA requires OpenFabrics libraries') \
   .require(cc.compiles(gnu_source_prefix + '''
diff --git a/host/include/aarch64/host/aes-round.h b/host/include/aarch64/host/aes-round.h
new file mode 100644
index 0000000000..6c126c3e89
--- /dev/null
+++ b/host/include/aarch64/host/aes-round.h
@@ -0,0 +1,205 @@
+/*
+ * AArch64 specific aes acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HOST_AES_ROUND_H
+#define HOST_AES_ROUND_H
+
+#include "host/cpuinfo.h"
+#include <arm_neon.h>
+
+#ifdef __ARM_FEATURE_AES
+# define HAVE_AES_ACCEL  true
+#else
+# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_AES)
+#endif
+#if !defined(__ARM_FEATURE_AES) && defined(CONFIG_ARM_AES_BUILTIN)
+# define ATTR_AES_ACCEL  __attribute__((target("+crypto")))
+#else
+# define ATTR_AES_ACCEL
+#endif
+
+static inline uint8x16_t aes_accel_bswap(uint8x16_t x)
+{
+    return vqtbl1q_u8(x, (uint8x16_t){ 15, 14, 13, 12, 11, 10, 9, 8,
+                                        7,  6,  5,  4,  3,  2, 1, 0, });
+}
+
+#ifdef CONFIG_ARM_AES_BUILTIN
+# define aes_accel_aesd            vaesdq_u8
+# define aes_accel_aese            vaeseq_u8
+# define aes_accel_aesmc           vaesmcq_u8
+# define aes_accel_aesimc          vaesimcq_u8
+# define aes_accel_aesd_imc(S, K)  vaesimcq_u8(vaesdq_u8(S, K))
+# define aes_accel_aese_mc(S, K)   vaesmcq_u8(vaeseq_u8(S, K))
+#else
+static inline uint8x16_t aes_accel_aesd(uint8x16_t d, uint8x16_t k)
+{
+    asm(".arch_extension aes\n\t"
+        "aesd %0.16b, %1.16b" : "+w"(d) : "w"(k));
+    return d;
+}
+
+static inline uint8x16_t aes_accel_aese(uint8x16_t d, uint8x16_t k)
+{
+    asm(".arch_extension aes\n\t"
+        "aese %0.16b, %1.16b" : "+w"(d) : "w"(k));
+    return d;
+}
+
+static inline uint8x16_t aes_accel_aesmc(uint8x16_t d)
+{
+    asm(".arch_extension aes\n\t"
+        "aesmc %0.16b, %1.16b" : "=w"(d) : "w"(d));
+    return d;
+}
+
+static inline uint8x16_t aes_accel_aesimc(uint8x16_t d)
+{
+    asm(".arch_extension aes\n\t"
+        "aesimc %0.16b, %1.16b" : "=w"(d) : "w"(d));
+    return d;
+}
+
+/* Most CPUs fuse AESD+AESIMC in the execution pipeline. */
+static inline uint8x16_t aes_accel_aesd_imc(uint8x16_t d, uint8x16_t k)
+{
+    asm(".arch_extension aes\n\t"
+        "aesd %0.16b, %1.16b\n\t"
+        "aesimc %0.16b, %0.16b" : "+w"(d) : "w"(k));
+    return d;
+}
+
+/* Most CPUs fuse AESE+AESMC in the execution pipeline. */
+static inline uint8x16_t aes_accel_aese_mc(uint8x16_t d, uint8x16_t k)
+{
+    asm(".arch_extension aes\n\t"
+        "aese %0.16b, %1.16b\n\t"
+        "aesmc %0.16b, %0.16b" : "+w"(d) : "w"(k));
+    return d;
+}
+#endif /* CONFIG_ARM_AES_BUILTIN */
+
+static inline void ATTR_AES_ACCEL
+aesenc_MC_accel(AESState *ret, const AESState *st, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aesmc(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesmc(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_AK_accel(AESState *ret, const AESState *st,
+                      const AESState *rk, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aese(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aese(t, z);
+    }
+    ret->v = (AESStateVec)t ^ rk->v;
+}
+
+static inline void ATTR_AES_ACCEL
+aesenc_SB_SR_MC_AK_accel(AESState *ret, const AESState *st,
+                         const AESState *rk, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aese_mc(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aese_mc(t, z);
+    }
+    ret->v = (AESStateVec)t ^ rk->v;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_IMC_accel(AESState *ret, const AESState *st, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aesimc(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesimc(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_AK_accel(AESState *ret, const AESState *st,
+                        const AESState *rk, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aesd(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesd(t, z);
+    }
+    ret->v = (AESStateVec)t ^ rk->v;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_AK_IMC_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t k = (uint8x16_t)rk->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        k = aes_accel_bswap(k);
+        t = aes_accel_aesd(t, z);
+        t ^= k;
+        t = aes_accel_aesimc(t);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesd(t, z);
+        t ^= k;
+        t = aes_accel_aesimc(t);
+    }
+    ret->v = (AESStateVec)t;
+}
+
+static inline void ATTR_AES_ACCEL
+aesdec_ISB_ISR_IMC_AK_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    uint8x16_t t = (uint8x16_t)st->v;
+    uint8x16_t z = { };
+
+    if (be) {
+        t = aes_accel_bswap(t);
+        t = aes_accel_aesd_imc(t, z);
+        t = aes_accel_bswap(t);
+    } else {
+        t = aes_accel_aesd_imc(t, z);
+    }
+    ret->v = (AESStateVec)t ^ rk->v;
+}
+
+#endif
diff --git a/host/include/aarch64/host/cpuinfo.h b/host/include/aarch64/host/cpuinfo.h
index 82227890b4..05feeb4f43 100644
--- a/host/include/aarch64/host/cpuinfo.h
+++ b/host/include/aarch64/host/cpuinfo.h
@@ -9,6 +9,7 @@
 #define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
 #define CPUINFO_LSE             (1u << 1)
 #define CPUINFO_LSE2            (1u << 2)
+#define CPUINFO_AES             (1u << 3)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/util/cpuinfo-aarch64.c b/util/cpuinfo-aarch64.c
index f99acb7884..ababc39550 100644
--- a/util/cpuinfo-aarch64.c
+++ b/util/cpuinfo-aarch64.c
@@ -56,10 +56,12 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
     unsigned long hwcap = qemu_getauxval(AT_HWCAP);
     info |= (hwcap & HWCAP_ATOMICS ? CPUINFO_LSE : 0);
     info |= (hwcap & HWCAP_USCAT ? CPUINFO_LSE2 : 0);
+    info |= (hwcap & HWCAP_AES ? CPUINFO_AES: 0);
 #endif
 #ifdef CONFIG_DARWIN
     info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE") * CPUINFO_LSE;
     info |= sysctl_for_bool("hw.optional.arm.FEAT_LSE2") * CPUINFO_LSE2;
+    info |= sysctl_for_bool("hw.optional.arm.FEAT_AES") * CPUINFO_AES;
 #endif
 
     cpuinfo = info;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 38/38] host/include/ppc: Implement aes-round.h
  2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
                   ` (36 preceding siblings ...)
  2023-06-09  2:24 ` [PATCH v2 37/38] host/include/aarch64: " Richard Henderson
@ 2023-06-09  2:24 ` Richard Henderson
  2023-06-12 13:30   ` Daniel Henrique Barboza
  37 siblings, 1 reply; 67+ messages in thread
From: Richard Henderson @ 2023-06-09  2:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

Detect CRYPTO in cpuinfo; implement the accel hooks.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/ppc/host/aes-round.h   | 181 ++++++++++++++++++++++++++++
 host/include/ppc/host/cpuinfo.h     |   1 +
 host/include/ppc64/host/aes-round.h |   1 +
 util/cpuinfo-ppc.c                  |   8 ++
 4 files changed, 191 insertions(+)
 create mode 100644 host/include/ppc/host/aes-round.h
 create mode 100644 host/include/ppc64/host/aes-round.h

diff --git a/host/include/ppc/host/aes-round.h b/host/include/ppc/host/aes-round.h
new file mode 100644
index 0000000000..9b5a15d1e5
--- /dev/null
+++ b/host/include/ppc/host/aes-round.h
@@ -0,0 +1,181 @@
+/*
+ * Power v2.07 specific aes acceleration.
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef PPC_HOST_AES_ROUND_H
+#define PPC_HOST_AES_ROUND_H
+
+#ifndef __ALTIVEC__
+/* Without ALTIVEC, we can't even write inline assembly. */
+#include "host/include/generic/host/aes-round.h"
+#else
+#include "host/cpuinfo.h"
+
+#ifdef __CRYPTO__
+# define HAVE_AES_ACCEL  true
+#else
+# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_CRYPTO)
+#endif
+#define ATTR_AES_ACCEL
+
+/*
+ * While there is <altivec.h>, both gcc and clang "aid" with the
+ * endianness issues in different ways. Just use inline asm instead.
+ */
+
+/* Bytes in memory are host-endian; bytes in register are @be. */
+static inline AESStateVec aes_accel_ld(const AESState *p, bool be)
+{
+    AESStateVec r;
+
+    if (be) {
+        asm("lvx %0, 0, %1" : "=v"(r) : "r"(p), "m"(*p));
+    } else if (HOST_BIG_ENDIAN) {
+        AESStateVec rev = {
+            15, 14, 13, 12, 11, 10, 9, 8, 7,  6,  5,  4,  3,  2,  1,  0,
+        };
+        asm("lvx %0, 0, %1\n\t"
+            "vperm %0, %0, %0, %2"
+            : "=v"(r) : "r"(p), "v"(rev), "m"(*p));
+    } else {
+#ifdef __POWER9_VECTOR__
+        asm("lxvb16x %x0, 0, %1" : "=v"(r) : "r"(p), "m"(*p));
+#else
+        asm("lxvd2x %x0, 0, %1\n\t"
+            "xxpermdi %x0, %x0, %x0, 2"
+            : "=v"(r) : "r"(p), "m"(*p));
+#endif
+    }
+    return r;
+}
+
+static void aes_accel_st(AESState *p, AESStateVec r, bool be)
+{
+    if (be) {
+        asm("stvx %1, 0, %2" : "=m"(*p) : "v"(r), "r"(p));
+    } else if (HOST_BIG_ENDIAN) {
+        AESStateVec rev = {
+            15, 14, 13, 12, 11, 10, 9, 8, 7,  6,  5,  4,  3,  2,  1,  0,
+        };
+        asm("vperm %1, %1, %1, %2\n\t"
+            "stvx %1, 0, %3"
+            : "=m"(*p), "+v"(r) : "v"(rev), "r"(p));
+    } else {
+#ifdef __POWER9_VECTOR__
+        asm("stxvb16x %x1, 0, %2" : "=m"(*p) : "v"(r), "r"(p));
+#else
+        asm("xxpermdi %x1, %x1, %x1, 2\n\t"
+            "stxvd2x %x1, 0, %2"
+            : "=m"(*p), "+v"(r) : "r"(p));
+#endif
+    }
+}
+
+static inline AESStateVec aes_accel_vcipher(AESStateVec d, AESStateVec k)
+{
+    asm("vcipher %0, %0, %1" : "+v"(d) : "v"(k));
+    return d;
+}
+
+static inline AESStateVec aes_accel_vncipher(AESStateVec d, AESStateVec k)
+{
+    asm("vncipher %0, %0, %1" : "+v"(d) : "v"(k));
+    return d;
+}
+
+static inline AESStateVec aes_accel_vcipherlast(AESStateVec d, AESStateVec k)
+{
+    asm("vcipherlast %0, %0, %1" : "+v"(d) : "v"(k));
+    return d;
+}
+
+static inline AESStateVec aes_accel_vncipherlast(AESStateVec d, AESStateVec k)
+{
+    asm("vncipherlast %0, %0, %1" : "+v"(d) : "v"(k));
+    return d;
+}
+
+static inline void
+aesenc_MC_accel(AESState *ret, const AESState *st, bool be)
+{
+    AESStateVec t, z = { };
+
+    t = aes_accel_ld(st, be);
+    t = aes_accel_vncipherlast(t, z);
+    t = aes_accel_vcipher(t, z);
+    aes_accel_st(ret, t, be);
+}
+
+static inline void
+aesenc_SB_SR_AK_accel(AESState *ret, const AESState *st,
+                      const AESState *rk, bool be)
+{
+    AESStateVec t, k;
+
+    t = aes_accel_ld(st, be);
+    k = aes_accel_ld(rk, be);
+    t = aes_accel_vcipherlast(t, k);
+    aes_accel_st(ret, t, be);
+}
+
+static inline void
+aesenc_SB_SR_MC_AK_accel(AESState *ret, const AESState *st,
+                         const AESState *rk, bool be)
+{
+    AESStateVec t, k;
+
+    t = aes_accel_ld(st, be);
+    k = aes_accel_ld(rk, be);
+    t = aes_accel_vcipher(t, k);
+    aes_accel_st(ret, t, be);
+}
+
+static inline void
+aesdec_IMC_accel(AESState *ret, const AESState *st, bool be)
+{
+    AESStateVec t, z = { };
+
+    t = aes_accel_ld(st, be);
+    t = aes_accel_vcipherlast(t, z);
+    t = aes_accel_vncipher(t, z);
+    aes_accel_st(ret, t, be);
+}
+
+static inline void
+aesdec_ISB_ISR_AK_accel(AESState *ret, const AESState *st,
+                        const AESState *rk, bool be)
+{
+    AESStateVec t, k;
+
+    t = aes_accel_ld(st, be);
+    k = aes_accel_ld(rk, be);
+    t = aes_accel_vncipherlast(t, k);
+    aes_accel_st(ret, t, be);
+}
+
+static inline void
+aesdec_ISB_ISR_AK_IMC_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    AESStateVec t, k;
+
+    t = aes_accel_ld(st, be);
+    k = aes_accel_ld(rk, be);
+    t = aes_accel_vncipher(t, k);
+    aes_accel_st(ret, t, be);
+}
+
+static inline void
+aesdec_ISB_ISR_IMC_AK_accel(AESState *ret, const AESState *st,
+                            const AESState *rk, bool be)
+{
+    AESStateVec t, k, z = { };
+
+    t = aes_accel_ld(st, be);
+    k = aes_accel_ld(rk, be);
+    t = aes_accel_vncipher(t, z);
+    aes_accel_st(ret, t ^ k, be);
+}
+#endif /* __ALTIVEC__ */
+#endif /* PPC_HOST_AES_ROUND_H */
diff --git a/host/include/ppc/host/cpuinfo.h b/host/include/ppc/host/cpuinfo.h
index 7ec252ef52..6cc727dba7 100644
--- a/host/include/ppc/host/cpuinfo.h
+++ b/host/include/ppc/host/cpuinfo.h
@@ -16,6 +16,7 @@
 #define CPUINFO_ISEL            (1u << 5)
 #define CPUINFO_ALTIVEC         (1u << 6)
 #define CPUINFO_VSX             (1u << 7)
+#define CPUINFO_CRYPTO          (1u << 8)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/host/include/ppc64/host/aes-round.h b/host/include/ppc64/host/aes-round.h
new file mode 100644
index 0000000000..4a78d94de8
--- /dev/null
+++ b/host/include/ppc64/host/aes-round.h
@@ -0,0 +1 @@
+#include "host/include/ppc/host/aes-round.h"
diff --git a/util/cpuinfo-ppc.c b/util/cpuinfo-ppc.c
index ee761de33a..053b383720 100644
--- a/util/cpuinfo-ppc.c
+++ b/util/cpuinfo-ppc.c
@@ -49,6 +49,14 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
         /* We only care about the portion of VSX that overlaps Altivec. */
         if (hwcap & PPC_FEATURE_HAS_VSX) {
             info |= CPUINFO_VSX;
+            /*
+             * We use VSX especially for little-endian, but we should
+             * always have both anyway, since VSX came with Power7
+             * and crypto came with Power8.
+             */
+            if (hwcap2 & PPC_FEATURE2_HAS_VEC_CRYPTO) {
+                info |= CPUINFO_CRYPTO;
+            }
         }
     }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 01/38] tcg/ppc: Define _CALL_AIX for clang on ppc64(be)
  2023-06-09  2:23 ` [PATCH v2 01/38] tcg/ppc: Define _CALL_AIX for clang on ppc64(be) Richard Henderson
@ 2023-06-12 13:25   ` Daniel Henrique Barboza
  0 siblings, 0 replies; 67+ messages in thread
From: Daniel Henrique Barboza @ 2023-06-12 13:25 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini



On 6/8/23 23:23, Richard Henderson wrote:
> Restructure the ifdef ladder, separating 64-bit from 32-bit,
> and ensure _CALL_AIX is set for ELF v1.  Fixes the build for
> ppc64 big-endian host with clang.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   tcg/ppc/tcg-target.c.inc | 23 ++++++++++++++++-------
>   1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index 507fe6cda8..5c8378f8f6 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -29,15 +29,24 @@
>   /*
>    * Standardize on the _CALL_FOO symbols used by GCC:
>    * Apple XCode does not define _CALL_DARWIN.
> - * Clang defines _CALL_ELF (64-bit) but not _CALL_SYSV (32-bit).
> + * Clang defines _CALL_ELF (64-bit) but not _CALL_SYSV or _CALL_AIX.
>    */
> -#if !defined(_CALL_SYSV) && \
> -    !defined(_CALL_DARWIN) && \
> -    !defined(_CALL_AIX) && \
> -    !defined(_CALL_ELF)
> -# if defined(__APPLE__)
> +#if TCG_TARGET_REG_BITS == 64
> +# ifdef _CALL_AIX
> +    /* ok */
> +# elif defined(_CALL_ELF) && _CALL_ELF == 1
> +#  define _CALL_AIX
> +# elif defined(_CALL_ELF) && _CALL_ELF == 2
> +    /* ok */
> +# else
> +#  error "Unknown ABI"
> +# endif
> +#else
> +# if defined(_CALL_SYSV) || defined(_CALL_DARWIN)
> +    /* ok */
> +# elif defined(__APPLE__)
>   #  define _CALL_DARWIN
> -# elif defined(__ELF__) && TCG_TARGET_REG_BITS == 32
> +# elif defined(__ELF__)
>   #  define _CALL_SYSV
>   # else
>   #  error "Unknown ABI"


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 10/38] target/ppc: Use aesenc_SB_SR_AK
  2023-06-09  2:23 ` [PATCH v2 10/38] target/ppc: " Richard Henderson
@ 2023-06-12 13:26   ` Daniel Henrique Barboza
  2023-06-19 10:47   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 67+ messages in thread
From: Daniel Henrique Barboza @ 2023-06-12 13:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini



On 6/8/23 23:23, Richard Henderson wrote:
> This implements the VCIPHERLAST instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   target/ppc/int_helper.c | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index d97a7f1f28..34257e9d76 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -25,6 +25,7 @@
>   #include "qemu/log.h"
>   #include "exec/helper-proto.h"
>   #include "crypto/aes.h"
> +#include "crypto/aes-round.h"
>   #include "fpu/softfloat.h"
>   #include "qapi/error.h"
>   #include "qemu/guest-random.h"
> @@ -2947,13 +2948,7 @@ void helper_vcipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>   
>   void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>   {
> -    ppc_avr_t result;
> -    int i;
> -
> -    VECTOR_FOR_INORDER_I(i, u8) {
> -        result.VsrB(i) = b->VsrB(i) ^ (AES_sbox[a->VsrB(AES_shifts[i])]);
> -    }
> -    *r = result;
> +    aesenc_SB_SR_AK((AESState *)r, (AESState *)a, (AESState *)b, true);
>   }
>   
>   void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 02/38] util: Add cpuinfo-ppc.c
  2023-06-09  2:23 ` [PATCH v2 02/38] util: Add cpuinfo-ppc.c Richard Henderson
@ 2023-06-12 13:27   ` Daniel Henrique Barboza
  2023-06-19 10:37   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 67+ messages in thread
From: Daniel Henrique Barboza @ 2023-06-12 13:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini



On 6/8/23 23:23, Richard Henderson wrote:
> Move the code from tcg/.  Fix a bug in that PPC_FEATURE2_ARCH_3_10
> is actually spelled PPC_FEATURE2_ARCH_3_1.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   host/include/ppc/host/cpuinfo.h   | 29 ++++++++++++++++
>   host/include/ppc64/host/cpuinfo.h |  1 +
>   tcg/ppc/tcg-target.h              | 16 ++++-----
>   util/cpuinfo-ppc.c                | 57 +++++++++++++++++++++++++++++++
>   tcg/ppc/tcg-target.c.inc          | 44 +-----------------------
>   util/meson.build                  |  2 ++
>   6 files changed, 98 insertions(+), 51 deletions(-)
>   create mode 100644 host/include/ppc/host/cpuinfo.h
>   create mode 100644 host/include/ppc64/host/cpuinfo.h
>   create mode 100644 util/cpuinfo-ppc.c
> 
> diff --git a/host/include/ppc/host/cpuinfo.h b/host/include/ppc/host/cpuinfo.h
> new file mode 100644
> index 0000000000..7ec252ef52
> --- /dev/null
> +++ b/host/include/ppc/host/cpuinfo.h
> @@ -0,0 +1,29 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + * Host specific cpu indentification for ppc.
> + */
> +
> +#ifndef HOST_CPUINFO_H
> +#define HOST_CPUINFO_H
> +
> +/* Digested version of <cpuid.h> */
> +
> +#define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
> +#define CPUINFO_V2_06           (1u << 1)
> +#define CPUINFO_V2_07           (1u << 2)
> +#define CPUINFO_V3_00           (1u << 3)
> +#define CPUINFO_V3_10           (1u << 4)
> +#define CPUINFO_ISEL            (1u << 5)
> +#define CPUINFO_ALTIVEC         (1u << 6)
> +#define CPUINFO_VSX             (1u << 7)
> +
> +/* Initialized with a constructor. */
> +extern unsigned cpuinfo;
> +
> +/*
> + * We cannot rely on constructor ordering, so other constructors must
> + * use the function interface rather than the variable above.
> + */
> +unsigned cpuinfo_init(void);
> +
> +#endif /* HOST_CPUINFO_H */
> diff --git a/host/include/ppc64/host/cpuinfo.h b/host/include/ppc64/host/cpuinfo.h
> new file mode 100644
> index 0000000000..2f036a0627
> --- /dev/null
> +++ b/host/include/ppc64/host/cpuinfo.h
> @@ -0,0 +1 @@
> +#include "host/include/ppc/host/cpuinfo.h"
> diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
> index c7552b6391..b632a5a647 100644
> --- a/tcg/ppc/tcg-target.h
> +++ b/tcg/ppc/tcg-target.h
> @@ -25,6 +25,8 @@
>   #ifndef PPC_TCG_TARGET_H
>   #define PPC_TCG_TARGET_H
>   
> +#include "host/cpuinfo.h"
> +
>   #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
>   
>   #define TCG_TARGET_NB_REGS 64
> @@ -61,14 +63,12 @@ typedef enum {
>       tcg_isa_3_10,
>   } TCGPowerISA;
>   
> -extern TCGPowerISA have_isa;
> -extern bool have_altivec;
> -extern bool have_vsx;
> -
> -#define have_isa_2_06  (have_isa >= tcg_isa_2_06)
> -#define have_isa_2_07  (have_isa >= tcg_isa_2_07)
> -#define have_isa_3_00  (have_isa >= tcg_isa_3_00)
> -#define have_isa_3_10  (have_isa >= tcg_isa_3_10)
> +#define have_isa_2_06  (cpuinfo & CPUINFO_V2_06)
> +#define have_isa_2_07  (cpuinfo & CPUINFO_V2_07)
> +#define have_isa_3_00  (cpuinfo & CPUINFO_V3_00)
> +#define have_isa_3_10  (cpuinfo & CPUINFO_V3_10)
> +#define have_altivec   (cpuinfo & CPUINFO_ALTIVEC)
> +#define have_vsx       (cpuinfo & CPUINFO_VSX)
>   
>   /* optional instructions automatically implemented */
>   #define TCG_TARGET_HAS_ext8u_i32        0 /* andi */
> diff --git a/util/cpuinfo-ppc.c b/util/cpuinfo-ppc.c
> new file mode 100644
> index 0000000000..ee761de33a
> --- /dev/null
> +++ b/util/cpuinfo-ppc.c
> @@ -0,0 +1,57 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + * Host specific cpu indentification for ppc.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "host/cpuinfo.h"
> +
> +#ifdef CONFIG_GETAUXVAL
> +# include <sys/auxv.h>
> +#else
> +# include <asm/cputable.h>
> +# include "elf.h"
> +#endif
> +
> +unsigned cpuinfo;
> +
> +/* Called both as constructor and (possibly) via other constructors. */
> +unsigned __attribute__((constructor)) cpuinfo_init(void)
> +{
> +    unsigned info = cpuinfo;
> +    unsigned long hwcap, hwcap2;
> +
> +    if (info) {
> +        return info;
> +    }
> +
> +    hwcap = qemu_getauxval(AT_HWCAP);
> +    hwcap2 = qemu_getauxval(AT_HWCAP2);
> +    info = CPUINFO_ALWAYS;
> +
> +    if (hwcap & PPC_FEATURE_ARCH_2_06) {
> +        info |= CPUINFO_V2_06;
> +    }
> +    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
> +        info |= CPUINFO_V2_07;
> +    }
> +    if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
> +        info |= CPUINFO_V3_00;
> +    }
> +    if (hwcap2 & PPC_FEATURE2_ARCH_3_1) {
> +        info |= CPUINFO_V3_10;
> +    }
> +    if (hwcap2 & PPC_FEATURE2_HAS_ISEL) {
> +        info |= CPUINFO_ISEL;
> +    }
> +    if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
> +        info |= CPUINFO_ALTIVEC;
> +        /* We only care about the portion of VSX that overlaps Altivec. */
> +        if (hwcap & PPC_FEATURE_HAS_VSX) {
> +            info |= CPUINFO_VSX;
> +        }
> +    }
> +
> +    cpuinfo = info;
> +    return info;
> +}
> diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
> index 5c8378f8f6..c866f2c997 100644
> --- a/tcg/ppc/tcg-target.c.inc
> +++ b/tcg/ppc/tcg-target.c.inc
> @@ -101,10 +101,7 @@
>   #define ALL_GENERAL_REGS  0xffffffffu
>   #define ALL_VECTOR_REGS   0xffffffff00000000ull
>   
> -TCGPowerISA have_isa;
> -static bool have_isel;
> -bool have_altivec;
> -bool have_vsx;
> +#define have_isel  (cpuinfo & CPUINFO_ISEL)
>   
>   #ifndef CONFIG_SOFTMMU
>   #define TCG_GUEST_BASE_REG 30
> @@ -3879,45 +3876,6 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>   
>   static void tcg_target_init(TCGContext *s)
>   {
> -    unsigned long hwcap = qemu_getauxval(AT_HWCAP);
> -    unsigned long hwcap2 = qemu_getauxval(AT_HWCAP2);
> -
> -    have_isa = tcg_isa_base;
> -    if (hwcap & PPC_FEATURE_ARCH_2_06) {
> -        have_isa = tcg_isa_2_06;
> -    }
> -#ifdef PPC_FEATURE2_ARCH_2_07
> -    if (hwcap2 & PPC_FEATURE2_ARCH_2_07) {
> -        have_isa = tcg_isa_2_07;
> -    }
> -#endif
> -#ifdef PPC_FEATURE2_ARCH_3_00
> -    if (hwcap2 & PPC_FEATURE2_ARCH_3_00) {
> -        have_isa = tcg_isa_3_00;
> -    }
> -#endif
> -#ifdef PPC_FEATURE2_ARCH_3_10
> -    if (hwcap2 & PPC_FEATURE2_ARCH_3_10) {
> -        have_isa = tcg_isa_3_10;
> -    }
> -#endif
> -
> -#ifdef PPC_FEATURE2_HAS_ISEL
> -    /* Prefer explicit instruction from the kernel. */
> -    have_isel = (hwcap2 & PPC_FEATURE2_HAS_ISEL) != 0;
> -#else
> -    /* Fall back to knowing Power7 (2.06) has ISEL. */
> -    have_isel = have_isa_2_06;
> -#endif
> -
> -    if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
> -        have_altivec = true;
> -        /* We only care about the portion of VSX that overlaps Altivec. */
> -        if (hwcap & PPC_FEATURE_HAS_VSX) {
> -            have_vsx = true;
> -        }
> -    }
> -
>       tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
>       tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
>       if (have_altivec) {
> diff --git a/util/meson.build b/util/meson.build
> index 3a93071d27..a375160286 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -113,4 +113,6 @@ if cpu == 'aarch64'
>     util_ss.add(files('cpuinfo-aarch64.c'))
>   elif cpu in ['x86', 'x86_64']
>     util_ss.add(files('cpuinfo-i386.c'))
> +elif cpu in ['ppc', 'ppc64']
> +  util_ss.add(files('cpuinfo-ppc.c'))
>   endif


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 15/38] target/ppc: Use aesdec_ISB_ISR_AK
  2023-06-09  2:23 ` [PATCH v2 15/38] target/ppc: " Richard Henderson
@ 2023-06-12 13:27   ` Daniel Henrique Barboza
  2023-06-19 10:51   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 67+ messages in thread
From: Daniel Henrique Barboza @ 2023-06-12 13:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini



On 6/8/23 23:23, Richard Henderson wrote:
> This implements the VNCIPHERLAST instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   target/ppc/int_helper.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 34257e9d76..15f07fca2b 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -2973,13 +2973,7 @@ void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>   
>   void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>   {
> -    ppc_avr_t result;
> -    int i;
> -
> -    VECTOR_FOR_INORDER_I(i, u8) {
> -        result.VsrB(i) = b->VsrB(i) ^ (AES_isbox[a->VsrB(AES_ishifts[i])]);
> -    }
> -    *r = result;
> +    aesdec_ISB_ISR_AK((AESState *)r, (AESState *)a, (AESState *)b, true);
>   }
>   
>   void helper_vshasigmaw(ppc_avr_t *r,  ppc_avr_t *a, uint32_t st_six)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 25/38] target/ppc: Use aesenc_SB_SR_MC_AK
  2023-06-09  2:23 ` [PATCH v2 25/38] target/ppc: " Richard Henderson
@ 2023-06-12 13:28   ` Daniel Henrique Barboza
  0 siblings, 0 replies; 67+ messages in thread
From: Daniel Henrique Barboza @ 2023-06-12 13:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini



On 6/8/23 23:23, Richard Henderson wrote:
> This implements the VCIPHER instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   target/ppc/int_helper.c | 14 ++++----------
>   1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 15f07fca2b..1e477924b7 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -2933,17 +2933,11 @@ void helper_vsbox(ppc_avr_t *r, ppc_avr_t *a)
>   
>   void helper_vcipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>   {
> -    ppc_avr_t result;
> -    int i;
> +    AESState *ad = (AESState *)r;
> +    AESState *st = (AESState *)a;
> +    AESState *rk = (AESState *)b;
>   
> -    VECTOR_FOR_INORDER_I(i, u32) {
> -        result.VsrW(i) = b->VsrW(i) ^
> -            (AES_Te0[a->VsrB(AES_shifts[4 * i + 0])] ^
> -             AES_Te1[a->VsrB(AES_shifts[4 * i + 1])] ^
> -             AES_Te2[a->VsrB(AES_shifts[4 * i + 2])] ^
> -             AES_Te3[a->VsrB(AES_shifts[4 * i + 3])]);
> -    }
> -    *r = result;
> +    aesenc_SB_SR_MC_AK(ad, st, rk, true);
>   }
>   
>   void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 31/38] target/ppc: Use aesdec_ISB_ISR_AK_IMC
  2023-06-09  2:23 ` [PATCH v2 31/38] target/ppc: Use aesdec_ISB_ISR_AK_IMC Richard Henderson
@ 2023-06-12 13:28   ` Daniel Henrique Barboza
  2023-06-19 13:46   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 67+ messages in thread
From: Daniel Henrique Barboza @ 2023-06-12 13:28 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini



On 6/8/23 23:23, Richard Henderson wrote:
> This implements the VNCIPHER instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   target/ppc/int_helper.c | 19 ++++---------------
>   1 file changed, 4 insertions(+), 15 deletions(-)
> 
> diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
> index 1e477924b7..834da80fe3 100644
> --- a/target/ppc/int_helper.c
> +++ b/target/ppc/int_helper.c
> @@ -2947,22 +2947,11 @@ void helper_vcipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>   
>   void helper_vncipher(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
>   {
> -    /* This differs from what is written in ISA V2.07.  The RTL is */
> -    /* incorrect and will be fixed in V2.07B.                      */
> -    int i;
> -    ppc_avr_t tmp;
> +    AESState *ad = (AESState *)r;
> +    AESState *st = (AESState *)a;
> +    AESState *rk = (AESState *)b;
>   
> -    VECTOR_FOR_INORDER_I(i, u8) {
> -        tmp.VsrB(i) = b->VsrB(i) ^ AES_isbox[a->VsrB(AES_ishifts[i])];
> -    }
> -
> -    VECTOR_FOR_INORDER_I(i, u32) {
> -        r->VsrW(i) =
> -            AES_imc[tmp.VsrB(4 * i + 0)][0] ^
> -            AES_imc[tmp.VsrB(4 * i + 1)][1] ^
> -            AES_imc[tmp.VsrB(4 * i + 2)][2] ^
> -            AES_imc[tmp.VsrB(4 * i + 3)][3];
> -    }
> +    aesdec_ISB_ISR_AK_IMC(ad, st, rk, true);
>   }
>   
>   void helper_vncipherlast(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 38/38] host/include/ppc: Implement aes-round.h
  2023-06-09  2:24 ` [PATCH v2 38/38] host/include/ppc: " Richard Henderson
@ 2023-06-12 13:30   ` Daniel Henrique Barboza
  0 siblings, 0 replies; 67+ messages in thread
From: Daniel Henrique Barboza @ 2023-06-12 13:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini



On 6/8/23 23:24, Richard Henderson wrote:
> Detect CRYPTO in cpuinfo; implement the accel hooks.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   host/include/ppc/host/aes-round.h   | 181 ++++++++++++++++++++++++++++
>   host/include/ppc/host/cpuinfo.h     |   1 +
>   host/include/ppc64/host/aes-round.h |   1 +
>   util/cpuinfo-ppc.c                  |   8 ++
>   4 files changed, 191 insertions(+)
>   create mode 100644 host/include/ppc/host/aes-round.h
>   create mode 100644 host/include/ppc64/host/aes-round.h
> 
> diff --git a/host/include/ppc/host/aes-round.h b/host/include/ppc/host/aes-round.h
> new file mode 100644
> index 0000000000..9b5a15d1e5
> --- /dev/null
> +++ b/host/include/ppc/host/aes-round.h
> @@ -0,0 +1,181 @@
> +/*
> + * Power v2.07 specific aes acceleration.
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef PPC_HOST_AES_ROUND_H
> +#define PPC_HOST_AES_ROUND_H
> +
> +#ifndef __ALTIVEC__
> +/* Without ALTIVEC, we can't even write inline assembly. */
> +#include "host/include/generic/host/aes-round.h"
> +#else
> +#include "host/cpuinfo.h"
> +
> +#ifdef __CRYPTO__
> +# define HAVE_AES_ACCEL  true
> +#else
> +# define HAVE_AES_ACCEL  likely(cpuinfo & CPUINFO_CRYPTO)
> +#endif
> +#define ATTR_AES_ACCEL
> +
> +/*
> + * While there is <altivec.h>, both gcc and clang "aid" with the
> + * endianness issues in different ways. Just use inline asm instead.
> + */
> +
> +/* Bytes in memory are host-endian; bytes in register are @be. */
> +static inline AESStateVec aes_accel_ld(const AESState *p, bool be)
> +{
> +    AESStateVec r;
> +
> +    if (be) {
> +        asm("lvx %0, 0, %1" : "=v"(r) : "r"(p), "m"(*p));
> +    } else if (HOST_BIG_ENDIAN) {
> +        AESStateVec rev = {
> +            15, 14, 13, 12, 11, 10, 9, 8, 7,  6,  5,  4,  3,  2,  1,  0,
> +        };
> +        asm("lvx %0, 0, %1\n\t"
> +            "vperm %0, %0, %0, %2"
> +            : "=v"(r) : "r"(p), "v"(rev), "m"(*p));
> +    } else {
> +#ifdef __POWER9_VECTOR__
> +        asm("lxvb16x %x0, 0, %1" : "=v"(r) : "r"(p), "m"(*p));
> +#else
> +        asm("lxvd2x %x0, 0, %1\n\t"
> +            "xxpermdi %x0, %x0, %x0, 2"
> +            : "=v"(r) : "r"(p), "m"(*p));
> +#endif
> +    }
> +    return r;
> +}
> +
> +static void aes_accel_st(AESState *p, AESStateVec r, bool be)
> +{
> +    if (be) {
> +        asm("stvx %1, 0, %2" : "=m"(*p) : "v"(r), "r"(p));
> +    } else if (HOST_BIG_ENDIAN) {
> +        AESStateVec rev = {
> +            15, 14, 13, 12, 11, 10, 9, 8, 7,  6,  5,  4,  3,  2,  1,  0,
> +        };
> +        asm("vperm %1, %1, %1, %2\n\t"
> +            "stvx %1, 0, %3"
> +            : "=m"(*p), "+v"(r) : "v"(rev), "r"(p));
> +    } else {
> +#ifdef __POWER9_VECTOR__
> +        asm("stxvb16x %x1, 0, %2" : "=m"(*p) : "v"(r), "r"(p));
> +#else
> +        asm("xxpermdi %x1, %x1, %x1, 2\n\t"
> +            "stxvd2x %x1, 0, %2"
> +            : "=m"(*p), "+v"(r) : "r"(p));
> +#endif
> +    }
> +}
> +
> +static inline AESStateVec aes_accel_vcipher(AESStateVec d, AESStateVec k)
> +{
> +    asm("vcipher %0, %0, %1" : "+v"(d) : "v"(k));
> +    return d;
> +}
> +
> +static inline AESStateVec aes_accel_vncipher(AESStateVec d, AESStateVec k)
> +{
> +    asm("vncipher %0, %0, %1" : "+v"(d) : "v"(k));
> +    return d;
> +}
> +
> +static inline AESStateVec aes_accel_vcipherlast(AESStateVec d, AESStateVec k)
> +{
> +    asm("vcipherlast %0, %0, %1" : "+v"(d) : "v"(k));
> +    return d;
> +}
> +
> +static inline AESStateVec aes_accel_vncipherlast(AESStateVec d, AESStateVec k)
> +{
> +    asm("vncipherlast %0, %0, %1" : "+v"(d) : "v"(k));
> +    return d;
> +}
> +
> +static inline void
> +aesenc_MC_accel(AESState *ret, const AESState *st, bool be)
> +{
> +    AESStateVec t, z = { };
> +
> +    t = aes_accel_ld(st, be);
> +    t = aes_accel_vncipherlast(t, z);
> +    t = aes_accel_vcipher(t, z);
> +    aes_accel_st(ret, t, be);
> +}
> +
> +static inline void
> +aesenc_SB_SR_AK_accel(AESState *ret, const AESState *st,
> +                      const AESState *rk, bool be)
> +{
> +    AESStateVec t, k;
> +
> +    t = aes_accel_ld(st, be);
> +    k = aes_accel_ld(rk, be);
> +    t = aes_accel_vcipherlast(t, k);
> +    aes_accel_st(ret, t, be);
> +}
> +
> +static inline void
> +aesenc_SB_SR_MC_AK_accel(AESState *ret, const AESState *st,
> +                         const AESState *rk, bool be)
> +{
> +    AESStateVec t, k;
> +
> +    t = aes_accel_ld(st, be);
> +    k = aes_accel_ld(rk, be);
> +    t = aes_accel_vcipher(t, k);
> +    aes_accel_st(ret, t, be);
> +}
> +
> +static inline void
> +aesdec_IMC_accel(AESState *ret, const AESState *st, bool be)
> +{
> +    AESStateVec t, z = { };
> +
> +    t = aes_accel_ld(st, be);
> +    t = aes_accel_vcipherlast(t, z);
> +    t = aes_accel_vncipher(t, z);
> +    aes_accel_st(ret, t, be);
> +}
> +
> +static inline void
> +aesdec_ISB_ISR_AK_accel(AESState *ret, const AESState *st,
> +                        const AESState *rk, bool be)
> +{
> +    AESStateVec t, k;
> +
> +    t = aes_accel_ld(st, be);
> +    k = aes_accel_ld(rk, be);
> +    t = aes_accel_vncipherlast(t, k);
> +    aes_accel_st(ret, t, be);
> +}
> +
> +static inline void
> +aesdec_ISB_ISR_AK_IMC_accel(AESState *ret, const AESState *st,
> +                            const AESState *rk, bool be)
> +{
> +    AESStateVec t, k;
> +
> +    t = aes_accel_ld(st, be);
> +    k = aes_accel_ld(rk, be);
> +    t = aes_accel_vncipher(t, k);
> +    aes_accel_st(ret, t, be);
> +}
> +
> +static inline void
> +aesdec_ISB_ISR_IMC_AK_accel(AESState *ret, const AESState *st,
> +                            const AESState *rk, bool be)
> +{
> +    AESStateVec t, k, z = { };
> +
> +    t = aes_accel_ld(st, be);
> +    k = aes_accel_ld(rk, be);
> +    t = aes_accel_vncipher(t, z);
> +    aes_accel_st(ret, t ^ k, be);
> +}
> +#endif /* __ALTIVEC__ */
> +#endif /* PPC_HOST_AES_ROUND_H */
> diff --git a/host/include/ppc/host/cpuinfo.h b/host/include/ppc/host/cpuinfo.h
> index 7ec252ef52..6cc727dba7 100644
> --- a/host/include/ppc/host/cpuinfo.h
> +++ b/host/include/ppc/host/cpuinfo.h
> @@ -16,6 +16,7 @@
>   #define CPUINFO_ISEL            (1u << 5)
>   #define CPUINFO_ALTIVEC         (1u << 6)
>   #define CPUINFO_VSX             (1u << 7)
> +#define CPUINFO_CRYPTO          (1u << 8)
>   
>   /* Initialized with a constructor. */
>   extern unsigned cpuinfo;
> diff --git a/host/include/ppc64/host/aes-round.h b/host/include/ppc64/host/aes-round.h
> new file mode 100644
> index 0000000000..4a78d94de8
> --- /dev/null
> +++ b/host/include/ppc64/host/aes-round.h
> @@ -0,0 +1 @@
> +#include "host/include/ppc/host/aes-round.h"
> diff --git a/util/cpuinfo-ppc.c b/util/cpuinfo-ppc.c
> index ee761de33a..053b383720 100644
> --- a/util/cpuinfo-ppc.c
> +++ b/util/cpuinfo-ppc.c
> @@ -49,6 +49,14 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
>           /* We only care about the portion of VSX that overlaps Altivec. */
>           if (hwcap & PPC_FEATURE_HAS_VSX) {
>               info |= CPUINFO_VSX;
> +            /*
> +             * We use VSX especially for little-endian, but we should
> +             * always have both anyway, since VSX came with Power7
> +             * and crypto came with Power8.
> +             */
> +            if (hwcap2 & PPC_FEATURE2_HAS_VEC_CRYPTO) {
> +                info |= CPUINFO_CRYPTO;
> +            }
>           }
>       }
>   


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 03/38] tests/multiarch: Add test-aes
  2023-06-09  2:23 ` [PATCH v2 03/38] tests/multiarch: Add test-aes Richard Henderson
@ 2023-06-12 14:46   ` Alex Bennée
  2023-06-14  3:40     ` Richard Henderson
  0 siblings, 1 reply; 67+ messages in thread
From: Alex Bennée @ 2023-06-12 14:46 UTC (permalink / raw)
  To: Richard Henderson
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini, qemu-devel


Richard Henderson <richard.henderson@linaro.org> writes:

> Use a shared driver and backends for i386, aarch64, ppc64, riscv64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tests/tcg/aarch64/test-aes.c            |  58 ++++++++
>  tests/tcg/i386/test-aes.c               |  68 +++++++++
>  tests/tcg/ppc64/test-aes.c              | 116 +++++++++++++++
>  tests/tcg/riscv64/test-aes.c            |  76 ++++++++++
>  tests/tcg/multiarch/test-aes-main.c.inc | 183
> ++++++++++++++++++++++++

I find it odd the file with the main function is the c.inc and the per
guest impl's are the plain .c files. Is it possible to have it the other
way around? If we have a fallback library function for aes then we could
enable the test for all targets (a true multiarch test) with some having CPU specific
accelerations where available.

But if that's too hard to do:

Acked-by: Alex Bennée <alex.bennee@linaro.org>


>  tests/tcg/aarch64/Makefile.target       |   4 +
>  tests/tcg/i386/Makefile.target          |   4 +
>  tests/tcg/ppc64/Makefile.target         |   1 +
>  tests/tcg/riscv64/Makefile.target       |   4 +
>  9 files changed, 514 insertions(+)
>  create mode 100644 tests/tcg/aarch64/test-aes.c
>  create mode 100644 tests/tcg/i386/test-aes.c
>  create mode 100644 tests/tcg/ppc64/test-aes.c
>  create mode 100644 tests/tcg/riscv64/test-aes.c
>  create mode 100644 tests/tcg/multiarch/test-aes-main.c.inc
>
> diff --git a/tests/tcg/aarch64/test-aes.c b/tests/tcg/aarch64/test-aes.c
> new file mode 100644
> index 0000000000..2cd324f09b
> --- /dev/null
> +++ b/tests/tcg/aarch64/test-aes.c
> @@ -0,0 +1,58 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#include "../multiarch/test-aes-main.c.inc"
> +
> +bool test_SB_SR(uint8_t *o, const uint8_t *i)
> +{
> +    /* aese also adds round key, so supply zero. */
> +    asm("ld1 { v0.16b }, [%1]\n\t"
> +        "movi v1.16b, #0\n\t"
> +        "aese v0.16b, v1.16b\n\t"
> +        "st1 { v0.16b }, [%0]"
> +        : : "r"(o), "r"(i) : "v0", "v1", "memory");
> +    return true;
> +}
> +
> +bool test_MC(uint8_t *o, const uint8_t *i)
> +{
> +    asm("ld1 { v0.16b }, [%1]\n\t"
> +        "aesmc v0.16b, v0.16b\n\t"
> +        "st1 { v0.16b }, [%0]"
> +        : : "r"(o), "r"(i) : "v0", "memory");
> +    return true;
> +}
> +
> +bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    return false;
> +}
> +
> +bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
> +{
> +    /* aesd also adds round key, so supply zero. */
> +    asm("ld1 { v0.16b }, [%1]\n\t"
> +        "movi v1.16b, #0\n\t"
> +        "aesd v0.16b, v1.16b\n\t"
> +        "st1 { v0.16b }, [%0]"
> +        : : "r"(o), "r"(i) : "v0", "v1", "memory");
> +    return true;
> +}
> +
> +bool test_IMC(uint8_t *o, const uint8_t *i)
> +{
> +    asm("ld1 { v0.16b }, [%1]\n\t"
> +        "aesimc v0.16b, v0.16b\n\t"
> +        "st1 { v0.16b }, [%0]"
> +        : : "r"(o), "r"(i) : "v0", "memory");
> +    return true;
> +}
> +
> +bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    return false;
> +}
> +
> +bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    return false;
> +}
> diff --git a/tests/tcg/i386/test-aes.c b/tests/tcg/i386/test-aes.c
> new file mode 100644
> index 0000000000..199395e6cc
> --- /dev/null
> +++ b/tests/tcg/i386/test-aes.c
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#include "../multiarch/test-aes-main.c.inc"
> +#include <immintrin.h>
> +
> +static bool test_SB_SR(uint8_t *o, const uint8_t *i)
> +{
> +    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
> +
> +    /* aesenclast also adds round key, so supply zero. */
> +    vi = _mm_aesenclast_si128(vi, _mm_setzero_si128());
> +
> +    _mm_storeu_si128((__m128i_u *)o, vi);
> +    return true;
> +}
> +
> +static bool test_MC(uint8_t *o, const uint8_t *i)
> +{
> +    return false;
> +}
> +
> +static bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
> +    __m128i vk = _mm_loadu_si128((const __m128i_u *)k);
> +
> +    vi = _mm_aesenc_si128(vi, vk);
> +
> +    _mm_storeu_si128((__m128i_u *)o, vi);
> +    return true;
> +}
> +
> +static bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
> +{
> +    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
> +
> +    /* aesdeclast also adds round key, so supply zero. */
> +    vi = _mm_aesdeclast_si128(vi, _mm_setzero_si128());
> +
> +    _mm_storeu_si128((__m128i_u *)o, vi);
> +    return true;
> +}
> +
> +static bool test_IMC(uint8_t *o, const uint8_t *i)
> +{
> +    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
> +
> +    vi = _mm_aesimc_si128(vi);
> +
> +    _mm_storeu_si128((__m128i_u *)o, vi);
> +    return true;
> +}
> +
> +static bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    return false;
> +}
> +
> +static bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    __m128i vi = _mm_loadu_si128((const __m128i_u *)i);
> +    __m128i vk = _mm_loadu_si128((const __m128i_u *)k);
> +
> +    vi = _mm_aesdec_si128(vi, vk);
> +
> +    _mm_storeu_si128((__m128i_u *)o, vi);
> +    return true;
> +}
> diff --git a/tests/tcg/ppc64/test-aes.c b/tests/tcg/ppc64/test-aes.c
> new file mode 100644
> index 0000000000..1d2be488e9
> --- /dev/null
> +++ b/tests/tcg/ppc64/test-aes.c
> @@ -0,0 +1,116 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#include "../multiarch/test-aes-main.c.inc"
> +
> +#undef BIG_ENDIAN
> +#define BIG_ENDIAN  (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
> +
> +static unsigned char bswap_le[16] __attribute__((aligned(16))) = {
> +    8,9,10,11,12,13,14,15,
> +    0,1,2,3,4,5,6,7
> +};
> +
> +bool test_SB_SR(uint8_t *o, const uint8_t *i)
> +{
> +    /* vcipherlast also adds round key, so supply zero. */
> +    if (BIG_ENDIAN) {
> +        asm("lxvd2x 32,0,%1\n\t"
> +            "vspltisb 1,0\n\t"
> +            "vcipherlast 0,0,1\n\t"
> +            "stxvd2x 32,0,%0"
> +            : : "r"(o), "r"(i) : "memory", "v0", "v1");
> +    } else {
> +        asm("lxvd2x 32,0,%1\n\t"
> +            "lxvd2x 34,0,%2\n\t"
> +            "vspltisb 1,0\n\t"
> +            "vperm 0,0,0,2\n\t"
> +            "vcipherlast 0,0,1\n\t"
> +            "vperm 0,0,0,2\n\t"
> +            "stxvd2x 32,0,%0"
> +            : : "r"(o), "r"(i), "r"(bswap_le) : "memory", "v0", "v1", "v2");
> +    }
> +    return true;
> +}
> +
> +bool test_MC(uint8_t *o, const uint8_t *i)
> +{
> +    return false;
> +}
> +
> +bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    if (BIG_ENDIAN) {
> +        asm("lxvd2x 32,0,%1\n\t"
> +            "lxvd2x 33,0,%2\n\t"
> +            "vcipher 0,0,1\n\t"
> +            "stxvd2x 32,0,%0"
> +            : : "r"(o), "r"(i), "r"(k) : "memory", "v0", "v1");
> +    } else {
> +        asm("lxvd2x 32,0,%1\n\t"
> +            "lxvd2x 33,0,%2\n\t"
> +            "lxvd2x 34,0,%3\n\t"
> +            "vperm 0,0,0,2\n\t"
> +            "vperm 1,1,1,2\n\t"
> +            "vcipher 0,0,1\n\t"
> +            "vperm 0,0,0,2\n\t"
> +            "stxvd2x 32,0,%0"
> +            : : "r"(o), "r"(i), "r"(k), "r"(bswap_le)
> +              : "memory", "v0", "v1", "v2");
> +    }
> +    return true;
> +}
> +
> +bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
> +{
> +    /* vcipherlast also adds round key, so supply zero. */
> +    if (BIG_ENDIAN) {
> +        asm("lxvd2x 32,0,%1\n\t"
> +            "vspltisb 1,0\n\t"
> +            "vncipherlast 0,0,1\n\t"
> +            "stxvd2x 32,0,%0"
> +            : : "r"(o), "r"(i) : "memory", "v0", "v1");
> +    } else {
> +        asm("lxvd2x 32,0,%1\n\t"
> +            "lxvd2x 34,0,%2\n\t"
> +            "vspltisb 1,0\n\t"
> +            "vperm 0,0,0,2\n\t"
> +            "vncipherlast 0,0,1\n\t"
> +            "vperm 0,0,0,2\n\t"
> +            "stxvd2x 32,0,%0"
> +            : : "r"(o), "r"(i), "r"(bswap_le) : "memory", "v0", "v1", "v2");
> +    }
> +    return true;
> +}
> +
> +bool test_IMC(uint8_t *o, const uint8_t *i)
> +{
> +    return false;
> +}
> +
> +bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    if (BIG_ENDIAN) {
> +        asm("lxvd2x 32,0,%1\n\t"
> +            "lxvd2x 33,0,%2\n\t"
> +            "vncipher 0,0,1\n\t"
> +            "stxvd2x 32,0,%0"
> +            : : "r"(o), "r"(i), "r"(k) : "memory", "v0", "v1");
> +    } else {
> +        asm("lxvd2x 32,0,%1\n\t"
> +            "lxvd2x 33,0,%2\n\t"
> +            "lxvd2x 34,0,%3\n\t"
> +            "vperm 0,0,0,2\n\t"
> +            "vperm 1,1,1,2\n\t"
> +            "vncipher 0,0,1\n\t"
> +            "vperm 0,0,0,2\n\t"
> +            "stxvd2x 32,0,%0"
> +            : : "r"(o), "r"(i), "r"(k), "r"(bswap_le)
> +              : "memory", "v0", "v1", "v2");
> +    }
> +    return true;
> +}
> +
> +bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    return false;
> +}
> diff --git a/tests/tcg/riscv64/test-aes.c b/tests/tcg/riscv64/test-aes.c
> new file mode 100644
> index 0000000000..3d7ef0e33a
> --- /dev/null
> +++ b/tests/tcg/riscv64/test-aes.c
> @@ -0,0 +1,76 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#include "../multiarch/test-aes-main.c.inc"
> +
> +bool test_SB_SR(uint8_t *o, const uint8_t *i)
> +{
> +    uint64_t *o8 = (uint64_t *)o;
> +    const uint64_t *i8 = (const uint64_t *)i;
> +
> +    asm("aes64es %0,%2,%3\n\t"
> +        "aes64es %1,%3,%2"
> +        : "=&r"(o8[0]), "=&r"(o8[1]) : "r"(i8[0]), "r"(i8[1]));
> +    return true;
> +}
> +
> +bool test_MC(uint8_t *o, const uint8_t *i)
> +{
> +    return false;
> +}
> +
> +bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    uint64_t *o8 = (uint64_t *)o;
> +    const uint64_t *i8 = (const uint64_t *)i;
> +    const uint64_t *k8 = (const uint64_t *)k;
> +
> +    asm("aes64esm %0,%2,%3\n\t"
> +        "aes64esm %1,%3,%2\n\t"
> +        "xor %0,%0,%4\n\t"
> +        "xor %1,%1,%5"
> +        : "=&r"(o8[0]), "=&r"(o8[1])
> +        : "r"(i8[0]), "r"(i8[1]), "r"(k8[0]), "r"(k8[1]));
> +    return true;
> +}
> +
> +bool test_ISB_ISR(uint8_t *o, const uint8_t *i)
> +{
> +    uint64_t *o8 = (uint64_t *)o;
> +    const uint64_t *i8 = (const uint64_t *)i;
> +
> +    asm("aes64ds %0,%2,%3\n\t"
> +        "aes64ds %1,%3,%2"
> +        : "=&r"(o8[0]), "=&r"(o8[1]) : "r"(i8[0]), "r"(i8[1]));
> +    return true;
> +}
> +
> +bool test_IMC(uint8_t *o, const uint8_t *i)
> +{
> +    uint64_t *o8 = (uint64_t *)o;
> +    const uint64_t *i8 = (const uint64_t *)i;
> +
> +    asm("aes64im %0,%0\n\t"
> +        "aes64im %1,%1"
> +        : "=r"(o8[0]), "=r"(o8[1]) : "0"(i8[0]), "1"(i8[1]));
> +    return true;
> +}
> +
> +bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    return false;
> +}
> +
> +bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k)
> +{
> +    uint64_t *o8 = (uint64_t *)o;
> +    const uint64_t *i8 = (const uint64_t *)i;
> +    const uint64_t *k8 = (const uint64_t *)k;
> +
> +    asm("aes64dsm %0,%2,%3\n\t"
> +        "aes64dsm %1,%3,%2\n\t"
> +        "xor %0,%0,%4\n\t"
> +        "xor %1,%1,%5"
> +        : "=&r"(o8[0]), "=&r"(o8[1])
> +        : "r"(i8[0]), "r"(i8[1]), "r"(k8[0]), "r"(k8[1]));
> +    return true;
> +}
> diff --git a/tests/tcg/multiarch/test-aes-main.c.inc b/tests/tcg/multiarch/test-aes-main.c.inc
> new file mode 100644
> index 0000000000..0039f8ba55
> --- /dev/null
> +++ b/tests/tcg/multiarch/test-aes-main.c.inc
> @@ -0,0 +1,183 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stdio.h>
> +
> +static bool test_SB_SR(uint8_t *o, const uint8_t *i);
> +static bool test_MC(uint8_t *o, const uint8_t *i);
> +static bool test_SB_SR_MC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k);
> +
> +static bool test_ISB_ISR(uint8_t *o, const uint8_t *i);
> +static bool test_IMC(uint8_t *o, const uint8_t *i);
> +static bool test_ISB_ISR_AK_IMC(uint8_t *o, const uint8_t *i, const uint8_t *k);
> +static bool test_ISB_ISR_IMC_AK(uint8_t *o, const uint8_t *i, const uint8_t *k);
> +
> +/*
> + * From https://doi.org/10.6028/NIST.FIPS.197-upd1,
> + * Appendix B -- Cipher Example
> + *
> + * Note that the formatting of the 4x4 matrices in the document is
> + * column-major, whereas C is row-major.  Therefore to get the bytes
> + * in the same order as the text, the matrices are transposed.
> + *
> + * Note that we are not going to test SubBytes or ShiftRows separately,
> + * so the "After SubBytes" column is omitted, using only the combined
> + * result "After ShiftRows" column.
> + */
> +
> +/* Ease the inline assembly by aligning everything. */
> +typedef struct {
> +    uint8_t b[16] __attribute__((aligned(16)));
> +} State;
> +
> +typedef struct {
> +    State start, after_sr, after_mc, round_key;
> +} Round;
> +
> +static const Round rounds[] = {
> +    /* Round 1 */
> +    { { { 0x19, 0x3d, 0xe3, 0xbe,       /* start */
> +          0xa0, 0xf4, 0xe2, 0x2b,
> +          0x9a, 0xc6, 0x8d, 0x2a,
> +          0xe9, 0xf8, 0x48, 0x08, } },
> +
> +      { { 0xd4, 0xbf, 0x5d, 0x30,       /* after shiftrows */
> +          0xe0, 0xb4, 0x52, 0xae,
> +          0xb8, 0x41, 0x11, 0xf1,
> +          0x1e, 0x27, 0x98, 0xe5, } },
> +
> +      { { 0x04, 0x66, 0x81, 0xe5,       /* after mixcolumns */
> +          0xe0, 0xcb, 0x19, 0x9a,
> +          0x48, 0xf8, 0xd3, 0x7a,
> +          0x28, 0x06, 0x26, 0x4c, } },
> +
> +      { { 0xa0, 0xfa, 0xfe, 0x17,       /* round key */
> +          0x88, 0x54, 0x2c, 0xb1,
> +          0x23, 0xa3, 0x39, 0x39,
> +          0x2a, 0x6c, 0x76, 0x05, } } },
> +
> +    /* Round 2 */
> +    { { { 0xa4, 0x9c, 0x7f, 0xf2,       /* start */
> +          0x68, 0x9f, 0x35, 0x2b,
> +          0x6b, 0x5b, 0xea, 0x43,
> +          0x02, 0x6a, 0x50, 0x49, } },
> +
> +      { { 0x49, 0xdb, 0x87, 0x3b,       /* after shiftrows */
> +          0x45, 0x39, 0x53, 0x89,
> +          0x7f, 0x02, 0xd2, 0xf1,
> +          0x77, 0xde, 0x96, 0x1a, } },
> +
> +      { { 0x58, 0x4d, 0xca, 0xf1,       /* after mixcolumns */
> +          0x1b, 0x4b, 0x5a, 0xac,
> +          0xdb, 0xe7, 0xca, 0xa8,
> +          0x1b, 0x6b, 0xb0, 0xe5, } },
> +
> +      { { 0xf2, 0xc2, 0x95, 0xf2,       /* round key */
> +          0x7a, 0x96, 0xb9, 0x43,
> +          0x59, 0x35, 0x80, 0x7a,
> +          0x73, 0x59, 0xf6, 0x7f, } } },
> +
> +    /* Round 3 */
> +    { { { 0xaa, 0x8f, 0x5f, 0x03,       /* start */
> +          0x61, 0xdd, 0xe3, 0xef,
> +          0x82, 0xd2, 0x4a, 0xd2,
> +          0x68, 0x32, 0x46, 0x9a, } },
> +
> +      { { 0xac, 0xc1, 0xd6, 0xb8,       /* after shiftrows */
> +          0xef, 0xb5, 0x5a, 0x7b,
> +          0x13, 0x23, 0xcf, 0xdf,
> +          0x45, 0x73, 0x11, 0xb5, } },
> +
> +      { { 0x75, 0xec, 0x09, 0x93,       /* after mixcolumns */
> +          0x20, 0x0b, 0x63, 0x33,
> +          0x53, 0xc0, 0xcf, 0x7c,
> +          0xbb, 0x25, 0xd0, 0xdc, } },
> +
> +      { { 0x3d, 0x80, 0x47, 0x7d,       /* round key */
> +          0x47, 0x16, 0xfe, 0x3e,
> +          0x1e, 0x23, 0x7e, 0x44,
> +          0x6d, 0x7a, 0x88, 0x3b, } } },
> +};
> +
> +static void verify_log(const char *prefix, const State *s)
> +{
> +    printf("%s:", prefix);
> +    for (int i = 0; i < sizeof(State); ++i) {
> +        printf(" %02x", s->b[i]);
> +    }
> +    printf("\n");
> +}
> +
> +static void verify(const State *ref, const State *tst, const char *which)
> +{
> +    if (!memcmp(ref, tst, sizeof(State))) {
> +        return;
> +    }
> +
> +    printf("Mismatch on %s\n", which);
> +    verify_log("ref", ref);
> +    verify_log("tst", tst);
> +    exit(EXIT_FAILURE);
> +}
> +
> +int main()
> +{
> +    int i, n = sizeof(rounds) / sizeof(Round);
> +    State t;
> +
> +    for (i = 0; i < n; ++i) {
> +        if (test_SB_SR(t.b, rounds[i].start.b)) {
> +            verify(&rounds[i].after_sr, &t, "SB+SR");
> +        }
> +    }
> +
> +    for (i = 0; i < n; ++i) {
> +        if (test_MC(t.b, rounds[i].after_sr.b)) {
> +            verify(&rounds[i].after_mc, &t, "MC");
> +        }
> +    }
> +
> +    /* The kernel of Cipher(). */
> +    for (i = 0; i < n - 1; ++i) {
> +        if (test_SB_SR_MC_AK(t.b, rounds[i].start.b, rounds[i].round_key.b)) {
> +            verify(&rounds[i + 1].start, &t, "SB+SR+MC+AK");
> +        }
> +    }
> +
> +    for (i = 0; i < n; ++i) {
> +        if (test_ISB_ISR(t.b, rounds[i].after_sr.b)) {
> +            verify(&rounds[i].start, &t, "ISB+ISR");
> +        }
> +    }
> +
> +    for (i = 0; i < n; ++i) {
> +        if (test_IMC(t.b, rounds[i].after_mc.b)) {
> +            verify(&rounds[i].after_sr, &t, "IMC");
> +        }
> +    }
> +
> +    /* The kernel of InvCipher(). */
> +    for (i = n - 1; i > 0; --i) {
> +        if (test_ISB_ISR_AK_IMC(t.b, rounds[i].after_sr.b,
> +                                rounds[i - 1].round_key.b)) {
> +            verify(&rounds[i - 1].after_sr, &t, "ISB+ISR+AK+IMC");
> +        }
> +    }
> +
> +    /*
> +     * The kernel of EqInvCipher().  
> +     * We must compute a different round key: apply InvMixColumns to
> +     * the standard round key, per KeyExpansion vs KeyExpansionEIC.
> +     */
> +    for (i = 1; i < n; ++i) {
> +        if (test_IMC(t.b, rounds[i - 1].round_key.b) &&
> +            test_ISB_ISR_IMC_AK(t.b, rounds[i].after_sr.b, t.b)) {
> +            verify(&rounds[i - 1].after_sr, &t, "ISB+ISR+IMC+AK");
> +        }
> +    }
> +
> +    return EXIT_SUCCESS;
> +}
> diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
> index 3430fd3cd8..d217474d0d 100644
> --- a/tests/tcg/aarch64/Makefile.target
> +++ b/tests/tcg/aarch64/Makefile.target
> @@ -74,6 +74,10 @@ endif
>  AARCH64_TESTS += sve-ioctls
>  sve-ioctls: CFLAGS+=-march=armv8.1-a+sve
>  
> +AARCH64_TESTS += test-aes
> +test-aes: CFLAGS += -O -march=armv8-a+aes
> +test-aes: test-aes-main.c.inc
> +
>  # Vector SHA1
>  sha1-vector: CFLAGS=-O3
>  sha1-vector: sha1.c
> diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
> index 821822ed0c..3ba61e3880 100644
> --- a/tests/tcg/i386/Makefile.target
> +++ b/tests/tcg/i386/Makefile.target
> @@ -28,6 +28,10 @@ run-test-i386-bmi2: QEMU_OPTS += -cpu max
>  test-i386-adcox: CFLAGS=-O2
>  run-test-i386-adcox: QEMU_OPTS += -cpu max
>  
> +test-aes: CFLAGS += -O -msse2 -maes
> +test-aes: test-aes-main.c.inc
> +run-test-aes: QEMU_OPTS += -cpu max
> +
>  #
>  # hello-i386 is a barebones app
>  #
> diff --git a/tests/tcg/ppc64/Makefile.target b/tests/tcg/ppc64/Makefile.target
> index b084963b9a..5721c159f2 100644
> --- a/tests/tcg/ppc64/Makefile.target
> +++ b/tests/tcg/ppc64/Makefile.target
> @@ -36,5 +36,6 @@ run-vector: QEMU_OPTS += -cpu POWER10
>  
>  PPC64_TESTS += signal_save_restore_xer
>  PPC64_TESTS += xxspltw
> +PPC64_TESTS += test-aes
>  
>  TESTS += $(PPC64_TESTS)
> diff --git a/tests/tcg/riscv64/Makefile.target b/tests/tcg/riscv64/Makefile.target
> index 9973ba3b5f..4002d14b9e 100644
> --- a/tests/tcg/riscv64/Makefile.target
> +++ b/tests/tcg/riscv64/Makefile.target
> @@ -9,3 +9,7 @@ TESTS += noexec
>  TESTS += test-noc
>  test-noc: LDFLAGS = -nostdlib -static
>  run-test-noc: QEMU_OPTS += -cpu rv64,c=false
> +
> +TESTS += test-aes
> +test-aes: CFLAGS += -O -march=rv64gzk
> +run-test-aes: QEMU_OPTS += -cpu rv64,zk=on


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 03/38] tests/multiarch: Add test-aes
  2023-06-12 14:46   ` Alex Bennée
@ 2023-06-14  3:40     ` Richard Henderson
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-14  3:40 UTC (permalink / raw)
  To: Alex Bennée
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini, qemu-devel

On 6/12/23 16:46, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> Use a shared driver and backends for i386, aarch64, ppc64, riscv64.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   tests/tcg/aarch64/test-aes.c            |  58 ++++++++
>>   tests/tcg/i386/test-aes.c               |  68 +++++++++
>>   tests/tcg/ppc64/test-aes.c              | 116 +++++++++++++++
>>   tests/tcg/riscv64/test-aes.c            |  76 ++++++++++
>>   tests/tcg/multiarch/test-aes-main.c.inc | 183
>> ++++++++++++++++++++++++
> 
> I find it odd the file with the main function is the c.inc and the per
> guest impl's are the plain .c files. Is it possible to have it the other
> way around?

Not with the way tests/tcg is structured, no.

> If we have a fallback library function for aes then we could
> enable the test for all targets (a true multiarch test) with some having CPU specific
> accelerations where available.

We don't have that either.


r~


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 02/38] util: Add cpuinfo-ppc.c
  2023-06-09  2:23 ` [PATCH v2 02/38] util: Add cpuinfo-ppc.c Richard Henderson
  2023-06-12 13:27   ` Daniel Henrique Barboza
@ 2023-06-19 10:37   ` Philippe Mathieu-Daudé
  2023-06-19 14:44     ` Richard Henderson
  1 sibling, 1 reply; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 10:37 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> Move the code from tcg/.  Fix a bug in that PPC_FEATURE2_ARCH_3_10
> is actually spelled PPC_FEATURE2_ARCH_3_1.

This is rather confusing.

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   host/include/ppc/host/cpuinfo.h   | 29 ++++++++++++++++
>   host/include/ppc64/host/cpuinfo.h |  1 +
>   tcg/ppc/tcg-target.h              | 16 ++++-----
>   util/cpuinfo-ppc.c                | 57 +++++++++++++++++++++++++++++++
>   tcg/ppc/tcg-target.c.inc          | 44 +-----------------------
>   util/meson.build                  |  2 ++
>   6 files changed, 98 insertions(+), 51 deletions(-)
>   create mode 100644 host/include/ppc/host/cpuinfo.h
>   create mode 100644 host/include/ppc64/host/cpuinfo.h
>   create mode 100644 util/cpuinfo-ppc.c
> 
> diff --git a/host/include/ppc/host/cpuinfo.h b/host/include/ppc/host/cpuinfo.h
> new file mode 100644
> index 0000000000..7ec252ef52
> --- /dev/null
> +++ b/host/include/ppc/host/cpuinfo.h
> @@ -0,0 +1,29 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + * Host specific cpu indentification for ppc.
> + */
> +
> +#ifndef HOST_CPUINFO_H
> +#define HOST_CPUINFO_H
> +
> +/* Digested version of <cpuid.h> */
> +
> +#define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
> +#define CPUINFO_V2_06           (1u << 1)
> +#define CPUINFO_V2_07           (1u << 2)
> +#define CPUINFO_V3_00           (1u << 3)
> +#define CPUINFO_V3_10           (1u << 4)

Could we define as CPUINFO_V3_1 ...

> +#define CPUINFO_ISEL            (1u << 5)
> +#define CPUINFO_ALTIVEC         (1u << 6)
> +#define CPUINFO_VSX             (1u << 7)


> -#define have_isa_2_06  (have_isa >= tcg_isa_2_06)
> -#define have_isa_2_07  (have_isa >= tcg_isa_2_07)
> -#define have_isa_3_00  (have_isa >= tcg_isa_3_00)
> -#define have_isa_3_10  (have_isa >= tcg_isa_3_10)
> +#define have_isa_2_06  (cpuinfo & CPUINFO_V2_06)
> +#define have_isa_2_07  (cpuinfo & CPUINFO_V2_07)
> +#define have_isa_3_00  (cpuinfo & CPUINFO_V3_00)
> +#define have_isa_3_10  (cpuinfo & CPUINFO_V3_10)

... and have_isa_3_1 instead?

Otherwise,

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 07/38] target/i386: Use aesenc_SB_SR_AK
  2023-06-09  2:23 ` [PATCH v2 07/38] target/i386: Use aesenc_SB_SR_AK Richard Henderson
@ 2023-06-19 10:43   ` Philippe Mathieu-Daudé
  2023-06-19 10:45     ` Philippe Mathieu-Daudé
  0 siblings, 1 reply; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 10:43 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> This implements the AESENCLAST instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/i386/ops_sse.h | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)


>   void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
>   {
> -    int i;
> -    Reg st = *v;
> -    Reg rk = *s;
> +    for (int i = 0; i < SHIFT; i++) {
> +        AESState *ad = (AESState *)&d->ZMM_X(i);
> +        AESState *st = (AESState *)&v->ZMM_X(i);
> +        AESState *rk = (AESState *)&s->ZMM_X(i);
>   
> -    for (i = 0; i < 8 << SHIFT; i++) {
> -        d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i & 15] + (i & ~15))]);
> +        aesenc_SB_SR_AK(ad, st, rk, false);

Why not use aesenc_SB_SR_AK_gen(ad, st, rk)?

Regardless:
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

>       }
>   }
>   



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 07/38] target/i386: Use aesenc_SB_SR_AK
  2023-06-19 10:43   ` Philippe Mathieu-Daudé
@ 2023-06-19 10:45     ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 10:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 19/6/23 12:43, Philippe Mathieu-Daudé wrote:
> On 9/6/23 04:23, Richard Henderson wrote:
>> This implements the AESENCLAST instruction.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/i386/ops_sse.h | 11 ++++++-----
>>   1 file changed, 6 insertions(+), 5 deletions(-)
> 
> 
>>   void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg 
>> *v, Reg *s)
>>   {
>> -    int i;
>> -    Reg st = *v;
>> -    Reg rk = *s;
>> +    for (int i = 0; i < SHIFT; i++) {
>> +        AESState *ad = (AESState *)&d->ZMM_X(i);
>> +        AESState *st = (AESState *)&v->ZMM_X(i);
>> +        AESState *rk = (AESState *)&s->ZMM_X(i);
>> -    for (i = 0; i < 8 << SHIFT; i++) {
>> -        d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i & 15] + (i & 
>> ~15))]);
>> +        aesenc_SB_SR_AK(ad, st, rk, false);
> 
> Why not use aesenc_SB_SR_AK_gen(ad, st, rk)?

Whatever, I misread the last 'be' boolean as 'swap', so this is perfect.

> Regardless:
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> 
>>       }
>>   }
> 



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 10/38] target/ppc: Use aesenc_SB_SR_AK
  2023-06-09  2:23 ` [PATCH v2 10/38] target/ppc: " Richard Henderson
  2023-06-12 13:26   ` Daniel Henrique Barboza
@ 2023-06-19 10:47   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 10:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> This implements the VCIPHERLAST instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/ppc/int_helper.c | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 13/38] target/i386: Use aesdec_ISB_ISR_AK
  2023-06-09  2:23 ` [PATCH v2 13/38] target/i386: Use aesdec_ISB_ISR_AK Richard Henderson
@ 2023-06-19 10:51   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 10:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> This implements the AESDECLAST instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/i386/ops_sse.h | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 15/38] target/ppc: Use aesdec_ISB_ISR_AK
  2023-06-09  2:23 ` [PATCH v2 15/38] target/ppc: " Richard Henderson
  2023-06-12 13:27   ` Daniel Henrique Barboza
@ 2023-06-19 10:51   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 10:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> This implements the VNCIPHERLAST instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/ppc/int_helper.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 35/38] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN
  2023-06-09  2:23 ` [PATCH v2 35/38] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN Richard Henderson
@ 2023-06-19 13:18   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 13:18 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> These arrays are no longer used outside of aes.c.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   include/crypto/aes.h | 25 -------------------------
>   crypto/aes.c         | 33 +++++++++++++++++++++------------
>   2 files changed, 21 insertions(+), 37 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 34/38] crypto: Remove AES_imc
  2023-06-09  2:23 ` [PATCH v2 34/38] crypto: Remove AES_imc Richard Henderson
@ 2023-06-19 13:19   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 13:19 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> This array is no longer used.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   include/crypto/aes.h |   7 --
>   crypto/aes.c         | 264 -------------------------------------------
>   2 files changed, 271 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 32/38] crypto: Remove AES_shifts, AES_ishifts
  2023-06-09  2:23 ` [PATCH v2 32/38] crypto: Remove AES_shifts, AES_ishifts Richard Henderson
@ 2023-06-19 13:45   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 13:45 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> These arrays are no longer used, replaced by AES_SH_*, AES_ISH_*.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   include/crypto/aes.h |  4 ----
>   crypto/aes.c         | 14 --------------
>   2 files changed, 18 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 31/38] target/ppc: Use aesdec_ISB_ISR_AK_IMC
  2023-06-09  2:23 ` [PATCH v2 31/38] target/ppc: Use aesdec_ISB_ISR_AK_IMC Richard Henderson
  2023-06-12 13:28   ` Daniel Henrique Barboza
@ 2023-06-19 13:46   ` Philippe Mathieu-Daudé
  1 sibling, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 13:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> This implements the VNCIPHER instruction.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/ppc/int_helper.c | 19 ++++---------------
>   1 file changed, 4 insertions(+), 15 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 30/38] crypto: Add aesdec_ISB_ISR_AK_IMC
  2023-06-09  2:23 ` [PATCH v2 30/38] crypto: Add aesdec_ISB_ISR_AK_IMC Richard Henderson
@ 2023-06-19 13:59   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-19 13:59 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> Add a primitive for InvSubBytes + InvShiftRows +
> AddRoundKey + InvMixColumns.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   host/include/generic/host/aes-round.h |  3 +++
>   include/crypto/aes-round.h            | 21 +++++++++++++++++++++
>   crypto/aes.c                          | 14 ++++++++++++++
>   3 files changed, 38 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 02/38] util: Add cpuinfo-ppc.c
  2023-06-19 10:37   ` Philippe Mathieu-Daudé
@ 2023-06-19 14:44     ` Richard Henderson
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-19 14:44 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 6/19/23 12:37, Philippe Mathieu-Daudé wrote:
> On 9/6/23 04:23, Richard Henderson wrote:
>> Move the code from tcg/.  Fix a bug in that PPC_FEATURE2_ARCH_3_10
>> is actually spelled PPC_FEATURE2_ARCH_3_1.
> 
> This is rather confusing.
> 
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   host/include/ppc/host/cpuinfo.h   | 29 ++++++++++++++++
>>   host/include/ppc64/host/cpuinfo.h |  1 +
>>   tcg/ppc/tcg-target.h              | 16 ++++-----
>>   util/cpuinfo-ppc.c                | 57 +++++++++++++++++++++++++++++++
>>   tcg/ppc/tcg-target.c.inc          | 44 +-----------------------
>>   util/meson.build                  |  2 ++
>>   6 files changed, 98 insertions(+), 51 deletions(-)
>>   create mode 100644 host/include/ppc/host/cpuinfo.h
>>   create mode 100644 host/include/ppc64/host/cpuinfo.h
>>   create mode 100644 util/cpuinfo-ppc.c
>>
>> diff --git a/host/include/ppc/host/cpuinfo.h b/host/include/ppc/host/cpuinfo.h
>> new file mode 100644
>> index 0000000000..7ec252ef52
>> --- /dev/null
>> +++ b/host/include/ppc/host/cpuinfo.h
>> @@ -0,0 +1,29 @@
>> +/*
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + * Host specific cpu indentification for ppc.
>> + */
>> +
>> +#ifndef HOST_CPUINFO_H
>> +#define HOST_CPUINFO_H
>> +
>> +/* Digested version of <cpuid.h> */
>> +
>> +#define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
>> +#define CPUINFO_V2_06           (1u << 1)
>> +#define CPUINFO_V2_07           (1u << 2)
>> +#define CPUINFO_V3_00           (1u << 3)
>> +#define CPUINFO_V3_10           (1u << 4)
> 
> Could we define as CPUINFO_V3_1 ...
> 
>> +#define CPUINFO_ISEL            (1u << 5)
>> +#define CPUINFO_ALTIVEC         (1u << 6)
>> +#define CPUINFO_VSX             (1u << 7)
> 
> 
>> -#define have_isa_2_06  (have_isa >= tcg_isa_2_06)
>> -#define have_isa_2_07  (have_isa >= tcg_isa_2_07)
>> -#define have_isa_3_00  (have_isa >= tcg_isa_3_00)
>> -#define have_isa_3_10  (have_isa >= tcg_isa_3_10)
>> +#define have_isa_2_06  (cpuinfo & CPUINFO_V2_06)
>> +#define have_isa_2_07  (cpuinfo & CPUINFO_V2_07)
>> +#define have_isa_3_00  (cpuinfo & CPUINFO_V3_00)
>> +#define have_isa_3_10  (cpuinfo & CPUINFO_V3_10)
> 
> ... and have_isa_3_1 instead?

I suppose we could, but they all line up this way.  :-)

r~

> 
> Otherwise,
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> 



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows
  2023-06-09  2:23 ` [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
@ 2023-06-19 15:41   ` Daniel P. Berrangé
  2023-06-29 10:21   ` Ard Biesheuvel
  1 sibling, 0 replies; 67+ messages in thread
From: Daniel P. Berrangé @ 2023-06-19 15:41 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ardb, qemu-ppc, qemu-arm, qemu-riscv, pbonzini,
	Philippe Mathieu-Daudé

On Thu, Jun 08, 2023 at 07:23:28PM -0700, Richard Henderson wrote:
> These symbols will avoid the indirection through memory
> when fully unrolling some new primitives.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  crypto/aes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 48 insertions(+), 2 deletions(-)

Acked-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 04/38] target/arm: Move aesmc and aesimc tables to crypto/aes.c
  2023-06-09  2:23 ` [PATCH v2 04/38] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
@ 2023-06-19 16:49   ` Daniel P. Berrangé
  0 siblings, 0 replies; 67+ messages in thread
From: Daniel P. Berrangé @ 2023-06-19 16:49 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ardb, qemu-ppc, qemu-arm, qemu-riscv, pbonzini,
	Philippe Mathieu-Daudé

On Thu, Jun 08, 2023 at 07:23:27PM -0700, Richard Henderson wrote:
> We do not currently have a table in crypto/ for just MixColumns.
> Move both tables for consistency.
> 
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  include/crypto/aes.h           |   6 ++
>  crypto/aes.c                   | 140 ++++++++++++++++++++++++++++++++
>  target/arm/tcg/crypto_helper.c | 143 ++-------------------------------
>  3 files changed, 151 insertions(+), 138 deletions(-)

Acked-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 06/38] crypto: Add aesenc_SB_SR_AK
  2023-06-09  2:23 ` [PATCH v2 06/38] crypto: Add aesenc_SB_SR_AK Richard Henderson
@ 2023-06-19 16:56   ` Daniel P. Berrangé
  2023-06-19 17:05     ` Richard Henderson
  0 siblings, 1 reply; 67+ messages in thread
From: Daniel P. Berrangé @ 2023-06-19 16:56 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, ardb, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On Thu, Jun 08, 2023 at 07:23:29PM -0700, Richard Henderson wrote:
> Start adding infrastructure for accelerating guest AES.
> Begin with a SubBytes + ShiftRows + AddRoundKey primitive.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  host/include/generic/host/aes-round.h | 16 ++++++++++
>  include/crypto/aes-round.h            | 44 +++++++++++++++++++++++++++
>  crypto/aes.c                          | 44 +++++++++++++++++++++++++++
>  3 files changed, 104 insertions(+)
>  create mode 100644 host/include/generic/host/aes-round.h
>  create mode 100644 include/crypto/aes-round.h
> 
> diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
> new file mode 100644
> index 0000000000..19c8505e2b
> --- /dev/null
> +++ b/host/include/generic/host/aes-round.h

Could we put these files under a 'crypto/' subdirectory eg

  host/include/generic/host/crypto/aes-round.h

and then add

  host/include/*/host/crypto

to MAINTAINERS for 'crypto'.

> @@ -0,0 +1,16 @@
> +/*
> + * No host specific aes acceleration.
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef GENERIC_HOST_AES_ROUND_H
> +#define GENERIC_HOST_AES_ROUND_H

To match the extra sub-dir GENERIC_HOST_CRYPTO_AES_ROUND_H

> +
> +#define HAVE_AES_ACCEL  false
> +#define ATTR_AES_ACCEL
> +
> +void aesenc_SB_SR_AK_accel(AESState *, const AESState *,
> +                           const AESState *, bool)
> +    QEMU_ERROR("unsupported accel");
> +
> +#endif
> diff --git a/include/crypto/aes-round.h b/include/crypto/aes-round.h
> new file mode 100644
> index 0000000000..15ea1f42bc
> --- /dev/null
> +++ b/include/crypto/aes-round.h
> @@ -0,0 +1,44 @@
> +/*
> + * AES round fragments, generic version
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * Copyright (C) 2023 Linaro, Ltd.
> + */
> +
> +#ifndef CRYPTO_AES_ROUND_H
> +#define CRYPTO_AES_ROUND_H
> +
> +/* Hosts with acceleration will usually need a 16-byte vector type. */
> +typedef uint8_t AESStateVec __attribute__((vector_size(16)));
> +
> +typedef union {
> +    uint8_t b[16];
> +    uint32_t w[4];
> +    uint64_t d[4];
> +    AESStateVec v;
> +} AESState;
> +
> +#include "host/aes-round.h"
> +
> +/*
> + * Perform SubBytes + ShiftRows.
> + */
> +
> +void aesenc_SB_SR_AK_gen(AESState *ret, const AESState *st,
> +                         const AESState *rk);
> +void aesenc_SB_SR_AK_genrev(AESState *ret, const AESState *st,
> +                            const AESState *rk);
> +
> +static inline void aesenc_SB_SR_AK(AESState *r, const AESState *st,
> +                                   const AESState *rk, bool be)
> +{
> +    if (HAVE_AES_ACCEL) {
> +        aesenc_SB_SR_AK_accel(r, st, rk, be);
> +    } else if (HOST_BIG_ENDIAN == be) {
> +        aesenc_SB_SR_AK_gen(r, st, rk);
> +    } else {
> +        aesenc_SB_SR_AK_genrev(r, st, rk);
> +    }
> +}
> +
> +#endif /* CRYPTO_AES_ROUND_H */
> diff --git a/crypto/aes.c b/crypto/aes.c
> index cdf937883d..896f6f44f1 100644
> --- a/crypto/aes.c
> +++ b/crypto/aes.c
> @@ -29,6 +29,7 @@
>   */
>  #include "qemu/osdep.h"
>  #include "crypto/aes.h"
> +#include "crypto/aes-round.h"
>  
>  typedef uint32_t u32;
>  typedef uint8_t u8;
> @@ -1249,6 +1250,49 @@ static const u32 rcon[] = {
>          0x1B000000, 0x36000000, /* for 128-bit blocks, Rijndael never uses more than 10 rcon values */
>  };
>  
> +/* Perform SubBytes + ShiftRows + AddRoundKey. */
> +static inline void
> +aesenc_SB_SR_AK_swap(AESState *ret, const AESState *st,
> +                     const AESState *rk, bool swap)
> +{
> +    const int swap_b = swap ? 15 : 0;
> +    AESState t;
> +
> +    t.b[swap_b ^ 0x0] = AES_sbox[st->b[swap_b ^ AES_SH_0]];
> +    t.b[swap_b ^ 0x1] = AES_sbox[st->b[swap_b ^ AES_SH_1]];
> +    t.b[swap_b ^ 0x2] = AES_sbox[st->b[swap_b ^ AES_SH_2]];
> +    t.b[swap_b ^ 0x3] = AES_sbox[st->b[swap_b ^ AES_SH_3]];
> +    t.b[swap_b ^ 0x4] = AES_sbox[st->b[swap_b ^ AES_SH_4]];
> +    t.b[swap_b ^ 0x5] = AES_sbox[st->b[swap_b ^ AES_SH_5]];
> +    t.b[swap_b ^ 0x6] = AES_sbox[st->b[swap_b ^ AES_SH_6]];
> +    t.b[swap_b ^ 0x7] = AES_sbox[st->b[swap_b ^ AES_SH_7]];
> +    t.b[swap_b ^ 0x8] = AES_sbox[st->b[swap_b ^ AES_SH_8]];
> +    t.b[swap_b ^ 0x9] = AES_sbox[st->b[swap_b ^ AES_SH_9]];
> +    t.b[swap_b ^ 0xa] = AES_sbox[st->b[swap_b ^ AES_SH_A]];
> +    t.b[swap_b ^ 0xb] = AES_sbox[st->b[swap_b ^ AES_SH_B]];
> +    t.b[swap_b ^ 0xc] = AES_sbox[st->b[swap_b ^ AES_SH_C]];
> +    t.b[swap_b ^ 0xd] = AES_sbox[st->b[swap_b ^ AES_SH_D]];
> +    t.b[swap_b ^ 0xe] = AES_sbox[st->b[swap_b ^ AES_SH_E]];
> +    t.b[swap_b ^ 0xf] = AES_sbox[st->b[swap_b ^ AES_SH_F]];
> +
> +    /*
> +     * Perform the AddRoundKey with generic vectors.
> +     * This may be expanded to either host integer or host vector code.
> +     * The key and output endianness match, so no bswap required.
> +     */
> +    ret->v = t.v ^ rk->v;
> +}
> +
> +void aesenc_SB_SR_AK_gen(AESState *r, const AESState *s, const AESState *k)
> +{
> +    aesenc_SB_SR_AK_swap(r, s, k, false);
> +}
> +
> +void aesenc_SB_SR_AK_genrev(AESState *r, const AESState *s, const AESState *k)
> +{
> +    aesenc_SB_SR_AK_swap(r, s, k, true);
> +}
> +
>  /**
>   * Expand the cipher key into the encryption key schedule.
>   */
> -- 
> 2.34.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 06/38] crypto: Add aesenc_SB_SR_AK
  2023-06-19 16:56   ` Daniel P. Berrangé
@ 2023-06-19 17:05     ` Richard Henderson
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-19 17:05 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, ardb, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 6/19/23 18:56, Daniel P. Berrangé wrote:
> On Thu, Jun 08, 2023 at 07:23:29PM -0700, Richard Henderson wrote:
>> Start adding infrastructure for accelerating guest AES.
>> Begin with a SubBytes + ShiftRows + AddRoundKey primitive.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   host/include/generic/host/aes-round.h | 16 ++++++++++
>>   include/crypto/aes-round.h            | 44 +++++++++++++++++++++++++++
>>   crypto/aes.c                          | 44 +++++++++++++++++++++++++++
>>   3 files changed, 104 insertions(+)
>>   create mode 100644 host/include/generic/host/aes-round.h
>>   create mode 100644 include/crypto/aes-round.h
>>
>> diff --git a/host/include/generic/host/aes-round.h b/host/include/generic/host/aes-round.h
>> new file mode 100644
>> index 0000000000..19c8505e2b
>> --- /dev/null
>> +++ b/host/include/generic/host/aes-round.h
> 
> Could we put these files under a 'crypto/' subdirectory eg
> 
>    host/include/generic/host/crypto/aes-round.h
> 
> and then add
> 
>    host/include/*/host/crypto
> 
> to MAINTAINERS for 'crypto'.

Certainly.


r~


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 33/38] crypto: Implement aesdec_IMC with AES_imc_rot
  2023-06-09  2:23 ` [PATCH v2 33/38] crypto: Implement aesdec_IMC with AES_imc_rot Richard Henderson
@ 2023-06-20  5:09   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 67+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-06-20  5:09 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: ardb, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini

On 9/6/23 04:23, Richard Henderson wrote:
> This method uses one uint32_t * 256 table instead of 4,
> which means its data cache overhead is less.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   crypto/aes.c | 42 +++++++++++++++++++++---------------------
>   1 file changed, 21 insertions(+), 21 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows
  2023-06-09  2:23 ` [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
  2023-06-19 15:41   ` Daniel P. Berrangé
@ 2023-06-29 10:21   ` Ard Biesheuvel
  2023-06-29 11:58     ` Richard Henderson
  1 sibling, 1 reply; 67+ messages in thread
From: Ard Biesheuvel @ 2023-06-29 10:21 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini,
	Philippe Mathieu-Daudé

On Fri, 9 Jun 2023 at 04:24, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> These symbols will avoid the indirection through memory
> when fully unrolling some new primitives.
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  crypto/aes.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 48 insertions(+), 2 deletions(-)
>
> diff --git a/crypto/aes.c b/crypto/aes.c
> index 67bb74b8e3..cdf937883d 100644
> --- a/crypto/aes.c
> +++ b/crypto/aes.c
> @@ -108,12 +108,58 @@ const uint8_t AES_isbox[256] = {
>      0xE1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0C, 0x7D,
>  };
>
> +/* AES ShiftRows, for complete unrolling. */
> +enum {
> +    AES_SH_0 = 0x0,
> +    AES_SH_1 = 0x5,
> +    AES_SH_2 = 0xa,
> +    AES_SH_3 = 0xf,
> +    AES_SH_4 = 0x4,
> +    AES_SH_5 = 0x9,
> +    AES_SH_6 = 0xe,
> +    AES_SH_7 = 0x3,
> +    AES_SH_8 = 0x8,
> +    AES_SH_9 = 0xd,
> +    AES_SH_A = 0x2,
> +    AES_SH_B = 0x7,
> +    AES_SH_C = 0xc,
> +    AES_SH_D = 0x1,
> +    AES_SH_E = 0x6,
> +    AES_SH_F = 0xb,
> +};
> +

We might simplify this further by doing

#define AES_SH(n)  (((n) * 5) % 16)
#define AES_ISH(n)  (((n) * 13) % 16)

>  const uint8_t AES_shifts[16] = {
> -    0, 5, 10, 15, 4, 9, 14, 3, 8, 13, 2, 7, 12, 1, 6, 11
> +    AES_SH_0, AES_SH_1, AES_SH_2, AES_SH_3,
> +    AES_SH_4, AES_SH_5, AES_SH_6, AES_SH_7,
> +    AES_SH_8, AES_SH_9, AES_SH_A, AES_SH_B,
> +    AES_SH_C, AES_SH_D, AES_SH_E, AES_SH_F,
> +};
> +
> +/* AES InvShiftRows, for complete unrolling. */
> +enum {
> +    AES_ISH_0 = 0x0,
> +    AES_ISH_1 = 0xd,
> +    AES_ISH_2 = 0xa,
> +    AES_ISH_3 = 0x7,
> +    AES_ISH_4 = 0x4,
> +    AES_ISH_5 = 0x1,
> +    AES_ISH_6 = 0xe,
> +    AES_ISH_7 = 0xb,
> +    AES_ISH_8 = 0x8,
> +    AES_ISH_9 = 0x5,
> +    AES_ISH_A = 0x2,
> +    AES_ISH_B = 0xf,
> +    AES_ISH_C = 0xc,
> +    AES_ISH_D = 0x9,
> +    AES_ISH_E = 0x6,
> +    AES_ISH_F = 0x3,
>  };
>
>  const uint8_t AES_ishifts[16] = {
> -    0, 13, 10, 7, 4, 1, 14, 11, 8, 5, 2, 15, 12, 9, 6, 3
> +    AES_ISH_0, AES_ISH_1, AES_ISH_2, AES_ISH_3,
> +    AES_ISH_4, AES_ISH_5, AES_ISH_6, AES_ISH_7,
> +    AES_ISH_8, AES_ISH_9, AES_ISH_A, AES_ISH_B,
> +    AES_ISH_C, AES_ISH_D, AES_ISH_E, AES_ISH_F,
>  };
>
>  /*
> --
> 2.34.1
>


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows
  2023-06-29 10:21   ` Ard Biesheuvel
@ 2023-06-29 11:58     ` Richard Henderson
  0 siblings, 0 replies; 67+ messages in thread
From: Richard Henderson @ 2023-06-29 11:58 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: qemu-devel, berrange, qemu-ppc, qemu-arm, qemu-riscv, pbonzini,
	Philippe Mathieu-Daudé

On 6/29/23 12:21, Ard Biesheuvel wrote:
>> +/* AES ShiftRows, for complete unrolling. */
>> +enum {
>> +    AES_SH_0 = 0x0,
>> +    AES_SH_1 = 0x5,
>> +    AES_SH_2 = 0xa,
>> +    AES_SH_3 = 0xf,
>> +    AES_SH_4 = 0x4,
>> +    AES_SH_5 = 0x9,
>> +    AES_SH_6 = 0xe,
>> +    AES_SH_7 = 0x3,
>> +    AES_SH_8 = 0x8,
>> +    AES_SH_9 = 0xd,
>> +    AES_SH_A = 0x2,
>> +    AES_SH_B = 0x7,
>> +    AES_SH_C = 0xc,
>> +    AES_SH_D = 0x1,
>> +    AES_SH_E = 0x6,
>> +    AES_SH_F = 0xb,
>> +};
>> +
> 
> We might simplify this further by doing
> 
> #define AES_SH(n)  (((n) * 5) % 16)
> #define AES_ISH(n)  (((n) * 13) % 16)

Thanks.  I should have noticed, but

   s'_{r,c} = s_{r,(c+r)%4}

didn't make an impression and I assumed the table was non-regular.


r~


^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2023-06-29 11:59 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-09  2:23 [PATCH v2 00/38] crypto: Provide aes-round.h and host accel Richard Henderson
2023-06-09  2:23 ` [PATCH v2 01/38] tcg/ppc: Define _CALL_AIX for clang on ppc64(be) Richard Henderson
2023-06-12 13:25   ` Daniel Henrique Barboza
2023-06-09  2:23 ` [PATCH v2 02/38] util: Add cpuinfo-ppc.c Richard Henderson
2023-06-12 13:27   ` Daniel Henrique Barboza
2023-06-19 10:37   ` Philippe Mathieu-Daudé
2023-06-19 14:44     ` Richard Henderson
2023-06-09  2:23 ` [PATCH v2 03/38] tests/multiarch: Add test-aes Richard Henderson
2023-06-12 14:46   ` Alex Bennée
2023-06-14  3:40     ` Richard Henderson
2023-06-09  2:23 ` [PATCH v2 04/38] target/arm: Move aesmc and aesimc tables to crypto/aes.c Richard Henderson
2023-06-19 16:49   ` Daniel P. Berrangé
2023-06-09  2:23 ` [PATCH v2 05/38] crypto/aes: Add constants for ShiftRows, InvShiftRows Richard Henderson
2023-06-19 15:41   ` Daniel P. Berrangé
2023-06-29 10:21   ` Ard Biesheuvel
2023-06-29 11:58     ` Richard Henderson
2023-06-09  2:23 ` [PATCH v2 06/38] crypto: Add aesenc_SB_SR_AK Richard Henderson
2023-06-19 16:56   ` Daniel P. Berrangé
2023-06-19 17:05     ` Richard Henderson
2023-06-09  2:23 ` [PATCH v2 07/38] target/i386: Use aesenc_SB_SR_AK Richard Henderson
2023-06-19 10:43   ` Philippe Mathieu-Daudé
2023-06-19 10:45     ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 08/38] target/arm: Demultiplex AESE and AESMC Richard Henderson
2023-06-09  2:23 ` [PATCH v2 09/38] target/arm: Use aesenc_SB_SR_AK Richard Henderson
2023-06-09  2:23 ` [PATCH v2 10/38] target/ppc: " Richard Henderson
2023-06-12 13:26   ` Daniel Henrique Barboza
2023-06-19 10:47   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 11/38] target/riscv: " Richard Henderson
2023-06-09  2:23 ` [PATCH v2 12/38] crypto: Add aesdec_ISB_ISR_AK Richard Henderson
2023-06-09  2:23 ` [PATCH v2 13/38] target/i386: Use aesdec_ISB_ISR_AK Richard Henderson
2023-06-19 10:51   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 14/38] target/arm: " Richard Henderson
2023-06-09  2:23 ` [PATCH v2 15/38] target/ppc: " Richard Henderson
2023-06-12 13:27   ` Daniel Henrique Barboza
2023-06-19 10:51   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 16/38] target/riscv: " Richard Henderson
2023-06-09  2:23 ` [PATCH v2 17/38] crypto: Add aesenc_MC Richard Henderson
2023-06-09  2:23 ` [PATCH v2 18/38] target/arm: Use aesenc_MC Richard Henderson
2023-06-09  2:23 ` [PATCH v2 19/38] crypto: Add aesdec_IMC Richard Henderson
2023-06-09  2:23 ` [PATCH v2 20/38] target/i386: Use aesdec_IMC Richard Henderson
2023-06-09  2:23 ` [PATCH v2 21/38] target/arm: " Richard Henderson
2023-06-09  2:23 ` [PATCH v2 22/38] target/riscv: " Richard Henderson
2023-06-09  2:23 ` [PATCH v2 23/38] crypto: Add aesenc_SB_SR_MC_AK Richard Henderson
2023-06-09  2:23 ` [PATCH v2 24/38] target/i386: Use aesenc_SB_SR_MC_AK Richard Henderson
2023-06-09  2:23 ` [PATCH v2 25/38] target/ppc: " Richard Henderson
2023-06-12 13:28   ` Daniel Henrique Barboza
2023-06-09  2:23 ` [PATCH v2 26/38] target/riscv: " Richard Henderson
2023-06-09  2:23 ` [PATCH v2 27/38] crypto: Add aesdec_ISB_ISR_IMC_AK Richard Henderson
2023-06-09  2:23 ` [PATCH v2 28/38] target/i386: Use aesdec_ISB_ISR_IMC_AK Richard Henderson
2023-06-09  2:23 ` [PATCH v2 29/38] target/riscv: " Richard Henderson
2023-06-09  2:23 ` [PATCH v2 30/38] crypto: Add aesdec_ISB_ISR_AK_IMC Richard Henderson
2023-06-19 13:59   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 31/38] target/ppc: Use aesdec_ISB_ISR_AK_IMC Richard Henderson
2023-06-12 13:28   ` Daniel Henrique Barboza
2023-06-19 13:46   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 32/38] crypto: Remove AES_shifts, AES_ishifts Richard Henderson
2023-06-19 13:45   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 33/38] crypto: Implement aesdec_IMC with AES_imc_rot Richard Henderson
2023-06-20  5:09   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 34/38] crypto: Remove AES_imc Richard Henderson
2023-06-19 13:19   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 35/38] crypto: Unexport AES_*_rot, AES_TeN, AES_TdN Richard Henderson
2023-06-19 13:18   ` Philippe Mathieu-Daudé
2023-06-09  2:23 ` [PATCH v2 36/38] host/include/i386: Implement aes-round.h Richard Henderson
2023-06-09  2:24 ` [PATCH v2 37/38] host/include/aarch64: " Richard Henderson
2023-06-09  2:24 ` [PATCH v2 38/38] host/include/ppc: " Richard Henderson
2023-06-12 13:30   ` Daniel Henrique Barboza

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.