* [PATCH 0/2] target/arm: Implement an IMPDEF pauth algorithm @ 2020-08-12 6:53 Richard Henderson 2020-08-12 6:53 ` [PATCH 1/2] target/arm: Add cpu property to control pauth Richard Henderson 2020-08-12 6:53 ` [PATCH 2/2] target/arm: Implement an IMPDEF pauth algorithm Richard Henderson 0 siblings, 2 replies; 13+ messages in thread From: Richard Henderson @ 2020-08-12 6:53 UTC (permalink / raw) To: qemu-devel; +Cc: mark.rutland, peter.maydell, alex.bennee With recent linux kernels, which are built with pauth enabled, boot times have been significantly impacted. It turns out the architected pac algorithm is expensive to implement without hardware accel. I tried replacing this with AES128, which most hosts have in hardware. I did manage to make this perform fairly well, but it *really* depends on the quality of the crypto library shipped by the OS: (0) For reference, pauth_computepac_architected consumes about 80% of the total runtime for a debug-enabled kernel build. That's a 400% slowdown. (1) libgcrypt performed well, with 15% of the total runtime, or about 18% slowdown. (2a) libnettle, as shipped by ubuntu 18, does not have the x86_64 aes instruction set enabled, and so is not using the available hw accel. That took about 40% of the total runtime or 170% slowdown. (2b) Rebuilding libnettle locally, with --enable-fat, and using LD_LIBRARY_PATH to replace the system version, worked fairly well, with about 20% of the total runtime or 25% slowdown. (2c) libnettle doesn't have support for armv8, ppc or s390. Those hosts should *really* be using libgcrypt. But, silly me, I had used --target-list=aarch64-softmmu for this testing, in order to speed up the builds. When I went back to building aarch64-linux-user, I was reminded that we don't link linux-user binaries against the crypto libraries. And those crypto libraries are usually only distributed as shared libraries, which would cause problems for --static. I very briefly looked into doing my own aes implementation, with host cpu detection. But aside from the ugliness of that, it begs the question of what to do if there's no host accel. I settled on a fast non-cryptographic hash with about 10% overhead. I added a -cpu max,pauth={off,impdef,arch} property to choose between the different algorithms. The default remains arch, since that's what 5.0 and 5.1 shipped. We can discuss whether it makes sense to change the default for "max". r~ Richard Henderson (2): target/arm: Add cpu property to control pauth target/arm: Implement an IMPDEF pauth algorithm target/arm/cpu64.c | 64 +++++++++++++++++++++++++++++++++ target/arm/pauth_helper.c | 75 ++++++++++++++++++++++++++++++++++++--- 2 files changed, 134 insertions(+), 5 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] target/arm: Add cpu property to control pauth 2020-08-12 6:53 [PATCH 0/2] target/arm: Implement an IMPDEF pauth algorithm Richard Henderson @ 2020-08-12 6:53 ` Richard Henderson 2020-08-12 11:00 ` Andrew Jones 2020-08-12 6:53 ` [PATCH 2/2] target/arm: Implement an IMPDEF pauth algorithm Richard Henderson 1 sibling, 1 reply; 13+ messages in thread From: Richard Henderson @ 2020-08-12 6:53 UTC (permalink / raw) To: qemu-devel; +Cc: mark.rutland, peter.maydell, alex.bennee The crypto overhead of emulating pauth can be significant for some workloads. Add an enumeration property that allows the feature to be turned off, on with the architected algorithm, or on with an implementation defined algorithm. The architected algorithm is quite expensive to emulate; using another algorithm may allow hardware acceleration. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/cpu64.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c index dd696183df..3181d0e2f8 100644 --- a/target/arm/cpu64.c +++ b/target/arm/cpu64.c @@ -572,6 +572,69 @@ void aarch64_add_sve_properties(Object *obj) } } +static const char * const pauth_names[] = { + "off", "impdef", "arch" +}; + +static const QEnumLookup pauth_lookup = { + .array = pauth_names, + .size = ARRAY_SIZE(pauth_names) +}; + +static int cpu_arm_get_pauth(Object *obj, Error **errp) +{ + ARMCPU *cpu = ARM_CPU(obj); + int value; + + /* We will always set GPA+APA and GPI+API to the same value. */ + if (FIELD_EX64(cpu->isar.id_aa64isar1, ID_AA64ISAR1, APA)) { + value = 2; + } else if (FIELD_EX64(cpu->isar.id_aa64isar1, ID_AA64ISAR1, API)) { + value = 1; + } else { + value = 0; + } + return value; +} + +static void cpu_arm_set_pauth(Object *obj, int value, Error **errp) +{ + ARMCPU *cpu = ARM_CPU(obj); + uint64_t t; + + /* TODO: Handle HaveEnhancedPAC, HaveEnhancedPAC2, HaveFPAC. */ + t = cpu->isar.id_aa64isar1; + switch (value) { + case 0: + t = FIELD_DP64(t, ID_AA64ISAR1, APA, 0); + t = FIELD_DP64(t, ID_AA64ISAR1, API, 0); + t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 0); + t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 0); + break; + case 1: + t = FIELD_DP64(t, ID_AA64ISAR1, APA, 0); + t = FIELD_DP64(t, ID_AA64ISAR1, API, 1); + t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 0); + t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 1); + break; + case 2: + t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); + t = FIELD_DP64(t, ID_AA64ISAR1, API, 0); + t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 1); + t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 0); + break; + default: + g_assert_not_reached(); + } + cpu->isar.id_aa64isar1 = t; +} + +static void aarch64_add_pauth_properties(Object *obj) +{ + object_property_add_enum(obj, "pauth", "ARMPauthKind", &pauth_lookup, + cpu_arm_get_pauth, cpu_arm_set_pauth); +} + /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host); * otherwise, a CPU with as many features enabled as our emulation supports. * The version of '-cpu max' for qemu-system-arm is defined in cpu.c; @@ -720,6 +783,7 @@ static void aarch64_max_initfn(Object *obj) #endif } + aarch64_add_pauth_properties(obj); aarch64_add_sve_properties(obj); object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq, cpu_max_set_sve_max_vq, NULL, NULL); -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] target/arm: Add cpu property to control pauth 2020-08-12 6:53 ` [PATCH 1/2] target/arm: Add cpu property to control pauth Richard Henderson @ 2020-08-12 11:00 ` Andrew Jones 2020-08-12 15:10 ` Richard Henderson 0 siblings, 1 reply; 13+ messages in thread From: Andrew Jones @ 2020-08-12 11:00 UTC (permalink / raw) To: Richard Henderson; +Cc: mark.rutland, peter.maydell, alex.bennee, qemu-devel On Tue, Aug 11, 2020 at 11:53:38PM -0700, Richard Henderson wrote: > The crypto overhead of emulating pauth can be significant for > some workloads. Add an enumeration property that allows the > feature to be turned off, on with the architected algorithm, > or on with an implementation defined algorithm. > > The architected algorithm is quite expensive to emulate; > using another algorithm may allow hardware acceleration. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > target/arm/cpu64.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 64 insertions(+) > > diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c > index dd696183df..3181d0e2f8 100644 > --- a/target/arm/cpu64.c > +++ b/target/arm/cpu64.c > @@ -572,6 +572,69 @@ void aarch64_add_sve_properties(Object *obj) > } > } > > +static const char * const pauth_names[] = { > + "off", "impdef", "arch" > +}; Hi Richard, Please add three boolean properties, rather than one enum: pauth: enable support of the pauth feature pauth-fast: enable QEMU's fast non-cryptographic hash for pauth (pauth must be enabled) pauth-arch: enable the architected algorithm for pauth (pauth must be enabled) These booleans can then be added to the cpu feature probing list used by qmp_query_cpu_model_expansion() > + > +static const QEnumLookup pauth_lookup = { > + .array = pauth_names, > + .size = ARRAY_SIZE(pauth_names) > +}; > + > +static int cpu_arm_get_pauth(Object *obj, Error **errp) > +{ > + ARMCPU *cpu = ARM_CPU(obj); > + int value; > + > + /* We will always set GPA+APA and GPI+API to the same value. */ > + if (FIELD_EX64(cpu->isar.id_aa64isar1, ID_AA64ISAR1, APA)) { > + value = 2; > + } else if (FIELD_EX64(cpu->isar.id_aa64isar1, ID_AA64ISAR1, API)) { > + value = 1; > + } else { > + value = 0; > + } > + return value; > +} > + > +static void cpu_arm_set_pauth(Object *obj, int value, Error **errp) > +{ > + ARMCPU *cpu = ARM_CPU(obj); > + uint64_t t; > + > + /* TODO: Handle HaveEnhancedPAC, HaveEnhancedPAC2, HaveFPAC. */ > + t = cpu->isar.id_aa64isar1; > + switch (value) { > + case 0: > + t = FIELD_DP64(t, ID_AA64ISAR1, APA, 0); > + t = FIELD_DP64(t, ID_AA64ISAR1, API, 0); > + t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 0); > + t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 0); > + break; > + case 1: > + t = FIELD_DP64(t, ID_AA64ISAR1, APA, 0); > + t = FIELD_DP64(t, ID_AA64ISAR1, API, 1); > + t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 0); > + t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 1); > + break; > + case 2: > + t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); > + t = FIELD_DP64(t, ID_AA64ISAR1, API, 0); > + t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 1); > + t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 0); > + break; > + default: > + g_assert_not_reached(); > + } > + cpu->isar.id_aa64isar1 = t; > +} > + > +static void aarch64_add_pauth_properties(Object *obj) > +{ > + object_property_add_enum(obj, "pauth", "ARMPauthKind", &pauth_lookup, > + cpu_arm_get_pauth, cpu_arm_set_pauth); > +} > + > /* -cpu max: if KVM is enabled, like -cpu host (best possible with this host); > * otherwise, a CPU with as many features enabled as our emulation supports. > * The version of '-cpu max' for qemu-system-arm is defined in cpu.c; > @@ -720,6 +783,7 @@ static void aarch64_max_initfn(Object *obj) > #endif > } > > + aarch64_add_pauth_properties(obj); We don't yet support enabling pauth on KVM accelerated guests, so this call should be in the !kvm_enabled() part of this function. > aarch64_add_sve_properties(obj); > object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq, > cpu_max_set_sve_max_vq, NULL, NULL); > -- > 2.25.1 > > Thanks, drew ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] target/arm: Add cpu property to control pauth 2020-08-12 11:00 ` Andrew Jones @ 2020-08-12 15:10 ` Richard Henderson 2020-08-12 16:31 ` Andrew Jones 0 siblings, 1 reply; 13+ messages in thread From: Richard Henderson @ 2020-08-12 15:10 UTC (permalink / raw) To: Andrew Jones; +Cc: mark.rutland, peter.maydell, alex.bennee, qemu-devel On 8/12/20 4:00 AM, Andrew Jones wrote: > On Tue, Aug 11, 2020 at 11:53:38PM -0700, Richard Henderson wrote: >> The crypto overhead of emulating pauth can be significant for >> some workloads. Add an enumeration property that allows the >> feature to be turned off, on with the architected algorithm, >> or on with an implementation defined algorithm. >> >> The architected algorithm is quite expensive to emulate; >> using another algorithm may allow hardware acceleration. >> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> --- >> target/arm/cpu64.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 64 insertions(+) >> >> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c >> index dd696183df..3181d0e2f8 100644 >> --- a/target/arm/cpu64.c >> +++ b/target/arm/cpu64.c >> @@ -572,6 +572,69 @@ void aarch64_add_sve_properties(Object *obj) >> } >> } >> >> +static const char * const pauth_names[] = { >> + "off", "impdef", "arch" >> +}; > > Hi Richard, > > Please add three boolean properties, rather than one enum: > > pauth: enable support of the pauth feature > pauth-fast: enable QEMU's fast non-cryptographic hash for pauth > (pauth must be enabled) > pauth-arch: enable the architected algorithm for pauth > (pauth must be enabled) > > These booleans can then be added to the cpu feature probing list used by > qmp_query_cpu_model_expansion() Why are 3 booleans better than one enum? I'd forgotten about qmp_query_cpu_model_expansion(); can it not take anything but booleans? r~ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] target/arm: Add cpu property to control pauth 2020-08-12 15:10 ` Richard Henderson @ 2020-08-12 16:31 ` Andrew Jones 2020-08-13 6:03 ` Andrew Jones 0 siblings, 1 reply; 13+ messages in thread From: Andrew Jones @ 2020-08-12 16:31 UTC (permalink / raw) To: Richard Henderson; +Cc: mark.rutland, peter.maydell, alex.bennee, qemu-devel On Wed, Aug 12, 2020 at 08:10:47AM -0700, Richard Henderson wrote: > On 8/12/20 4:00 AM, Andrew Jones wrote: > > On Tue, Aug 11, 2020 at 11:53:38PM -0700, Richard Henderson wrote: > >> The crypto overhead of emulating pauth can be significant for > >> some workloads. Add an enumeration property that allows the > >> feature to be turned off, on with the architected algorithm, > >> or on with an implementation defined algorithm. > >> > >> The architected algorithm is quite expensive to emulate; > >> using another algorithm may allow hardware acceleration. > >> > >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > >> --- > >> target/arm/cpu64.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 64 insertions(+) > >> > >> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c > >> index dd696183df..3181d0e2f8 100644 > >> --- a/target/arm/cpu64.c > >> +++ b/target/arm/cpu64.c > >> @@ -572,6 +572,69 @@ void aarch64_add_sve_properties(Object *obj) > >> } > >> } > >> > >> +static const char * const pauth_names[] = { > >> + "off", "impdef", "arch" > >> +}; > > > > Hi Richard, > > > > Please add three boolean properties, rather than one enum: > > > > pauth: enable support of the pauth feature > > pauth-fast: enable QEMU's fast non-cryptographic hash for pauth > > (pauth must be enabled) > > pauth-arch: enable the architected algorithm for pauth > > (pauth must be enabled) > > > > These booleans can then be added to the cpu feature probing list used by > > qmp_query_cpu_model_expansion() > > Why are 3 booleans better than one enum? > > I'd forgotten about qmp_query_cpu_model_expansion(); can it not take anything > but booleans? Right. The probing works by getting a list of possible CPU features, which are all boolean properties. That way the prober can try enabling/disabling them without having to know about each property's set of valid values. We could implement each as an enum and a second level of probing, but that would complicate the probing, and I'm not sure enums gain us much over multiple properties. In this case, since pauth-fast and pauth-arch are mutually exclusive and we want a pauth=on/off too, then we'll need a finalize function like SVE has in order to support the following selections: # Default (pauth-arch), explicitly selected or not -cpu max[,pauth=on] -cpu max[,pauth=on][,pauth-fast=off],pauth-arch=on # Select pauth-fast -cpu max[,pauth=on][,pauth-arch=off],pauth-fast=on # Disable -cpu max,pauth=off -cpu max[,pauth=off],pauth-arch=off,pauth-fast=off # Mutual exclusion errors -cpu max,pauth=off,pauth-{arch,fast}=on -cpu max,pauth=on,pauth-arch=off,pauth-fast=off -cpu max[,pauth=on],pauth-arch=on,pauth-fast=on # Errors because we don't want to guess what the user means -cpu max[,pauth=on],pauth-arch=off -cpu max[,pauth=on],pauth-fast=off Thanks, drew ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] target/arm: Add cpu property to control pauth 2020-08-12 16:31 ` Andrew Jones @ 2020-08-13 6:03 ` Andrew Jones 2020-08-13 9:05 ` Mark Rutland 0 siblings, 1 reply; 13+ messages in thread From: Andrew Jones @ 2020-08-13 6:03 UTC (permalink / raw) To: Richard Henderson; +Cc: mark.rutland, peter.maydell, alex.bennee, qemu-devel On Wed, Aug 12, 2020 at 06:31:11PM +0200, Andrew Jones wrote: > On Wed, Aug 12, 2020 at 08:10:47AM -0700, Richard Henderson wrote: > > On 8/12/20 4:00 AM, Andrew Jones wrote: > > > On Tue, Aug 11, 2020 at 11:53:38PM -0700, Richard Henderson wrote: > > >> The crypto overhead of emulating pauth can be significant for > > >> some workloads. Add an enumeration property that allows the > > >> feature to be turned off, on with the architected algorithm, > > >> or on with an implementation defined algorithm. > > >> > > >> The architected algorithm is quite expensive to emulate; > > >> using another algorithm may allow hardware acceleration. > > >> > > >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > > >> --- > > >> target/arm/cpu64.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ > > >> 1 file changed, 64 insertions(+) > > >> > > >> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c > > >> index dd696183df..3181d0e2f8 100644 > > >> --- a/target/arm/cpu64.c > > >> +++ b/target/arm/cpu64.c > > >> @@ -572,6 +572,69 @@ void aarch64_add_sve_properties(Object *obj) > > >> } > > >> } > > >> > > >> +static const char * const pauth_names[] = { > > >> + "off", "impdef", "arch" > > >> +}; > > > > > > Hi Richard, > > > > > > Please add three boolean properties, rather than one enum: > > > > > > pauth: enable support of the pauth feature > > > pauth-fast: enable QEMU's fast non-cryptographic hash for pauth > > > (pauth must be enabled) > > > pauth-arch: enable the architected algorithm for pauth > > > (pauth must be enabled) > > > > > > These booleans can then be added to the cpu feature probing list used by > > > qmp_query_cpu_model_expansion() > > > > Why are 3 booleans better than one enum? > > > > I'd forgotten about qmp_query_cpu_model_expansion(); can it not take anything > > but booleans? > > Right. The probing works by getting a list of possible CPU features, which > are all boolean properties. That way the prober can try enabling/disabling > them without having to know about each property's set of valid values. We > could implement each as an enum and a second level of probing, but that > would complicate the probing, and I'm not sure enums gain us much over > multiple properties. > > In this case, since pauth-fast and pauth-arch are mutually exclusive and > we want a pauth=on/off too, then we'll need a finalize function like SVE > has in order to support the following selections: > > # Default (pauth-arch), explicitly selected or not > -cpu max[,pauth=on] > -cpu max[,pauth=on][,pauth-fast=off],pauth-arch=on > > # Select pauth-fast > -cpu max[,pauth=on][,pauth-arch=off],pauth-fast=on > > # Disable > -cpu max,pauth=off > -cpu max[,pauth=off],pauth-arch=off,pauth-fast=off > > # Mutual exclusion errors > -cpu max,pauth=off,pauth-{arch,fast}=on > -cpu max,pauth=on,pauth-arch=off,pauth-fast=off > -cpu max[,pauth=on],pauth-arch=on,pauth-fast=on > > # Errors because we don't want to guess what the user means > -cpu max[,pauth=on],pauth-arch=off > -cpu max[,pauth=on],pauth-fast=off Thinking about this some more, maybe we don't need pauth-arch. If we don't, then it simplifies nicely to # Default (enabled with architected algorithm) -cpu max[,pauth=on][,pauth-fast=off] # Select pauth-fast -cpu max[,pauth=on],pauth-fast=on # Disable -cpu max[,pauth-fast=off],pauth=off # Mutual exclusion errors -cpu max,pauth=off,pauth-fast=on Thanks, drew ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] target/arm: Add cpu property to control pauth 2020-08-13 6:03 ` Andrew Jones @ 2020-08-13 9:05 ` Mark Rutland 2020-08-13 9:49 ` Andrew Jones 0 siblings, 1 reply; 13+ messages in thread From: Mark Rutland @ 2020-08-13 9:05 UTC (permalink / raw) To: Andrew Jones; +Cc: alex.bennee, peter.maydell, Richard Henderson, qemu-devel On Thu, Aug 13, 2020 at 08:03:21AM +0200, Andrew Jones wrote: > Thinking about this some more, maybe we don't need pauth-arch. > If we don't, then it simplifies nicely to > > # Default (enabled with architected algorithm) > -cpu max[,pauth=on][,pauth-fast=off] > > # Select pauth-fast > -cpu max[,pauth=on],pauth-fast=on One reason that users may wish to choose the IMP-DEF algorithm is for functional testing regardless of speed (since APA+GPA / API+GPI depend on whether the algo is architected or imp-def). Given that, I think that "impdef" is a better option name than "pauth-fast", and speed is a benefit but not the only reason to use the option. How about hacing a 'pauth-algo' option which defaults to architected, e.g. | -cpu max,pauth={on,off},pauth-algo={impdef,architected} ... then in future the 'pauth={on,off}' bit could de extended to cover FPAC version etc independently of the algorithm, but for now that can be a boolean. Thanks, Mark. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] target/arm: Add cpu property to control pauth 2020-08-13 9:05 ` Mark Rutland @ 2020-08-13 9:49 ` Andrew Jones 2020-08-13 11:10 ` Mark Rutland 0 siblings, 1 reply; 13+ messages in thread From: Andrew Jones @ 2020-08-13 9:49 UTC (permalink / raw) To: Mark Rutland; +Cc: alex.bennee, peter.maydell, Richard Henderson, qemu-devel On Thu, Aug 13, 2020 at 10:05:04AM +0100, Mark Rutland wrote: > On Thu, Aug 13, 2020 at 08:03:21AM +0200, Andrew Jones wrote: > > Thinking about this some more, maybe we don't need pauth-arch. > > If we don't, then it simplifies nicely to > > > > # Default (enabled with architected algorithm) > > -cpu max[,pauth=on][,pauth-fast=off] > > > > # Select pauth-fast > > -cpu max[,pauth=on],pauth-fast=on > > One reason that users may wish to choose the IMP-DEF algorithm is for > functional testing regardless of speed (since APA+GPA / API+GPI depend > on whether the algo is architected or imp-def). > > Given that, I think that "impdef" is a better option name than > "pauth-fast", and speed is a benefit but not the only reason to use the > option. I was going with pauth-fast because in this case Richard identified a need for a fast version. If we identify another need later, then we may want to add another impdef version, e.g. pauth-foo. Maybe all the impdef versions should have impdef in their name to make that more explicit? pauth-impdef-fast pauth-impdef-foo ... That seems a bit verbose, though, and we can document that pauth-* are impdefs and that the default is architected. > > How about hacing a 'pauth-algo' option which defaults to architected, > e.g. > > | -cpu max,pauth={on,off},pauth-algo={impdef,architected} > > ... then in future the 'pauth={on,off}' bit could de extended to cover > FPAC version etc independently of the algorithm, but for now that can be > a boolean. > Keeping pauth a boolean, but creating a pauth-algo enum doesn't help us much, because probing will only be possible for pauth. The prober could only decide to enable pauth with the default algo or not. We could create a pauth-specific probe, similar to the gic-specific probe, but, IMO, it's really not necessary. If we need multiple pauth-* properties, then we can have them all. It just adds a few more lines of logic to the pauth finalize function. Thanks, drew ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/2] target/arm: Add cpu property to control pauth 2020-08-13 9:49 ` Andrew Jones @ 2020-08-13 11:10 ` Mark Rutland 0 siblings, 0 replies; 13+ messages in thread From: Mark Rutland @ 2020-08-13 11:10 UTC (permalink / raw) To: Andrew Jones; +Cc: alex.bennee, peter.maydell, Richard Henderson, qemu-devel On Thu, Aug 13, 2020 at 11:49:07AM +0200, Andrew Jones wrote: > On Thu, Aug 13, 2020 at 10:05:04AM +0100, Mark Rutland wrote: > > On Thu, Aug 13, 2020 at 08:03:21AM +0200, Andrew Jones wrote: > > > Thinking about this some more, maybe we don't need pauth-arch. > > > If we don't, then it simplifies nicely to > > > > > > # Default (enabled with architected algorithm) > > > -cpu max[,pauth=on][,pauth-fast=off] > > > > > > # Select pauth-fast > > > -cpu max[,pauth=on],pauth-fast=on > > > > One reason that users may wish to choose the IMP-DEF algorithm is for > > functional testing regardless of speed (since APA+GPA / API+GPI depend > > on whether the algo is architected or imp-def). > > > > Given that, I think that "impdef" is a better option name than > > "pauth-fast", and speed is a benefit but not the only reason to use the > > option. > > I was going with pauth-fast because in this case Richard identified a > need for a fast version. If we identify another need later, then we may > want to add another impdef version, e.g. pauth-foo. Maybe all the impdef > versions should have impdef in their name to make that more explicit? > > pauth-impdef-fast > pauth-impdef-foo Something like that is fine by me. > > How about hacing a 'pauth-algo' option which defaults to architected, > > e.g. > > > > | -cpu max,pauth={on,off},pauth-algo={impdef,architected} > > > > ... then in future the 'pauth={on,off}' bit could de extended to cover > > FPAC version etc independently of the algorithm, but for now that can be > > a boolean. > > Keeping pauth a boolean, but creating a pauth-algo enum doesn't help us > much, because probing will only be possible for pauth. The prober could > only decide to enable pauth with the default algo or not. We could create > a pauth-specific probe, similar to the gic-specific probe, but, IMO, it's > really not necessary. If we need multiple pauth-* properties, then we can > have them all. It just adds a few more lines of logic to the pauth > finalize function. I suspect that it will be necessary in future handle multiple options to enumerate things like FPAC/EPAC separately from the algorithm used, which is why I suggested the algo being its own option now. I can live with whatever you folk decide on. Thanks, Mark. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 2/2] target/arm: Implement an IMPDEF pauth algorithm 2020-08-12 6:53 [PATCH 0/2] target/arm: Implement an IMPDEF pauth algorithm Richard Henderson 2020-08-12 6:53 ` [PATCH 1/2] target/arm: Add cpu property to control pauth Richard Henderson @ 2020-08-12 6:53 ` Richard Henderson 2020-08-12 9:49 ` Alex Bennée 1 sibling, 1 reply; 13+ messages in thread From: Richard Henderson @ 2020-08-12 6:53 UTC (permalink / raw) To: qemu-devel; +Cc: mark.rutland, peter.maydell, alex.bennee Without hardware acceleration, a cryptographically strong algorithm is too expensive for pauth_computepac. Even with hardware accel, we are not currently expecting to link the linux-user binaries to any crypto libraries, and doing so would generally make the --static build fail. So choose XXH64 as a reasonably quick and decent hash. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- target/arm/pauth_helper.c | 75 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 70 insertions(+), 5 deletions(-) diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c index 6dbab03768..f1a4389465 100644 --- a/target/arm/pauth_helper.c +++ b/target/arm/pauth_helper.c @@ -207,8 +207,8 @@ static uint64_t tweak_inv_shuffle(uint64_t i) return o; } -static uint64_t pauth_computepac(uint64_t data, uint64_t modifier, - ARMPACKey key) +static uint64_t __attribute__((noinline)) +pauth_computepac_architected(uint64_t data, uint64_t modifier, ARMPACKey key) { static const uint64_t RC[5] = { 0x0000000000000000ull, @@ -272,6 +272,71 @@ static uint64_t pauth_computepac(uint64_t data, uint64_t modifier, return workingval; } +/* + * The XXH64 algorithm from + * https://github.com/Cyan4973/xxHash/blob/v0.8.0/xxhash.h + */ +#define PRIME64_1 0x9E3779B185EBCA87ULL +#define PRIME64_2 0xC2B2AE3D27D4EB4FULL +#define PRIME64_3 0x165667B19E3779F9ULL +#define PRIME64_4 0x85EBCA77C2B2AE63ULL +#define PRIME64_5 0x27D4EB2F165667C5ULL + +static inline uint64_t XXH64_round(uint64_t acc, uint64_t input) +{ + return rol64(acc + input * PRIME64_2, 31) * PRIME64_1; +} + +static inline uint64_t XXH64_mergeround(uint64_t acc, uint64_t val) +{ + return (acc ^ XXH64_round(0, val)) * PRIME64_1 + PRIME64_4; +} + +static inline uint64_t XXH64_avalanche(uint64_t h64) +{ + h64 ^= h64 >> 33; + h64 *= PRIME64_2; + h64 ^= h64 >> 29; + h64 *= PRIME64_3; + /* h64 ^= h64 >> 32; -- does not affect high 64 for pauth */ + return h64; +} + +static uint64_t __attribute__((noinline)) +pauth_computepac_impdef(uint64_t data, uint64_t modifier, ARMPACKey key) +{ + uint64_t v1 = 1 + PRIME64_1 + PRIME64_2; + uint64_t v2 = 1 + PRIME64_2; + uint64_t v3 = 1 + 0; + uint64_t v4 = 1 - PRIME64_1; + uint64_t h64; + + v1 = XXH64_round(v1, data); + v2 = XXH64_round(v2, modifier); + v3 = XXH64_round(v3, key.lo); + v4 = XXH64_round(v4, key.hi); + + h64 = rol64(v1, 1) + rol64(v2, 7) + rol64(v3, 12) + rol64(v4, 18); + h64 = XXH64_mergeround(h64, v1); + h64 = XXH64_mergeround(h64, v2); + h64 = XXH64_mergeround(h64, v3); + h64 = XXH64_mergeround(h64, v4); + + return XXH64_avalanche(h64); +} + +static uint64_t pauth_computepac(CPUARMState *env, uint64_t data, + uint64_t modifier, ARMPACKey key) +{ + ARMCPU *cpu = env_archcpu(env); + + if (FIELD_EX64(cpu->isar.id_aa64isar1, ID_AA64ISAR1, APA)) { + return pauth_computepac_architected(data, modifier, key); + } else { + return pauth_computepac_impdef(data, modifier, key); + } +} + static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier, ARMPACKey *key, bool data) { @@ -292,7 +357,7 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier, bot_bit = 64 - param.tsz; ext_ptr = deposit64(ptr, bot_bit, top_bit - bot_bit, ext); - pac = pauth_computepac(ext_ptr, modifier, *key); + pac = pauth_computepac(env, ext_ptr, modifier, *key); /* * Check if the ptr has good extension bits and corrupt the @@ -341,7 +406,7 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier, uint64_t pac, orig_ptr, test; orig_ptr = pauth_original_ptr(ptr, param); - pac = pauth_computepac(orig_ptr, modifier, *key); + pac = pauth_computepac(env, orig_ptr, modifier, *key); bot_bit = 64 - param.tsz; top_bit = 64 - 8 * param.tbi; @@ -442,7 +507,7 @@ uint64_t HELPER(pacga)(CPUARMState *env, uint64_t x, uint64_t y) uint64_t pac; pauth_check_trap(env, arm_current_el(env), GETPC()); - pac = pauth_computepac(x, y, env->keys.apga); + pac = pauth_computepac(env, x, y, env->keys.apga); return pac & 0xffffffff00000000ull; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] target/arm: Implement an IMPDEF pauth algorithm 2020-08-12 6:53 ` [PATCH 2/2] target/arm: Implement an IMPDEF pauth algorithm Richard Henderson @ 2020-08-12 9:49 ` Alex Bennée 2020-08-12 15:13 ` Richard Henderson 0 siblings, 1 reply; 13+ messages in thread From: Alex Bennée @ 2020-08-12 9:49 UTC (permalink / raw) To: Richard Henderson; +Cc: mark.rutland, peter.maydell, qemu-devel Richard Henderson <richard.henderson@linaro.org> writes: > Without hardware acceleration, a cryptographically strong > algorithm is too expensive for pauth_computepac. > > Even with hardware accel, we are not currently expecting > to link the linux-user binaries to any crypto libraries, > and doing so would generally make the --static build fail. > > So choose XXH64 as a reasonably quick and decent hash. > > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > target/arm/pauth_helper.c | 75 ++++++++++++++++++++++++++++++++++++--- > 1 file changed, 70 insertions(+), 5 deletions(-) > > diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c > index 6dbab03768..f1a4389465 100644 > --- a/target/arm/pauth_helper.c > +++ b/target/arm/pauth_helper.c > @@ -207,8 +207,8 @@ static uint64_t tweak_inv_shuffle(uint64_t i) > return o; > } > > -static uint64_t pauth_computepac(uint64_t data, uint64_t modifier, > - ARMPACKey key) > +static uint64_t __attribute__((noinline)) > +pauth_computepac_architected(uint64_t data, uint64_t modifier, ARMPACKey key) > { > static const uint64_t RC[5] = { > 0x0000000000000000ull, > @@ -272,6 +272,71 @@ static uint64_t pauth_computepac(uint64_t data, uint64_t modifier, > return workingval; > } > > +/* > + * The XXH64 algorithm from > + * https://github.com/Cyan4973/xxHash/blob/v0.8.0/xxhash.h > + */ > +#define PRIME64_1 0x9E3779B185EBCA87ULL > +#define PRIME64_2 0xC2B2AE3D27D4EB4FULL > +#define PRIME64_3 0x165667B19E3779F9ULL > +#define PRIME64_4 0x85EBCA77C2B2AE63ULL > +#define PRIME64_5 0x27D4EB2F165667C5ULL > + > +static inline uint64_t XXH64_round(uint64_t acc, uint64_t input) > +{ > + return rol64(acc + input * PRIME64_2, 31) * PRIME64_1; > +} > + > +static inline uint64_t XXH64_mergeround(uint64_t acc, uint64_t val) > +{ > + return (acc ^ XXH64_round(0, val)) * PRIME64_1 + PRIME64_4; > +} > + > +static inline uint64_t XXH64_avalanche(uint64_t h64) > +{ > + h64 ^= h64 >> 33; > + h64 *= PRIME64_2; > + h64 ^= h64 >> 29; > + h64 *= PRIME64_3; > + /* h64 ^= h64 >> 32; -- does not affect high 64 for pauth */ > + return h64; > +} > + > +static uint64_t __attribute__((noinline)) > +pauth_computepac_impdef(uint64_t data, uint64_t modifier, ARMPACKey key) > +{ > + uint64_t v1 = 1 + PRIME64_1 + PRIME64_2; > + uint64_t v2 = 1 + PRIME64_2; > + uint64_t v3 = 1 + 0; > + uint64_t v4 = 1 - PRIME64_1; > + uint64_t h64; > + > + v1 = XXH64_round(v1, data); > + v2 = XXH64_round(v2, modifier); > + v3 = XXH64_round(v3, key.lo); > + v4 = XXH64_round(v4, key.hi); > + > + h64 = rol64(v1, 1) + rol64(v2, 7) + rol64(v3, 12) + rol64(v4, 18); > + h64 = XXH64_mergeround(h64, v1); > + h64 = XXH64_mergeround(h64, v2); > + h64 = XXH64_mergeround(h64, v3); > + h64 = XXH64_mergeround(h64, v4); > + > + return XXH64_avalanche(h64); > +} You might find it easier to #include "qemu/xxhash.h" which we use for tb hashing amongst other things. -- Alex Bennée ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] target/arm: Implement an IMPDEF pauth algorithm 2020-08-12 9:49 ` Alex Bennée @ 2020-08-12 15:13 ` Richard Henderson 2020-08-12 17:13 ` Alex Bennée 0 siblings, 1 reply; 13+ messages in thread From: Richard Henderson @ 2020-08-12 15:13 UTC (permalink / raw) To: Alex Bennée; +Cc: mark.rutland, peter.maydell, qemu-devel On 8/12/20 2:49 AM, Alex Bennée wrote: > > Richard Henderson <richard.henderson@linaro.org> writes: > >> Without hardware acceleration, a cryptographically strong >> algorithm is too expensive for pauth_computepac. >> >> Even with hardware accel, we are not currently expecting >> to link the linux-user binaries to any crypto libraries, >> and doing so would generally make the --static build fail. >> >> So choose XXH64 as a reasonably quick and decent hash. >> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> --- >> target/arm/pauth_helper.c | 75 ++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 70 insertions(+), 5 deletions(-) >> >> diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c >> index 6dbab03768..f1a4389465 100644 >> --- a/target/arm/pauth_helper.c >> +++ b/target/arm/pauth_helper.c >> @@ -207,8 +207,8 @@ static uint64_t tweak_inv_shuffle(uint64_t i) >> return o; >> } >> >> -static uint64_t pauth_computepac(uint64_t data, uint64_t modifier, >> - ARMPACKey key) >> +static uint64_t __attribute__((noinline)) >> +pauth_computepac_architected(uint64_t data, uint64_t modifier, ARMPACKey key) >> { >> static const uint64_t RC[5] = { >> 0x0000000000000000ull, >> @@ -272,6 +272,71 @@ static uint64_t pauth_computepac(uint64_t data, uint64_t modifier, >> return workingval; >> } >> >> +/* >> + * The XXH64 algorithm from >> + * https://github.com/Cyan4973/xxHash/blob/v0.8.0/xxhash.h >> + */ >> +#define PRIME64_1 0x9E3779B185EBCA87ULL >> +#define PRIME64_2 0xC2B2AE3D27D4EB4FULL >> +#define PRIME64_3 0x165667B19E3779F9ULL >> +#define PRIME64_4 0x85EBCA77C2B2AE63ULL >> +#define PRIME64_5 0x27D4EB2F165667C5ULL >> + >> +static inline uint64_t XXH64_round(uint64_t acc, uint64_t input) >> +{ >> + return rol64(acc + input * PRIME64_2, 31) * PRIME64_1; >> +} >> + >> +static inline uint64_t XXH64_mergeround(uint64_t acc, uint64_t val) >> +{ >> + return (acc ^ XXH64_round(0, val)) * PRIME64_1 + PRIME64_4; >> +} >> + >> +static inline uint64_t XXH64_avalanche(uint64_t h64) >> +{ >> + h64 ^= h64 >> 33; >> + h64 *= PRIME64_2; >> + h64 ^= h64 >> 29; >> + h64 *= PRIME64_3; >> + /* h64 ^= h64 >> 32; -- does not affect high 64 for pauth */ >> + return h64; >> +} >> + >> +static uint64_t __attribute__((noinline)) >> +pauth_computepac_impdef(uint64_t data, uint64_t modifier, ARMPACKey key) >> +{ >> + uint64_t v1 = 1 + PRIME64_1 + PRIME64_2; >> + uint64_t v2 = 1 + PRIME64_2; >> + uint64_t v3 = 1 + 0; >> + uint64_t v4 = 1 - PRIME64_1; >> + uint64_t h64; >> + >> + v1 = XXH64_round(v1, data); >> + v2 = XXH64_round(v2, modifier); >> + v3 = XXH64_round(v3, key.lo); >> + v4 = XXH64_round(v4, key.hi); >> + >> + h64 = rol64(v1, 1) + rol64(v2, 7) + rol64(v3, 12) + rol64(v4, 18); >> + h64 = XXH64_mergeround(h64, v1); >> + h64 = XXH64_mergeround(h64, v2); >> + h64 = XXH64_mergeround(h64, v3); >> + h64 = XXH64_mergeround(h64, v4); >> + >> + return XXH64_avalanche(h64); >> +} > > You might find it easier to #include "qemu/xxhash.h" which we use for tb > hashing amongst other things. First, that's the 32-bit version, XXH32. Second, we define xxhash7 there; we would need xxhash8 here. r~ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] target/arm: Implement an IMPDEF pauth algorithm 2020-08-12 15:13 ` Richard Henderson @ 2020-08-12 17:13 ` Alex Bennée 0 siblings, 0 replies; 13+ messages in thread From: Alex Bennée @ 2020-08-12 17:13 UTC (permalink / raw) To: Richard Henderson; +Cc: mark.rutland, peter.maydell, qemu-devel Richard Henderson <richard.henderson@linaro.org> writes: > On 8/12/20 2:49 AM, Alex Bennée wrote: >> >> Richard Henderson <richard.henderson@linaro.org> writes: >> >>> Without hardware acceleration, a cryptographically strong >>> algorithm is too expensive for pauth_computepac. >>> >>> Even with hardware accel, we are not currently expecting >>> to link the linux-user binaries to any crypto libraries, >>> and doing so would generally make the --static build fail. >>> >>> So choose XXH64 as a reasonably quick and decent hash. >>> >>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >>> --- >>> target/arm/pauth_helper.c | 75 ++++++++++++++++++++++++++++++++++++--- >>> 1 file changed, 70 insertions(+), 5 deletions(-) >>> >>> diff --git a/target/arm/pauth_helper.c b/target/arm/pauth_helper.c >>> index 6dbab03768..f1a4389465 100644 >>> --- a/target/arm/pauth_helper.c >>> +++ b/target/arm/pauth_helper.c >>> @@ -207,8 +207,8 @@ static uint64_t tweak_inv_shuffle(uint64_t i) >>> return o; >>> } >>> >>> -static uint64_t pauth_computepac(uint64_t data, uint64_t modifier, >>> - ARMPACKey key) >>> +static uint64_t __attribute__((noinline)) >>> +pauth_computepac_architected(uint64_t data, uint64_t modifier, ARMPACKey key) >>> { >>> static const uint64_t RC[5] = { >>> 0x0000000000000000ull, >>> @@ -272,6 +272,71 @@ static uint64_t pauth_computepac(uint64_t data, uint64_t modifier, >>> return workingval; >>> } >>> >>> +/* >>> + * The XXH64 algorithm from >>> + * https://github.com/Cyan4973/xxHash/blob/v0.8.0/xxhash.h >>> + */ >>> +#define PRIME64_1 0x9E3779B185EBCA87ULL >>> +#define PRIME64_2 0xC2B2AE3D27D4EB4FULL >>> +#define PRIME64_3 0x165667B19E3779F9ULL >>> +#define PRIME64_4 0x85EBCA77C2B2AE63ULL >>> +#define PRIME64_5 0x27D4EB2F165667C5ULL >>> + >>> +static inline uint64_t XXH64_round(uint64_t acc, uint64_t input) >>> +{ >>> + return rol64(acc + input * PRIME64_2, 31) * PRIME64_1; >>> +} >>> + >>> +static inline uint64_t XXH64_mergeround(uint64_t acc, uint64_t val) >>> +{ >>> + return (acc ^ XXH64_round(0, val)) * PRIME64_1 + PRIME64_4; >>> +} >>> + >>> +static inline uint64_t XXH64_avalanche(uint64_t h64) >>> +{ >>> + h64 ^= h64 >> 33; >>> + h64 *= PRIME64_2; >>> + h64 ^= h64 >> 29; >>> + h64 *= PRIME64_3; >>> + /* h64 ^= h64 >> 32; -- does not affect high 64 for pauth */ >>> + return h64; >>> +} >>> + >>> +static uint64_t __attribute__((noinline)) >>> +pauth_computepac_impdef(uint64_t data, uint64_t modifier, ARMPACKey key) >>> +{ >>> + uint64_t v1 = 1 + PRIME64_1 + PRIME64_2; >>> + uint64_t v2 = 1 + PRIME64_2; >>> + uint64_t v3 = 1 + 0; >>> + uint64_t v4 = 1 - PRIME64_1; >>> + uint64_t h64; >>> + >>> + v1 = XXH64_round(v1, data); >>> + v2 = XXH64_round(v2, modifier); >>> + v3 = XXH64_round(v3, key.lo); >>> + v4 = XXH64_round(v4, key.hi); >>> + >>> + h64 = rol64(v1, 1) + rol64(v2, 7) + rol64(v3, 12) + rol64(v4, 18); >>> + h64 = XXH64_mergeround(h64, v1); >>> + h64 = XXH64_mergeround(h64, v2); >>> + h64 = XXH64_mergeround(h64, v3); >>> + h64 = XXH64_mergeround(h64, v4); >>> + >>> + return XXH64_avalanche(h64); >>> +} >> >> You might find it easier to #include "qemu/xxhash.h" which we use for tb >> hashing amongst other things. > > First, that's the 32-bit version, XXH32. Ahh I missed that detail. > Second, we define xxhash7 there; we would need xxhash8 here. We could at least put the code in the header with the others. > > > r~ -- Alex Bennée ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-08-13 11:19 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-08-12 6:53 [PATCH 0/2] target/arm: Implement an IMPDEF pauth algorithm Richard Henderson 2020-08-12 6:53 ` [PATCH 1/2] target/arm: Add cpu property to control pauth Richard Henderson 2020-08-12 11:00 ` Andrew Jones 2020-08-12 15:10 ` Richard Henderson 2020-08-12 16:31 ` Andrew Jones 2020-08-13 6:03 ` Andrew Jones 2020-08-13 9:05 ` Mark Rutland 2020-08-13 9:49 ` Andrew Jones 2020-08-13 11:10 ` Mark Rutland 2020-08-12 6:53 ` [PATCH 2/2] target/arm: Implement an IMPDEF pauth algorithm Richard Henderson 2020-08-12 9:49 ` Alex Bennée 2020-08-12 15:13 ` Richard Henderson 2020-08-12 17:13 ` Alex Bennée
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.