All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
@ 2018-06-09  6:26 Randy Li
  2018-06-09  6:26 ` [PATCH v2 1/4] arch-armv8a.inc: add tune include for armv8 Randy Li
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Randy Li @ 2018-06-09  6:26 UTC (permalink / raw)
  To: openembedded-core

I read the ARMv8 manual again, it looks the hardware float is mandatory
in Linux Distributions and toolchain libraries. Even some cortex
processors can be configured without FPU/NEON hardware, but I don't
think they would be used in openembeded core.

So I can assume the NEON(SIMD) would exist all the time. Leaving only the
crc and crypto instructions are optional here.


Randy Li (4):
  arch-armv8a.inc: add tune include for armv8
  tune-cortexa35: add tunes for ARM Cortex-A35
  tune-cortexa32: add tunes for ARM Cortex-A32
  tune-cortexa72: add tunes for ARM Cortex-A72

 meta/conf/machine/include/arm/arch-armv8.inc  |  1 -
 meta/conf/machine/include/arm/arch-armv8a.inc | 22 ++++++++++++++++++++++
 meta/conf/machine/include/tune-cortexa32.inc  | 15 +++++++++++++++
 meta/conf/machine/include/tune-cortexa35.inc  | 15 +++++++++++++++
 meta/conf/machine/include/tune-cortexa72.inc  | 12 ++++++++++++
 5 files changed, 64 insertions(+), 1 deletion(-)
 delete mode 100644 meta/conf/machine/include/arm/arch-armv8.inc
 create mode 100644 meta/conf/machine/include/arm/arch-armv8a.inc
 create mode 100644 meta/conf/machine/include/tune-cortexa32.inc
 create mode 100644 meta/conf/machine/include/tune-cortexa35.inc
 create mode 100644 meta/conf/machine/include/tune-cortexa72.inc

-- 
2.14.3



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/4] arch-armv8a.inc: add tune include for armv8
  2018-06-09  6:26 [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Randy Li
@ 2018-06-09  6:26 ` Randy Li
  2018-06-12  5:59   ` Nicolas Dechesne
  2018-06-09  6:26 ` [PATCH v2 2/4] tune-cortexa35: add tunes for ARM Cortex-A35 Randy Li
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Randy Li @ 2018-06-09  6:26 UTC (permalink / raw)
  To: openembedded-core

There are some addtional instructions apart from bare armv8,
also there is armv8.1, armv8.2.

Most the processor would support crc, except X-gene 1.

Signed-off-by: Randy Li <ayaka@soulik.info>
---
 meta/conf/machine/include/arm/arch-armv8.inc  |  1 -
 meta/conf/machine/include/arm/arch-armv8a.inc | 22 ++++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)
 delete mode 100644 meta/conf/machine/include/arm/arch-armv8.inc
 create mode 100644 meta/conf/machine/include/arm/arch-armv8a.inc

diff --git a/meta/conf/machine/include/arm/arch-armv8.inc b/meta/conf/machine/include/arm/arch-armv8.inc
deleted file mode 100644
index 5e832fae6d..0000000000
--- a/meta/conf/machine/include/arm/arch-armv8.inc
+++ /dev/null
@@ -1 +0,0 @@
-require conf/machine/include/arm/arch-arm64.inc
diff --git a/meta/conf/machine/include/arm/arch-armv8a.inc b/meta/conf/machine/include/arm/arch-armv8a.inc
new file mode 100644
index 0000000000..b8b9e44c54
--- /dev/null
+++ b/meta/conf/machine/include/arm/arch-armv8a.inc
@@ -0,0 +1,22 @@
+DEFAULTTUNE ?= "armv8a-crc"
+
+TUNEVALID[armv8] = "Enable instructions for ARMv8-a"
+TUNE_CCARGS .= "${@bb.utils.contains('TUNE_FEATURES', 'armv8a', ' -march=armv8-a', '', d)}"
+MACHINEOVERRIDES =. "${@bb.utils.contains('TUNE_FEATURES', 'armv8a', 'armv8a:', '' ,d)}"
+
+require conf/machine/include/arm/arch-arm64.inc
+
+# Little Endian base configs
+AVAILTUNES += "armv8a armv8a-crc armv8a-crc-crypto armv8a-crypto"
+ARMPKGARCH_tune-armv8a                    ?= "armv8a"
+ARMPKGARCH_tune-armv8a-crc                ?= "armv8a"
+ARMPKGARCH_tune-armv8a-crypto             ?= "armv8a"
+ARMPKGARCH_tune-armv8a-crc-crypto         ?= "armv8a"
+TUNE_FEATURES_tune-armv8a                  = "armv8a simd"
+TUNE_FEATURES_tune-armv8a-crc              = "${ARMPKGARCH_tune-armv8a} crc"
+TUNE_FEATURES_tune-armv8a-crypto           = "${TUNE_FEATURES_tune-armv8a} crypto"
+TUNE_FEATURES_tune-armv8a-crc-crypto       = "${TUNE_FEATURES_tune-armv8a-crc} crypto"
+PACKAGE_EXTRA_ARCHS_tune-armv8a            = "aarch64 armv8a simd"
+PACKAGE_EXTRA_ARCHS_tune-armv8a-crc        = "${PACKAGE_EXTRA_ARCHS_tune-armv8a} crc"
+PACKAGE_EXTRA_ARCHS_tune-armv8a-crypto     = "${PACKAGE_EXTRA_ARCHS_tune-armv8a} crypto"
+PACKAGE_EXTRA_ARCHS_tune-armv8a-crc-crypto = "${PACKAGE_EXTRA_ARCHS_tune-armv8a-crc} crypto"
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/4] tune-cortexa35: add tunes for ARM Cortex-A35
  2018-06-09  6:26 [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Randy Li
  2018-06-09  6:26 ` [PATCH v2 1/4] arch-armv8a.inc: add tune include for armv8 Randy Li
@ 2018-06-09  6:26 ` Randy Li
  2018-06-09  6:26 ` [PATCH v2 3/4] tune-cortexa32: add tunes for ARM Cortex-A32 Randy Li
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Randy Li @ 2018-06-09  6:26 UTC (permalink / raw)
  To: openembedded-core

https://developer.arm.com/products/processors/cortex-a/cortex-a35

Signed-off-by: Randy Li <ayaka@soulik.info>
---
 meta/conf/machine/include/tune-cortexa35.inc | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
 create mode 100644 meta/conf/machine/include/tune-cortexa35.inc

diff --git a/meta/conf/machine/include/tune-cortexa35.inc b/meta/conf/machine/include/tune-cortexa35.inc
new file mode 100644
index 0000000000..c5fc2e948e
--- /dev/null
+++ b/meta/conf/machine/include/tune-cortexa35.inc
@@ -0,0 +1,15 @@
+DEFAULTTUNE ?= "cortexa35"
+
+require conf/machine/include/arm/arch-armv8a.inc
+
+TUNEVALID[cortexa35] = "Enable Cortex-A35 specific processor optimizations"
+TUNE_CCARGS .= "${@bb.utils.contains('TUNE_FEATURES', 'cortexa35', ' -mcpu=cortex-a35', '', d)}"
+
+# Little Endian base configs
+AVAILTUNES += "cortexa35 cortexa35-crypto"
+ARMPKGARCH_tune-cortexa35             = "cortexa35"
+ARMPKGARCH_tune-cortexa35-crypto      = "cortexa35"
+TUNE_FEATURES_tune-cortexa35          = "${TUNE_FEATURES_tune-armv8a-crc} cortexa35"
+TUNE_FEATURES_tune-cortexa35-crypto   = "${TUNE_FEATURES_tune-armv8a-crc-crypto} cortexa35"
+PACKAGE_EXTRA_ARCHS_tune-cortexa35             = "${PACKAGE_EXTRA_ARCHS_tune-armv8a-crc} cortexa35"
+PACKAGE_EXTRA_ARCHS_tune-cortexa35-crypto      = "${PACKAGE_EXTRA_ARCHS_tune-armv8a-crc-crypto} cortexa35-crypto"
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/4] tune-cortexa32: add tunes for ARM Cortex-A32
  2018-06-09  6:26 [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Randy Li
  2018-06-09  6:26 ` [PATCH v2 1/4] arch-armv8a.inc: add tune include for armv8 Randy Li
  2018-06-09  6:26 ` [PATCH v2 2/4] tune-cortexa35: add tunes for ARM Cortex-A35 Randy Li
@ 2018-06-09  6:26 ` Randy Li
  2018-06-09  6:26 ` [PATCH v2 4/4] tune-cortexa72: add tunes for ARM Cortex-A72 Randy Li
  2018-06-12  9:00 ` [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Koen Kooi
  4 siblings, 0 replies; 17+ messages in thread
From: Randy Li @ 2018-06-09  6:26 UTC (permalink / raw)
  To: openembedded-core

https://developer.arm.com/products/processors/cortex-a/cortex-a32

Signed-off-by: Randy Li <ayaka@soulik.info>
---
 meta/conf/machine/include/tune-cortexa32.inc | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
 create mode 100644 meta/conf/machine/include/tune-cortexa32.inc

diff --git a/meta/conf/machine/include/tune-cortexa32.inc b/meta/conf/machine/include/tune-cortexa32.inc
new file mode 100644
index 0000000000..4ecb9ef241
--- /dev/null
+++ b/meta/conf/machine/include/tune-cortexa32.inc
@@ -0,0 +1,15 @@
+DEFAULTTUNE ?= "cortexa32"
+
+require conf/machine/include/arm/arch-armv8a.inc
+
+TUNEVALID[cortexa32] = "Enable Cortex-A32 specific processor optimizations"
+TUNE_CCARGS .= "${@bb.utils.contains('TUNE_FEATURES', 'cortexa32', ' -mcpu=cortex-a32', '', d)}"
+
+# Little Endian base configs
+AVAILTUNES += "cortexa32 cortexa32-crypto"
+ARMPKGARCH_tune-cortexa32             = "cortexa32"
+ARMPKGARCH_tune-cortexa32-crypto      = "cortexa32"
+TUNE_FEATURES_tune-cortexa32          = "${TUNE_FEATURES_tune-armv8a-crc} cortexa32"
+TUNE_FEATURES_tune-cortexa32-crypto   = "${TUNE_FEATURES_tune-armv8a-crc-crypto} cortexa32"
+PACKAGE_EXTRA_ARCHS_tune-cortexa32             = "${PACKAGE_EXTRA_ARCHS_tune-armv8a-crc} cortexa32"
+PACKAGE_EXTRA_ARCHS_tune-cortexa32-crypto      = "${PACKAGE_EXTRA_ARCHS_tune-armv8a-crc-crypto} cortexa32-crypto"
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/4] tune-cortexa72: add tunes for ARM Cortex-A72
  2018-06-09  6:26 [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Randy Li
                   ` (2 preceding siblings ...)
  2018-06-09  6:26 ` [PATCH v2 3/4] tune-cortexa32: add tunes for ARM Cortex-A32 Randy Li
@ 2018-06-09  6:26 ` Randy Li
  2018-06-12  9:00 ` [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Koen Kooi
  4 siblings, 0 replies; 17+ messages in thread
From: Randy Li @ 2018-06-09  6:26 UTC (permalink / raw)
  To: openembedded-core

It looks that the Cryptography engine is mandatory in this
platform.

https://developer.arm.com/products/processors/cortex-a/cortex-a72

Signe-off-by: ayaka <ayaka@soulik.info>
Signed-off-by: Randy Li <ayaka@soulik.info>
---
 meta/conf/machine/include/tune-cortexa72.inc | 12 ++++++++++++
 1 file changed, 12 insertions(+)
 create mode 100644 meta/conf/machine/include/tune-cortexa72.inc

diff --git a/meta/conf/machine/include/tune-cortexa72.inc b/meta/conf/machine/include/tune-cortexa72.inc
new file mode 100644
index 0000000000..8055492b8e
--- /dev/null
+++ b/meta/conf/machine/include/tune-cortexa72.inc
@@ -0,0 +1,12 @@
+DEFAULTTUNE ?= "cortexa72"
+
+require conf/machine/include/arm/arch-armv8a.inc
+
+TUNEVALID[cortexa72] = "Enable Cortex-A72 specific processor optimizations"
+TUNE_CCARGS .= "${@bb.utils.contains('TUNE_FEATURES', 'cortexa72', ' -mcpu=cortex-a72', '', d)}"
+
+# Little Endian base configs
+AVAILTUNES += "cortexa72"
+ARMPKGARCH_tune-cortexa72             = "cortexa72"
+TUNE_FEATURES_tune-cortexa72          = "${TUNE_FEATURES_tune-armv8a-crc-crypto} cortexa72"
+PACKAGE_EXTRA_ARCHS_tune-cortexa72    = "${PACKAGE_EXTRA_ARCHS_tune-armv8a-crc-crypto} cortexa72"
-- 
2.14.3



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] arch-armv8a.inc: add tune include for armv8
  2018-06-09  6:26 ` [PATCH v2 1/4] arch-armv8a.inc: add tune include for armv8 Randy Li
@ 2018-06-12  5:59   ` Nicolas Dechesne
  2018-06-12 14:21     ` Mark Hatle
  0 siblings, 1 reply; 17+ messages in thread
From: Nicolas Dechesne @ 2018-06-12  5:59 UTC (permalink / raw)
  To: Randy Li; +Cc: Patches and discussions about the oe-core layer

On Sat, Jun 9, 2018 at 8:26 AM, Randy Li <ayaka@soulik.info> wrote:
> There are some addtional instructions apart from bare armv8,
> also there is armv8.1, armv8.2.
>
> Most the processor would support crc, except X-gene 1.

the commit message doesn't really explain what is going on in this patch..

>
> Signed-off-by: Randy Li <ayaka@soulik.info>
> ---
>  meta/conf/machine/include/arm/arch-armv8.inc  |  1 -
>  meta/conf/machine/include/arm/arch-armv8a.inc | 22 ++++++++++++++++++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
>  delete mode 100644 meta/conf/machine/include/arm/arch-armv8.inc
^^ this file is used by probably any armv8 machine out there, even
inside oe-core:

$ git grep "arch-armv8\.inc"
include/tune-thunderx.inc:require conf/machine/include/arm/arch-armv8.inc
qemuarm64.conf:require conf/machine/include/arm/arch-armv8.inc

so this change would break many things. and it's used in many BSP layers.

>  create mode 100644 meta/conf/machine/include/arm/arch-armv8a.inc
>
> diff --git a/meta/conf/machine/include/arm/arch-armv8.inc b/meta/conf/machine/include/arm/arch-armv8.inc
> deleted file mode 100644
> index 5e832fae6d..0000000000
> --- a/meta/conf/machine/include/arm/arch-armv8.inc
> +++ /dev/null
> @@ -1 +0,0 @@
> -require conf/machine/include/arm/arch-arm64.inc
> diff --git a/meta/conf/machine/include/arm/arch-armv8a.inc b/meta/conf/machine/include/arm/arch-armv8a.inc
> new file mode 100644
> index 0000000000..b8b9e44c54
> --- /dev/null
> +++ b/meta/conf/machine/include/arm/arch-armv8a.inc
> @@ -0,0 +1,22 @@
> +DEFAULTTUNE ?= "armv8a-crc"

even without taking into consideration that this is changing all
default configs for arm64.. which should for sure be discussed and
agreed upon... the crc and crypto flags you are adding here, are not
defined anywhere.

please review how the armv7 variants were designed/introduced, and try
to mimic that for armv8 if you think it's needed.

> +
> +TUNEVALID[armv8] = "Enable instructions for ARMv8-a"
> +TUNE_CCARGS .= "${@bb.utils.contains('TUNE_FEATURES', 'armv8a', ' -march=armv8-a', '', d)}"
> +MACHINEOVERRIDES =. "${@bb.utils.contains('TUNE_FEATURES', 'armv8a', 'armv8a:', '' ,d)}"
> +
> +require conf/machine/include/arm/arch-arm64.inc
> +
> +# Little Endian base configs
> +AVAILTUNES += "armv8a armv8a-crc armv8a-crc-crypto armv8a-crypto"
> +ARMPKGARCH_tune-armv8a                    ?= "armv8a"
> +ARMPKGARCH_tune-armv8a-crc                ?= "armv8a"
> +ARMPKGARCH_tune-armv8a-crypto             ?= "armv8a"
> +ARMPKGARCH_tune-armv8a-crc-crypto         ?= "armv8a"
> +TUNE_FEATURES_tune-armv8a                  = "armv8a simd"
> +TUNE_FEATURES_tune-armv8a-crc              = "${ARMPKGARCH_tune-armv8a} crc"
> +TUNE_FEATURES_tune-armv8a-crypto           = "${TUNE_FEATURES_tune-armv8a} crypto"
> +TUNE_FEATURES_tune-armv8a-crc-crypto       = "${TUNE_FEATURES_tune-armv8a-crc} crypto"
> +PACKAGE_EXTRA_ARCHS_tune-armv8a            = "aarch64 armv8a simd"
> +PACKAGE_EXTRA_ARCHS_tune-armv8a-crc        = "${PACKAGE_EXTRA_ARCHS_tune-armv8a} crc"
> +PACKAGE_EXTRA_ARCHS_tune-armv8a-crypto     = "${PACKAGE_EXTRA_ARCHS_tune-armv8a} crypto"
> +PACKAGE_EXTRA_ARCHS_tune-armv8a-crc-crypto = "${PACKAGE_EXTRA_ARCHS_tune-armv8a-crc} crypto"
> --
> 2.14.3
>
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-09  6:26 [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Randy Li
                   ` (3 preceding siblings ...)
  2018-06-09  6:26 ` [PATCH v2 4/4] tune-cortexa72: add tunes for ARM Cortex-A72 Randy Li
@ 2018-06-12  9:00 ` Koen Kooi
  2018-06-12  9:30   ` Herve Jourdain
  4 siblings, 1 reply; 17+ messages in thread
From: Koen Kooi @ 2018-06-12  9:00 UTC (permalink / raw)
  To: Randy Li; +Cc: OE-core



> Op 9 jun. 2018, om 08:26 heeft Randy Li <ayaka@soulik.info> het volgende geschreven:
> 
> I read the ARMv8 manual again, it looks the hardware float is mandatory
> in Linux Distributions and toolchain libraries. Even some cortex
> processors can be configured without FPU/NEON hardware, but I don't
> think they would be used in openembeded core.
> 
> So I can assume the NEON(SIMD) would exist all the time. Leaving only the
> crc and crypto instructions are optional here.
> 
> 
> Randy Li (4):
>  arch-armv8a.inc: add tune include for armv8
>  tune-cortexa35: add tunes for ARM Cortex-A35
>  tune-cortexa32: add tunes for ARM Cortex-A32
>  tune-cortexa72: add tunes for ARM Cortex-A72

Having been forced to deal with the mess that’s 32-bit arm tunes: Let’s only add an implementation specific tunes *after* having seem conclusive, repeatable benchmark results. 90% of the 32 bit tune files are placebo effect and just explode number of package archs in your distro feed. The goal of aarch64 was to stop being different for the sake of being different, let’s not make a mess because we are used to messes.

regards,

Koen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12  9:00 ` [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Koen Kooi
@ 2018-06-12  9:30   ` Herve Jourdain
  2018-06-12 14:32     ` Mark Hatle
  0 siblings, 1 reply; 17+ messages in thread
From: Herve Jourdain @ 2018-06-12  9:30 UTC (permalink / raw)
  To: 'Koen Kooi', 'Randy Li'; +Cc: 'OE-core'

Hi,

I believe I'm the "original author" of some patch attempt at tackling this problem, more than a year ago, as referenced in this series.
And I understand why everyone, Khem being the first and not the only one, would like some "simpler" things for ARM.
But the problem is that ARM-based SoCs are very diverse, and ARM does have a number of optional IP blocks (such as crypto, but neon is another one, and there are others), defined for each architecture. Then ARM defines some "standard" SoCs (like cortex-A53, cortex-A57, ...) which may set some of those optional IPs as required for that SoC, and the rest still as optional.
And SoC vendors decide what optional IPs they will implement or not...

So when we're talking "cortex-A53", it's not necessarily the same cortex-A53 for all SoC vendors.

GCC does support all that complexity. So the main question is, do we want to be able to generate code that could take advantage of the optional IPs present on a SoC? Or do we prefer to settle for the least common denominator?
As someone who is close to the SoC, I definitely would prefer to be able to take advantage of the optional IPs present on an ARM SoC, and I'd rather have a system that can at least support that even if it's slightly more complex. This said, once it's done, most people won't look under the hood but just use it, so the complexity would end up being hidden - much like now with armv7.

I've personally followed up on my patches from last year, and I now have a slightly modified/simplified version of them, which I've used to build some production-ready environments using cortex-a53/armv8 tunes, that trigger the optimization for cortex-a53 + neon. And if the SoC I'm working with had the crypto extension, I would be very happy to build for it, by just switching the tune I use for my cortex-a53 to the armv8 tune supporting crypto.

So I believe now may be a good time to talk this over again, because we're basically building for cortex-a53 with cortexa7/armv7ve, and that is not the most optimal thing to do in my opinion (like, some instructions that were native in armv7ve are simulated in armv8).

One thing that I did come up as a simplification was the handling of thumb, I don't think it needs to be an option anymore, since its support is mandatory in armv8 (but I think it was also the case in armv7). That simplifies things a bit, but nothing fundamental, you still need to carry the support for the optional IPs around...
And in addition to what I proposed to support last year, we indeed now have to add armv8.1a, armv8.2a, armv8.3a, armv8.4a (so far...), which each have their own specificities/differences that make it unlikely to be supported within a single file.

Thoughts? Can we talk this over, so we can have a chance to have a good support for armv8-32 in oe, instead of everyone doing its own?

Cheers,
Herve

-----Original Message-----
From: openembedded-core-bounces@lists.openembedded.org [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of Koen Kooi
Sent: mardi 12 juin 2018 11:01
To: Randy Li <ayaka@soulik.info>
Cc: OE-core <openembedded-core@lists.openembedded.org>
Subject: Re: [OE-core] [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors



> Op 9 jun. 2018, om 08:26 heeft Randy Li <ayaka@soulik.info> het volgende geschreven:
> 
> I read the ARMv8 manual again, it looks the hardware float is 
> mandatory in Linux Distributions and toolchain libraries. Even some 
> cortex processors can be configured without FPU/NEON hardware, but I 
> don't think they would be used in openembeded core.
> 
> So I can assume the NEON(SIMD) would exist all the time. Leaving only 
> the crc and crypto instructions are optional here.
> 
> 
> Randy Li (4):
>  arch-armv8a.inc: add tune include for armv8
>  tune-cortexa35: add tunes for ARM Cortex-A35
>  tune-cortexa32: add tunes for ARM Cortex-A32
>  tune-cortexa72: add tunes for ARM Cortex-A72

Having been forced to deal with the mess that’s 32-bit arm tunes: Let’s only add an implementation specific tunes *after* having seem conclusive, repeatable benchmark results. 90% of the 32 bit tune files are placebo effect and just explode number of package archs in your distro feed. The goal of aarch64 was to stop being different for the sake of being different, let’s not make a mess because we are used to messes.

regards,

Koen
--
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] arch-armv8a.inc: add tune include for armv8
  2018-06-12  5:59   ` Nicolas Dechesne
@ 2018-06-12 14:21     ` Mark Hatle
  0 siblings, 0 replies; 17+ messages in thread
From: Mark Hatle @ 2018-06-12 14:21 UTC (permalink / raw)
  To: Nicolas Dechesne, Randy Li
  Cc: Patches and discussions about the oe-core layer

On 6/12/18 12:59 AM, Nicolas Dechesne wrote:
> On Sat, Jun 9, 2018 at 8:26 AM, Randy Li <ayaka@soulik.info> wrote:
>> There are some addtional instructions apart from bare armv8,
>> also there is armv8.1, armv8.2.
>>
>> Most the processor would support crc, except X-gene 1.
> 
> the commit message doesn't really explain what is going on in this patch..
> 
>>
>> Signed-off-by: Randy Li <ayaka@soulik.info>
>> ---
>>  meta/conf/machine/include/arm/arch-armv8.inc  |  1 -
>>  meta/conf/machine/include/arm/arch-armv8a.inc | 22 ++++++++++++++++++++++
>>  2 files changed, 22 insertions(+), 1 deletion(-)
>>  delete mode 100644 meta/conf/machine/include/arm/arch-armv8.inc
> ^^ this file is used by probably any armv8 machine out there, even
> inside oe-core:
> 
> $ git grep "arch-armv8\.inc"
> include/tune-thunderx.inc:require conf/machine/include/arm/arch-armv8.inc
> qemuarm64.conf:require conf/machine/include/arm/arch-armv8.inc
> 
> so this change would break many things. and it's used in many BSP layers.
> 
>>  create mode 100644 meta/conf/machine/include/arm/arch-armv8a.inc
>>
>> diff --git a/meta/conf/machine/include/arm/arch-armv8.inc b/meta/conf/machine/include/arm/arch-armv8.inc
>> deleted file mode 100644
>> index 5e832fae6d..0000000000
>> --- a/meta/conf/machine/include/arm/arch-armv8.inc
>> +++ /dev/null
>> @@ -1 +0,0 @@
>> -require conf/machine/include/arm/arch-arm64.inc
>> diff --git a/meta/conf/machine/include/arm/arch-armv8a.inc b/meta/conf/machine/include/arm/arch-armv8a.inc
>> new file mode 100644
>> index 0000000000..b8b9e44c54
>> --- /dev/null
>> +++ b/meta/conf/machine/include/arm/arch-armv8a.inc
>> @@ -0,0 +1,22 @@
>> +DEFAULTTUNE ?= "armv8a-crc"
> 
> even without taking into consideration that this is changing all
> default configs for arm64.. which should for sure be discussed and
> agreed upon... the crc and crypto flags you are adding here, are not
> defined anywhere.
> 
> please review how the armv7 variants were designed/introduced, and try
> to mimic that for armv8 if you think it's needed.
> 
>> +
>> +TUNEVALID[armv8] = "Enable instructions for ARMv8-a"
>> +TUNE_CCARGS .= "${@bb.utils.contains('TUNE_FEATURES', 'armv8a', ' -march=armv8-a', '', d)}"
>> +MACHINEOVERRIDES =. "${@bb.utils.contains('TUNE_FEATURES', 'armv8a', 'armv8a:', '' ,d)}"
>> +
>> +require conf/machine/include/arm/arch-arm64.inc
>> +
>> +# Little Endian base configs
>> +AVAILTUNES += "armv8a armv8a-crc armv8a-crc-crypto armv8a-crypto"
>> +ARMPKGARCH_tune-armv8a                    ?= "armv8a"
>> +ARMPKGARCH_tune-armv8a-crc                ?= "armv8a"
>> +ARMPKGARCH_tune-armv8a-crypto             ?= "armv8a"
>> +ARMPKGARCH_tune-armv8a-crc-crypto         ?= "armv8a"

To add to Nicolas's comment.  Are any of the variants mentioned above, have the
additional instructions generated by the toolchain?  Or are they purely assembly
level functionality?

Generally we defined tunes as something that can be generated by the compiler.
For things that are assembly driven, a new tune is rarely required, but instead
the developer of the component should either do run-time switching of assembly
functions, or it becomes board/distribution specific.  (There are always
exceptions to that, but that policy helps simplify the tunes to things that matter.)

>> +TUNE_FEATURES_tune-armv8a                  = "armv8a simd"
>> +TUNE_FEATURES_tune-armv8a-crc              = "${ARMPKGARCH_tune-armv8a} crc"
>> +TUNE_FEATURES_tune-armv8a-crypto           = "${TUNE_FEATURES_tune-armv8a} crypto"
>> +TUNE_FEATURES_tune-armv8a-crc-crypto       = "${TUNE_FEATURES_tune-armv8a-crc} crypto"
>> +PACKAGE_EXTRA_ARCHS_tune-armv8a            = "aarch64 armv8a simd"
>> +PACKAGE_EXTRA_ARCHS_tune-armv8a-crc        = "${PACKAGE_EXTRA_ARCHS_tune-armv8a} crc"
>> +PACKAGE_EXTRA_ARCHS_tune-armv8a-crypto     = "${PACKAGE_EXTRA_ARCHS_tune-armv8a} crypto"
>> +PACKAGE_EXTRA_ARCHS_tune-armv8a-crc-crypto = "${PACKAGE_EXTRA_ARCHS_tune-armv8a-crc} crypto"
>> --
>> 2.14.3
>>
>> --
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/openembedded-core



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12  9:30   ` Herve Jourdain
@ 2018-06-12 14:32     ` Mark Hatle
  2018-06-12 15:49       ` Herve Jourdain
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Hatle @ 2018-06-12 14:32 UTC (permalink / raw)
  To: Herve Jourdain, 'Koen Kooi', 'Randy Li'; +Cc: 'OE-core'

On 6/12/18 4:30 AM, Herve Jourdain wrote:
> Hi,
> 
> I believe I'm the "original author" of some patch attempt at tackling this problem, more than a year ago, as referenced in this series.
> And I understand why everyone, Khem being the first and not the only one, would like some "simpler" things for ARM.
> But the problem is that ARM-based SoCs are very diverse, and ARM does have a number of optional IP blocks (such as crypto, but neon is another one, and there are others), defined for each architecture. Then ARM defines some "standard" SoCs (like cortex-A53, cortex-A57, ...) which may set some of those optional IPs as required for that SoC, and the rest still as optional.
> And SoC vendors decide what optional IPs they will implement or not...

Simplification is a goal in this, but as you said, not always reasonable with a
processor designed to be customized.

Typically true customization (vendor specific) doesn't belong in the oe-core
tune files, but stuff that is architecturally defined may.

> So when we're talking "cortex-A53", it's not necessarily the same cortex-A53 for all SoC vendors.
> 
> GCC does support all that complexity. So the main question is, do we want to be able to generate code that could take advantage of the optional IPs present on a SoC? Or do we prefer to settle for the least common denominator?

I think this is the key.  What combinations does GCC support (actually generate
code for?)   If GCC can't generate code for that combination, then I don't
believe it belongs as a tune in OE-Core, unless there is a compelling argument
that assembly level functions will be common enough to justify it.

> As someone who is close to the SoC, I definitely would prefer to be able to take advantage of the optional IPs present on an ARM SoC, and I'd rather have a system that can at least support that even if it's slightly more complex. This said, once it's done, most people won't look under the hood but just use it, so the complexity would end up being hidden - much like now with armv7.

And this is why my GCC statement is being made.  Most developers will define a
tune, but will never go into the assembly realm.  They simply don't have the
knowledge or care to devote a bunch of time for a .5% performance improvement.
If GCC can add specific optimizations, then we've hit the 'trivial optimization'
phase, and a tune may be justified.  We just need to be careful of the variant
names -- once set they will last a VERY long time.

> I've personally followed up on my patches from last year, and I now have a slightly modified/simplified version of them, which I've used to build some production-ready environments using cortex-a53/armv8 tunes, that trigger the optimization for cortex-a53 + neon. And if the SoC I'm working with had the crypto extension, I would be very happy to build for it, by just switching the tune I use for my cortex-a53 to the armv8 tune supporting crypto.
> 
> So I believe now may be a good time to talk this over again, because we're basically building for cortex-a53 with cortexa7/armv7ve, and that is not the most optimal thing to do in my opinion (like, some instructions that were native in armv7ve are simulated in armv8).

I don't think anyone objects to armv8, but I was under the impression that
things like neon were now 'required', (i.e. were not supposed to be removed from
the instruction set.)  So for anything that is now standard, they would be the
definition of armv8.. and if there are rare, but customized version w/o neon or
something else -- then I think it's a silicon vendor specific tune that is needed.

In the end it comes down to what has ARM specified, what does GCC support, and
what is ACTUALLY being broadly implemented.

> One thing that I did come up as a simplification was the handling of thumb, I don't think it needs to be an option anymore, since its support is mandatory in armv8 (but I think it was also the case in armv7). That simplifies things a bit, but nothing fundamental, you still need to carry the support for the optional IPs around...

The only reason to continue with the existing 32-bit naming conventions (t,
neon, vfp, etc) is to show the compatibility matrix.  I don't know if this
actually justifies the extensions though.  (I do know I have customers who never
want to use thumb or always [as much as possible] want to use thumb based on
their own performance requirements and designs.. so thumb being switchable is
still a desired attribute -- at least in the armv7 designs I know of.)

> And in addition to what I proposed to support last year, we indeed now have to add armv8.1a, armv8.2a, armv8.3a, armv8.4a (so far...), which each have their own specificities/differences that make it unlikely to be supported within a single file.

IF the instruction scheduling, generated instructions, optimizations, etc are
truely different.. then we should call them armv81a, etc..  (I don't believe we
can use a '.' for various reasons..)   But if there is no difference in the
compiler behavior, or the generated code.. and it's just assembly level
instruction additions -- then I'm reluctant to add these tunes as they can give
a false impression.

> Thoughts? Can we talk this over, so we can have a chance to have a good support for armv8-32 in oe, instead of everyone doing its own?
> 
> Cheers,
> Herve
> 
> -----Original Message-----
> From: openembedded-core-bounces@lists.openembedded.org [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of Koen Kooi
> Sent: mardi 12 juin 2018 11:01
> To: Randy Li <ayaka@soulik.info>
> Cc: OE-core <openembedded-core@lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
> 
> 
> 
>> Op 9 jun. 2018, om 08:26 heeft Randy Li <ayaka@soulik.info> het volgende geschreven:
>>
>> I read the ARMv8 manual again, it looks the hardware float is 
>> mandatory in Linux Distributions and toolchain libraries. Even some 
>> cortex processors can be configured without FPU/NEON hardware, but I 
>> don't think they would be used in openembeded core.
>>
>> So I can assume the NEON(SIMD) would exist all the time. Leaving only 
>> the crc and crypto instructions are optional here.
>>
>>
>> Randy Li (4):
>>  arch-armv8a.inc: add tune include for armv8
>>  tune-cortexa35: add tunes for ARM Cortex-A35
>>  tune-cortexa32: add tunes for ARM Cortex-A32
>>  tune-cortexa72: add tunes for ARM Cortex-A72
> 
> Having been forced to deal with the mess that’s 32-bit arm tunes: Let’s only add an implementation specific tunes *after* having seem conclusive, repeatable benchmark results. 90% of the 32 bit tune files are placebo effect and just explode number of package archs in your distro feed. The goal of aarch64 was to stop being different for the sake of being different, let’s not make a mess because we are used to messes.
> 
> regards,
> 
> Koen
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12 14:32     ` Mark Hatle
@ 2018-06-12 15:49       ` Herve Jourdain
  2018-06-12 20:00         ` Mark Hatle
  0 siblings, 1 reply; 17+ messages in thread
From: Herve Jourdain @ 2018-06-12 15:49 UTC (permalink / raw)
  To: 'Mark Hatle', 'Koen Kooi', 'Randy Li'
  Cc: 'OE-core'

Hi,

So I agree with you about restricting to what gcc can support, that's actually my proposal (actually, probably a subset of what gcc can support).
So for armv8, gcc supports, as architectures: armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a.
Then, you can add the supported options with a "+" after the architecture.
Options supported for armv8-a are: '+crc', '+simd', '+crypto', '+nocrypto', '+nofp'
Options supported for armv8.1-a are: '+simd', '+crypto', '+nocrypto', '+nofp'
Options supported for armv8.2-a and armv8.3-a are: '+fp16', '+fp16fml', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
Options supported for armv8.4-a are: '+fp16', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'

As you can see, proposals for armv8-a, whether my previous one, the new one here, or even the one I have updated and used in production, just capture the existing complexity, and not add to it.
and support for armv8.1-a, armv8.2-a, armv8.3-a, armv8.4a will only add more options down the line.

Regarding fpu, gcc supports the following for armv8: fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.

Regarding cpu, I believe that the armv8 supported ones are: ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’.

I personally would like to keep tuning for a specific CPU as much as possible (again I'm working closely with various ARM-based SoCs, so my opinion might be tainted).

One thing that could be done to simplify things would be to just use the cpu, and add the options to it. Gcc supports adding options to the cpu.
'+nofp' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’ and ‘cortex-a55’
'+crypto' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’

That could simplify the tune settings, but would give less control than what we currently have.
As you might have guessed, I do put a specific emphasis on the crypto option, and on the neon option, which are the most interesting for armv8 in my opinion.

Regarding thumb, always adding it to the tune without creating specific variants with or without thumb makes sense, since the tune is normally about the SoC capabilities, and arv7 and armv8 both support it.
You can always select whether you want thumb or not by setting ARM_INSTRUCTION_SET appropriately at the distro level.

Cheers,
Herve

-----Original Message-----
From: Mark Hatle [mailto:mark.hatle@windriver.com] 
Sent: mardi 12 juin 2018 16:32
To: Herve Jourdain <herve.jourdain@neuf.fr>; 'Koen Kooi' <koen@dominion.thruhere.net>; 'Randy Li' <ayaka@soulik.info>
Cc: 'OE-core' <openembedded-core@lists.openembedded.org>
Subject: Re: [OE-core] [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors

On 6/12/18 4:30 AM, Herve Jourdain wrote:
> Hi,
> 
> I believe I'm the "original author" of some patch attempt at tackling this problem, more than a year ago, as referenced in this series.
> And I understand why everyone, Khem being the first and not the only one, would like some "simpler" things for ARM.
> But the problem is that ARM-based SoCs are very diverse, and ARM does have a number of optional IP blocks (such as crypto, but neon is another one, and there are others), defined for each architecture. Then ARM defines some "standard" SoCs (like cortex-A53, cortex-A57, ...) which may set some of those optional IPs as required for that SoC, and the rest still as optional.
> And SoC vendors decide what optional IPs they will implement or not...

Simplification is a goal in this, but as you said, not always reasonable with a processor designed to be customized.

Typically true customization (vendor specific) doesn't belong in the oe-core tune files, but stuff that is architecturally defined may.

> So when we're talking "cortex-A53", it's not necessarily the same cortex-A53 for all SoC vendors.
> 
> GCC does support all that complexity. So the main question is, do we want to be able to generate code that could take advantage of the optional IPs present on a SoC? Or do we prefer to settle for the least common denominator?

I think this is the key.  What combinations does GCC support (actually generate
code for?)   If GCC can't generate code for that combination, then I don't
believe it belongs as a tune in OE-Core, unless there is a compelling argument that assembly level functions will be common enough to justify it.

> As someone who is close to the SoC, I definitely would prefer to be able to take advantage of the optional IPs present on an ARM SoC, and I'd rather have a system that can at least support that even if it's slightly more complex. This said, once it's done, most people won't look under the hood but just use it, so the complexity would end up being hidden - much like now with armv7.

And this is why my GCC statement is being made.  Most developers will define a tune, but will never go into the assembly realm.  They simply don't have the knowledge or care to devote a bunch of time for a .5% performance improvement.
If GCC can add specific optimizations, then we've hit the 'trivial optimization'
phase, and a tune may be justified.  We just need to be careful of the variant names -- once set they will last a VERY long time.

> I've personally followed up on my patches from last year, and I now have a slightly modified/simplified version of them, which I've used to build some production-ready environments using cortex-a53/armv8 tunes, that trigger the optimization for cortex-a53 + neon. And if the SoC I'm working with had the crypto extension, I would be very happy to build for it, by just switching the tune I use for my cortex-a53 to the armv8 tune supporting crypto.
> 
> So I believe now may be a good time to talk this over again, because we're basically building for cortex-a53 with cortexa7/armv7ve, and that is not the most optimal thing to do in my opinion (like, some instructions that were native in armv7ve are simulated in armv8).

I don't think anyone objects to armv8, but I was under the impression that things like neon were now 'required', (i.e. were not supposed to be removed from the instruction set.)  So for anything that is now standard, they would be the definition of armv8.. and if there are rare, but customized version w/o neon or something else -- then I think it's a silicon vendor specific tune that is needed.

In the end it comes down to what has ARM specified, what does GCC support, and what is ACTUALLY being broadly implemented.

> One thing that I did come up as a simplification was the handling of thumb, I don't think it needs to be an option anymore, since its support is mandatory in armv8 (but I think it was also the case in armv7). That simplifies things a bit, but nothing fundamental, you still need to carry the support for the optional IPs around...

The only reason to continue with the existing 32-bit naming conventions (t, neon, vfp, etc) is to show the compatibility matrix.  I don't know if this actually justifies the extensions though.  (I do know I have customers who never want to use thumb or always [as much as possible] want to use thumb based on their own performance requirements and designs.. so thumb being switchable is still a desired attribute -- at least in the armv7 designs I know of.)

> And in addition to what I proposed to support last year, we indeed now have to add armv8.1a, armv8.2a, armv8.3a, armv8.4a (so far...), which each have their own specificities/differences that make it unlikely to be supported within a single file.

IF the instruction scheduling, generated instructions, optimizations, etc are truely different.. then we should call them armv81a, etc..  (I don't believe we
can use a '.' for various reasons..)   But if there is no difference in the
compiler behavior, or the generated code.. and it's just assembly level instruction additions -- then I'm reluctant to add these tunes as they can give a false impression.

> Thoughts? Can we talk this over, so we can have a chance to have a good support for armv8-32 in oe, instead of everyone doing its own?
> 
> Cheers,
> Herve
> 
> -----Original Message-----
> From: openembedded-core-bounces@lists.openembedded.org 
> [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of 
> Koen Kooi
> Sent: mardi 12 juin 2018 11:01
> To: Randy Li <ayaka@soulik.info>
> Cc: OE-core <openembedded-core@lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH v2 0/4] Add tune for ARMv8 and some 
> cortex processors
> 
> 
> 
>> Op 9 jun. 2018, om 08:26 heeft Randy Li <ayaka@soulik.info> het volgende geschreven:
>>
>> I read the ARMv8 manual again, it looks the hardware float is 
>> mandatory in Linux Distributions and toolchain libraries. Even some 
>> cortex processors can be configured without FPU/NEON hardware, but I 
>> don't think they would be used in openembeded core.
>>
>> So I can assume the NEON(SIMD) would exist all the time. Leaving only 
>> the crc and crypto instructions are optional here.
>>
>>
>> Randy Li (4):
>>  arch-armv8a.inc: add tune include for armv8
>>  tune-cortexa35: add tunes for ARM Cortex-A35
>>  tune-cortexa32: add tunes for ARM Cortex-A32
>>  tune-cortexa72: add tunes for ARM Cortex-A72
> 
> Having been forced to deal with the mess that’s 32-bit arm tunes: Let’s only add an implementation specific tunes *after* having seem conclusive, repeatable benchmark results. 90% of the 32 bit tune files are placebo effect and just explode number of package archs in your distro feed. The goal of aarch64 was to stop being different for the sake of being different, let’s not make a mess because we are used to messes.
> 
> regards,
> 
> Koen
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12 15:49       ` Herve Jourdain
@ 2018-06-12 20:00         ` Mark Hatle
  2018-06-12 20:39           ` Andre McCurdy
  0 siblings, 1 reply; 17+ messages in thread
From: Mark Hatle @ 2018-06-12 20:00 UTC (permalink / raw)
  To: Herve Jourdain, 'Koen Kooi', 'Randy Li', Khem Raj
  Cc: 'OE-core'

On 6/12/18 10:49 AM, Herve Jourdain wrote:
> Hi,
> 
> So I agree with you about restricting to what gcc can support, that's actually my proposal (actually, probably a subset of what gcc can support).
> So for armv8, gcc supports, as architectures: armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a.
> Then, you can add the supported options with a "+" after the architecture.
> Options supported for armv8-a are: '+crc', '+simd', '+crypto', '+nocrypto', '+nofp'
> Options supported for armv8.1-a are: '+simd', '+crypto', '+nocrypto', '+nofp'
> Options supported for armv8.2-a and armv8.3-a are: '+fp16', '+fp16fml', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
> Options supported for armv8.4-a are: '+fp16', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
> 
> As you can see, proposals for armv8-a, whether my previous one, the new one here, or even the one I have updated and used in production, just capture the existing complexity, and not add to it.
> and support for armv8.1-a, armv8.2-a, armv8.3-a, armv8.4a will only add more options down the line.

Sounds a lot like the above would be TUNE_FEATURES to me..  (even if we don't
necessarily define a tune that uses them -- if it's standard another layer
certainly could.)

> Regarding fpu, gcc supports the following for armv8: fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.
> 
> Regarding cpu, I believe that the armv8 supported ones are: ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’.
> 
> I personally would like to keep tuning for a specific CPU as much as possible (again I'm working closely with various ARM-based SoCs, so my opinion might be tainted).

Thats a lot of options, but if we focus on TUNE_FEATURES, I think it's a bit
more reasonable to support all of this.. (maybe that is what needs to be done in
the future as well for other architectures.. focus on the 'gcc' behavior and
generate TUNE_FEATURES matching the compiler.)

I'd like Khem's opinion on how crazy of an idea that is.

> One thing that could be done to simplify things would be to just use the cpu, and add the options to it. Gcc supports adding options to the cpu.
> '+nofp' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’ and ‘cortex-a55’
> '+crypto' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’
> 
> That could simplify the tune settings, but would give less control than what we currently have.
> As you might have guessed, I do put a specific emphasis on the crypto option, and on the neon option, which are the most interesting for armv8 in my opinion.

In the past 'crypto' options have only been assembly.. if that's changed it has
definitely opened up a new facet in all of this work.

> Regarding thumb, always adding it to the tune without creating specific variants with or without thumb makes sense, since the tune is normally about the SoC capabilities, and arv7 and armv8 both support it.
> You can always select whether you want thumb or not by setting ARM_INSTRUCTION_SET appropriately at the distro level.

Yes, that might be needed now that thumb is theoretically always supposed to be
present.

--Mark

> Cheers,
> Herve
> 
> -----Original Message-----
> From: Mark Hatle [mailto:mark.hatle@windriver.com] 
> Sent: mardi 12 juin 2018 16:32
> To: Herve Jourdain <herve.jourdain@neuf.fr>; 'Koen Kooi' <koen@dominion.thruhere.net>; 'Randy Li' <ayaka@soulik.info>
> Cc: 'OE-core' <openembedded-core@lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
> 
> On 6/12/18 4:30 AM, Herve Jourdain wrote:
>> Hi,
>>
>> I believe I'm the "original author" of some patch attempt at tackling this problem, more than a year ago, as referenced in this series.
>> And I understand why everyone, Khem being the first and not the only one, would like some "simpler" things for ARM.
>> But the problem is that ARM-based SoCs are very diverse, and ARM does have a number of optional IP blocks (such as crypto, but neon is another one, and there are others), defined for each architecture. Then ARM defines some "standard" SoCs (like cortex-A53, cortex-A57, ...) which may set some of those optional IPs as required for that SoC, and the rest still as optional.
>> And SoC vendors decide what optional IPs they will implement or not...
> 
> Simplification is a goal in this, but as you said, not always reasonable with a processor designed to be customized.
> 
> Typically true customization (vendor specific) doesn't belong in the oe-core tune files, but stuff that is architecturally defined may.
> 
>> So when we're talking "cortex-A53", it's not necessarily the same cortex-A53 for all SoC vendors.
>>
>> GCC does support all that complexity. So the main question is, do we want to be able to generate code that could take advantage of the optional IPs present on a SoC? Or do we prefer to settle for the least common denominator?
> 
> I think this is the key.  What combinations does GCC support (actually generate
> code for?)   If GCC can't generate code for that combination, then I don't
> believe it belongs as a tune in OE-Core, unless there is a compelling argument that assembly level functions will be common enough to justify it.
> 
>> As someone who is close to the SoC, I definitely would prefer to be able to take advantage of the optional IPs present on an ARM SoC, and I'd rather have a system that can at least support that even if it's slightly more complex. This said, once it's done, most people won't look under the hood but just use it, so the complexity would end up being hidden - much like now with armv7.
> 
> And this is why my GCC statement is being made.  Most developers will define a tune, but will never go into the assembly realm.  They simply don't have the knowledge or care to devote a bunch of time for a .5% performance improvement.
> If GCC can add specific optimizations, then we've hit the 'trivial optimization'
> phase, and a tune may be justified.  We just need to be careful of the variant names -- once set they will last a VERY long time.
> 
>> I've personally followed up on my patches from last year, and I now have a slightly modified/simplified version of them, which I've used to build some production-ready environments using cortex-a53/armv8 tunes, that trigger the optimization for cortex-a53 + neon. And if the SoC I'm working with had the crypto extension, I would be very happy to build for it, by just switching the tune I use for my cortex-a53 to the armv8 tune supporting crypto.
>>
>> So I believe now may be a good time to talk this over again, because we're basically building for cortex-a53 with cortexa7/armv7ve, and that is not the most optimal thing to do in my opinion (like, some instructions that were native in armv7ve are simulated in armv8).
> 
> I don't think anyone objects to armv8, but I was under the impression that things like neon were now 'required', (i.e. were not supposed to be removed from the instruction set.)  So for anything that is now standard, they would be the definition of armv8.. and if there are rare, but customized version w/o neon or something else -- then I think it's a silicon vendor specific tune that is needed.
> 
> In the end it comes down to what has ARM specified, what does GCC support, and what is ACTUALLY being broadly implemented.
> 
>> One thing that I did come up as a simplification was the handling of thumb, I don't think it needs to be an option anymore, since its support is mandatory in armv8 (but I think it was also the case in armv7). That simplifies things a bit, but nothing fundamental, you still need to carry the support for the optional IPs around...
> 
> The only reason to continue with the existing 32-bit naming conventions (t, neon, vfp, etc) is to show the compatibility matrix.  I don't know if this actually justifies the extensions though.  (I do know I have customers who never want to use thumb or always [as much as possible] want to use thumb based on their own performance requirements and designs.. so thumb being switchable is still a desired attribute -- at least in the armv7 designs I know of.)
> 
>> And in addition to what I proposed to support last year, we indeed now have to add armv8.1a, armv8.2a, armv8.3a, armv8.4a (so far...), which each have their own specificities/differences that make it unlikely to be supported within a single file.
> 
> IF the instruction scheduling, generated instructions, optimizations, etc are truely different.. then we should call them armv81a, etc..  (I don't believe we
> can use a '.' for various reasons..)   But if there is no difference in the
> compiler behavior, or the generated code.. and it's just assembly level instruction additions -- then I'm reluctant to add these tunes as they can give a false impression.
> 
>> Thoughts? Can we talk this over, so we can have a chance to have a good support for armv8-32 in oe, instead of everyone doing its own?
>>
>> Cheers,
>> Herve
>>
>> -----Original Message-----
>> From: openembedded-core-bounces@lists.openembedded.org 
>> [mailto:openembedded-core-bounces@lists.openembedded.org] On Behalf Of 
>> Koen Kooi
>> Sent: mardi 12 juin 2018 11:01
>> To: Randy Li <ayaka@soulik.info>
>> Cc: OE-core <openembedded-core@lists.openembedded.org>
>> Subject: Re: [OE-core] [PATCH v2 0/4] Add tune for ARMv8 and some 
>> cortex processors
>>
>>
>>
>>> Op 9 jun. 2018, om 08:26 heeft Randy Li <ayaka@soulik.info> het volgende geschreven:
>>>
>>> I read the ARMv8 manual again, it looks the hardware float is 
>>> mandatory in Linux Distributions and toolchain libraries. Even some 
>>> cortex processors can be configured without FPU/NEON hardware, but I 
>>> don't think they would be used in openembeded core.
>>>
>>> So I can assume the NEON(SIMD) would exist all the time. Leaving only 
>>> the crc and crypto instructions are optional here.
>>>
>>>
>>> Randy Li (4):
>>>  arch-armv8a.inc: add tune include for armv8
>>>  tune-cortexa35: add tunes for ARM Cortex-A35
>>>  tune-cortexa32: add tunes for ARM Cortex-A32
>>>  tune-cortexa72: add tunes for ARM Cortex-A72
>>
>> Having been forced to deal with the mess that’s 32-bit arm tunes: Let’s only add an implementation specific tunes *after* having seem conclusive, repeatable benchmark results. 90% of the 32 bit tune files are placebo effect and just explode number of package archs in your distro feed. The goal of aarch64 was to stop being different for the sake of being different, let’s not make a mess because we are used to messes.
>>
>> regards,
>>
>> Koen
>> --
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>>
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12 20:00         ` Mark Hatle
@ 2018-06-12 20:39           ` Andre McCurdy
  2018-06-12 21:43             ` Mark Hatle
  2018-06-12 23:32             ` Herve Jourdain
  0 siblings, 2 replies; 17+ messages in thread
From: Andre McCurdy @ 2018-06-12 20:39 UTC (permalink / raw)
  To: Mark Hatle; +Cc: OE-core, Koen Kooi

On Tue, Jun 12, 2018 at 1:00 PM, Mark Hatle <mark.hatle@windriver.com> wrote:
> On 6/12/18 10:49 AM, Herve Jourdain wrote:
>> Hi,
>>
>> So I agree with you about restricting to what gcc can support, that's actually my proposal (actually, probably a subset of what gcc can support).
>> So for armv8, gcc supports, as architectures: armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a.
>> Then, you can add the supported options with a "+" after the architecture.
>> Options supported for armv8-a are: '+crc', '+simd', '+crypto', '+nocrypto', '+nofp'
>> Options supported for armv8.1-a are: '+simd', '+crypto', '+nocrypto', '+nofp'
>> Options supported for armv8.2-a and armv8.3-a are: '+fp16', '+fp16fml', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>> Options supported for armv8.4-a are: '+fp16', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>
>> As you can see, proposals for armv8-a, whether my previous one, the new one here, or even the one I have updated and used in production, just capture the existing complexity, and not add to it.
>> and support for armv8.1-a, armv8.2-a, armv8.3-a, armv8.4a will only add more options down the line.
>
> Sounds a lot like the above would be TUNE_FEATURES to me..  (even if we don't
> necessarily define a tune that uses them -- if it's standard another layer
> certainly could.)
>
>> Regarding fpu, gcc supports the following for armv8: fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.
>>
>> Regarding cpu, I believe that the armv8 supported ones are: ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’.
>>
>> I personally would like to keep tuning for a specific CPU as much as possible (again I'm working closely with various ARM-based SoCs, so my opinion might be tainted).
>
> Thats a lot of options, but if we focus on TUNE_FEATURES, I think it's a bit
> more reasonable to support all of this.. (maybe that is what needs to be done in
> the future as well for other architectures.. focus on the 'gcc' behavior and
> generate TUNE_FEATURES matching the compiler.)
>
> I'd like Khem's opinion on how crazy of an idea that is.
>
>> One thing that could be done to simplify things would be to just use the cpu, and add the options to it. Gcc supports adding options to the cpu.
>> '+nofp' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’ and ‘cortex-a55’
>> '+crypto' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’
>>
>> That could simplify the tune settings, but would give less control than what we currently have.
>> As you might have guessed, I do put a specific emphasis on the crypto option, and on the neon option, which are the most interesting for armv8 in my opinion.
>
> In the past 'crypto' options have only been assembly.. if that's changed it has
> definitely opened up a new facet in all of this work.
>
>> Regarding thumb, always adding it to the tune without creating specific variants with or without thumb makes sense, since the tune is normally about the SoC capabilities, and arv7 and armv8 both support it.
>> You can always select whether you want thumb or not by setting ARM_INSTRUCTION_SET appropriately at the distro level.
>
> Yes, that might be needed now that thumb is theoretically always supposed to be
> present.

It's not _always_ present - it's missing for armv4 CPUs such as StrongARM.

However the option has been unnecessarily propagated into tuning files
for higher architecture levels where support for Thumb _is_ always
present.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12 20:39           ` Andre McCurdy
@ 2018-06-12 21:43             ` Mark Hatle
  2018-06-12 21:48               ` Andre McCurdy
  2018-06-12 23:32             ` Herve Jourdain
  1 sibling, 1 reply; 17+ messages in thread
From: Mark Hatle @ 2018-06-12 21:43 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: OE-core, Koen Kooi

On 6/12/18 3:39 PM, Andre McCurdy wrote:
> On Tue, Jun 12, 2018 at 1:00 PM, Mark Hatle <mark.hatle@windriver.com> wrote:
>> On 6/12/18 10:49 AM, Herve Jourdain wrote:
>>> Hi,
>>>
>>> So I agree with you about restricting to what gcc can support, that's actually my proposal (actually, probably a subset of what gcc can support).
>>> So for armv8, gcc supports, as architectures: armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a.
>>> Then, you can add the supported options with a "+" after the architecture.
>>> Options supported for armv8-a are: '+crc', '+simd', '+crypto', '+nocrypto', '+nofp'
>>> Options supported for armv8.1-a are: '+simd', '+crypto', '+nocrypto', '+nofp'
>>> Options supported for armv8.2-a and armv8.3-a are: '+fp16', '+fp16fml', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>> Options supported for armv8.4-a are: '+fp16', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>>
>>> As you can see, proposals for armv8-a, whether my previous one, the new one here, or even the one I have updated and used in production, just capture the existing complexity, and not add to it.
>>> and support for armv8.1-a, armv8.2-a, armv8.3-a, armv8.4a will only add more options down the line.
>>
>> Sounds a lot like the above would be TUNE_FEATURES to me..  (even if we don't
>> necessarily define a tune that uses them -- if it's standard another layer
>> certainly could.)
>>
>>> Regarding fpu, gcc supports the following for armv8: fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.
>>>
>>> Regarding cpu, I believe that the armv8 supported ones are: ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’.
>>>
>>> I personally would like to keep tuning for a specific CPU as much as possible (again I'm working closely with various ARM-based SoCs, so my opinion might be tainted).
>>
>> Thats a lot of options, but if we focus on TUNE_FEATURES, I think it's a bit
>> more reasonable to support all of this.. (maybe that is what needs to be done in
>> the future as well for other architectures.. focus on the 'gcc' behavior and
>> generate TUNE_FEATURES matching the compiler.)
>>
>> I'd like Khem's opinion on how crazy of an idea that is.
>>
>>> One thing that could be done to simplify things would be to just use the cpu, and add the options to it. Gcc supports adding options to the cpu.
>>> '+nofp' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’ and ‘cortex-a55’
>>> '+crypto' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’
>>>
>>> That could simplify the tune settings, but would give less control than what we currently have.
>>> As you might have guessed, I do put a specific emphasis on the crypto option, and on the neon option, which are the most interesting for armv8 in my opinion.
>>
>> In the past 'crypto' options have only been assembly.. if that's changed it has
>> definitely opened up a new facet in all of this work.
>>
>>> Regarding thumb, always adding it to the tune without creating specific variants with or without thumb makes sense, since the tune is normally about the SoC capabilities, and arv7 and armv8 both support it.
>>> You can always select whether you want thumb or not by setting ARM_INSTRUCTION_SET appropriately at the distro level.
>>
>> Yes, that might be needed now that thumb is theoretically always supposed to be
>> present.
> 
> It's not _always_ present - it's missing for armv4 CPUs such as StrongARM.

Always present on -modern- ARM processors.. ARMv7 (Cortex) and newer AFAIK.  I'm
not referring to older cores.

> However the option has been unnecessarily propagated into tuning files
> for higher architecture levels where support for Thumb _is_ always
> present.
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12 21:43             ` Mark Hatle
@ 2018-06-12 21:48               ` Andre McCurdy
  0 siblings, 0 replies; 17+ messages in thread
From: Andre McCurdy @ 2018-06-12 21:48 UTC (permalink / raw)
  To: Mark Hatle; +Cc: OE-core, Koen Kooi

On Tue, Jun 12, 2018 at 2:43 PM, Mark Hatle <mark.hatle@windriver.com> wrote:
> On 6/12/18 3:39 PM, Andre McCurdy wrote:
>> On Tue, Jun 12, 2018 at 1:00 PM, Mark Hatle <mark.hatle@windriver.com> wrote:
>>> On 6/12/18 10:49 AM, Herve Jourdain wrote:
>>>> Hi,
>>>>
>>>> So I agree with you about restricting to what gcc can support, that's actually my proposal (actually, probably a subset of what gcc can support).
>>>> So for armv8, gcc supports, as architectures: armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a.
>>>> Then, you can add the supported options with a "+" after the architecture.
>>>> Options supported for armv8-a are: '+crc', '+simd', '+crypto', '+nocrypto', '+nofp'
>>>> Options supported for armv8.1-a are: '+simd', '+crypto', '+nocrypto', '+nofp'
>>>> Options supported for armv8.2-a and armv8.3-a are: '+fp16', '+fp16fml', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>>> Options supported for armv8.4-a are: '+fp16', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>>>
>>>> As you can see, proposals for armv8-a, whether my previous one, the new one here, or even the one I have updated and used in production, just capture the existing complexity, and not add to it.
>>>> and support for armv8.1-a, armv8.2-a, armv8.3-a, armv8.4a will only add more options down the line.
>>>
>>> Sounds a lot like the above would be TUNE_FEATURES to me..  (even if we don't
>>> necessarily define a tune that uses them -- if it's standard another layer
>>> certainly could.)
>>>
>>>> Regarding fpu, gcc supports the following for armv8: fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.
>>>>
>>>> Regarding cpu, I believe that the armv8 supported ones are: ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’.
>>>>
>>>> I personally would like to keep tuning for a specific CPU as much as possible (again I'm working closely with various ARM-based SoCs, so my opinion might be tainted).
>>>
>>> Thats a lot of options, but if we focus on TUNE_FEATURES, I think it's a bit
>>> more reasonable to support all of this.. (maybe that is what needs to be done in
>>> the future as well for other architectures.. focus on the 'gcc' behavior and
>>> generate TUNE_FEATURES matching the compiler.)
>>>
>>> I'd like Khem's opinion on how crazy of an idea that is.
>>>
>>>> One thing that could be done to simplify things would be to just use the cpu, and add the options to it. Gcc supports adding options to the cpu.
>>>> '+nofp' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’ and ‘cortex-a55’
>>>> '+crypto' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’
>>>>
>>>> That could simplify the tune settings, but would give less control than what we currently have.
>>>> As you might have guessed, I do put a specific emphasis on the crypto option, and on the neon option, which are the most interesting for armv8 in my opinion.
>>>
>>> In the past 'crypto' options have only been assembly.. if that's changed it has
>>> definitely opened up a new facet in all of this work.
>>>
>>>> Regarding thumb, always adding it to the tune without creating specific variants with or without thumb makes sense, since the tune is normally about the SoC capabilities, and arv7 and armv8 both support it.
>>>> You can always select whether you want thumb or not by setting ARM_INSTRUCTION_SET appropriately at the distro level.
>>>
>>> Yes, that might be needed now that thumb is theoretically always supposed to be
>>> present.
>>
>> It's not _always_ present - it's missing for armv4 CPUs such as StrongARM.
>
> Always present on -modern- ARM processors.. ARMv7 (Cortex) and newer AFAIK.  I'm
> not referring to older cores.

OK. Thanks for clarifying.

>> However the option has been unnecessarily propagated into tuning files
>> for higher architecture levels where support for Thumb _is_ always
>> present.
>>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12 20:39           ` Andre McCurdy
  2018-06-12 21:43             ` Mark Hatle
@ 2018-06-12 23:32             ` Herve Jourdain
  2018-06-13  0:07               ` Andre McCurdy
  1 sibling, 1 reply; 17+ messages in thread
From: Herve Jourdain @ 2018-06-12 23:32 UTC (permalink / raw)
  To: Andre McCurdy; +Cc: Koen Kooi, OE-core

Hi Andre,

I believe I did say always present on armv8 and armv7, I did not mean before that.
Having separate tunes for thumb support was necessary on previous architectures where it was optional, but it persisted for architectures which made thumb mandatory.
I’m not even advocating removing the tune option for previous architectures that would normally not require it, but I believe we should get rid of it for new ARM architectures.

Cheers,
Herve

> On 13 Jun 2018, at 04:39, Andre McCurdy <armccurdy@gmail.com> wrote:
> 
>> On Tue, Jun 12, 2018 at 1:00 PM, Mark Hatle <mark.hatle@windriver.com> wrote:
>>> On 6/12/18 10:49 AM, Herve Jourdain wrote:
>>> Hi,
>>> 
>>> So I agree with you about restricting to what gcc can support, that's actually my proposal (actually, probably a subset of what gcc can support).
>>> So for armv8, gcc supports, as architectures: armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a.
>>> Then, you can add the supported options with a "+" after the architecture.
>>> Options supported for armv8-a are: '+crc', '+simd', '+crypto', '+nocrypto', '+nofp'
>>> Options supported for armv8.1-a are: '+simd', '+crypto', '+nocrypto', '+nofp'
>>> Options supported for armv8.2-a and armv8.3-a are: '+fp16', '+fp16fml', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>> Options supported for armv8.4-a are: '+fp16', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>> 
>>> As you can see, proposals for armv8-a, whether my previous one, the new one here, or even the one I have updated and used in production, just capture the existing complexity, and not add to it.
>>> and support for armv8.1-a, armv8.2-a, armv8.3-a, armv8.4a will only add more options down the line.
>> 
>> Sounds a lot like the above would be TUNE_FEATURES to me..  (even if we don't
>> necessarily define a tune that uses them -- if it's standard another layer
>> certainly could.)
>> 
>>> Regarding fpu, gcc supports the following for armv8: fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.
>>> 
>>> Regarding cpu, I believe that the armv8 supported ones are: ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’.
>>> 
>>> I personally would like to keep tuning for a specific CPU as much as possible (again I'm working closely with various ARM-based SoCs, so my opinion might be tainted).
>> 
>> Thats a lot of options, but if we focus on TUNE_FEATURES, I think it's a bit
>> more reasonable to support all of this.. (maybe that is what needs to be done in
>> the future as well for other architectures.. focus on the 'gcc' behavior and
>> generate TUNE_FEATURES matching the compiler.)
>> 
>> I'd like Khem's opinion on how crazy of an idea that is.
>> 
>>> One thing that could be done to simplify things would be to just use the cpu, and add the options to it. Gcc supports adding options to the cpu.
>>> '+nofp' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’ and ‘cortex-a55’
>>> '+crypto' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’
>>> 
>>> That could simplify the tune settings, but would give less control than what we currently have.
>>> As you might have guessed, I do put a specific emphasis on the crypto option, and on the neon option, which are the most interesting for armv8 in my opinion.
>> 
>> In the past 'crypto' options have only been assembly.. if that's changed it has
>> definitely opened up a new facet in all of this work.
>> 
>>> Regarding thumb, always adding it to the tune without creating specific variants with or without thumb makes sense, since the tune is normally about the SoC capabilities, and arv7 and armv8 both support it.
>>> You can always select whether you want thumb or not by setting ARM_INSTRUCTION_SET appropriately at the distro level.
>> 
>> Yes, that might be needed now that thumb is theoretically always supposed to be
>> present.
> 
> It's not _always_ present - it's missing for armv4 CPUs such as StrongARM.
> 
> However the option has been unnecessarily propagated into tuning files
> for higher architecture levels where support for Thumb _is_ always
> present.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
  2018-06-12 23:32             ` Herve Jourdain
@ 2018-06-13  0:07               ` Andre McCurdy
  0 siblings, 0 replies; 17+ messages in thread
From: Andre McCurdy @ 2018-06-13  0:07 UTC (permalink / raw)
  To: Herve Jourdain; +Cc: Koen Kooi, OE-core

On Tue, Jun 12, 2018 at 4:32 PM, Herve Jourdain <herve.jourdain@neuf.fr> wrote:
> Hi Andre,
>
> I believe I did say always present on armv8 and armv7, I did not mean before that.

Right. The point of considering older architecture levels was that IF
we were to drop support for armv4 (I'm not necessarily suggesting that
we do) then we could simplify the tuning files quite a lot, since then
every supported ARM core would support Thumb.

> Having separate tunes for thumb support was necessary on previous architectures where it was optional, but it persisted for architectures which made thumb mandatory.
> I’m not even advocating removing the tune option for previous architectures that would normally not require it, but I believe we should get rid of it for new ARM architectures.

We all seem to be strongly agreeing with each other :-)

BTW, I don't know how much history you've aware of from when this
topic has been discussed previously on the list (it's come up a few
times...). There are people who've created distros etc which do
actually rely on being able to define (for example) an armv7a machine
without Thumb support, even though such a thing doesn't actually
exist. So, as much as we might all agree that removing the option to
disable Thumb support for armv7a etc might be "the right thing to do",
in practice there are going to be people who object to it.

> Cheers,
> Herve
>
>> On 13 Jun 2018, at 04:39, Andre McCurdy <armccurdy@gmail.com> wrote:
>>
>>> On Tue, Jun 12, 2018 at 1:00 PM, Mark Hatle <mark.hatle@windriver.com> wrote:
>>>> On 6/12/18 10:49 AM, Herve Jourdain wrote:
>>>> Hi,
>>>>
>>>> So I agree with you about restricting to what gcc can support, that's actually my proposal (actually, probably a subset of what gcc can support).
>>>> So for armv8, gcc supports, as architectures: armv8-a, armv8.1-a, armv8.2-a, armv8.3-a, armv8.4-a.
>>>> Then, you can add the supported options with a "+" after the architecture.
>>>> Options supported for armv8-a are: '+crc', '+simd', '+crypto', '+nocrypto', '+nofp'
>>>> Options supported for armv8.1-a are: '+simd', '+crypto', '+nocrypto', '+nofp'
>>>> Options supported for armv8.2-a and armv8.3-a are: '+fp16', '+fp16fml', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>>> Options supported for armv8.4-a are: '+fp16', '+simd', '+crypto', '+dotprod', '+nocrypto', '+nofp'
>>>>
>>>> As you can see, proposals for armv8-a, whether my previous one, the new one here, or even the one I have updated and used in production, just capture the existing complexity, and not add to it.
>>>> and support for armv8.1-a, armv8.2-a, armv8.3-a, armv8.4a will only add more options down the line.
>>>
>>> Sounds a lot like the above would be TUNE_FEATURES to me..  (even if we don't
>>> necessarily define a tune that uses them -- if it's standard another layer
>>> certainly could.)
>>>
>>>> Regarding fpu, gcc supports the following for armv8: fp-armv8, neon-fp-armv8, and crypto-neon-fp-armv8.
>>>>
>>>> Regarding cpu, I believe that the armv8 supported ones are: ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’.
>>>>
>>>> I personally would like to keep tuning for a specific CPU as much as possible (again I'm working closely with various ARM-based SoCs, so my opinion might be tainted).
>>>
>>> Thats a lot of options, but if we focus on TUNE_FEATURES, I think it's a bit
>>> more reasonable to support all of this.. (maybe that is what needs to be done in
>>> the future as well for other architectures.. focus on the 'gcc' behavior and
>>> generate TUNE_FEATURES matching the compiler.)
>>>
>>> I'd like Khem's opinion on how crazy of an idea that is.
>>>
>>>> One thing that could be done to simplify things would be to just use the cpu, and add the options to it. Gcc supports adding options to the cpu.
>>>> '+nofp' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’ and ‘cortex-a55’
>>>> '+crypto' for ‘cortex-a32’, ‘cortex-a35’, ‘cortex-a53’, ‘cortex-a55’, ‘cortex-a57’, ‘cortex-a72’, ‘cortex-a73’, ‘cortex-a75’
>>>>
>>>> That could simplify the tune settings, but would give less control than what we currently have.
>>>> As you might have guessed, I do put a specific emphasis on the crypto option, and on the neon option, which are the most interesting for armv8 in my opinion.
>>>
>>> In the past 'crypto' options have only been assembly.. if that's changed it has
>>> definitely opened up a new facet in all of this work.
>>>
>>>> Regarding thumb, always adding it to the tune without creating specific variants with or without thumb makes sense, since the tune is normally about the SoC capabilities, and arv7 and armv8 both support it.
>>>> You can always select whether you want thumb or not by setting ARM_INSTRUCTION_SET appropriately at the distro level.
>>>
>>> Yes, that might be needed now that thumb is theoretically always supposed to be
>>> present.
>>
>> It's not _always_ present - it's missing for armv4 CPUs such as StrongARM.
>>
>> However the option has been unnecessarily propagated into tuning files
>> for higher architecture levels where support for Thumb _is_ always
>> present.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-06-13  0:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-09  6:26 [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Randy Li
2018-06-09  6:26 ` [PATCH v2 1/4] arch-armv8a.inc: add tune include for armv8 Randy Li
2018-06-12  5:59   ` Nicolas Dechesne
2018-06-12 14:21     ` Mark Hatle
2018-06-09  6:26 ` [PATCH v2 2/4] tune-cortexa35: add tunes for ARM Cortex-A35 Randy Li
2018-06-09  6:26 ` [PATCH v2 3/4] tune-cortexa32: add tunes for ARM Cortex-A32 Randy Li
2018-06-09  6:26 ` [PATCH v2 4/4] tune-cortexa72: add tunes for ARM Cortex-A72 Randy Li
2018-06-12  9:00 ` [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors Koen Kooi
2018-06-12  9:30   ` Herve Jourdain
2018-06-12 14:32     ` Mark Hatle
2018-06-12 15:49       ` Herve Jourdain
2018-06-12 20:00         ` Mark Hatle
2018-06-12 20:39           ` Andre McCurdy
2018-06-12 21:43             ` Mark Hatle
2018-06-12 21:48               ` Andre McCurdy
2018-06-12 23:32             ` Herve Jourdain
2018-06-13  0:07               ` Andre McCurdy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.