linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant
@ 2017-12-09 15:28 Dongjiu Geng
  2017-12-11 11:59 ` Dave P Martin
  0 siblings, 1 reply; 7+ messages in thread
From: Dongjiu Geng @ 2017-12-09 15:28 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, corbet, mark.rutland,
	suzuki.poulose, Dave.Martin, robin.murphy, gregkh,
	arvind.yadav.cs, linux-arm-kernel, linux-doc, linux-kernel,
	linuxarm
  Cc: huangshaoyu, guohanjun, zhanghaibin7, zhihui.gao, gengdongjiu

ARM v8.4 extensions include support for new floating point
multiplication variant instructions to the AArch64 SIMD
instructions set. Let the userspace know about it via a
HWCAP bit and MRS emulation.

Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
My platform supports this feature, so I need to add it.
---
 Documentation/arm64/cpu-feature-registers.txt | 4 +++-
 arch/arm64/include/asm/sysreg.h               | 1 +
 arch/arm64/include/uapi/asm/hwcap.h           | 1 +
 arch/arm64/kernel/cpufeature.c                | 2 ++
 arch/arm64/kernel/cpuinfo.c                   | 1 +
 5 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt
index bd9b3fa..a70090b 100644
--- a/Documentation/arm64/cpu-feature-registers.txt
+++ b/Documentation/arm64/cpu-feature-registers.txt
@@ -110,7 +110,9 @@ infrastructure:
      x--------------------------------------------------x
      | Name                         |  bits   | visible |
      |--------------------------------------------------|
-     | RES0                         | [63-48] |    n    |
+     | RES0                         | [63-52] |    n    |
+     |--------------------------------------------------|
+     | FHM                          | [51-48] |    y    |
      |--------------------------------------------------|
      | DP                           | [47-44] |    y    |
      |--------------------------------------------------|
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 08cc885..1818077 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -419,6 +419,7 @@
 #define SCTLR_EL1_CP15BEN	(1 << 5)
 
 /* id_aa64isar0 */
+#define ID_AA64ISAR0_FHM_SHIFT		48
 #define ID_AA64ISAR0_DP_SHIFT		44
 #define ID_AA64ISAR0_SM4_SHIFT		40
 #define ID_AA64ISAR0_SM3_SHIFT		36
diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
index cda76fa..f018c3d 100644
--- a/arch/arm64/include/uapi/asm/hwcap.h
+++ b/arch/arm64/include/uapi/asm/hwcap.h
@@ -43,5 +43,6 @@
 #define HWCAP_ASIMDDP		(1 << 20)
 #define HWCAP_SHA512		(1 << 21)
 #define HWCAP_SVE		(1 << 22)
+#define HWCAP_ASIMDFHM		(1 << 23)
 
 #endif /* _UAPI__ASM_HWCAP_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c5ba009..bc7e707 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -123,6 +123,7 @@ static int __init register_cpu_hwcaps_dumper(void)
  * sync with the documentation of the CPU feature register ABI.
  */
 static const struct arm64_ftr_bits ftr_id_aa64isar0[] = {
+	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_FHM_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_DP_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_SM4_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_SM3_SHIFT, 4, 0),
@@ -991,6 +992,7 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
 	HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM3_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_SM3),
 	HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM4_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_SM4),
 	HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_DP_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_ASIMDDP),
+	HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_FHM_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_ASIMDFHM),
 	HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, HWCAP_FP),
 	HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 1, CAP_HWCAP, HWCAP_FPHP),
 	HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, HWCAP_ASIMD),
diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
index 1e25545..7f94623 100644
--- a/arch/arm64/kernel/cpuinfo.c
+++ b/arch/arm64/kernel/cpuinfo.c
@@ -76,6 +76,7 @@
 	"asimddp",
 	"sha512",
 	"sve",
+	"asimdfhm",
 	NULL
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant
  2017-12-09 15:28 [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant Dongjiu Geng
@ 2017-12-11 11:59 ` Dave P Martin
  2017-12-11 12:47   ` gengdongjiu
  0 siblings, 1 reply; 7+ messages in thread
From: Dave P Martin @ 2017-12-11 11:59 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: Catalin Marinas, Will Deacon, corbet, Mark Rutland,
	Suzuki Poulose, Robin Murphy, gregkh, arvind.yadav.cs,
	linux-arm-kernel, linux-doc, linux-kernel, linuxarm, huangshaoyu,
	guohanjun, zhanghaibin7, zhihui.gao

On Sat, Dec 09, 2017 at 03:28:42PM +0000, Dongjiu Geng wrote:
> ARM v8.4 extensions include support for new floating point
> multiplication variant instructions to the AArch64 SIMD

Do we have any human-readable description of what the new instructions
do?

Since the v8.4 spec itself only describes these as "New Floating
Point Multiplication Variant", I wonder what "FHM" actually stands
for.

Maybe something like "widening half-precision floating-point multiply
accumulate" is acceptable wording consistent with the existing
architecture, but I just made that up, so it's not official ;)

> instructions set. Let the userspace know about it via a
> HWCAP bit and MRS emulation.
>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> ---
> My platform supports this feature, so I need to add it.
> ---
>  Documentation/arm64/cpu-feature-registers.txt | 4 +++-
>  arch/arm64/include/asm/sysreg.h               | 1 +
>  arch/arm64/include/uapi/asm/hwcap.h           | 1 +
>  arch/arm64/kernel/cpufeature.c                | 2 ++
>  arch/arm64/kernel/cpuinfo.c                   | 1 +
>  5 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt
> index bd9b3fa..a70090b 100644
> --- a/Documentation/arm64/cpu-feature-registers.txt
> +++ b/Documentation/arm64/cpu-feature-registers.txt
> @@ -110,7 +110,9 @@ infrastructure:
>       x--------------------------------------------------x
>       | Name                         |  bits   | visible |
>       |--------------------------------------------------|
> -     | RES0                         | [63-48] |    n    |
> +     | RES0                         | [63-52] |    n    |
> +     |--------------------------------------------------|
> +     | FHM                          | [51-48] |    y    |

You also need to update Documentation/arm64/elf_hwcaps.txt.

Otherwise, looks OK.

Cheers
---Dave

>       |--------------------------------------------------|
>       | DP                           | [47-44] |    y    |
>       |--------------------------------------------------|
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 08cc885..1818077 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -419,6 +419,7 @@
>  #define SCTLR_EL1_CP15BEN    (1 << 5)
>
>  /* id_aa64isar0 */
> +#define ID_AA64ISAR0_FHM_SHIFT               48
>  #define ID_AA64ISAR0_DP_SHIFT                44
>  #define ID_AA64ISAR0_SM4_SHIFT               40
>  #define ID_AA64ISAR0_SM3_SHIFT               36
> diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
> index cda76fa..f018c3d 100644
> --- a/arch/arm64/include/uapi/asm/hwcap.h
> +++ b/arch/arm64/include/uapi/asm/hwcap.h
> @@ -43,5 +43,6 @@
>  #define HWCAP_ASIMDDP                (1 << 20)
>  #define HWCAP_SHA512         (1 << 21)
>  #define HWCAP_SVE            (1 << 22)
> +#define HWCAP_ASIMDFHM               (1 << 23)
>
>  #endif /* _UAPI__ASM_HWCAP_H */
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index c5ba009..bc7e707 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -123,6 +123,7 @@ static int __init register_cpu_hwcaps_dumper(void)
>   * sync with the documentation of the CPU feature register ABI.
>   */
>  static const struct arm64_ftr_bits ftr_id_aa64isar0[] = {
> +     ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_FHM_SHIFT, 4, 0),
>       ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_DP_SHIFT, 4, 0),
>       ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_SM4_SHIFT, 4, 0),
>       ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_SM3_SHIFT, 4, 0),
> @@ -991,6 +992,7 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
>       HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM3_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_SM3),
>       HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM4_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_SM4),
>       HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_DP_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_ASIMDDP),
> +     HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_FHM_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_ASIMDFHM),
>       HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, HWCAP_FP),
>       HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 1, CAP_HWCAP, HWCAP_FPHP),
>       HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, HWCAP_ASIMD),
> diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
> index 1e25545..7f94623 100644
> --- a/arch/arm64/kernel/cpuinfo.c
> +++ b/arch/arm64/kernel/cpuinfo.c
> @@ -76,6 +76,7 @@
>       "asimddp",
>       "sha512",
>       "sve",
> +     "asimdfhm",
>       NULL
>  };
>
> --
> 1.9.1
>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant
  2017-12-11 11:59 ` Dave P Martin
@ 2017-12-11 12:47   ` gengdongjiu
  2017-12-11 13:29     ` Dave Martin
  0 siblings, 1 reply; 7+ messages in thread
From: gengdongjiu @ 2017-12-11 12:47 UTC (permalink / raw)
  To: Dave P Martin
  Cc: Catalin Marinas, Will Deacon, corbet, Mark Rutland,
	Suzuki Poulose, Robin Murphy, gregkh, arvind.yadav.cs,
	linux-arm-kernel, linux-doc, linux-kernel, linuxarm, huangshaoyu,
	guohanjun, zhanghaibin7, zhihui.gao


On 2017/12/11 19:59, Dave P Martin wrote:
> On Sat, Dec 09, 2017 at 03:28:42PM +0000, Dongjiu Geng wrote:
>> ARM v8.4 extensions include support for new floating point
>> multiplication variant instructions to the AArch64 SIMD
> 
> Do we have any human-readable description of what the new instructions
> do?
> 
> Since the v8.4 spec itself only describes these as "New Floating
> Point Multiplication Variant", I wonder what "FHM" actually stands
> for.Thanks for the point out.
In fact, this feature only adds two instructions:
FP16 * FP16 + FP32
FP16 * FP16 - FP32

The spec call this bit to ID_AA64ISAR0_EL1.FHM, I do not know why it will call
"FHM", I  think call it "FMLXL" may be better, which can stand for FMLAL/FMLSL instructions.

> 
> Maybe something like "widening half-precision floating-point multiply
> accumulate" is acceptable wording consistent with the existing
> architecture, but I just made that up, so it's not official ;)
how about something like "performing a multiplication of each FP16 element of one
vector with the corresponding FP16 element of a second vector, and to
add or subtract this without an intermediate rounding to the
corresponding FP32 element in a third vector."?

> 
>> instructions set. Let the userspace know about it via a
>> HWCAP bit and MRS emulation.
>>
>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> ---
>> My platform supports this feature, so I need to add it.
>> ---
>>  Documentation/arm64/cpu-feature-registers.txt | 4 +++-
>>  arch/arm64/include/asm/sysreg.h               | 1 +
>>  arch/arm64/include/uapi/asm/hwcap.h           | 1 +
>>  arch/arm64/kernel/cpufeature.c                | 2 ++
>>  arch/arm64/kernel/cpuinfo.c                   | 1 +
>>  5 files changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt
>> index bd9b3fa..a70090b 100644
>> --- a/Documentation/arm64/cpu-feature-registers.txt
>> +++ b/Documentation/arm64/cpu-feature-registers.txt
>> @@ -110,7 +110,9 @@ infrastructure:
>>       x--------------------------------------------------x
>>       | Name                         |  bits   | visible |
>>       |--------------------------------------------------|
>> -     | RES0                         | [63-48] |    n    |
>> +     | RES0                         | [63-52] |    n    |
>> +     |--------------------------------------------------|
>> +     | FHM                          | [51-48] |    y    |
> 
> You also need to update Documentation/arm64/elf_hwcaps.txt.
I will update it, thanks for the point out

> 
> Otherwise, looks OK.
Appreciate for your review.

> 
> Cheers
> ---Dave
> 
>>       |--------------------------------------------------|
>>       | DP                           | [47-44] |    y    |
>>       |--------------------------------------------------|
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index 08cc885..1818077 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -419,6 +419,7 @@
>>  #define SCTLR_EL1_CP15BEN    (1 << 5)
>>
>>  /* id_aa64isar0 */
>> +#define ID_AA64ISAR0_FHM_SHIFT               48
>>  #define ID_AA64ISAR0_DP_SHIFT                44
>>  #define ID_AA64ISAR0_SM4_SHIFT               40
>>  #define ID_AA64ISAR0_SM3_SHIFT               36
>> diff --git a/arch/arm64/include/uapi/asm/hwcap.h b/arch/arm64/include/uapi/asm/hwcap.h
>> index cda76fa..f018c3d 100644
>> --- a/arch/arm64/include/uapi/asm/hwcap.h
>> +++ b/arch/arm64/include/uapi/asm/hwcap.h
>> @@ -43,5 +43,6 @@
>>  #define HWCAP_ASIMDDP                (1 << 20)
>>  #define HWCAP_SHA512         (1 << 21)
>>  #define HWCAP_SVE            (1 << 22)
>> +#define HWCAP_ASIMDFHM               (1 << 23)
>>
>>  #endif /* _UAPI__ASM_HWCAP_H */
>> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
>> index c5ba009..bc7e707 100644
>> --- a/arch/arm64/kernel/cpufeature.c
>> +++ b/arch/arm64/kernel/cpufeature.c
>> @@ -123,6 +123,7 @@ static int __init register_cpu_hwcaps_dumper(void)
>>   * sync with the documentation of the CPU feature register ABI.
>>   */
>>  static const struct arm64_ftr_bits ftr_id_aa64isar0[] = {
>> +     ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_FHM_SHIFT, 4, 0),
>>       ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_DP_SHIFT, 4, 0),
>>       ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_SM4_SHIFT, 4, 0),
>>       ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64ISAR0_SM3_SHIFT, 4, 0),
>> @@ -991,6 +992,7 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
>>       HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM3_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_SM3),
>>       HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM4_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_SM4),
>>       HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_DP_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_ASIMDDP),
>> +     HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_FHM_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, HWCAP_ASIMDFHM),
>>       HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, HWCAP_FP),
>>       HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 1, CAP_HWCAP, HWCAP_FPHP),
>>       HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, HWCAP_ASIMD),
>> diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c
>> index 1e25545..7f94623 100644
>> --- a/arch/arm64/kernel/cpuinfo.c
>> +++ b/arch/arm64/kernel/cpuinfo.c
>> @@ -76,6 +76,7 @@
>>       "asimddp",
>>       "sha512",
>>       "sve",
>> +     "asimdfhm",
>>       NULL
>>  };
>>
>> --
>> 1.9.1
>>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
> 
> .
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant
  2017-12-11 12:47   ` gengdongjiu
@ 2017-12-11 13:29     ` Dave Martin
  2017-12-11 18:58       ` Suzuki K Poulose
  2017-12-12  1:44       ` gengdongjiu
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Martin @ 2017-12-11 13:29 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Mark Rutland, guohanjun, linux-doc, Suzuki Poulose,
	Catalin Marinas, corbet, Will Deacon, linux-kernel, linuxarm,
	zhihui.gao, huangshaoyu, gregkh, arvind.yadav.cs, Robin Murphy,
	linux-arm-kernel, zhanghaibin7

On Mon, Dec 11, 2017 at 08:47:00PM +0800, gengdongjiu wrote:
> 
> On 2017/12/11 19:59, Dave P Martin wrote:
> > On Sat, Dec 09, 2017 at 03:28:42PM +0000, Dongjiu Geng wrote:
> >> ARM v8.4 extensions include support for new floating point
> >> multiplication variant instructions to the AArch64 SIMD
> > 
> > Do we have any human-readable description of what the new instructions
> > do?
> > 
> > Since the v8.4 spec itself only describes these as "New Floating
> > Point Multiplication Variant", I wonder what "FHM" actually stands
> > for.
> Thanks for the point out.
> In fact, this feature only adds two instructions:
> FP16 * FP16 + FP32
> FP16 * FP16 - FP32
> 
> The spec call this bit to ID_AA64ISAR0_EL1.FHM, I do not know why it
> will call "FHM", I  think call it "FMLXL" may be better, which can
> stand for FMLAL/FMLSL instructions.

Although "FHM" is cryptic, I think it makes sense to keep this as "FHM"
to match the ISAR0 field name -- we've tended to follow this policy
for other extension names unless there's a much better or more obvious
name available.

For "FMLXL", new instructions might be added in the future that match
the same pattern, and then "FMLXL" could become ambiguous.  So maybe
this is not the best choice.

> > Maybe something like "widening half-precision floating-point multiply
> > accumulate" is acceptable wording consistent with the existing
> > architecture, but I just made that up, so it's not official ;)
> 
> how about something like "performing a multiplication of each FP16
> element of one vector with the corresponding FP16 element of a second
> vector, and to add or subtract this without an intermediate rounding
> to the corresponding FP32 element in a third vector."?

We could have that, I guess.

> > 
> >> instructions set. Let the userspace know about it via a
> >> HWCAP bit and MRS emulation.
> >>
> >> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> >> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> >> ---
> >> My platform supports this feature, so I need to add it.
> >> ---
> >>  Documentation/arm64/cpu-feature-registers.txt | 4 +++-
> >>  arch/arm64/include/asm/sysreg.h               | 1 +
> >>  arch/arm64/include/uapi/asm/hwcap.h           | 1 +
> >>  arch/arm64/kernel/cpufeature.c                | 2 ++
> >>  arch/arm64/kernel/cpuinfo.c                   | 1 +
> >>  5 files changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/arm64/cpu-feature-registers.txt b/Documentation/arm64/cpu-feature-registers.txt
> >> index bd9b3fa..a70090b 100644
> >> --- a/Documentation/arm64/cpu-feature-registers.txt
> >> +++ b/Documentation/arm64/cpu-feature-registers.txt
> >> @@ -110,7 +110,9 @@ infrastructure:
> >>       x--------------------------------------------------x
> >>       | Name                         |  bits   | visible |
> >>       |--------------------------------------------------|
> >> -     | RES0                         | [63-48] |    n    |
> >> +     | RES0                         | [63-52] |    n    |
> >> +     |--------------------------------------------------|
> >> +     | FHM                          | [51-48] |    y    |
> > 
> > You also need to update Documentation/arm64/elf_hwcaps.txt.
> I will update it, thanks for the point out
> 
> > 
> > Otherwise, looks OK.
> Appreciate for your review.

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant
  2017-12-11 13:29     ` Dave Martin
@ 2017-12-11 18:58       ` Suzuki K Poulose
  2017-12-12  2:07         ` gengdongjiu
  2017-12-12  1:44       ` gengdongjiu
  1 sibling, 1 reply; 7+ messages in thread
From: Suzuki K Poulose @ 2017-12-11 18:58 UTC (permalink / raw)
  To: Dave Martin, gengdongjiu
  Cc: Mark Rutland, guohanjun, linux-doc, Catalin Marinas, corbet,
	Will Deacon, linux-kernel, linuxarm, zhihui.gao, huangshaoyu,
	gregkh, arvind.yadav.cs, Robin Murphy, linux-arm-kernel,
	zhanghaibin7, nd

Hi gengdongjiu

Sorry for the late response. I have a similar patch to add the support 
for "FHM", which I was about to post it this week.

On 11/12/17 13:29, Dave Martin wrote:
> On Mon, Dec 11, 2017 at 08:47:00PM +0800, gengdongjiu wrote:
>>
>> On 2017/12/11 19:59, Dave P Martin wrote:
>>> On Sat, Dec 09, 2017 at 03:28:42PM +0000, Dongjiu Geng wrote:
>>>> ARM v8.4 extensions include support for new floating point
>>>> multiplication variant instructions to the AArch64 SIMD
>>>
>>> Do we have any human-readable description of what the new instructions
>>> do?
>>>
>>> Since the v8.4 spec itself only describes these as "New Floating
>>> Point Multiplication Variant", I wonder what "FHM" actually stands
>>> for.
>> Thanks for the point out.
>> In fact, this feature only adds two instructions:
>> FP16 * FP16 + FP32
>> FP16 * FP16 - FP32
>>
>> The spec call this bit to ID_AA64ISAR0_EL1.FHM, I do not know why it
>> will call "FHM", I  think call it "FMLXL" may be better, which can
>> stand for FMLAL/FMLSL instructions.
> 
> Although "FHM" is cryptic, I think it makes sense to keep this as "FHM"
> to match the ISAR0 field name -- we've tended to follow this policy
> for other extension names unless there's a much better or more obvious
> name available.
> 
> For "FMLXL", new instructions might be added in the future that match
> the same pattern, and then "FMLXL" could become ambiguous.  So maybe
> this is not the best choice.

I think the FHM stands for "FP Half precision Multiplication 
instructions". I vote for keeping the feature bit in sync with the 
register bit definition. i.e, FHM.

However, my version of the patch names the HWCAP bit "asimdfml", 
following the compiler name for the feature option "fp16fml", which
is not perfect either. I think FHM is the safe option here.

> 
>>> Maybe something like "widening half-precision floating-point multiply
>>> accumulate" is acceptable wording consistent with the existing
>>> architecture, but I just made that up, so it's not official ;)
>>
>> how about something like "performing a multiplication of each FP16
>> element of one vector with the corresponding FP16 element of a second
>> vector, and to add or subtract this without an intermediate rounding
>> to the corresponding FP32 element in a third vector."?
> 
> We could have that, I guess.
> 

I agree, and that matches the feature description.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant
  2017-12-11 13:29     ` Dave Martin
  2017-12-11 18:58       ` Suzuki K Poulose
@ 2017-12-12  1:44       ` gengdongjiu
  1 sibling, 0 replies; 7+ messages in thread
From: gengdongjiu @ 2017-12-12  1:44 UTC (permalink / raw)
  To: Dave Martin
  Cc: Mark Rutland, guohanjun, linux-doc, Suzuki Poulose,
	Catalin Marinas, corbet, Will Deacon, linux-kernel, linuxarm,
	zhihui.gao, huangshaoyu, gregkh, arvind.yadav.cs, Robin Murphy,
	linux-arm-kernel, zhanghaibin7


On 2017/12/11 21:29, Dave Martin wrote:
>> Thanks for the point out.
>> In fact, this feature only adds two instructions:
>> FP16 * FP16 + FP32
>> FP16 * FP16 - FP32
>>
>> The spec call this bit to ID_AA64ISAR0_EL1.FHM, I do not know why it
>> will call "FHM", I  think call it "FMLXL" may be better, which can
>> stand for FMLAL/FMLSL instructions.
> Although "FHM" is cryptic, I think it makes sense to keep this as "FHM"
> to match the ISAR0 field name -- we've tended to follow this policy
> for other extension names unless there's a much better or more obvious
> name available
Agree with you, I also think the "FHM" is better.

> 
> For "FMLXL", new instructions might be added in the future that match
> the same pattern, and then "FMLXL" could become ambiguous.  So maybe
> this is not the best choice.
Ok.

> 
>>> Maybe something like "widening half-precision floating-point multiply
>>> accumulate" is acceptable wording consistent with the existing
>>> architecture, but I just made that up, so it's not official ;)
>> how about something like "performing a multiplication of each FP16
>> element of one vector with the corresponding FP16 element of a second
>> vector, and to add or subtract this without an intermediate rounding
>> to the corresponding FP32 element in a third vector."?
> We could have that, I guess.
Ok, thanks!

> 
>>>> instructions set. Let the userspace know about it via a
>>>> HWCAP bit and MRS emulation.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant
  2017-12-11 18:58       ` Suzuki K Poulose
@ 2017-12-12  2:07         ` gengdongjiu
  0 siblings, 0 replies; 7+ messages in thread
From: gengdongjiu @ 2017-12-12  2:07 UTC (permalink / raw)
  To: Suzuki K Poulose, Dave Martin
  Cc: Mark Rutland, guohanjun, linux-doc, Catalin Marinas, corbet,
	Will Deacon, linux-kernel, linuxarm, zhihui.gao, huangshaoyu,
	gregkh, arvind.yadav.cs, Robin Murphy, linux-arm-kernel,
	zhanghaibin7, nd

On 2017/12/12 2:58, Suzuki K Poulose wrote:
> Hi gengdongjiu
> 
> Sorry for the late response. I have a similar patch to add the support for "FHM", which I was about to post it this week.
Suzuki, you are welcome.
May be you can not post again to avoid the duplicate review, thanks!

> 
> On 11/12/17 13:29, Dave Martin wrote:
>> On Mon, Dec 11, 2017 at 08:47:00PM +0800, gengdongjiu wrote:
>>>
>>> On 2017/12/11 19:59, Dave P Martin wrote:
>>>> On Sat, Dec 09, 2017 at 03:28:42PM +0000, Dongjiu Geng wrote:
>>>>> ARM v8.4 extensions include support for new floating point
>>>>> multiplication variant instructions to the AArch64 SIMD
>>>>
>>>> Do we have any human-readable description of what the new instructions
>>>> do?
>>>>
>>>> Since the v8.4 spec itself only describes these as "New Floating
>>>> Point Multiplication Variant", I wonder what "FHM" actually stands
>>>> for.
>>> Thanks for the point out.
>>> In fact, this feature only adds two instructions:
>>> FP16 * FP16 + FP32
>>> FP16 * FP16 - FP32
>>>
>>> The spec call this bit to ID_AA64ISAR0_EL1.FHM, I do not know why it
>>> will call "FHM", I  think call it "FMLXL" may be better, which can
>>> stand for FMLAL/FMLSL instructions.
>>
>> Although "FHM" is cryptic, I think it makes sense to keep this as "FHM"
>> to match the ISAR0 field name -- we've tended to follow this policy
>> for other extension names unless there's a much better or more obvious
>> name available.
>>
>> For "FMLXL", new instructions might be added in the future that match
>> the same pattern, and then "FMLXL" could become ambiguous.  So maybe
>> this is not the best choice.
> 
> I think the FHM stands for "FP Half precision Multiplication instructions". I vote for keeping the feature bit in sync with the register bit definition. i.e, FHM.
 agree with you

> 
> However, my version of the patch names the HWCAP bit "asimdfml", following the compiler name for the feature option "fp16fml", which
> is not perfect either. I think FHM is the safe option here.
yes, "FHM" is safe here.

> 
>>
>>>> Maybe something like "widening half-precision floating-point multiply
>>>> accumulate" is acceptable wording consistent with the existing
>>>> architecture, but I just made that up, so it's not official ;)
>>>
>>> how about something like "performing a multiplication of each FP16
>>> element of one vector with the corresponding FP16 element of a second
>>> vector, and to add or subtract this without an intermediate rounding
>>> to the corresponding FP32 element in a third vector."?
>>
>> We could have that, I guess.
>>
> 
> I agree, and that matches the feature description.
Ok, thanks!

> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-12  2:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-09 15:28 [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant Dongjiu Geng
2017-12-11 11:59 ` Dave P Martin
2017-12-11 12:47   ` gengdongjiu
2017-12-11 13:29     ` Dave Martin
2017-12-11 18:58       ` Suzuki K Poulose
2017-12-12  2:07         ` gengdongjiu
2017-12-12  1:44       ` gengdongjiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).