linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API
@ 2024-03-29  7:18 Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 01/15] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
                   ` (15 more replies)
  0 siblings, 16 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Huacai Chen,
	Ingo Molnar, Jonathan Corbet, Masahiro Yamada, Nathan Chancellor,
	Nicolas Schier, Russell King, Thomas Gleixner, Will Deacon,
	linux-doc, linux-kbuild

This series unifies the kernel-mode FPU API across several architectures
by wrapping the existing functions (where needed) in consistently-named
functions placed in a consistent header location, with mostly the same
semantics: they can be called from preemptible or non-preemptible task
context, and are not assumed to be reentrant. Architectures are also
expected to provide CFLAGS adjustments for compiling FPU-dependent code.
For the moment, SIMD/vector units are out of scope for this common API.

This allows us to remove the ifdeffery and duplicated Makefile logic at
each FPU user. It then implements the common API on RISC-V, and converts
a couple of users to the new API: the AMDGPU DRM driver, and the FPU
self test.

The underlying goal of this series is to allow using newer AMD GPUs
(e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those
GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode
FPU support.

Previous versions:
v3: https://lore.kernel.org/linux-kernel/20240327200157.1097089-1-samuel.holland@sifive.com/
v2: https://lore.kernel.org/linux-kernel/20231228014220.3562640-1-samuel.holland@sifive.com/
v1: https://lore.kernel.org/linux-kernel/20231208055501.2916202-1-samuel.holland@sifive.com/
v0: https://lore.kernel.org/linux-kernel/20231122030621.3759313-1-samuel.holland@sifive.com/

Changes in v4:
 - Add missed CFLAGS changes for recov_neon_inner.c
   (fixes arm build failures)
 - Fix x86 include guard issue (fixes x86 build failures)

Changes in v3:
 - Rebase on v6.9-rc1
 - Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT

Changes in v2:
 - Add documentation explaining the built-time and runtime APIs
 - Add a linux/fpu.h header for generic isolation enforcement
 - Remove file name from header comment
 - Clean up arch/arm64/lib/Makefile, like for arch/arm
 - Remove RISC-V architecture-specific preprocessor check
 - Split altivec removal to a separate patch
 - Use linux/fpu.h instead of asm/fpu.h in consumers
 - Declare test_fpu() in a header

Michael Ellerman (1):
  drm/amd/display: Only use hard-float, not altivec on powerpc

Samuel Holland (14):
  arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
  ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
  arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
  lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
  LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  x86/fpu: Fix asm/fpu/types.h include guard
  x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  riscv: Add support for kernel-mode FPU
  drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  selftests/fpu: Move FP code to a separate translation unit
  selftests/fpu: Allow building on other architectures

 Documentation/core-api/floating-point.rst     | 78 +++++++++++++++++++
 Documentation/core-api/index.rst              |  1 +
 Makefile                                      |  5 ++
 arch/Kconfig                                  |  6 ++
 arch/arm/Kconfig                              |  1 +
 arch/arm/Makefile                             |  7 ++
 arch/arm/include/asm/fpu.h                    | 15 ++++
 arch/arm/lib/Makefile                         |  3 +-
 arch/arm64/Kconfig                            |  1 +
 arch/arm64/Makefile                           |  9 ++-
 arch/arm64/include/asm/fpu.h                  | 15 ++++
 arch/arm64/lib/Makefile                       |  6 +-
 arch/loongarch/Kconfig                        |  1 +
 arch/loongarch/Makefile                       |  5 +-
 arch/loongarch/include/asm/fpu.h              |  1 +
 arch/powerpc/Kconfig                          |  1 +
 arch/powerpc/Makefile                         |  5 +-
 arch/powerpc/include/asm/fpu.h                | 28 +++++++
 arch/riscv/Kconfig                            |  1 +
 arch/riscv/Makefile                           |  3 +
 arch/riscv/include/asm/fpu.h                  | 16 ++++
 arch/riscv/kernel/Makefile                    |  1 +
 arch/riscv/kernel/kernel_mode_fpu.c           | 28 +++++++
 arch/x86/Kconfig                              |  1 +
 arch/x86/Makefile                             | 20 +++++
 arch/x86/include/asm/fpu.h                    | 13 ++++
 arch/x86/include/asm/fpu/types.h              |  6 +-
 drivers/gpu/drm/amd/display/Kconfig           |  2 +-
 .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c    | 35 +--------
 drivers/gpu/drm/amd/display/dc/dml/Makefile   | 36 +--------
 drivers/gpu/drm/amd/display/dc/dml2/Makefile  | 36 +--------
 include/linux/fpu.h                           | 12 +++
 lib/Kconfig.debug                             |  2 +-
 lib/Makefile                                  | 26 +------
 lib/raid6/Makefile                            | 33 +++-----
 lib/test_fpu.h                                |  8 ++
 lib/{test_fpu.c => test_fpu_glue.c}           | 37 ++-------
 lib/test_fpu_impl.c                           | 37 +++++++++
 38 files changed, 348 insertions(+), 193 deletions(-)
 create mode 100644 Documentation/core-api/floating-point.rst
 create mode 100644 arch/arm/include/asm/fpu.h
 create mode 100644 arch/arm64/include/asm/fpu.h
 create mode 100644 arch/powerpc/include/asm/fpu.h
 create mode 100644 arch/riscv/include/asm/fpu.h
 create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c
 create mode 100644 arch/x86/include/asm/fpu.h
 create mode 100644 include/linux/fpu.h
 create mode 100644 lib/test_fpu.h
 rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
 create mode 100644 lib/test_fpu_impl.c

-- 
2.44.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 01/15] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Borislav Petkov, Catalin Marinas, Dave Hansen, Huacai Chen,
	Ingo Molnar, Jonathan Corbet, Masahiro Yamada, Nathan Chancellor,
	Nicolas Schier, Russell King, Thomas Gleixner, Will Deacon,
	linux-doc, linux-kbuild

Several architectures provide an API to enable the FPU and run
floating-point SIMD code in kernel space. However, the function names,
header locations, and semantics are inconsistent across architectures,
and FPU support may be gated behind other Kconfig options.

Provide a standard way for architectures to declare that kernel space
FPU support is available. Architectures selecting this option must
implement what is currently the most common API (kernel_fpu_begin() and
kernel_fpu_end(), plus a new function kernel_fpu_available()) and
provide the appropriate CFLAGS for compiling floating-point C code.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - Add documentation explaining the built-time and runtime APIs
 - Add a linux/fpu.h header for generic isolation enforcement

 Documentation/core-api/floating-point.rst | 78 +++++++++++++++++++++++
 Documentation/core-api/index.rst          |  1 +
 Makefile                                  |  5 ++
 arch/Kconfig                              |  6 ++
 include/linux/fpu.h                       | 12 ++++
 5 files changed, 102 insertions(+)
 create mode 100644 Documentation/core-api/floating-point.rst
 create mode 100644 include/linux/fpu.h

diff --git a/Documentation/core-api/floating-point.rst b/Documentation/core-api/floating-point.rst
new file mode 100644
index 000000000000..a8d0d4b05052
--- /dev/null
+++ b/Documentation/core-api/floating-point.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+Floating-point API
+==================
+
+Kernel code is normally prohibited from using floating-point (FP) registers or
+instructions, including the C float and double data types. This rule reduces
+system call overhead, because the kernel does not need to save and restore the
+userspace floating-point register state.
+
+However, occasionally drivers or library functions may need to include FP code.
+This is supported by isolating the functions containing FP code to a separate
+translation unit (a separate source file), and saving/restoring the FP register
+state around calls to those functions. This creates "critical sections" of
+floating-point usage.
+
+The reason for this isolation is to prevent the compiler from generating code
+touching the FP registers outside these critical sections. Compilers sometimes
+use FP registers to optimize inlined ``memcpy`` or variable assignment, as
+floating-point registers may be wider than general-purpose registers.
+
+Usability of floating-point code within the kernel is architecture-specific.
+Additionally, because a single kernel may be configured to support platforms
+both with and without a floating-point unit, FPU availability must be checked
+both at build time and at run time.
+
+Several architectures implement the generic kernel floating-point API from
+``linux/fpu.h``, as described below. Some other architectures implement their
+own unique APIs, which are documented separately.
+
+Build-time API
+--------------
+
+Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT``
+is enabled. For C code, such code must be placed in a separate file, and that
+file must have its compilation flags adjusted using the following pattern::
+
+    CFLAGS_foo.o += $(CC_FLAGS_FPU)
+    CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU)
+
+Architectures are expected to define one or both of these variables in their
+top-level Makefile as needed. For example::
+
+    CC_FLAGS_FPU := -mhard-float
+
+or::
+
+    CC_FLAGS_NO_FPU := -msoft-float
+
+Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``.
+
+Runtime API
+-----------
+
+The runtime API is provided in ``linux/fpu.h``. This header cannot be included
+from files implementing FP code (those with their compilation flags adjusted as
+above). Instead, it must be included when defining the FP critical sections.
+
+.. c:function:: bool kernel_fpu_available( void )
+
+        This function reports if floating-point code can be used on this CPU or
+        platform. The value returned by this function is not expected to change
+        at runtime, so it only needs to be called once, not before every
+        critical section.
+
+.. c:function:: void kernel_fpu_begin( void )
+                void kernel_fpu_end( void )
+
+        These functions create a floating-point critical section. It is only
+        valid to call ``kernel_fpu_begin()`` after a previous call to
+        ``kernel_fpu_available()`` returned ``true``. These functions are only
+        guaranteed to be callable from (preemptible or non-preemptible) process
+        context.
+
+        Preemption may be disabled inside critical sections, so their size
+        should be minimized. They are *not* required to be reentrant. If the
+        caller expects to nest critical sections, it must implement its own
+        reference counting.
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index 7a3a08d81f11..974beccd671f 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -48,6 +48,7 @@ Library functionality that is used throughout the kernel.
    errseq
    wrappers/atomic_t
    wrappers/atomic_bitops
+   floating-point
 
 Low level entry and exit
 ========================
diff --git a/Makefile b/Makefile
index 763b6792d3d5..710f65e4249d 100644
--- a/Makefile
+++ b/Makefile
@@ -964,6 +964,11 @@ KBUILD_CFLAGS	+= $(CC_FLAGS_CFI)
 export CC_FLAGS_CFI
 endif
 
+# Architectures can define flags to add/remove for floating-point support
+CC_FLAGS_FPU	+= -D_LINUX_FPU_COMPILATION_UNIT
+export CC_FLAGS_FPU
+export CC_FLAGS_NO_FPU
+
 ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0)
 # Set the minimal function alignment. Use the newer GCC option
 # -fmin-function-alignment if it is available, or fall back to -falign-funtions.
diff --git a/arch/Kconfig b/arch/Kconfig
index 9f066785bb71..8e34b3acf73d 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1569,6 +1569,12 @@ config ARCH_HAS_NONLEAF_PMD_YOUNG
 	  address translations. Page table walkers that clear the accessed bit
 	  may use this capability to reduce their search space.
 
+config ARCH_HAS_KERNEL_FPU_SUPPORT
+	bool
+	help
+	  Architectures that select this option can run floating-point code in
+	  the kernel, as described in Documentation/core-api/floating-point.rst.
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/include/linux/fpu.h b/include/linux/fpu.h
new file mode 100644
index 000000000000..2fb63e22913b
--- /dev/null
+++ b/include/linux/fpu.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _LINUX_FPU_H
+#define _LINUX_FPU_H
+
+#ifdef _LINUX_FPU_COMPILATION_UNIT
+#error FP code must be compiled separately. See Documentation/core-api/floating-point.rst.
+#endif
+
+#include <asm/fpu.h>
+
+#endif
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 01/15] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 03/15] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS Samuel Holland
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Russell King

ARM provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - Remove file name from header comment

 arch/arm/Kconfig           |  1 +
 arch/arm/Makefile          |  7 +++++++
 arch/arm/include/asm/fpu.h | 15 +++++++++++++++
 3 files changed, 23 insertions(+)
 create mode 100644 arch/arm/include/asm/fpu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b14aed3a17ab..b1751c2cab87 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -15,6 +15,7 @@ config ARM
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_KEEPINITRD
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index d82908b1b1bb..71afdd98ddf2 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -130,6 +130,13 @@ endif
 # Accept old syntax despite ".syntax unified"
 AFLAGS_NOWARN	:=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W)
 
+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+CC_FLAGS_FPU	:= -ffreestanding
+# Enable <arm_neon.h>
+CC_FLAGS_FPU	+= -isystem $(shell $(CC) -print-file-name=include)
+CC_FLAGS_FPU	+= -march=armv7-a -mfloat-abi=softfp -mfpu=neon
+
 ifeq ($(CONFIG_THUMB2_KERNEL),y)
 CFLAGS_ISA	:=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN)
 AFLAGS_ISA	:=$(CFLAGS_ISA) -Wa$(comma)-mthumb
diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h
new file mode 100644
index 000000000000..2ae50bdce59b
--- /dev/null
+++ b/arch/arm/include/asm/fpu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_FPU_H
+#define __ASM_FPU_H
+
+#include <asm/neon.h>
+
+#define kernel_fpu_available()	cpu_has_neon()
+#define kernel_fpu_begin()	kernel_neon_begin()
+#define kernel_fpu_end()	kernel_neon_end()
+
+#endif /* ! __ASM_FPU_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 03/15] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 01/15] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 04/15] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Russell King

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/arm/lib/Makefile | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 650404be6768..0ca5aae1bcc3 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o:	$(obj)/csumpartialcopygeneric.S
 $(obj)/csumpartialcopyuser.o:	$(obj)/csumpartialcopygeneric.S
 
 ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
-  NEON_FLAGS			:= -march=armv7-a -mfloat-abi=softfp -mfpu=neon
-  CFLAGS_xor-neon.o		+= $(NEON_FLAGS)
+  CFLAGS_xor-neon.o		+= $(CC_FLAGS_FPU)
   obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
 endif
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 04/15] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (2 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 03/15] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 05/15] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS Samuel Holland
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Catalin Marinas, Will Deacon

arm64 provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - Remove file name from header comment

 arch/arm64/Kconfig           |  1 +
 arch/arm64/Makefile          |  9 ++++++++-
 arch/arm64/include/asm/fpu.h | 15 +++++++++++++++
 3 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/fpu.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..67f0d3b5b7df 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -30,6 +30,7 @@ config ARM64
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
 	select ARCH_HAS_KEEPINITRD
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 0e075d3c546b..3e863e5b0169 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y)
 $(warning Detected assembler with broken .inst; disassembly will be unreliable)
 endif
 
-KBUILD_CFLAGS	+= -mgeneral-regs-only	\
+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+CC_FLAGS_FPU	:= -ffreestanding
+# Enable <arm_neon.h>
+CC_FLAGS_FPU	+= -isystem $(shell $(CC) -print-file-name=include)
+CC_FLAGS_NO_FPU	:= -mgeneral-regs-only
+
+KBUILD_CFLAGS	+= $(CC_FLAGS_NO_FPU) \
 		   $(compat_vdso) $(cc_has_k_constraint)
 KBUILD_CFLAGS	+= $(call cc-disable-warning, psabi)
 KBUILD_AFLAGS	+= $(compat_vdso)
diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h
new file mode 100644
index 000000000000..2ae50bdce59b
--- /dev/null
+++ b/arch/arm64/include/asm/fpu.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef __ASM_FPU_H
+#define __ASM_FPU_H
+
+#include <asm/neon.h>
+
+#define kernel_fpu_available()	cpu_has_neon()
+#define kernel_fpu_begin()	kernel_neon_begin()
+#define kernel_fpu_end()	kernel_neon_end()
+
+#endif /* ! __ASM_FPU_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 05/15] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (3 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 04/15] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 06/15] lib/raid6: " Samuel Holland
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Catalin Marinas, Will Deacon

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - New patch for v2

 arch/arm64/lib/Makefile | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
index 29490be2546b..13e6a2829116 100644
--- a/arch/arm64/lib/Makefile
+++ b/arch/arm64/lib/Makefile
@@ -7,10 +7,8 @@ lib-y		:= clear_user.o delay.o copy_from_user.o		\
 
 ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
 obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
-CFLAGS_REMOVE_xor-neon.o	+= -mgeneral-regs-only
-CFLAGS_xor-neon.o		+= -ffreestanding
-# Enable <arm_neon.h>
-CFLAGS_xor-neon.o		+= -isystem $(shell $(CC) -print-file-name=include)
+CFLAGS_xor-neon.o		+= $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_xor-neon.o	+= $(CC_FLAGS_NO_FPU)
 endif
 
 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 06/15] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (4 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 05/15] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 07/15] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Catalin Marinas, Russell King, Will Deacon

Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v4:
 - Add missed CFLAGS changes for recov_neon_inner.c
   (fixes arm build failures)

 lib/raid6/Makefile | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index 385a94aa0b99..0e88bfe6445b 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float
 endif
 endif
 
-# The GCC option -ffreestanding is required in order to compile code containing
-# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
-ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
-NEON_FLAGS := -ffreestanding
-# Enable <arm_neon.h>
-NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include)
-ifeq ($(ARCH),arm)
-NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon
-endif
-CFLAGS_recov_neon_inner.o += $(NEON_FLAGS)
-ifeq ($(ARCH),arm64)
-CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only
-CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only
-endif
-endif
-
 quiet_cmd_unroll = UNROLL  $@
       cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@
 
@@ -75,10 +56,16 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c
 $(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
-CFLAGS_neon1.o += $(NEON_FLAGS)
-CFLAGS_neon2.o += $(NEON_FLAGS)
-CFLAGS_neon4.o += $(NEON_FLAGS)
-CFLAGS_neon8.o += $(NEON_FLAGS)
+CFLAGS_neon1.o += $(CC_FLAGS_FPU)
+CFLAGS_neon2.o += $(CC_FLAGS_FPU)
+CFLAGS_neon4.o += $(CC_FLAGS_FPU)
+CFLAGS_neon8.o += $(CC_FLAGS_FPU)
+CFLAGS_recov_neon_inner.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU)
+CFLAGS_REMOVE_recov_neon_inner.o += $(CC_FLAGS_NO_FPU)
 targets += neon1.c neon2.c neon4.c neon8.c
 $(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 07/15] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (5 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 06/15] lib/raid6: " Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 08/15] powerpc: " Samuel Holland
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	WANG Xuerui, Huacai Chen

LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in
asm/fpu.h, so it only needs to add kernel_fpu_available() and export
the CFLAGS adjustments.

Acked-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v3)

Changes in v3:
 - Rebase on v6.9-rc1

 arch/loongarch/Kconfig           | 1 +
 arch/loongarch/Makefile          | 5 ++++-
 arch/loongarch/include/asm/fpu.h | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index a5f300ec6f28..2266c6c41c38 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -18,6 +18,7 @@ config LOONGARCH
 	select ARCH_HAS_CURRENT_STACK_POINTER
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU
 	select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
 	select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
 	select ARCH_HAS_PTE_SPECIAL
diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile
index df6caf79537a..efb5440a43ec 100644
--- a/arch/loongarch/Makefile
+++ b/arch/loongarch/Makefile
@@ -26,6 +26,9 @@ endif
 32bit-emul		= elf32loongarch
 64bit-emul		= elf64loongarch
 
+CC_FLAGS_FPU		:= -mfpu=64
+CC_FLAGS_NO_FPU		:= -msoft-float
+
 ifdef CONFIG_UNWINDER_ORC
 orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h
 orc_hash_sh := $(srctree)/scripts/orc_hash.sh
@@ -59,7 +62,7 @@ ld-emul			= $(64bit-emul)
 cflags-y		+= -mabi=lp64s
 endif
 
-cflags-y			+= -pipe -msoft-float
+cflags-y			+= -pipe $(CC_FLAGS_NO_FPU)
 LDFLAGS_vmlinux			+= -static -n -nostdlib
 
 # When the assembler supports explicit relocation hint, we must use it.
diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
index c2d8962fda00..3177674228f8 100644
--- a/arch/loongarch/include/asm/fpu.h
+++ b/arch/loongarch/include/asm/fpu.h
@@ -21,6 +21,7 @@
 
 struct sigcontext;
 
+#define kernel_fpu_available() cpu_has_fpu
 extern void kernel_fpu_begin(void);
 extern void kernel_fpu_end(void);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 08/15] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (6 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 07/15] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard Samuel Holland
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Michael Ellerman

PowerPC provides an equivalent to the common kernel-mode FPU API, but in
a different header and using different function names. The PowerPC API
also requires a non-preemptible context. Add a wrapper header, and
export the CFLAGS adjustments.

Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/powerpc/Kconfig           |  1 +
 arch/powerpc/Makefile          |  5 ++++-
 arch/powerpc/include/asm/fpu.h | 28 ++++++++++++++++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/fpu.h

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..c42a57b6839d 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -137,6 +137,7 @@ config PPC
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_HUGEPD			if HUGETLB_PAGE
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_KERNEL_FPU_SUPPORT	if PPC_FPU
 	select ARCH_HAS_MEMBARRIER_CALLBACKS
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_MEMREMAP_COMPAT_ALIGN	if PPC_64S_HASH_MMU
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 65261cbe5bfd..93d89f055b70 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -153,6 +153,9 @@ CFLAGS-$(CONFIG_PPC32)	+= $(call cc-option, $(MULTIPLEWORD))
 
 CFLAGS-$(CONFIG_PPC32)	+= $(call cc-option,-mno-readonly-in-sdata)
 
+CC_FLAGS_FPU		:= $(call cc-option,-mhard-float)
+CC_FLAGS_NO_FPU		:= $(call cc-option,-msoft-float)
+
 ifdef CONFIG_FUNCTION_TRACER
 ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY
 KBUILD_CPPFLAGS	+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY
@@ -174,7 +177,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1)
 
 KBUILD_CPPFLAGS	+= -I $(srctree)/arch/powerpc $(asinstr)
 KBUILD_AFLAGS	+= $(AFLAGS-y)
-KBUILD_CFLAGS	+= $(call cc-option,-msoft-float)
+KBUILD_CFLAGS	+= $(CC_FLAGS_NO_FPU)
 KBUILD_CFLAGS	+= $(CFLAGS-y)
 CPP		= $(CC) -E $(KBUILD_CFLAGS)
 
diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h
new file mode 100644
index 000000000000..ca584e4bc40f
--- /dev/null
+++ b/arch/powerpc/include/asm/fpu.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_POWERPC_FPU_H
+#define _ASM_POWERPC_FPU_H
+
+#include <linux/preempt.h>
+
+#include <asm/cpu_has_feature.h>
+#include <asm/switch_to.h>
+
+#define kernel_fpu_available()	(!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+
+static inline void kernel_fpu_begin(void)
+{
+	preempt_disable();
+	enable_kernel_fp();
+}
+
+static inline void kernel_fpu_end(void)
+{
+	disable_kernel_fp();
+	preempt_enable();
+}
+
+#endif /* ! _ASM_POWERPC_FPU_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (7 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 08/15] powerpc: " Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29 17:30   ` Dave Hansen
  2024-03-29  7:18 ` [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Borislav Petkov, Dave Hansen, Ingo Molnar, Thomas Gleixner

The include guard should match the filename, or it will conflict with
the newly-added asm/fpu.h.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

Changes in v4:
 - New patch for v4

 arch/x86/include/asm/fpu/types.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index ace9aa3b78a3..eb17f31b06d2 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -2,8 +2,8 @@
 /*
  * FPU data structures:
  */
-#ifndef _ASM_X86_FPU_H
-#define _ASM_X86_FPU_H
+#ifndef _ASM_X86_FPU_TYPES_H
+#define _ASM_X86_FPU_TYPES_H
 
 #include <asm/page_types.h>
 
@@ -596,4 +596,4 @@ struct fpu_state_config {
 /* FPU state configuration information */
 extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg;
 
-#endif /* _ASM_X86_FPU_H */
+#endif /* _ASM_X86_FPU_TYPES_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (8 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29 17:28   ` Dave Hansen
  2024-03-29  7:18 ` [PATCH v4 11/15] riscv: Add support for kernel-mode FPU Samuel Holland
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Borislav Petkov, Dave Hansen, Ingo Molnar, Thomas Gleixner

x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a
different header. Add a wrapper header, and export the CFLAGS
adjustments as found in lib/Makefile.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 arch/x86/Kconfig           |  1 +
 arch/x86/Makefile          | 20 ++++++++++++++++++++
 arch/x86/include/asm/fpu.h | 13 +++++++++++++
 3 files changed, 34 insertions(+)
 create mode 100644 arch/x86/include/asm/fpu.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 39886bab943a..7c9d032ee675 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -83,6 +83,7 @@ config X86
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_KCOV			if X86_64
+	select ARCH_HAS_KERNEL_FPU_SUPPORT
 	select ARCH_HAS_MEM_ENCRYPT
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 662d9d4033e6..5a5f5999c505 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -74,6 +74,26 @@ KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx
 KBUILD_RUSTFLAGS += --target=$(objtree)/scripts/target.json
 KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2
 
+#
+# CFLAGS for compiling floating point code inside the kernel.
+#
+CC_FLAGS_FPU := -msse -msse2
+ifdef CONFIG_CC_IS_GCC
+# Stack alignment mismatch, proceed with caution.
+# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
+# (8B stack alignment).
+# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
+#
+# The "-msse" in the first argument is there so that the
+# -mpreferred-stack-boundary=3 build error:
+#
+#  -mpreferred-stack-boundary=3 is not between 4 and 12
+#
+# can be triggered. Otherwise gcc doesn't complain.
+CC_FLAGS_FPU += -mhard-float
+CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
+endif
+
 ifeq ($(CONFIG_X86_KERNEL_IBT),y)
 #
 # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate
diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h
new file mode 100644
index 000000000000..b2743fe19339
--- /dev/null
+++ b/arch/x86/include/asm/fpu.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_X86_FPU_H
+#define _ASM_X86_FPU_H
+
+#include <asm/fpu/api.h>
+
+#define kernel_fpu_available()	true
+
+#endif /* ! _ASM_X86_FPU_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 11/15] riscv: Add support for kernel-mode FPU
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (9 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Palmer Dabbelt

This is motivated by the amdgpu DRM driver, which needs floating-point
code to support recent hardware. That code is not performance-critical,
so only provide a minimal non-preemptible implementation for now.

Support is limited to riscv64 because riscv32 requires runtime (libgcc)
assistance to convert between doubles and 64-bit integers.

Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v3)

Changes in v3:
 - Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT

Changes in v2:
 - Remove RISC-V architecture-specific preprocessor check

 arch/riscv/Kconfig                  |  1 +
 arch/riscv/Makefile                 |  3 +++
 arch/riscv/include/asm/fpu.h        | 16 ++++++++++++++++
 arch/riscv/kernel/Makefile          |  1 +
 arch/riscv/kernel/kernel_mode_fpu.c | 28 ++++++++++++++++++++++++++++
 5 files changed, 49 insertions(+)
 create mode 100644 arch/riscv/include/asm/fpu.h
 create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56..3bcd0d250810 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -27,6 +27,7 @@ config RISCV
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE
 	select ARCH_HAS_KCOV
+	select ARCH_HAS_KERNEL_FPU_SUPPORT if 64BIT && FPU
 	select ARCH_HAS_MEMBARRIER_CALLBACKS
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_MMIOWB
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 252d63942f34..76ff4033c854 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -84,6 +84,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i
 
 KBUILD_AFLAGS += -march=$(riscv-march-y)
 
+# For C code built with floating-point support, exclude V but keep F and D.
+CC_FLAGS_FPU  := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/')
+
 KBUILD_CFLAGS += -mno-save-restore
 KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET)
 
diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h
new file mode 100644
index 000000000000..91c04c244e12
--- /dev/null
+++ b/arch/riscv/include/asm/fpu.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#ifndef _ASM_RISCV_FPU_H
+#define _ASM_RISCV_FPU_H
+
+#include <asm/switch_to.h>
+
+#define kernel_fpu_available()	has_fpu()
+
+void kernel_fpu_begin(void);
+void kernel_fpu_end(void);
+
+#endif /* ! _ASM_RISCV_FPU_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 81d94a8ee10f..5b243d46f4b1 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -67,6 +67,7 @@ obj-$(CONFIG_RISCV_MISALIGNED)	+= unaligned_access_speed.o
 obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS)	+= copy-unaligned.o
 
 obj-$(CONFIG_FPU)		+= fpu.o
+obj-$(CONFIG_FPU)		+= kernel_mode_fpu.o
 obj-$(CONFIG_RISCV_ISA_V)	+= vector.o
 obj-$(CONFIG_RISCV_ISA_V)	+= kernel_mode_vector.o
 obj-$(CONFIG_SMP)		+= smpboot.o
diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c
new file mode 100644
index 000000000000..0ac8348876c4
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_fpu.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2023 SiFive
+ */
+
+#include <linux/export.h>
+#include <linux/preempt.h>
+
+#include <asm/csr.h>
+#include <asm/fpu.h>
+#include <asm/processor.h>
+#include <asm/switch_to.h>
+
+void kernel_fpu_begin(void)
+{
+	preempt_disable();
+	fstate_save(current, task_pt_regs(current));
+	csr_set(CSR_SSTATUS, SR_FS);
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+
+void kernel_fpu_end(void)
+{
+	csr_clear(CSR_SSTATUS, SR_FS);
+	fstate_restore(current, task_pt_regs(current));
+	preempt_enable();
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_end);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (10 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 11/15] riscv: Add support for kernel-mode FPU Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-04-30 17:28   ` Harry Wentland
  2024-03-29  7:18 ` [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Michael Ellerman,
	Alex Deucher, Samuel Holland

From: Michael Ellerman <mpe@ellerman.id.au>

The compiler flags enable altivec, but that is not required; hard-float
is sufficient for the code to build and function.

Drop altivec from the compiler flags and adjust the enable/disable code
to only enable FPU use.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - New patch for v2

 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++----------
 drivers/gpu/drm/amd/display/dc/dml/Makefile    |  2 +-
 drivers/gpu/drm/amd/display/dc/dml2/Makefile   |  2 +-
 3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 4ae4720535a5..0de16796466b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line)
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
 		kernel_fpu_begin();
 #elif defined(CONFIG_PPC64)
-		if (cpu_has_feature(CPU_FTR_VSX_COMP))
-			enable_kernel_vsx();
-		else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
-			enable_kernel_altivec();
-		else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+		if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
 			enable_kernel_fp();
 #elif defined(CONFIG_ARM64)
 		kernel_neon_begin();
@@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line)
 #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
 		kernel_fpu_end();
 #elif defined(CONFIG_PPC64)
-		if (cpu_has_feature(CPU_FTR_VSX_COMP))
-			disable_kernel_vsx();
-		else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
-			disable_kernel_altivec();
-		else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
+		if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
 			disable_kernel_fp();
 #elif defined(CONFIG_ARM64)
 		kernel_neon_end();
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index c4a5efd2dda5..59d3972341d2 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse
 endif
 
 ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float -maltivec
+dml_ccflags := -mhard-float
 endif
 
 ifdef CONFIG_ARM64
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index acff3449b8d7..7b51364084b5 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse
 endif
 
 ifdef CONFIG_PPC64
-dml2_ccflags := -mhard-float -maltivec
+dml2_ccflags := -mhard-float
 endif
 
 ifdef CONFIG_ARM64
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (11 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-04-10 22:21   ` Thiago Jung Bauermann
                     ` (2 more replies)
  2024-03-29  7:18 ` [PATCH v4 14/15] selftests/fpu: Move FP code to a separate translation unit Samuel Holland
                   ` (2 subsequent siblings)
  15 siblings, 3 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland,
	Alex Deucher

Now that all previously-supported architectures select
ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
of the existing list of architectures. It can also take advantage of the
common kernel-mode FPU API and method of adjusting CFLAGS.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - Split altivec removal to a separate patch
 - Use linux/fpu.h instead of asm/fpu.h in consumers

 drivers/gpu/drm/amd/display/Kconfig           |  2 +-
 .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c    | 27 ++------------
 drivers/gpu/drm/amd/display/dc/dml/Makefile   | 36 ++-----------------
 drivers/gpu/drm/amd/display/dc/dml2/Makefile  | 36 ++-----------------
 4 files changed, 7 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig
index 901d1961b739..5fcd4f778dc3 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -8,7 +8,7 @@ config DRM_AMD_DC
 	depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64
 	select SND_HDA_COMPONENT if SND_HDA_CORE
 	# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
-	select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
+	select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG)
 	help
 	  Choose this option if you want to use the new display engine
 	  support for AMDGPU. This adds required support for Vega and
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
index 0de16796466b..e46f8ce41d87 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
@@ -26,16 +26,7 @@
 
 #include "dc_trace.h"
 
-#if defined(CONFIG_X86)
-#include <asm/fpu/api.h>
-#elif defined(CONFIG_PPC64)
-#include <asm/switch_to.h>
-#include <asm/cputable.h>
-#elif defined(CONFIG_ARM64)
-#include <asm/neon.h>
-#elif defined(CONFIG_LOONGARCH)
-#include <asm/fpu.h>
-#endif
+#include <linux/fpu.h>
 
 /**
  * DOC: DC FPU manipulation overview
@@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line)
 	WARN_ON_ONCE(!in_task());
 	preempt_disable();
 	depth = __this_cpu_inc_return(fpu_recursion_depth);
-
 	if (depth == 1) {
-#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
+		BUG_ON(!kernel_fpu_available());
 		kernel_fpu_begin();
-#elif defined(CONFIG_PPC64)
-		if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
-			enable_kernel_fp();
-#elif defined(CONFIG_ARM64)
-		kernel_neon_begin();
-#endif
 	}
 
 	TRACE_DCN_FPU(true, function_name, line, depth);
@@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line)
 
 	depth = __this_cpu_dec_return(fpu_recursion_depth);
 	if (depth == 0) {
-#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
 		kernel_fpu_end();
-#elif defined(CONFIG_PPC64)
-		if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
-			disable_kernel_fp();
-#elif defined(CONFIG_ARM64)
-		kernel_neon_end();
-#endif
 	} else {
 		WARN_ON_ONCE(depth < 0);
 	}
diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
index 59d3972341d2..a94b6d546cd1 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
@@ -25,40 +25,8 @@
 # It provides the general basic services required by other DAL
 # subcomponents.
 
-ifdef CONFIG_X86
-dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
-dml_ccflags := $(dml_ccflags-y) -msse
-endif
-
-ifdef CONFIG_PPC64
-dml_ccflags := -mhard-float
-endif
-
-ifdef CONFIG_ARM64
-dml_rcflags := -mgeneral-regs-only
-endif
-
-ifdef CONFIG_LOONGARCH
-dml_ccflags := -mfpu=64
-dml_rcflags := -msoft-float
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifneq ($(call gcc-min-version, 70100),y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-dml_ccflags += -mpreferred-stack-boundary=4
-else
-dml_ccflags += -msse2
-endif
-endif
+dml_ccflags := $(CC_FLAGS_FPU)
+dml_rcflags := $(CC_FLAGS_NO_FPU)
 
 ifneq ($(CONFIG_FRAME_WARN),0)
 ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index 7b51364084b5..4f6c804a26ad 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -24,40 +24,8 @@
 #
 # Makefile for dml2.
 
-ifdef CONFIG_X86
-dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
-dml2_ccflags := $(dml2_ccflags-y) -msse
-endif
-
-ifdef CONFIG_PPC64
-dml2_ccflags := -mhard-float
-endif
-
-ifdef CONFIG_ARM64
-dml2_rcflags := -mgeneral-regs-only
-endif
-
-ifdef CONFIG_LOONGARCH
-dml2_ccflags := -mfpu=64
-dml2_rcflags := -msoft-float
-endif
-
-ifdef CONFIG_CC_IS_GCC
-ifeq ($(call cc-ifversion, -lt, 0701, y), y)
-IS_OLD_GCC = 1
-endif
-endif
-
-ifdef CONFIG_X86
-ifdef IS_OLD_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-dml2_ccflags += -mpreferred-stack-boundary=4
-else
-dml2_ccflags += -msse2
-endif
-endif
+dml2_ccflags := $(CC_FLAGS_FPU)
+dml2_rcflags := $(CC_FLAGS_NO_FPU)
 
 ifneq ($(CONFIG_FRAME_WARN),0)
 ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 14/15] selftests/fpu: Move FP code to a separate translation unit
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (12 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-03-29  7:18 ` [PATCH v4 15/15] selftests/fpu: Allow building on other architectures Samuel Holland
  2024-04-03 12:51 ` [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Christian König
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland

This ensures no compiler-generated floating-point code can appear
outside kernel_fpu_{begin,end}() sections, and some architectures
enforce this separation.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v2)

Changes in v2:
 - Declare test_fpu() in a header

 lib/Makefile                        |  3 ++-
 lib/test_fpu.h                      |  8 +++++++
 lib/{test_fpu.c => test_fpu_glue.c} | 32 +------------------------
 lib/test_fpu_impl.c                 | 37 +++++++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 32 deletions(-)
 create mode 100644 lib/test_fpu.h
 rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
 create mode 100644 lib/test_fpu_impl.c

diff --git a/lib/Makefile b/lib/Makefile
index ffc6b2341b45..fcb35bf50979 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -133,7 +133,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st
 endif
 
 obj-$(CONFIG_TEST_FPU) += test_fpu.o
-CFLAGS_test_fpu.o += $(FPU_CFLAGS)
+test_fpu-y := test_fpu_glue.o test_fpu_impl.o
+CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)
 
 # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
 # so we can't just use obj-$(CONFIG_KUNIT).
diff --git a/lib/test_fpu.h b/lib/test_fpu.h
new file mode 100644
index 000000000000..4459807084bc
--- /dev/null
+++ b/lib/test_fpu.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+#ifndef _LIB_TEST_FPU_H
+#define _LIB_TEST_FPU_H
+
+int test_fpu(void);
+
+#endif
diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c
similarity index 71%
rename from lib/test_fpu.c
rename to lib/test_fpu_glue.c
index e82db19fed84..85963d7be826 100644
--- a/lib/test_fpu.c
+++ b/lib/test_fpu_glue.c
@@ -19,37 +19,7 @@
 #include <linux/debugfs.h>
 #include <asm/fpu/api.h>
 
-static int test_fpu(void)
-{
-	/*
-	 * This sequence of operations tests that rounding mode is
-	 * to nearest and that denormal numbers are supported.
-	 * Volatile variables are used to avoid compiler optimizing
-	 * the calculations away.
-	 */
-	volatile double a, b, c, d, e, f, g;
-
-	a = 4.0;
-	b = 1e-15;
-	c = 1e-310;
-
-	/* Sets precision flag */
-	d = a + b;
-
-	/* Result depends on rounding mode */
-	e = a + b / 2;
-
-	/* Denormal and very large values */
-	f = b / c;
-
-	/* Depends on denormal support */
-	g = a + c * f;
-
-	if (d > a && e > a && g > a)
-		return 0;
-	else
-		return -EINVAL;
-}
+#include "test_fpu.h"
 
 static int test_fpu_get(void *data, u64 *val)
 {
diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c
new file mode 100644
index 000000000000..777894dbbe86
--- /dev/null
+++ b/lib/test_fpu_impl.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+#include <linux/errno.h>
+
+#include "test_fpu.h"
+
+int test_fpu(void)
+{
+	/*
+	 * This sequence of operations tests that rounding mode is
+	 * to nearest and that denormal numbers are supported.
+	 * Volatile variables are used to avoid compiler optimizing
+	 * the calculations away.
+	 */
+	volatile double a, b, c, d, e, f, g;
+
+	a = 4.0;
+	b = 1e-15;
+	c = 1e-310;
+
+	/* Sets precision flag */
+	d = a + b;
+
+	/* Result depends on rounding mode */
+	e = a + b / 2;
+
+	/* Denormal and very large values */
+	f = b / c;
+
+	/* Depends on denormal support */
+	g = a + c * f;
+
+	if (d > a && e > a && g > a)
+		return 0;
+	else
+		return -EINVAL;
+}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 15/15] selftests/fpu: Allow building on other architectures
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (13 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 14/15] selftests/fpu: Move FP code to a separate translation unit Samuel Holland
@ 2024-03-29  7:18 ` Samuel Holland
  2024-04-03 12:51 ` [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Christian König
  15 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29  7:18 UTC (permalink / raw)
  To: Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Samuel Holland

Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile
and run floating-point code, this test is no longer x86-specific.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
---

(no changes since v1)

 lib/Kconfig.debug   |  2 +-
 lib/Makefile        | 25 ++-----------------------
 lib/test_fpu_glue.c |  5 ++++-
 3 files changed, 7 insertions(+), 25 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c63a5fbf1f1c..f93e778e0405 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2890,7 +2890,7 @@ config TEST_FREE_PAGES
 
 config TEST_FPU
 	tristate "Test floating point operations in kernel space"
-	depends on X86 && !KCOV_INSTRUMENT_ALL
+	depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL
 	help
 	  Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu
 	  which will trigger a sequence of floating point operations. This is used
diff --git a/lib/Makefile b/lib/Makefile
index fcb35bf50979..e44ad11f77b5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -110,31 +110,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE)
 obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o
 obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o
 
-#
-# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
-# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
-# get appended last to CFLAGS and thus override those previous compiler options.
-#
-FPU_CFLAGS := -msse -msse2
-ifdef CONFIG_CC_IS_GCC
-# Stack alignment mismatch, proceed with caution.
-# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
-# (8B stack alignment).
-# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
-#
-# The "-msse" in the first argument is there so that the
-# -mpreferred-stack-boundary=3 build error:
-#
-#  -mpreferred-stack-boundary=3 is not between 4 and 12
-#
-# can be triggered. Otherwise gcc doesn't complain.
-FPU_CFLAGS += -mhard-float
-FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
-endif
-
 obj-$(CONFIG_TEST_FPU) += test_fpu.o
 test_fpu-y := test_fpu_glue.o test_fpu_impl.o
-CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS)
+CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU)
+CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU)
 
 # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module,
 # so we can't just use obj-$(CONFIG_KUNIT).
diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c
index 85963d7be826..eef282a2715f 100644
--- a/lib/test_fpu_glue.c
+++ b/lib/test_fpu_glue.c
@@ -17,7 +17,7 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 #include <linux/debugfs.h>
-#include <asm/fpu/api.h>
+#include <linux/fpu.h>
 
 #include "test_fpu.h"
 
@@ -38,6 +38,9 @@ static struct dentry *selftest_dir;
 
 static int __init test_fpu_init(void)
 {
+	if (!kernel_fpu_available())
+		return -EINVAL;
+
 	selftest_dir = debugfs_create_dir("selftest_helpers", NULL);
 	if (!selftest_dir)
 		return -ENOMEM;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 ` [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
@ 2024-03-29 17:28   ` Dave Hansen
  2024-03-29 18:02     ` Samuel Holland
  0 siblings, 1 reply; 36+ messages in thread
From: Dave Hansen @ 2024-03-29 17:28 UTC (permalink / raw)
  To: Samuel Holland, Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Borislav Petkov,
	Dave Hansen, Ingo Molnar, Thomas Gleixner

On 3/29/24 00:18, Samuel Holland wrote:
> +#
> +# CFLAGS for compiling floating point code inside the kernel.
> +#
> +CC_FLAGS_FPU := -msse -msse2
> +ifdef CONFIG_CC_IS_GCC
> +# Stack alignment mismatch, proceed with caution.
> +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
> +# (8B stack alignment).
> +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
> +#
> +# The "-msse" in the first argument is there so that the
> +# -mpreferred-stack-boundary=3 build error:
> +#
> +#  -mpreferred-stack-boundary=3 is not between 4 and 12
> +#
> +# can be triggered. Otherwise gcc doesn't complain.
> +CC_FLAGS_FPU += -mhard-float
> +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
> +endif

I was expecting to see this (now duplicate) hunk come _out_ of
lib/Makefile somewhere in the series.

Did I miss that, or is there something keeping the duplicate there?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard
  2024-03-29  7:18 ` [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard Samuel Holland
@ 2024-03-29 17:30   ` Dave Hansen
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Hansen @ 2024-03-29 17:30 UTC (permalink / raw)
  To: Samuel Holland, Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Borislav Petkov,
	Dave Hansen, Ingo Molnar, Thomas Gleixner

On 3/29/24 00:18, Samuel Holland wrote:
> The include guard should match the filename, or it will conflict with
> the newly-added asm/fpu.h.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29 17:28   ` Dave Hansen
@ 2024-03-29 18:02     ` Samuel Holland
  0 siblings, 0 replies; 36+ messages in thread
From: Samuel Holland @ 2024-03-29 18:02 UTC (permalink / raw)
  To: Dave Hansen, Andrew Morton, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Borislav Petkov,
	Dave Hansen, Ingo Molnar, Thomas Gleixner, linux-arm-kernel

On 2024-03-29 12:28 PM, Dave Hansen wrote:
> On 3/29/24 00:18, Samuel Holland wrote:
>> +#
>> +# CFLAGS for compiling floating point code inside the kernel.
>> +#
>> +CC_FLAGS_FPU := -msse -msse2
>> +ifdef CONFIG_CC_IS_GCC
>> +# Stack alignment mismatch, proceed with caution.
>> +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
>> +# (8B stack alignment).
>> +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383
>> +#
>> +# The "-msse" in the first argument is there so that the
>> +# -mpreferred-stack-boundary=3 build error:
>> +#
>> +#  -mpreferred-stack-boundary=3 is not between 4 and 12
>> +#
>> +# can be triggered. Otherwise gcc doesn't complain.
>> +CC_FLAGS_FPU += -mhard-float
>> +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4)
>> +endif
> 
> I was expecting to see this (now duplicate) hunk come _out_ of
> lib/Makefile somewhere in the series.
> 
> Did I miss that, or is there something keeping the duplicate there?

This hunk is removed in patch 15/15, after the conversion of lib/test_fpu.c:

https://lore.kernel.org/linux-kernel/20240329072441.591471-16-samuel.holland@sifive.com/

Regards,
Samuel


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API
  2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
                   ` (14 preceding siblings ...)
  2024-03-29  7:18 ` [PATCH v4 15/15] selftests/fpu: Allow building on other architectures Samuel Holland
@ 2024-04-03 12:51 ` Christian König
  15 siblings, 0 replies; 36+ messages in thread
From: Christian König @ 2024-04-03 12:51 UTC (permalink / raw)
  To: Samuel Holland, Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Borislav Petkov,
	Catalin Marinas, Dave Hansen, Huacai Chen, Ingo Molnar,
	Jonathan Corbet, Masahiro Yamada, Nathan Chancellor,
	Nicolas Schier, Russell King, Thomas Gleixner, Will Deacon,
	linux-doc, linux-kbuild, Wentland, Harry, Rodrigo Siqueira

I only skimmed over the platform patches and spend only a few minutes on 
the amdgpu stuff.

 From what I've seen this series seems to make perfect sense to me, I 
just can't fully judge everything.

So feel free to add Acked-by: Christian König <christian.koenig@amd.com> 
but I strongly suggest that Harry and Rodrigo take a look as well.

Regards,
Christian.

Am 29.03.24 um 08:18 schrieb Samuel Holland:
> This series unifies the kernel-mode FPU API across several architectures
> by wrapping the existing functions (where needed) in consistently-named
> functions placed in a consistent header location, with mostly the same
> semantics: they can be called from preemptible or non-preemptible task
> context, and are not assumed to be reentrant. Architectures are also
> expected to provide CFLAGS adjustments for compiling FPU-dependent code.
> For the moment, SIMD/vector units are out of scope for this common API.
>
> This allows us to remove the ifdeffery and duplicated Makefile logic at
> each FPU user. It then implements the common API on RISC-V, and converts
> a couple of users to the new API: the AMDGPU DRM driver, and the FPU
> self test.
>
> The underlying goal of this series is to allow using newer AMD GPUs
> (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those
> GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode
> FPU support.
>
> Previous versions:
> v3: https://lore.kernel.org/linux-kernel/20240327200157.1097089-1-samuel.holland@sifive.com/
> v2: https://lore.kernel.org/linux-kernel/20231228014220.3562640-1-samuel.holland@sifive.com/
> v1: https://lore.kernel.org/linux-kernel/20231208055501.2916202-1-samuel.holland@sifive.com/
> v0: https://lore.kernel.org/linux-kernel/20231122030621.3759313-1-samuel.holland@sifive.com/
>
> Changes in v4:
>   - Add missed CFLAGS changes for recov_neon_inner.c
>     (fixes arm build failures)
>   - Fix x86 include guard issue (fixes x86 build failures)
>
> Changes in v3:
>   - Rebase on v6.9-rc1
>   - Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT
>
> Changes in v2:
>   - Add documentation explaining the built-time and runtime APIs
>   - Add a linux/fpu.h header for generic isolation enforcement
>   - Remove file name from header comment
>   - Clean up arch/arm64/lib/Makefile, like for arch/arm
>   - Remove RISC-V architecture-specific preprocessor check
>   - Split altivec removal to a separate patch
>   - Use linux/fpu.h instead of asm/fpu.h in consumers
>   - Declare test_fpu() in a header
>
> Michael Ellerman (1):
>    drm/amd/display: Only use hard-float, not altivec on powerpc
>
> Samuel Holland (14):
>    arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
>    ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>    ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
>    arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>    arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
>    lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
>    LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>    powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>    x86/fpu: Fix asm/fpu/types.h include guard
>    x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>    riscv: Add support for kernel-mode FPU
>    drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
>    selftests/fpu: Move FP code to a separate translation unit
>    selftests/fpu: Allow building on other architectures
>
>   Documentation/core-api/floating-point.rst     | 78 +++++++++++++++++++
>   Documentation/core-api/index.rst              |  1 +
>   Makefile                                      |  5 ++
>   arch/Kconfig                                  |  6 ++
>   arch/arm/Kconfig                              |  1 +
>   arch/arm/Makefile                             |  7 ++
>   arch/arm/include/asm/fpu.h                    | 15 ++++
>   arch/arm/lib/Makefile                         |  3 +-
>   arch/arm64/Kconfig                            |  1 +
>   arch/arm64/Makefile                           |  9 ++-
>   arch/arm64/include/asm/fpu.h                  | 15 ++++
>   arch/arm64/lib/Makefile                       |  6 +-
>   arch/loongarch/Kconfig                        |  1 +
>   arch/loongarch/Makefile                       |  5 +-
>   arch/loongarch/include/asm/fpu.h              |  1 +
>   arch/powerpc/Kconfig                          |  1 +
>   arch/powerpc/Makefile                         |  5 +-
>   arch/powerpc/include/asm/fpu.h                | 28 +++++++
>   arch/riscv/Kconfig                            |  1 +
>   arch/riscv/Makefile                           |  3 +
>   arch/riscv/include/asm/fpu.h                  | 16 ++++
>   arch/riscv/kernel/Makefile                    |  1 +
>   arch/riscv/kernel/kernel_mode_fpu.c           | 28 +++++++
>   arch/x86/Kconfig                              |  1 +
>   arch/x86/Makefile                             | 20 +++++
>   arch/x86/include/asm/fpu.h                    | 13 ++++
>   arch/x86/include/asm/fpu/types.h              |  6 +-
>   drivers/gpu/drm/amd/display/Kconfig           |  2 +-
>   .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c    | 35 +--------
>   drivers/gpu/drm/amd/display/dc/dml/Makefile   | 36 +--------
>   drivers/gpu/drm/amd/display/dc/dml2/Makefile  | 36 +--------
>   include/linux/fpu.h                           | 12 +++
>   lib/Kconfig.debug                             |  2 +-
>   lib/Makefile                                  | 26 +------
>   lib/raid6/Makefile                            | 33 +++-----
>   lib/test_fpu.h                                |  8 ++
>   lib/{test_fpu.c => test_fpu_glue.c}           | 37 ++-------
>   lib/test_fpu_impl.c                           | 37 +++++++++
>   38 files changed, 348 insertions(+), 193 deletions(-)
>   create mode 100644 Documentation/core-api/floating-point.rst
>   create mode 100644 arch/arm/include/asm/fpu.h
>   create mode 100644 arch/arm64/include/asm/fpu.h
>   create mode 100644 arch/powerpc/include/asm/fpu.h
>   create mode 100644 arch/riscv/include/asm/fpu.h
>   create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c
>   create mode 100644 arch/x86/include/asm/fpu.h
>   create mode 100644 include/linux/fpu.h
>   create mode 100644 lib/test_fpu.h
>   rename lib/{test_fpu.c => test_fpu_glue.c} (71%)
>   create mode 100644 lib/test_fpu_impl.c
>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 ` [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
@ 2024-04-10 22:21   ` Thiago Jung Bauermann
  2024-04-10 22:47     ` Samuel Holland
  2024-04-30 17:29   ` Harry Wentland
  2024-05-24  4:50   ` Guenter Roeck
  2 siblings, 1 reply; 36+ messages in thread
From: Thiago Jung Bauermann @ 2024-04-10 22:21 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Andrew Morton, linux-arm-kernel, x86, linux-kernel, linux-arch,
	linuxppc-dev, linux-riscv, Christoph Hellwig, loongarch, amd-gfx,
	Alex Deucher


Hello,

Samuel Holland <samuel.holland@sifive.com> writes:

> Now that all previously-supported architectures select
> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> of the existing list of architectures. It can also take advantage of the
> common kernel-mode FPU API and method of adjusting CFLAGS.
>
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>

Unfortunately this patch causes build failures on arm with allyesconfig
and allmodconfig. Tested with next-20240410.

Error with allyesconfig:

$ make -j 8 \
    O=$HOME/.cache/builds/linux-cross-arm \
    ARCH=arm \
    CROSS_COMPILE=arm-linux-gnueabihf-
make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm'
    ⋮
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.o: in function `dcn20_populate_dml_pipes_from_context':
dcn20_fpu.c:(.text+0x20f4): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x210c): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x2124): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x213c): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o: in function `pipe_ctx_to_e2e_pipe_params':
dcn_calcs.c:(.text+0x390): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o:dcn_calcs.c:(.text+0x3a4): more undefined references to `__aeabi_l2d' follow
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.o: in function `optimize_configuration':
dml2_wrapper.c:(.text+0xcbc): undefined reference to `__aeabi_d2ulz'
arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.o: in function `populate_dml_plane_cfg_from_plane_state':
dml2_translation_helper.c:(.text+0x9e4): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa20): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa58): undefined reference to `__aeabi_l2d'
arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa90): undefined reference to `__aeabi_l2d'
make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.vmlinux:37: vmlinux] Error 1
make[2]: *** [/home/bauermann/src/linux/Makefile:1165: vmlinux] Error 2
make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm'
make: *** [Makefile:240: __sub-make] Error 2

The error with allmodconfig is slightly different:

$ make -j 8 \
    O=$HOME/.cache/builds/linux-cross-arm \
    ARCH=arm \
    CROSS_COMPILE=arm-linux-gnueabihf-
make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm'
    ⋮
ERROR: modpost: "__aeabi_d2ulz" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: modpost: "__aeabi_l2d" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.modpost:145: Module.symvers] Error 1
make[2]: *** [/home/bauermann/src/linux/Makefile:1876: modpost] Error 2
make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm'
make: *** [Makefile:240: __sub-make] Error 2

--
Thiago

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-04-10 22:21   ` Thiago Jung Bauermann
@ 2024-04-10 22:47     ` Samuel Holland
  2024-04-11  1:02       ` Thiago Jung Bauermann
  0 siblings, 1 reply; 36+ messages in thread
From: Samuel Holland @ 2024-04-10 22:47 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Andrew Morton, linux-arm-kernel, x86, linux-kernel, linux-arch,
	linuxppc-dev, linux-riscv, Christoph Hellwig, loongarch, amd-gfx,
	Alex Deucher

Hi Thiago,

On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
> Samuel Holland <samuel.holland@sifive.com> writes:
> 
>> Now that all previously-supported architectures select
>> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
>> of the existing list of architectures. It can also take advantage of the
>> common kernel-mode FPU API and method of adjusting CFLAGS.
>>
>> Acked-by: Alex Deucher <alexander.deucher@amd.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> 
> Unfortunately this patch causes build failures on arm with allyesconfig
> and allmodconfig. Tested with next-20240410.
> 
> Error with allyesconfig:
> 
> $ make -j 8 \
>     O=$HOME/.cache/builds/linux-cross-arm \
>     ARCH=arm \
>     CROSS_COMPILE=arm-linux-gnueabihf-
> make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm'
>     ⋮
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.o: in function `dcn20_populate_dml_pipes_from_context':
> dcn20_fpu.c:(.text+0x20f4): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x210c): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x2124): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x213c): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o: in function `pipe_ctx_to_e2e_pipe_params':
> dcn_calcs.c:(.text+0x390): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o:dcn_calcs.c:(.text+0x3a4): more undefined references to `__aeabi_l2d' follow
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.o: in function `optimize_configuration':
> dml2_wrapper.c:(.text+0xcbc): undefined reference to `__aeabi_d2ulz'
> arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.o: in function `populate_dml_plane_cfg_from_plane_state':
> dml2_translation_helper.c:(.text+0x9e4): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa20): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa58): undefined reference to `__aeabi_l2d'
> arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa90): undefined reference to `__aeabi_l2d'
> make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.vmlinux:37: vmlinux] Error 1
> make[2]: *** [/home/bauermann/src/linux/Makefile:1165: vmlinux] Error 2
> make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2
> make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm'
> make: *** [Makefile:240: __sub-make] Error 2
> 
> The error with allmodconfig is slightly different:
> 
> $ make -j 8 \
>     O=$HOME/.cache/builds/linux-cross-arm \
>     ARCH=arm \
>     CROSS_COMPILE=arm-linux-gnueabihf-
> make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm'
>     ⋮
> ERROR: modpost: "__aeabi_d2ulz" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
> ERROR: modpost: "__aeabi_l2d" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
> make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.modpost:145: Module.symvers] Error 1
> make[2]: *** [/home/bauermann/src/linux/Makefile:1876: modpost] Error 2
> make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2
> make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm'
> make: *** [Makefile:240: __sub-make] Error 2

In both cases, the issue is that the toolchain requires runtime support to
convert between `unsigned long long` and `double`, even when hardware FP is
enabled. There was some past discussion about GCC inlining some of these
conversions[1], but that did not get implemented.

The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
32-bit arm until we can provide these runtime library functions.

Regards,
Samuel

[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-04-10 22:47     ` Samuel Holland
@ 2024-04-11  1:02       ` Thiago Jung Bauermann
  2024-04-11  1:11         ` Samuel Holland
  0 siblings, 1 reply; 36+ messages in thread
From: Thiago Jung Bauermann @ 2024-04-11  1:02 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Andrew Morton, linux-arm-kernel, x86, linux-kernel, linux-arch,
	linuxppc-dev, linux-riscv, Christoph Hellwig, loongarch, amd-gfx,
	Alex Deucher


Hello Samuel,

Thank you for the quick reply!

Samuel Holland <samuel.holland@sifive.com> writes:
> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
>>
>> Unfortunately this patch causes build failures on arm with allyesconfig
>> and allmodconfig. Tested with next-20240410.

<snip>

> In both cases, the issue is that the toolchain requires runtime support to
> convert between `unsigned long long` and `double`, even when hardware FP is
> enabled. There was some past discussion about GCC inlining some of these
> conversions[1], but that did not get implemented.

Thank you for the explanation and the bugzilla reference. I added a
comment there mentioning that the problem came up again with this patch
series.

> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
> 32-bit arm until we can provide these runtime library functions.

Does this mean that patch 2 in this series:

[PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT

will be dropped?

--
Thiago

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-04-11  1:02       ` Thiago Jung Bauermann
@ 2024-04-11  1:11         ` Samuel Holland
  2024-04-11  1:27           ` Thiago Jung Bauermann
  2024-04-11  7:15           ` Ard Biesheuvel
  0 siblings, 2 replies; 36+ messages in thread
From: Samuel Holland @ 2024-04-11  1:11 UTC (permalink / raw)
  To: Thiago Jung Bauermann
  Cc: Andrew Morton, linux-arm-kernel, x86, linux-kernel, linux-arch,
	linuxppc-dev, linux-riscv, Christoph Hellwig, loongarch, amd-gfx,
	Alex Deucher

Hi Thiago,

On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
> Samuel Holland <samuel.holland@sifive.com> writes:
>> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
>>>
>>> Unfortunately this patch causes build failures on arm with allyesconfig
>>> and allmodconfig. Tested with next-20240410.
> 
> <snip>
> 
>> In both cases, the issue is that the toolchain requires runtime support to
>> convert between `unsigned long long` and `double`, even when hardware FP is
>> enabled. There was some past discussion about GCC inlining some of these
>> conversions[1], but that did not get implemented.
> 
> Thank you for the explanation and the bugzilla reference. I added a
> comment there mentioning that the problem came up again with this patch
> series.
> 
>> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
>> 32-bit arm until we can provide these runtime library functions.
> 
> Does this mean that patch 2 in this series:
> 
> [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> 
> will be dropped?

No, because later patches in the series (3, 6) depend on the definition of
CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
find a GPL-2 compatible implementation of the runtime library functions.

Regards,
Samuel


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-04-11  1:11         ` Samuel Holland
@ 2024-04-11  1:27           ` Thiago Jung Bauermann
  2024-04-11  7:15           ` Ard Biesheuvel
  1 sibling, 0 replies; 36+ messages in thread
From: Thiago Jung Bauermann @ 2024-04-11  1:27 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Andrew Morton, linux-arm-kernel, x86, linux-kernel, linux-arch,
	linuxppc-dev, linux-riscv, Christoph Hellwig, loongarch, amd-gfx,
	Alex Deucher


Hello Samuel,

Samuel Holland <samuel.holland@sifive.com> writes:

> On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
>> Samuel Holland <samuel.holland@sifive.com> writes:
>>> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
>>>>
>>>> Unfortunately this patch causes build failures on arm with allyesconfig
>>>> and allmodconfig. Tested with next-20240410.
>>
>> <snip>
>>
>>> In both cases, the issue is that the toolchain requires runtime support to
>>> convert between `unsigned long long` and `double`, even when hardware FP is
>>> enabled. There was some past discussion about GCC inlining some of these
>>> conversions[1], but that did not get implemented.
>>
>> Thank you for the explanation and the bugzilla reference. I added a
>> comment there mentioning that the problem came up again with this patch
>> series.
>>
>>> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
>>> 32-bit arm until we can provide these runtime library functions.
>>
>> Does this mean that patch 2 in this series:
>>
>> [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>>
>> will be dropped?
>
> No, because later patches in the series (3, 6) depend on the definition of
> CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
> find a GPL-2 compatible implementation of the runtime library functions.

Ok, thank you for clarifying.

Andrew Pinski just responded on the GCC bugzilla and if I understood his
point correctly, it seems to be a matter of changing function names to
what GCC (or actually the arm EABI) expects...

--
Thiago

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-04-11  1:11         ` Samuel Holland
  2024-04-11  1:27           ` Thiago Jung Bauermann
@ 2024-04-11  7:15           ` Ard Biesheuvel
  2024-04-11  7:31             ` Arnd Bergmann
  1 sibling, 1 reply; 36+ messages in thread
From: Ard Biesheuvel @ 2024-04-11  7:15 UTC (permalink / raw)
  To: Samuel Holland, Arnd Bergmann
  Cc: Thiago Jung Bauermann, Andrew Morton, linux-arm-kernel, x86,
	linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Alex Deucher

(cc Arnd)

On Thu, 11 Apr 2024 at 03:11, Samuel Holland <samuel.holland@sifive.com> wrote:
>
> Hi Thiago,
>
> On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
> > Samuel Holland <samuel.holland@sifive.com> writes:
> >> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote:
> >>>
> >>> Unfortunately this patch causes build failures on arm with allyesconfig
> >>> and allmodconfig. Tested with next-20240410.
> >
> > <snip>
> >
> >> In both cases, the issue is that the toolchain requires runtime support to
> >> convert between `unsigned long long` and `double`, even when hardware FP is
> >> enabled. There was some past discussion about GCC inlining some of these
> >> conversions[1], but that did not get implemented.
> >
> > Thank you for the explanation and the bugzilla reference. I added a
> > comment there mentioning that the problem came up again with this patch
> > series.
> >
> >> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
> >> 32-bit arm until we can provide these runtime library functions.
> >
> > Does this mean that patch 2 in this series:
> >
> > [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> >
> > will be dropped?
>
> No, because later patches in the series (3, 6) depend on the definition of
> CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
> find a GPL-2 compatible implementation of the runtime library functions.
>

Is there really a point to doing that? Do 32-bit ARM systems even have
enough address space to the map the BARs of the AMD GPUs that need
this support?

Given that this was not enabled before, I don't think the upshot of
this series should be that we enable support for something on 32-bit
ARM that may cause headaches down the road without any benefit.

So I'd prefer a fixup patch that opts ARM out of this over adding
support code for 64-bit conversions.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-04-11  7:15           ` Ard Biesheuvel
@ 2024-04-11  7:31             ` Arnd Bergmann
  2024-04-12  1:54               ` Dave Airlie
  0 siblings, 1 reply; 36+ messages in thread
From: Arnd Bergmann @ 2024-04-11  7:31 UTC (permalink / raw)
  To: Ard Biesheuvel, Samuel Holland
  Cc: Thiago Jung Bauermann, Andrew Morton, linux-arm-kernel, x86,
	linux-kernel, Linux-Arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Alex Deucher

On Thu, Apr 11, 2024, at 09:15, Ard Biesheuvel wrote:
> On Thu, 11 Apr 2024 at 03:11, Samuel Holland <samuel.holland@sifive.com> wrote:
>> On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
>> > Samuel Holland <samuel.holland@sifive.com> writes:
>>
>> >> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
>> >> 32-bit arm until we can provide these runtime library functions.
>> >
>> > Does this mean that patch 2 in this series:
>> >
>> > [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
>> >
>> > will be dropped?
>>
>> No, because later patches in the series (3, 6) depend on the definition of
>> CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
>> find a GPL-2 compatible implementation of the runtime library functions.
>>
>
> Is there really a point to doing that? Do 32-bit ARM systems even have
> enough address space to the map the BARs of the AMD GPUs that need
> this support?
>
> Given that this was not enabled before, I don't think the upshot of
> this series should be that we enable support for something on 32-bit
> ARM that may cause headaches down the road without any benefit.
>
> So I'd prefer a fixup patch that opts ARM out of this over adding
> support code for 64-bit conversions.

I have not found any dts file for a 32-bit platform with support
for a 64-bit prefetchable BAR, and there are very few that even
have a pcie slot (as opposed on on-board devices) you could
plug a card into.

That said, I also don't think we should encourage the use of
floating-point code in random device drivers. There is really
no excuse for the amdgpu driver to use floating point math
here, and we should get AMD to fix their driver instead.

     Arnd

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-04-11  7:31             ` Arnd Bergmann
@ 2024-04-12  1:54               ` Dave Airlie
  0 siblings, 0 replies; 36+ messages in thread
From: Dave Airlie @ 2024-04-12  1:54 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ard Biesheuvel, Samuel Holland, Thiago Jung Bauermann,
	Andrew Morton, linux-arm-kernel, x86, linux-kernel, Linux-Arch,
	linuxppc-dev, linux-riscv, Christoph Hellwig, loongarch, amd-gfx,
	Alex Deucher

On Thu, 11 Apr 2024 at 17:32, Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Apr 11, 2024, at 09:15, Ard Biesheuvel wrote:
> > On Thu, 11 Apr 2024 at 03:11, Samuel Holland <samuel.holland@sifive.com> wrote:
> >> On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote:
> >> > Samuel Holland <samuel.holland@sifive.com> writes:
> >>
> >> >> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for
> >> >> 32-bit arm until we can provide these runtime library functions.
> >> >
> >> > Does this mean that patch 2 in this series:
> >> >
> >> > [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
> >> >
> >> > will be dropped?
> >>
> >> No, because later patches in the series (3, 6) depend on the definition of
> >> CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can
> >> find a GPL-2 compatible implementation of the runtime library functions.
> >>
> >
> > Is there really a point to doing that? Do 32-bit ARM systems even have
> > enough address space to the map the BARs of the AMD GPUs that need
> > this support?
> >
> > Given that this was not enabled before, I don't think the upshot of
> > this series should be that we enable support for something on 32-bit
> > ARM that may cause headaches down the road without any benefit.
> >
> > So I'd prefer a fixup patch that opts ARM out of this over adding
> > support code for 64-bit conversions.
>
> I have not found any dts file for a 32-bit platform with support
> for a 64-bit prefetchable BAR, and there are very few that even
> have a pcie slot (as opposed on on-board devices) you could
> plug a card into.
>
> That said, I also don't think we should encourage the use of
> floating-point code in random device drivers. There is really
> no excuse for the amdgpu driver to use floating point math
> here, and we should get AMD to fix their driver instead.

That would be nice, but it won't happen, there are many reasons for
that code to exist like it does, unless someone can write an automated
converter to fixed point and validate it produces the same results for
a long series of input values, it isn't really something that will get
"fixed".

AMD's hardware team produces the calculations, and will only look into
hardware problems in that area if the driver is using the calculations
they produce and validate.

If you've looked at the calculation complexity you'd understand this
isn't a trivial use of float-point for no reason.

Dave.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc
  2024-03-29  7:18 ` [PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland
@ 2024-04-30 17:28   ` Harry Wentland
  0 siblings, 0 replies; 36+ messages in thread
From: Harry Wentland @ 2024-04-30 17:28 UTC (permalink / raw)
  To: Samuel Holland, Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Michael Ellerman,
	Alex Deucher

On 2024-03-29 03:18, Samuel Holland wrote:
> From: Michael Ellerman <mpe@ellerman.id.au>
> 
> The compiler flags enable altivec, but that is not required; hard-float
> is sufficient for the code to build and function.
> 
> Drop altivec from the compiler flags and adjust the enable/disable code
> to only enable FPU use.
> 
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>

Acked-by: Harry Wentland <harry.wentland@amd.com>

Harry

> ---
> 
> (no changes since v2)
> 
> Changes in v2:
>  - New patch for v2
> 
>  drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++----------
>  drivers/gpu/drm/amd/display/dc/dml/Makefile    |  2 +-
>  drivers/gpu/drm/amd/display/dc/dml2/Makefile   |  2 +-
>  3 files changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> index 4ae4720535a5..0de16796466b 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> @@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line)
>  #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
>  		kernel_fpu_begin();
>  #elif defined(CONFIG_PPC64)
> -		if (cpu_has_feature(CPU_FTR_VSX_COMP))
> -			enable_kernel_vsx();
> -		else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
> -			enable_kernel_altivec();
> -		else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> +		if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
>  			enable_kernel_fp();
>  #elif defined(CONFIG_ARM64)
>  		kernel_neon_begin();
> @@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line)
>  #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
>  		kernel_fpu_end();
>  #elif defined(CONFIG_PPC64)
> -		if (cpu_has_feature(CPU_FTR_VSX_COMP))
> -			disable_kernel_vsx();
> -		else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP))
> -			disable_kernel_altivec();
> -		else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> +		if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
>  			disable_kernel_fp();
>  #elif defined(CONFIG_ARM64)
>  		kernel_neon_end();
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> index c4a5efd2dda5..59d3972341d2 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> @@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse
>  endif
>  
>  ifdef CONFIG_PPC64
> -dml_ccflags := -mhard-float -maltivec
> +dml_ccflags := -mhard-float
>  endif
>  
>  ifdef CONFIG_ARM64
> diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> index acff3449b8d7..7b51364084b5 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> @@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse
>  endif
>  
>  ifdef CONFIG_PPC64
> -dml2_ccflags := -mhard-float -maltivec
> +dml2_ccflags := -mhard-float
>  endif
>  
>  ifdef CONFIG_ARM64


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 ` [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
  2024-04-10 22:21   ` Thiago Jung Bauermann
@ 2024-04-30 17:29   ` Harry Wentland
  2024-05-24  4:50   ` Guenter Roeck
  2 siblings, 0 replies; 36+ messages in thread
From: Harry Wentland @ 2024-04-30 17:29 UTC (permalink / raw)
  To: Samuel Holland, Andrew Morton, linux-arm-kernel, x86
  Cc: linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Alex Deucher



On 2024-03-29 03:18, Samuel Holland wrote:
> Now that all previously-supported architectures select
> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> of the existing list of architectures. It can also take advantage of the
> common kernel-mode FPU API and method of adjusting CFLAGS.
> 
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Really nice set of changes.

Acked-by: Harry Wentland <harry.wentland@amd.com>

Harry

> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> ---
> 
> (no changes since v2)
> 
> Changes in v2:
>  - Split altivec removal to a separate patch
>  - Use linux/fpu.h instead of asm/fpu.h in consumers
> 
>  drivers/gpu/drm/amd/display/Kconfig           |  2 +-
>  .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c    | 27 ++------------
>  drivers/gpu/drm/amd/display/dc/dml/Makefile   | 36 ++-----------------
>  drivers/gpu/drm/amd/display/dc/dml2/Makefile  | 36 ++-----------------
>  4 files changed, 7 insertions(+), 94 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig
> index 901d1961b739..5fcd4f778dc3 100644
> --- a/drivers/gpu/drm/amd/display/Kconfig
> +++ b/drivers/gpu/drm/amd/display/Kconfig
> @@ -8,7 +8,7 @@ config DRM_AMD_DC
>  	depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64
>  	select SND_HDA_COMPONENT if SND_HDA_CORE
>  	# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
> -	select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG))
> +	select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG)
>  	help
>  	  Choose this option if you want to use the new display engine
>  	  support for AMDGPU. This adds required support for Vega and
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> index 0de16796466b..e46f8ce41d87 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c
> @@ -26,16 +26,7 @@
>  
>  #include "dc_trace.h"
>  
> -#if defined(CONFIG_X86)
> -#include <asm/fpu/api.h>
> -#elif defined(CONFIG_PPC64)
> -#include <asm/switch_to.h>
> -#include <asm/cputable.h>
> -#elif defined(CONFIG_ARM64)
> -#include <asm/neon.h>
> -#elif defined(CONFIG_LOONGARCH)
> -#include <asm/fpu.h>
> -#endif
> +#include <linux/fpu.h>
>  
>  /**
>   * DOC: DC FPU manipulation overview
> @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line)
>  	WARN_ON_ONCE(!in_task());
>  	preempt_disable();
>  	depth = __this_cpu_inc_return(fpu_recursion_depth);
> -
>  	if (depth == 1) {
> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
> +		BUG_ON(!kernel_fpu_available());
>  		kernel_fpu_begin();
> -#elif defined(CONFIG_PPC64)
> -		if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> -			enable_kernel_fp();
> -#elif defined(CONFIG_ARM64)
> -		kernel_neon_begin();
> -#endif
>  	}
>  
>  	TRACE_DCN_FPU(true, function_name, line, depth);
> @@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line)
>  
>  	depth = __this_cpu_dec_return(fpu_recursion_depth);
>  	if (depth == 0) {
> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH)
>  		kernel_fpu_end();
> -#elif defined(CONFIG_PPC64)
> -		if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE))
> -			disable_kernel_fp();
> -#elif defined(CONFIG_ARM64)
> -		kernel_neon_end();
> -#endif
>  	} else {
>  		WARN_ON_ONCE(depth < 0);
>  	}
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> index 59d3972341d2..a94b6d546cd1 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile
> @@ -25,40 +25,8 @@
>  # It provides the general basic services required by other DAL
>  # subcomponents.
>  
> -ifdef CONFIG_X86
> -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
> -dml_ccflags := $(dml_ccflags-y) -msse
> -endif
> -
> -ifdef CONFIG_PPC64
> -dml_ccflags := -mhard-float
> -endif
> -
> -ifdef CONFIG_ARM64
> -dml_rcflags := -mgeneral-regs-only
> -endif
> -
> -ifdef CONFIG_LOONGARCH
> -dml_ccflags := -mfpu=64
> -dml_rcflags := -msoft-float
> -endif
> -
> -ifdef CONFIG_CC_IS_GCC
> -ifneq ($(call gcc-min-version, 70100),y)
> -IS_OLD_GCC = 1
> -endif
> -endif
> -
> -ifdef CONFIG_X86
> -ifdef IS_OLD_GCC
> -# Stack alignment mismatch, proceed with caution.
> -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
> -# (8B stack alignment).
> -dml_ccflags += -mpreferred-stack-boundary=4
> -else
> -dml_ccflags += -msse2
> -endif
> -endif
> +dml_ccflags := $(CC_FLAGS_FPU)
> +dml_rcflags := $(CC_FLAGS_NO_FPU)
>  
>  ifneq ($(CONFIG_FRAME_WARN),0)
>  ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)
> diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> index 7b51364084b5..4f6c804a26ad 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> @@ -24,40 +24,8 @@
>  #
>  # Makefile for dml2.
>  
> -ifdef CONFIG_X86
> -dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float
> -dml2_ccflags := $(dml2_ccflags-y) -msse
> -endif
> -
> -ifdef CONFIG_PPC64
> -dml2_ccflags := -mhard-float
> -endif
> -
> -ifdef CONFIG_ARM64
> -dml2_rcflags := -mgeneral-regs-only
> -endif
> -
> -ifdef CONFIG_LOONGARCH
> -dml2_ccflags := -mfpu=64
> -dml2_rcflags := -msoft-float
> -endif
> -
> -ifdef CONFIG_CC_IS_GCC
> -ifeq ($(call cc-ifversion, -lt, 0701, y), y)
> -IS_OLD_GCC = 1
> -endif
> -endif
> -
> -ifdef CONFIG_X86
> -ifdef IS_OLD_GCC
> -# Stack alignment mismatch, proceed with caution.
> -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3
> -# (8B stack alignment).
> -dml2_ccflags += -mpreferred-stack-boundary=4
> -else
> -dml2_ccflags += -msse2
> -endif
> -endif
> +dml2_ccflags := $(CC_FLAGS_FPU)
> +dml2_rcflags := $(CC_FLAGS_NO_FPU)
>  
>  ifneq ($(CONFIG_FRAME_WARN),0)
>  ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-03-29  7:18 ` [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
  2024-04-10 22:21   ` Thiago Jung Bauermann
  2024-04-30 17:29   ` Harry Wentland
@ 2024-05-24  4:50   ` Guenter Roeck
  2024-05-24 13:17     ` Alex Deucher
  2 siblings, 1 reply; 36+ messages in thread
From: Guenter Roeck @ 2024-05-24  4:50 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Andrew Morton, linux-arm-kernel, x86, linux-kernel, linux-arch,
	linuxppc-dev, linux-riscv, Christoph Hellwig, loongarch, amd-gfx,
	Alex Deucher

Hi,

On Fri, Mar 29, 2024 at 12:18:28AM -0700, Samuel Holland wrote:
> Now that all previously-supported architectures select
> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> of the existing list of architectures. It can also take advantage of the
> common kernel-mode FPU API and method of adjusting CFLAGS.
> 
> Acked-by: Alex Deucher <alexander.deucher@amd.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>

With this patch in the mainline kernel, I get the following build error
when trying to build powerpc:ppc32_allmodconfig.

powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.o uses soft float
powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o

[ repeated many times ]

The problem is no longer seen after reverting this patch.

Guenter

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-05-24  4:50   ` Guenter Roeck
@ 2024-05-24 13:17     ` Alex Deucher
  2024-05-24 13:18       ` Alex Deucher
  0 siblings, 1 reply; 36+ messages in thread
From: Alex Deucher @ 2024-05-24 13:17 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Samuel Holland, Andrew Morton, linux-arm-kernel, x86,
	linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Alex Deucher

On Fri, May 24, 2024 at 5:16 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> Hi,
>
> On Fri, Mar 29, 2024 at 12:18:28AM -0700, Samuel Holland wrote:
> > Now that all previously-supported architectures select
> > ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> > of the existing list of architectures. It can also take advantage of the
> > common kernel-mode FPU API and method of adjusting CFLAGS.
> >
> > Acked-by: Alex Deucher <alexander.deucher@amd.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>
> With this patch in the mainline kernel, I get the following build error
> when trying to build powerpc:ppc32_allmodconfig.
>
> powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.o uses soft float
> powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o
>
> [ repeated many times ]
>
> The problem is no longer seen after reverting this patch.

Should be fixed with this patch:
https://gitlab.freedesktop.org/agd5f/linux/-/commit/5f56be33f33dd1d50b9433f842c879a20dc00f5b
Will pull it into my -fixes branch.

Alex

>
> Guenter

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-05-24 13:17     ` Alex Deucher
@ 2024-05-24 13:18       ` Alex Deucher
  2024-05-24 13:43         ` Guenter Roeck
  2024-05-24 18:23         ` Guenter Roeck
  0 siblings, 2 replies; 36+ messages in thread
From: Alex Deucher @ 2024-05-24 13:18 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Samuel Holland, Andrew Morton, linux-arm-kernel, x86,
	linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Alex Deucher

On Fri, May 24, 2024 at 9:17 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Fri, May 24, 2024 at 5:16 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > Hi,
> >
> > On Fri, Mar 29, 2024 at 12:18:28AM -0700, Samuel Holland wrote:
> > > Now that all previously-supported architectures select
> > > ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
> > > of the existing list of architectures. It can also take advantage of the
> > > common kernel-mode FPU API and method of adjusting CFLAGS.
> > >
> > > Acked-by: Alex Deucher <alexander.deucher@amd.com>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
> >
> > With this patch in the mainline kernel, I get the following build error
> > when trying to build powerpc:ppc32_allmodconfig.
> >
> > powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.o uses soft float
> > powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o
> >
> > [ repeated many times ]
> >
> > The problem is no longer seen after reverting this patch.
>
> Should be fixed with this patch:
> https://gitlab.freedesktop.org/agd5f/linux/-/commit/5f56be33f33dd1d50b9433f842c879a20dc00f5b
> Will pull it into my -fixes branch.
>

Nevermind, this is something else.

Alex


> Alex
>
> >
> > Guenter

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-05-24 13:18       ` Alex Deucher
@ 2024-05-24 13:43         ` Guenter Roeck
  2024-05-24 13:53           ` Guenter Roeck
  2024-05-24 18:23         ` Guenter Roeck
  1 sibling, 1 reply; 36+ messages in thread
From: Guenter Roeck @ 2024-05-24 13:43 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Samuel Holland, Andrew Morton, linux-arm-kernel, x86,
	linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Alex Deucher

On 5/24/24 06:18, Alex Deucher wrote:
> On Fri, May 24, 2024 at 9:17 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>
>> On Fri, May 24, 2024 at 5:16 AM Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>> Hi,
>>>
>>> On Fri, Mar 29, 2024 at 12:18:28AM -0700, Samuel Holland wrote:
>>>> Now that all previously-supported architectures select
>>>> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
>>>> of the existing list of architectures. It can also take advantage of the
>>>> common kernel-mode FPU API and method of adjusting CFLAGS.
>>>>
>>>> Acked-by: Alex Deucher <alexander.deucher@amd.com>
>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>>>
>>> With this patch in the mainline kernel, I get the following build error
>>> when trying to build powerpc:ppc32_allmodconfig.
>>>
>>> powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.o uses soft float
>>> powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o
>>>
>>> [ repeated many times ]
>>>
>>> The problem is no longer seen after reverting this patch.
>>
>> Should be fixed with this patch:
>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/5f56be33f33dd1d50b9433f842c879a20dc00f5b
>> Will pull it into my -fixes branch.
>>
> 
> Nevermind, this is something else.
> 

Yes, CONFIG_DRM_AMD_DC_FP is enabled in that configuration.

Guenter


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-05-24 13:43         ` Guenter Roeck
@ 2024-05-24 13:53           ` Guenter Roeck
  0 siblings, 0 replies; 36+ messages in thread
From: Guenter Roeck @ 2024-05-24 13:53 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Samuel Holland, Andrew Morton, linux-arm-kernel, x86,
	linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Alex Deucher

On 5/24/24 06:43, Guenter Roeck wrote:
> On 5/24/24 06:18, Alex Deucher wrote:
>> On Fri, May 24, 2024 at 9:17 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>>
>>> On Fri, May 24, 2024 at 5:16 AM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On Fri, Mar 29, 2024 at 12:18:28AM -0700, Samuel Holland wrote:
>>>>> Now that all previously-supported architectures select
>>>>> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
>>>>> of the existing list of architectures. It can also take advantage of the
>>>>> common kernel-mode FPU API and method of adjusting CFLAGS.
>>>>>
>>>>> Acked-by: Alex Deucher <alexander.deucher@amd.com>
>>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>>>>
>>>> With this patch in the mainline kernel, I get the following build error
>>>> when trying to build powerpc:ppc32_allmodconfig.
>>>>
>>>> powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.o uses soft float
>>>> powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o
>>>>
>>>> [ repeated many times ]
>>>>
>>>> The problem is no longer seen after reverting this patch.
>>>
>>> Should be fixed with this patch:
>>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/5f56be33f33dd1d50b9433f842c879a20dc00f5b
>>> Will pull it into my -fixes branch.
>>>
>>
>> Nevermind, this is something else.
>>
> 
> Yes, CONFIG_DRM_AMD_DC_FP is enabled in that configuration.
> 

Also, the above patch does not apply upstream; the endif is already in
the correct place in the upstream kernel (though the commit introducing
it adds a blank line at the end of the file)

Guenter


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
  2024-05-24 13:18       ` Alex Deucher
  2024-05-24 13:43         ` Guenter Roeck
@ 2024-05-24 18:23         ` Guenter Roeck
  1 sibling, 0 replies; 36+ messages in thread
From: Guenter Roeck @ 2024-05-24 18:23 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Samuel Holland, Andrew Morton, linux-arm-kernel, x86,
	linux-kernel, linux-arch, linuxppc-dev, linux-riscv,
	Christoph Hellwig, loongarch, amd-gfx, Alex Deucher

On 5/24/24 06:18, Alex Deucher wrote:
> On Fri, May 24, 2024 at 9:17 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>>
>> On Fri, May 24, 2024 at 5:16 AM Guenter Roeck <linux@roeck-us.net> wrote:
>>>
>>> Hi,
>>>
>>> On Fri, Mar 29, 2024 at 12:18:28AM -0700, Samuel Holland wrote:
>>>> Now that all previously-supported architectures select
>>>> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead
>>>> of the existing list of architectures. It can also take advantage of the
>>>> common kernel-mode FPU API and method of adjusting CFLAGS.
>>>>
>>>> Acked-by: Alex Deucher <alexander.deucher@amd.com>
>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>> Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
>>>
>>> With this patch in the mainline kernel, I get the following build error
>>> when trying to build powerpc:ppc32_allmodconfig.
>>>
>>> powerpc64-linux-ld: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o uses hard float, drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_helpers.o uses soft float
>>> powerpc64-linux-ld: failed to merge target specific data of file drivers/gpu/drm/amd/amdgpu/../display/dc/dml/display_mode_lib.o
>>>
>>> [ repeated many times ]
>>>
>>> The problem is no longer seen after reverting this patch.
>>
>> Should be fixed with this patch:
>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/5f56be33f33dd1d50b9433f842c879a20dc00f5b
>> Will pull it into my -fixes branch.
>>
> 
> Nevermind, this is something else.
> 

mdgpu_dm_helpers.o is built with -msoft-float, display_mode_lib.o is built
with -mhard-float.

-msoft-float is from

arch/powerpc/Makefile:KBUILD_CFLAGS     += $(CC_FLAGS_NO_FPU)

-mhard-float is from

drivers/gpu/drm/amd/display/dc/dml/Makefile:dml_ccflags := $(CC_FLAGS_FPU)
drivers/gpu/drm/amd/display/dc/dml/Makefile:dml_rcflags := $(CC_FLAGS_NO_FPU)
drivers/gpu/drm/amd/display/dc/dml2/Makefile:dml2_ccflags := $(CC_FLAGS_FPU)
drivers/gpu/drm/amd/display/dc/dml2/Makefile:dml2_rcflags := $(CC_FLAGS_NO_FPU)

Unfortunately I have no idea know how to resolve this.

Guenter


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-05-24 18:23 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-29  7:18 [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Samuel Holland
2024-03-29  7:18 ` [PATCH v4 01/15] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
2024-03-29  7:18 ` [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
2024-03-29  7:18 ` [PATCH v4 03/15] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS Samuel Holland
2024-03-29  7:18 ` [PATCH v4 04/15] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
2024-03-29  7:18 ` [PATCH v4 05/15] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS Samuel Holland
2024-03-29  7:18 ` [PATCH v4 06/15] lib/raid6: " Samuel Holland
2024-03-29  7:18 ` [PATCH v4 07/15] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
2024-03-29  7:18 ` [PATCH v4 08/15] powerpc: " Samuel Holland
2024-03-29  7:18 ` [PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard Samuel Holland
2024-03-29 17:30   ` Dave Hansen
2024-03-29  7:18 ` [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
2024-03-29 17:28   ` Dave Hansen
2024-03-29 18:02     ` Samuel Holland
2024-03-29  7:18 ` [PATCH v4 11/15] riscv: Add support for kernel-mode FPU Samuel Holland
2024-03-29  7:18 ` [PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland
2024-04-30 17:28   ` Harry Wentland
2024-03-29  7:18 ` [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT Samuel Holland
2024-04-10 22:21   ` Thiago Jung Bauermann
2024-04-10 22:47     ` Samuel Holland
2024-04-11  1:02       ` Thiago Jung Bauermann
2024-04-11  1:11         ` Samuel Holland
2024-04-11  1:27           ` Thiago Jung Bauermann
2024-04-11  7:15           ` Ard Biesheuvel
2024-04-11  7:31             ` Arnd Bergmann
2024-04-12  1:54               ` Dave Airlie
2024-04-30 17:29   ` Harry Wentland
2024-05-24  4:50   ` Guenter Roeck
2024-05-24 13:17     ` Alex Deucher
2024-05-24 13:18       ` Alex Deucher
2024-05-24 13:43         ` Guenter Roeck
2024-05-24 13:53           ` Guenter Roeck
2024-05-24 18:23         ` Guenter Roeck
2024-03-29  7:18 ` [PATCH v4 14/15] selftests/fpu: Move FP code to a separate translation unit Samuel Holland
2024-03-29  7:18 ` [PATCH v4 15/15] selftests/fpu: Allow building on other architectures Samuel Holland
2024-04-03 12:51 ` [PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).