linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] nds32 FPU port
@ 2018-11-01  7:16 Vincent Chen
  2018-11-01  7:16 ` [PATCH v3 1/5] nds32: " Vincent Chen
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Vincent Chen @ 2018-11-01  7:16 UTC (permalink / raw)
  To: arnd, linux-kernel; +Cc: green.hu, deanbo422, vincentc

  This patch set contains basic components for supporting the nds32 FPU,
such as exception handlers and context switch for FPU registers. By
default, the lazy FPU scheme is supported and the user can configure it
via CONFIG_LZAY_FPU. In addition, a floating point emulator is required
to handle all arithmetic of denormalized number because it is not supported
by the nds32 FPU.

  As mentioned above, the nds32 FPU does not support denormalized number
This means the denormalized operands and results are not permitted. If an
instruction contains denormalized operands, the nds32 FPU will raise an
denormalized input exception to inform kernel to deal with this
instruction. If the result of the instruction is a denormalized number,
normally nds32 FPU will treat it as an underflow case and round the result
to an appropriate value based on current rounding mode. Obviously, there is
a precision gap for tininess number. To reduce this precision gap, kernel
will enable the underflow trap by default to direct all underflow cases to
the floating pointer emulator. By the floating pointer emulator, the
correct denormalized number can be derived in kernel and return to the user
program. The feature can be configured by
CONFIG_SUPPORT_DENORMAL_ARITHMETIC, and if the precision requirement is not
critical for tininess number, user may disables this feature to keep
performance.

  The implementation of floating point emulator is based on soft-fp
which is located in include/math-emu folder. However, soft-fp is too
outdated to pass the current compiler check. The needed modifications
for soft-fp are included in this patch set

Changes in v3:
 - Kernel with FPU support enabled still can run on a CPU without FPU
 - Rename CONFIG_UNLAZY_FPU to CONFIG_LAYZ_FPU
 - Rename _switch() to _switch_fpu()
 - Store FPU context when kernel suspends
 - Modify the comments in code and patch

Changes in v2:
 - Remove the initilzation for floating pointer register before entering to
   signal handler.

Vincent Chen (5):
  nds32: nds32 FPU port
  nds32: Support FP emulation
  nds32: support denormalized result through FP emulator
  math-emu/op-2.h: Use statement expressions to prevent negative
    constant shift
  math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning

 arch/nds32/Kconfig                       |    1 +
 arch/nds32/Kconfig.cpu                   |   34 +++
 arch/nds32/Makefile                      |   11 +
 arch/nds32/include/asm/bitfield.h        |   15 ++
 arch/nds32/include/asm/elf.h             |   11 +
 arch/nds32/include/asm/fpu.h             |  126 +++++++++++
 arch/nds32/include/asm/fpuemu.h          |   32 +++
 arch/nds32/include/asm/nds32_fpu_inst.h  |  109 +++++++++
 arch/nds32/include/asm/processor.h       |    7 +
 arch/nds32/include/asm/sfp-machine.h     |  158 +++++++++++++
 arch/nds32/include/asm/syscalls.h        |    1 +
 arch/nds32/include/uapi/asm/auxvec.h     |    7 +
 arch/nds32/include/uapi/asm/sigcontext.h |   14 ++
 arch/nds32/include/uapi/asm/udftrap.h    |   13 +
 arch/nds32/include/uapi/asm/unistd.h     |    2 +
 arch/nds32/kernel/Makefile               |   10 +
 arch/nds32/kernel/ex-entry.S             |   24 ++-
 arch/nds32/kernel/ex-exit.S              |   13 +-
 arch/nds32/kernel/ex-scall.S             |    8 +-
 arch/nds32/kernel/fpu.c                  |  269 ++++++++++++++++++++++
 arch/nds32/kernel/process.c              |   64 +++++-
 arch/nds32/kernel/setup.c                |   12 +-
 arch/nds32/kernel/signal.c               |   62 +++++-
 arch/nds32/kernel/sleep.S                |    2 +
 arch/nds32/kernel/sys_nds32.c            |   32 +++
 arch/nds32/kernel/traps.c                |   16 ++
 arch/nds32/math-emu/Makefile             |    7 +
 arch/nds32/math-emu/faddd.c              |   24 ++
 arch/nds32/math-emu/fadds.c              |   24 ++
 arch/nds32/math-emu/fcmpd.c              |   24 ++
 arch/nds32/math-emu/fcmps.c              |   24 ++
 arch/nds32/math-emu/fd2s.c               |   22 ++
 arch/nds32/math-emu/fdivd.c              |   27 +++
 arch/nds32/math-emu/fdivs.c              |   26 +++
 arch/nds32/math-emu/fmuld.c              |   23 ++
 arch/nds32/math-emu/fmuls.c              |   23 ++
 arch/nds32/math-emu/fnegd.c              |   21 ++
 arch/nds32/math-emu/fnegs.c              |   21 ++
 arch/nds32/math-emu/fpuemu.c             |  357 ++++++++++++++++++++++++++++++
 arch/nds32/math-emu/fs2d.c               |   23 ++
 arch/nds32/math-emu/fsqrtd.c             |   21 ++
 arch/nds32/math-emu/fsqrts.c             |   21 ++
 arch/nds32/math-emu/fsubd.c              |   27 +++
 arch/nds32/math-emu/fsubs.c              |   27 +++
 include/math-emu/op-2.h                  |   97 ++++-----
 include/math-emu/soft-fp.h               |    2 +-
 46 files changed, 1827 insertions(+), 67 deletions(-)
 create mode 100644 arch/nds32/include/asm/fpu.h
 create mode 100644 arch/nds32/include/asm/fpuemu.h
 create mode 100644 arch/nds32/include/asm/nds32_fpu_inst.h
 create mode 100644 arch/nds32/include/asm/sfp-machine.h
 create mode 100644 arch/nds32/include/uapi/asm/udftrap.h
 create mode 100644 arch/nds32/kernel/fpu.c
 create mode 100644 arch/nds32/math-emu/Makefile
 create mode 100644 arch/nds32/math-emu/faddd.c
 create mode 100644 arch/nds32/math-emu/fadds.c
 create mode 100644 arch/nds32/math-emu/fcmpd.c
 create mode 100644 arch/nds32/math-emu/fcmps.c
 create mode 100644 arch/nds32/math-emu/fd2s.c
 create mode 100644 arch/nds32/math-emu/fdivd.c
 create mode 100644 arch/nds32/math-emu/fdivs.c
 create mode 100644 arch/nds32/math-emu/fmuld.c
 create mode 100644 arch/nds32/math-emu/fmuls.c
 create mode 100644 arch/nds32/math-emu/fnegd.c
 create mode 100644 arch/nds32/math-emu/fnegs.c
 create mode 100644 arch/nds32/math-emu/fpuemu.c
 create mode 100644 arch/nds32/math-emu/fs2d.c
 create mode 100644 arch/nds32/math-emu/fsqrtd.c
 create mode 100644 arch/nds32/math-emu/fsqrts.c
 create mode 100644 arch/nds32/math-emu/fsubd.c
 create mode 100644 arch/nds32/math-emu/fsubs.c


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/5] nds32: nds32 FPU port
  2018-11-01  7:16 [PATCH v3 0/5] nds32 FPU port Vincent Chen
@ 2018-11-01  7:16 ` Vincent Chen
  2018-11-01  7:16 ` [PATCH v3 2/5] nds32: Support FP emulation Vincent Chen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Vincent Chen @ 2018-11-01  7:16 UTC (permalink / raw)
  To: arnd, linux-kernel; +Cc: green.hu, deanbo422, vincentc

This patch set contains basic components for supporting the nds32 FPU,
such as exception handlers and context switch for FPU registers. By
default, the lazy FPU scheme is supported and the user can configure it via
CONFIG_LZAY_FPU.

Signed-off-by: Vincent Chen <vincentc@andestech.com>
---
 arch/nds32/Kconfig                       |    1 +
 arch/nds32/Kconfig.cpu                   |   21 +++
 arch/nds32/Makefile                      |   10 ++
 arch/nds32/include/asm/bitfield.h        |   15 ++
 arch/nds32/include/asm/fpu.h             |  114 +++++++++++++++
 arch/nds32/include/asm/processor.h       |    7 +
 arch/nds32/include/uapi/asm/sigcontext.h |    5 +
 arch/nds32/kernel/Makefile               |   10 ++
 arch/nds32/kernel/ex-entry.S             |   24 +++-
 arch/nds32/kernel/ex-exit.S              |   13 ++-
 arch/nds32/kernel/ex-scall.S             |    8 +-
 arch/nds32/kernel/fpu.c                  |  231 ++++++++++++++++++++++++++++++
 arch/nds32/kernel/process.c              |   64 ++++++++-
 arch/nds32/kernel/setup.c                |   12 ++-
 arch/nds32/kernel/signal.c               |   62 ++++++++-
 arch/nds32/kernel/sleep.S                |    2 +
 arch/nds32/kernel/traps.c                |   16 ++
 17 files changed, 600 insertions(+), 15 deletions(-)
 create mode 100644 arch/nds32/include/asm/fpu.h
 create mode 100644 arch/nds32/kernel/fpu.c

diff --git a/arch/nds32/Kconfig b/arch/nds32/Kconfig
index 8e2c5ac..6513791 100644
--- a/arch/nds32/Kconfig
+++ b/arch/nds32/Kconfig
@@ -29,6 +29,7 @@ config NDS32
 	select HANDLE_DOMAIN_IRQ
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_DEBUG_KMEMLEAK
+	select HAVE_EXIT_THREAD
 	select HAVE_MEMBLOCK
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_PERF_EVENTS
diff --git a/arch/nds32/Kconfig.cpu b/arch/nds32/Kconfig.cpu
index b8eecd0..593d0c2 100644
--- a/arch/nds32/Kconfig.cpu
+++ b/arch/nds32/Kconfig.cpu
@@ -7,6 +7,27 @@ config CPU_LITTLE_ENDIAN
 	bool "Little endian"
 	default y
 
+config FPU
+	bool "FPU support"
+	default n
+	help
+	  If FPU ISA is used in user space, this configuration shall be Y to
+          enable required support in kerenl such as fpu context switch and
+          fpu exception handler.
+
+	  If no FPU ISA is used in user space, say N.
+
+config LAZY_FPU
+	bool "lazy FPU support"
+	depends on FPU
+	default y
+	help
+	  Say Y here to enable the lazy FPU scheme. The lazy FPU scheme can
+          enhance system performance by reducing the context switch
+	  frequency of the FPU register.
+
+	  For nomal case, say Y.
+
 config HWZOL
 	bool "hardware zero overhead loop support"
 	depends on CPU_D10 || CPU_D15
diff --git a/arch/nds32/Makefile b/arch/nds32/Makefile
index 3509fac..599ee62 100644
--- a/arch/nds32/Makefile
+++ b/arch/nds32/Makefile
@@ -5,9 +5,19 @@ KBUILD_DEFCONFIG := defconfig
 
 comma = ,
 
+
 ifdef CONFIG_FUNCTION_TRACER
 arch-y += -malways-save-lp -mno-relax
 endif
+ifdef CONFIG_FPU
+arch-y  += \
+        $(shell $(CC) -E -dM -xc /dev/null | \
+                grep -o -m1 NDS32_EXT_FPU_SP | \
+                sed -e 's/NDS32_EXT_FPU_SP/-mno-ext-fpu-sp -mfloat-abi=soft/') \
+        $(shell $(CC) -E -dM -xc /dev/null | \
+                grep -o -m1 NDS32_EXT_FPU_DP | \
+                sed -e 's/NDS32_EXT_FPU_DP/-mno-ext-fpu-dp -mfloat-abi=soft/')
+endif
 
 KBUILD_CFLAGS	+= $(call cc-option, -mno-sched-prolog-epilog)
 KBUILD_CFLAGS	+= -mcmodel=large
diff --git a/arch/nds32/include/asm/bitfield.h b/arch/nds32/include/asm/bitfield.h
index 19b2841..c161973 100644
--- a/arch/nds32/include/asm/bitfield.h
+++ b/arch/nds32/include/asm/bitfield.h
@@ -251,6 +251,11 @@
 #define ITYPE_mskSTYPE		( 0xF  << ITYPE_offSTYPE )
 #define ITYPE_mskCPID		( 0x3  << ITYPE_offCPID )
 
+/* Additional definitions of ITYPE register for FPU */
+#define FPU_DISABLE_EXCEPTION	(0x1  << ITYPE_offSTYPE)
+#define FPU_EXCEPTION		(0x2  << ITYPE_offSTYPE)
+#define FPU_CPID		0	/* FPU Co-Processor ID is 0 */
+
 #define NDS32_VECTOR_mskNONEXCEPTION	0x78
 #define NDS32_VECTOR_offEXCEPTION	8
 #define NDS32_VECTOR_offINTERRUPT	9
@@ -926,6 +931,7 @@
 #define FPCSR_mskDNIT           ( 0x1  << FPCSR_offDNIT )
 #define FPCSR_mskRIT		( 0x1  << FPCSR_offRIT )
 #define FPCSR_mskALL		(FPCSR_mskIVO | FPCSR_mskDBZ | FPCSR_mskOVF | FPCSR_mskUDF | FPCSR_mskIEX)
+#define FPCSR_mskALLE_NO_UDFE	(FPCSR_mskIVOE | FPCSR_mskDBZE | FPCSR_mskOVFE | FPCSR_mskIEXE)
 #define FPCSR_mskALLE		(FPCSR_mskIVOE | FPCSR_mskDBZE | FPCSR_mskOVFE | FPCSR_mskUDFE | FPCSR_mskIEXE)
 #define FPCSR_mskALLT           (FPCSR_mskIVOT | FPCSR_mskDBZT | FPCSR_mskOVFT | FPCSR_mskUDFT | FPCSR_mskIEXT |FPCSR_mskDNIT | FPCSR_mskRIT)
 
@@ -946,6 +952,15 @@
 #define FPCFG_mskIMVER		( 0x1F  << FPCFG_offIMVER )
 #define FPCFG_mskAVER		( 0x1F  << FPCFG_offAVER )
 
+/* 8 Single precision or 4 double precision registers are available */
+#define SP8_DP4_reg		0
+/* 16 Single precision or 8 double precision registers are available */
+#define SP16_DP8_reg		1
+/* 32 Single precision or 16 double precision registers are available */
+#define SP32_DP16_reg		2
+/* 32 Single precision or 32 double precision registers are available */
+#define SP32_DP32_reg		3
+
 /******************************************************************************
  * fucpr: FUCOP_CTL (FPU and Coprocessor Enable Control Register)
  *****************************************************************************/
diff --git a/arch/nds32/include/asm/fpu.h b/arch/nds32/include/asm/fpu.h
new file mode 100644
index 0000000..f7a7f6b
--- /dev/null
+++ b/arch/nds32/include/asm/fpu.h
@@ -0,0 +1,114 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2005-2018 Andes Technology Corporation */
+
+#ifndef __ASM_NDS32_FPU_H
+#define __ASM_NDS32_FPU_H
+
+#if IS_ENABLED(CONFIG_FPU)
+#ifndef __ASSEMBLY__
+#include <linux/sched/task_stack.h>
+#include <linux/preempt.h>
+#include <asm/ptrace.h>
+
+extern bool has_fpu;
+
+extern void save_fpu(struct task_struct *__tsk);
+extern void load_fpu(const struct fpu_struct *fpregs);
+extern bool do_fpu_exception(unsigned int subtype, struct pt_regs *regs);
+
+#define test_tsk_fpu(regs)	(regs->fucop_ctl & FUCOP_CTL_mskCP0EN)
+
+/*
+ * Initially load the FPU with signalling NANS.  This bit pattern
+ * has the property that no matter whether considered as single or as
+ * double precision, it still represents a signalling NAN.
+ */
+
+#define sNAN64    0xFFFFFFFFFFFFFFFFULL
+#define sNAN32    0xFFFFFFFFUL
+
+#define FPCSR_INIT  0x0UL
+
+extern const struct fpu_struct init_fpuregs;
+
+static inline void disable_ptreg_fpu(struct pt_regs *regs)
+{
+	regs->fucop_ctl &= ~FUCOP_CTL_mskCP0EN;
+}
+
+static inline void enable_ptreg_fpu(struct pt_regs *regs)
+{
+	regs->fucop_ctl |= FUCOP_CTL_mskCP0EN;
+}
+
+static inline void enable_fpu(void)
+{
+	unsigned long fucop_ctl;
+
+	fucop_ctl = __nds32__mfsr(NDS32_SR_FUCOP_CTL) | FUCOP_CTL_mskCP0EN;
+	__nds32__mtsr(fucop_ctl, NDS32_SR_FUCOP_CTL);
+	__nds32__isb();
+}
+
+static inline void disable_fpu(void)
+{
+	unsigned long fucop_ctl;
+
+	fucop_ctl = __nds32__mfsr(NDS32_SR_FUCOP_CTL) & ~FUCOP_CTL_mskCP0EN;
+	__nds32__mtsr(fucop_ctl, NDS32_SR_FUCOP_CTL);
+	__nds32__isb();
+}
+
+static inline void lose_fpu(void)
+{
+	preempt_disable();
+#if IS_ENABLED(CONFIG_LAZY_FPU)
+	if (last_task_used_math == current) {
+		last_task_used_math = NULL;
+#else
+	if (test_tsk_fpu(task_pt_regs(current))) {
+#endif
+		save_fpu(current);
+	}
+	disable_ptreg_fpu(task_pt_regs(current));
+	preempt_enable();
+}
+
+static inline void own_fpu(void)
+{
+	preempt_disable();
+#if IS_ENABLED(CONFIG_LAZY_FPU)
+	if (last_task_used_math != current) {
+		if (last_task_used_math != NULL)
+			save_fpu(last_task_used_math);
+		load_fpu(&current->thread.fpu);
+		last_task_used_math = current;
+	}
+#else
+	if (!test_tsk_fpu(task_pt_regs(current))) {
+		load_fpu(&current->thread.fpu);
+	}
+#endif
+	enable_ptreg_fpu(task_pt_regs(current));
+	preempt_enable();
+}
+
+#if !IS_ENABLED(CONFIG_LAZY_FPU)
+static inline void unlazy_fpu(struct task_struct *tsk)
+{
+	preempt_disable();
+	if (test_tsk_fpu(task_pt_regs(tsk)))
+		save_fpu(tsk);
+	preempt_enable();
+}
+#endif /* !CONFIG_LAZY_FPU */
+static inline void clear_fpu(struct pt_regs *regs)
+{
+	preempt_disable();
+	if (test_tsk_fpu(regs))
+		disable_ptreg_fpu(regs);
+	preempt_enable();
+}
+#endif /* CONFIG_FPU */
+#endif /* __ASSEMBLY__ */
+#endif /* __ASM_NDS32_FPU_H */
diff --git a/arch/nds32/include/asm/processor.h b/arch/nds32/include/asm/processor.h
index 9c83caf..bf935d8 100644
--- a/arch/nds32/include/asm/processor.h
+++ b/arch/nds32/include/asm/processor.h
@@ -41,6 +41,8 @@ struct thread_struct {
 	unsigned long address;
 	unsigned long trap_no;
 	unsigned long error_code;
+
+	struct fpu_struct fpu;
 };
 
 #define INIT_THREAD  {	}
@@ -78,6 +80,11 @@ struct thread_struct {
 
 /* Free all resources held by a thread. */
 #define release_thread(thread) do { } while(0)
+#if IS_ENABLED(CONFIG_FPU)
+#if !IS_ENABLED(CONFIG_UNLAZU_FPU)
+extern struct task_struct *last_task_used_math;
+#endif
+#endif
 
 /* Prepare to copy thread state - unlazy all lazy status */
 #define prepare_to_copy(tsk)	do { } while (0)
diff --git a/arch/nds32/include/uapi/asm/sigcontext.h b/arch/nds32/include/uapi/asm/sigcontext.h
index 00567b2..1257a78 100644
--- a/arch/nds32/include/uapi/asm/sigcontext.h
+++ b/arch/nds32/include/uapi/asm/sigcontext.h
@@ -9,6 +9,10 @@
  * before the signal handler was invoked.  Note: only add new entries
  * to the end of the structure.
  */
+struct fpu_struct {
+	unsigned long long fd_regs[32];
+	unsigned long fpcsr;
+};
 
 struct zol_struct {
 	unsigned long nds32_lc;	/* $LC */
@@ -54,6 +58,7 @@ struct sigcontext {
 	unsigned long fault_address;
 	unsigned long used_math_flag;
 	/* FPU Registers */
+	struct fpu_struct fpu;
 	struct zol_struct zol;
 };
 
diff --git a/arch/nds32/kernel/Makefile b/arch/nds32/kernel/Makefile
index 8d62f2e..0679332 100644
--- a/arch/nds32/kernel/Makefile
+++ b/arch/nds32/kernel/Makefile
@@ -13,12 +13,22 @@ obj-y			:= ex-entry.o ex-exit.o ex-scall.o irq.o \
 
 obj-$(CONFIG_MODULES)		+= nds32_ksyms.o module.o
 obj-$(CONFIG_STACKTRACE)	+= stacktrace.o
+obj-$(CONFIG_FPU)		+= fpu.o
 obj-$(CONFIG_OF)		+= devtree.o
 obj-$(CONFIG_CACHE_L2)		+= atl2c.o
 obj-$(CONFIG_PERF_EVENTS) += perf_event_cpu.o
 obj-$(CONFIG_PM)		+= pm.o sleep.o
 extra-y := head.o vmlinux.lds
 
+CFLAGS_fpu.o += \
+        $(shell $(CC) -E -dM -xc /dev/null | \
+	grep -o -m1 NDS32_EXT_FPU_SP | \
+	sed -e 's/NDS32_EXT_FPU_SP/-mext-fpu-sp/') \
+        $(shell $(CC) -E -dM -xc /dev/null | \
+	grep -o -m1 NDS32_EXT_FPU_DP | \
+	sed -e 's/NDS32_EXT_FPU_DP/-mext-fpu-dp/')
+
+
 obj-y				+= vdso/
 
 obj-$(CONFIG_FUNCTION_TRACER)   += ftrace.o
diff --git a/arch/nds32/kernel/ex-entry.S b/arch/nds32/kernel/ex-entry.S
index 21a1440..107d98a 100644
--- a/arch/nds32/kernel/ex-entry.S
+++ b/arch/nds32/kernel/ex-entry.S
@@ -7,6 +7,7 @@
 #include <asm/errno.h>
 #include <asm/asm-offsets.h>
 #include <asm/page.h>
+#include <asm/fpu.h>
 
 #ifdef CONFIG_HWZOL
 	.macro push_zol
@@ -15,12 +16,31 @@
 	mfusr	$r16, $LC
 	.endm
 #endif
+	.macro  skip_save_fucop_ctl
+#if defined(CONFIG_FPU)
+skip_fucop_ctl:
+	smw.adm $p0, [$sp], $p0, #0x1
+	j fucop_ctl_done
+#endif
+	.endm
 
 	.macro	save_user_regs
-
+#if defined(CONFIG_FPU)
+	sethi   $p0, hi20(has_fpu)
+	lbsi 	$p0, [$p0+lo12(has_fpu)]
+	beqz	$p0, skip_fucop_ctl
+	mfsr    $p0, $FUCOP_CTL
+	smw.adm $p0, [$sp], $p0, #0x1
+	bclr    $p0, $p0, #FUCOP_CTL_offCP0EN
+	mtsr    $p0, $FUCOP_CTL
+fucop_ctl_done:
+	/* move $SP to the bottom of pt_regs */
+	addi    $sp, $sp, -FUCOP_CTL_OFFSET
+#else
 	smw.adm $sp, [$sp], $sp, #0x1
 	/* move $SP to the bottom of pt_regs */
 	addi    $sp, $sp, -OSP_OFFSET
+#endif
 
 	/* push $r0 ~ $r25 */
 	smw.bim $r0, [$sp], $r25
@@ -79,6 +99,7 @@ exception_handlers:
 	.long	eh_syscall		!Syscall
 	.long	asm_do_IRQ		!IRQ
 
+	skip_save_fucop_ctl
 common_exception_handler:
 	save_user_regs
 	mfsr	$p0, $ITYPE
@@ -103,7 +124,6 @@ common_exception_handler:
 	mtsr	$r21, $PSW
 	dsb
 	jr	$p1
-
 	/* syscall */
 1:
 	addi	$p1, $p0, #-NDS32_VECTOR_offEXCEPTION
diff --git a/arch/nds32/kernel/ex-exit.S b/arch/nds32/kernel/ex-exit.S
index f00af92..97ba15c 100644
--- a/arch/nds32/kernel/ex-exit.S
+++ b/arch/nds32/kernel/ex-exit.S
@@ -8,6 +8,7 @@
 #include <asm/asm-offsets.h>
 #include <asm/thread_info.h>
 #include <asm/current.h>
+#include <asm/fpu.h>
 
 
 
@@ -22,10 +23,18 @@
 	.macro	restore_user_regs_first
 	setgie.d
 	isb
-
+#if defined(CONFIG_FPU)
+	addi    $sp, $sp, OSP_OFFSET
+	lmw.adm $r12, [$sp], $r25, #0x0
+	sethi   $p0, hi20(has_fpu)
+	lbsi 	$p0, [$p0+lo12(has_fpu)]
+	beqz	$p0, 2f
+	mtsr    $r25, $FUCOP_CTL
+2:
+#else
 	addi	$sp, $sp, FUCOP_CTL_OFFSET
-
 	lmw.adm $r12, [$sp], $r24, #0x0
+#endif
 	mtsr	$r12, $SP_USR
 	mtsr	$r13, $IPC
 #ifdef CONFIG_HWZOL
diff --git a/arch/nds32/kernel/ex-scall.S b/arch/nds32/kernel/ex-scall.S
index 36aa87e..270050f 100644
--- a/arch/nds32/kernel/ex-scall.S
+++ b/arch/nds32/kernel/ex-scall.S
@@ -19,11 +19,13 @@ ENTRY(__switch_to)
 
 	la	$p0, __entry_task
 	sw	$r1, [$p0]
-	move	$p1, $r0
-	addi	$p1, $p1, #THREAD_CPU_CONTEXT
+	addi	$p1, $r0, #THREAD_CPU_CONTEXT
 	smw.bi 	$r6, [$p1], $r14, #0xb		! push r6~r14, fp, lp, sp
 	move	$r25, $r1
-	addi	$r1, $r1, #THREAD_CPU_CONTEXT
+#if defined(CONFIG_FPU)
+	call	_switch_fpu
+#endif
+	addi	$r1, $r25, #THREAD_CPU_CONTEXT
 	lmw.bi 	$r6, [$r1], $r14, #0xb		! pop r6~r14, fp, lp, sp
 	ret
 
diff --git a/arch/nds32/kernel/fpu.c b/arch/nds32/kernel/fpu.c
new file mode 100644
index 0000000..e55a1e1
--- /dev/null
+++ b/arch/nds32/kernel/fpu.c
@@ -0,0 +1,231 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+
+#include <linux/sched.h>
+#include <linux/signal.h>
+#include <linux/sched/signal.h>
+#include <asm/processor.h>
+#include <asm/user.h>
+#include <asm/io.h>
+#include <asm/bitfield.h>
+#include <asm/fpu.h>
+
+const struct fpu_struct init_fpuregs = {
+	.fd_regs = {[0 ... 31] = sNAN64},
+	.fpcsr = FPCSR_INIT
+};
+
+void save_fpu(struct task_struct *tsk)
+{
+	unsigned int fpcfg, fpcsr;
+
+	enable_fpu();
+	fpcfg = ((__nds32__fmfcfg() & FPCFG_mskFREG) >> FPCFG_offFREG);
+	switch (fpcfg) {
+	case SP32_DP32_reg:
+		asm volatile ("fsdi $fd31, [%0+0xf8]\n\t"
+			      "fsdi $fd30, [%0+0xf0]\n\t"
+			      "fsdi $fd29, [%0+0xe8]\n\t"
+			      "fsdi $fd28, [%0+0xe0]\n\t"
+			      "fsdi $fd27, [%0+0xd8]\n\t"
+			      "fsdi $fd26, [%0+0xd0]\n\t"
+			      "fsdi $fd25, [%0+0xc8]\n\t"
+			      "fsdi $fd24, [%0+0xc0]\n\t"
+			      "fsdi $fd23, [%0+0xb8]\n\t"
+			      "fsdi $fd22, [%0+0xb0]\n\t"
+			      "fsdi $fd21, [%0+0xa8]\n\t"
+			      "fsdi $fd20, [%0+0xa0]\n\t"
+			      "fsdi $fd19, [%0+0x98]\n\t"
+			      "fsdi $fd18, [%0+0x90]\n\t"
+			      "fsdi $fd17, [%0+0x88]\n\t"
+			      "fsdi $fd16, [%0+0x80]\n\t"
+			      :	/* no output */
+			      : "r" (&tsk->thread.fpu)
+			      : "memory");
+		/* fall through */
+	case SP32_DP16_reg:
+		asm volatile ("fsdi $fd15, [%0+0x78]\n\t"
+			      "fsdi $fd14, [%0+0x70]\n\t"
+			      "fsdi $fd13, [%0+0x68]\n\t"
+			      "fsdi $fd12, [%0+0x60]\n\t"
+			      "fsdi $fd11, [%0+0x58]\n\t"
+			      "fsdi $fd10, [%0+0x50]\n\t"
+			      "fsdi $fd9,  [%0+0x48]\n\t"
+			      "fsdi $fd8,  [%0+0x40]\n\t"
+			      :	/* no output */
+			      : "r" (&tsk->thread.fpu)
+			      : "memory");
+		/* fall through */
+	case SP16_DP8_reg:
+		asm volatile ("fsdi $fd7,  [%0+0x38]\n\t"
+			      "fsdi $fd6,  [%0+0x30]\n\t"
+			      "fsdi $fd5,  [%0+0x28]\n\t"
+			      "fsdi $fd4,  [%0+0x20]\n\t"
+			      :	/* no output */
+			      : "r" (&tsk->thread.fpu)
+			      : "memory");
+		/* fall through */
+	case SP8_DP4_reg:
+		asm volatile ("fsdi $fd3,  [%1+0x18]\n\t"
+			      "fsdi $fd2,  [%1+0x10]\n\t"
+			      "fsdi $fd1,  [%1+0x8]\n\t"
+			      "fsdi $fd0,  [%1+0x0]\n\t"
+			      "fmfcsr	%0\n\t"
+			      "swi  %0, [%1+0x100]\n\t"
+			      : "=&r" (fpcsr)
+			      : "r"(&tsk->thread.fpu)
+			      : "memory");
+	}
+	disable_fpu();
+}
+
+void load_fpu(const struct fpu_struct *fpregs)
+{
+	unsigned int fpcfg, fpcsr;
+
+	enable_fpu();
+	fpcfg = ((__nds32__fmfcfg() & FPCFG_mskFREG) >> FPCFG_offFREG);
+	switch (fpcfg) {
+	case SP32_DP32_reg:
+		asm volatile ("fldi $fd31, [%0+0xf8]\n\t"
+			      "fldi $fd30, [%0+0xf0]\n\t"
+			      "fldi $fd29, [%0+0xe8]\n\t"
+			      "fldi $fd28, [%0+0xe0]\n\t"
+			      "fldi $fd27, [%0+0xd8]\n\t"
+			      "fldi $fd26, [%0+0xd0]\n\t"
+			      "fldi $fd25, [%0+0xc8]\n\t"
+			      "fldi $fd24, [%0+0xc0]\n\t"
+			      "fldi $fd23, [%0+0xb8]\n\t"
+			      "fldi $fd22, [%0+0xb0]\n\t"
+			      "fldi $fd21, [%0+0xa8]\n\t"
+			      "fldi $fd20, [%0+0xa0]\n\t"
+			      "fldi $fd19, [%0+0x98]\n\t"
+			      "fldi $fd18, [%0+0x90]\n\t"
+			      "fldi $fd17, [%0+0x88]\n\t"
+			      "fldi $fd16, [%0+0x80]\n\t"
+			      :	/* no output */
+			      : "r" (fpregs));
+		/* fall through */
+	case SP32_DP16_reg:
+		asm volatile ("fldi $fd15, [%0+0x78]\n\t"
+			      "fldi $fd14, [%0+0x70]\n\t"
+			      "fldi $fd13, [%0+0x68]\n\t"
+			      "fldi $fd12, [%0+0x60]\n\t"
+			      "fldi $fd11, [%0+0x58]\n\t"
+			      "fldi $fd10, [%0+0x50]\n\t"
+			      "fldi $fd9,  [%0+0x48]\n\t"
+			      "fldi $fd8,  [%0+0x40]\n\t"
+			      :	/* no output */
+			      : "r" (fpregs));
+		/* fall through */
+	case SP16_DP8_reg:
+		asm volatile ("fldi $fd7,  [%0+0x38]\n\t"
+			      "fldi $fd6,  [%0+0x30]\n\t"
+			      "fldi $fd5,  [%0+0x28]\n\t"
+			      "fldi $fd4,  [%0+0x20]\n\t"
+			      :	/* no output */
+			      : "r" (fpregs));
+		/* fall through */
+	case SP8_DP4_reg:
+		asm volatile ("fldi $fd3,  [%1+0x18]\n\t"
+			      "fldi $fd2,  [%1+0x10]\n\t"
+			      "fldi $fd1,  [%1+0x8]\n\t"
+			      "fldi $fd0,  [%1+0x0]\n\t"
+			      "lwi  %0, [%1+0x100]\n\t"
+			      "fmtcsr	%0\n\t":"=&r" (fpcsr)
+			      : "r"(fpregs));
+	}
+	disable_fpu();
+}
+void store_fpu_for_suspend(void)
+{
+#ifdef CONFIG_LAZY_FPU
+	if (last_task_used_math != NULL)
+		save_fpu(last_task_used_math);
+	last_task_used_math = NULL;
+#else
+	if (!used_math())
+		return;
+	unlazy_fpu(current);
+#endif
+	clear_fpu(task_pt_regs(current));
+}
+inline void do_fpu_context_switch(struct pt_regs *regs)
+{
+	/* Enable to use FPU. */
+
+	if (!user_mode(regs)) {
+		pr_err("BUG: FPU is used in kernel mode.\n");
+		BUG();
+		return;
+	}
+
+	enable_ptreg_fpu(regs);
+#ifdef CONFIG_LAZY_FPU	//Lazy FPU is used
+	if (last_task_used_math == current)
+		return;
+	if (last_task_used_math != NULL)
+		/* Other processes fpu state, save away */
+		save_fpu(last_task_used_math);
+	last_task_used_math = current;
+#endif
+	if (used_math()) {
+		load_fpu(&current->thread.fpu);
+	} else {
+		/* First time FPU user.  */
+		load_fpu(&init_fpuregs);
+		set_used_math();
+	}
+
+}
+
+inline void fill_sigfpe_signo(unsigned int fpcsr, int *signo)
+{
+	if (fpcsr & FPCSR_mskOVFT)
+		*signo = FPE_FLTOVF;
+	else if (fpcsr & FPCSR_mskUDFT)
+		*signo = FPE_FLTUND;
+	else if (fpcsr & FPCSR_mskIVOT)
+		*signo = FPE_FLTINV;
+	else if (fpcsr & FPCSR_mskDBZT)
+		*signo = FPE_FLTDIV;
+	else if (fpcsr & FPCSR_mskIEXT)
+		*signo = FPE_FLTRES;
+}
+
+inline void handle_fpu_exception(struct pt_regs *regs)
+{
+	unsigned int fpcsr;
+	int si_code = 0, si_signo = SIGFPE;
+
+	lose_fpu();
+	fpcsr = current->thread.fpu.fpcsr;
+
+	if (fpcsr & FPCSR_mskRIT) {
+		if (!user_mode(regs))
+			do_exit(SIGILL);
+		si_signo = SIGILL;
+		show_regs(regs);
+		si_code = ILL_COPROC;
+	} else
+		fill_sigfpe_signo(fpcsr, &si_code);
+	force_sig_fault(si_signo, si_code,
+			(void __user *)instruction_pointer(regs), current);
+}
+
+bool do_fpu_exception(unsigned int subtype, struct pt_regs *regs)
+{
+	int done = true;
+	/* Coprocessor disabled exception */
+	if (subtype == FPU_DISABLE_EXCEPTION) {
+		preempt_disable();
+		do_fpu_context_switch(regs);
+		preempt_enable();
+	}
+	/* Coprocessor exception such as underflow and overflow */
+	else if (subtype == FPU_EXCEPTION)
+		handle_fpu_exception(regs);
+	else
+		done = false;
+	return done;
+}
diff --git a/arch/nds32/kernel/process.c b/arch/nds32/kernel/process.c
index 65fda98..ab7ab46 100644
--- a/arch/nds32/kernel/process.c
+++ b/arch/nds32/kernel/process.c
@@ -9,15 +9,16 @@
 #include <linux/uaccess.h>
 #include <asm/elf.h>
 #include <asm/proc-fns.h>
+#include <asm/fpu.h>
 #include <linux/ptrace.h>
 #include <linux/reboot.h>
 
-extern void setup_mm_for_reboot(char mode);
-#ifdef CONFIG_PROC_FS
-struct proc_dir_entry *proc_dir_cpu;
-EXPORT_SYMBOL(proc_dir_cpu);
+#if IS_ENABLED(CONFIG_LAZY_FPU)
+struct task_struct *last_task_used_math;
 #endif
 
+extern void setup_mm_for_reboot(char mode);
+
 extern inline void arch_reset(char mode)
 {
 	if (mode == 's') {
@@ -125,15 +126,31 @@ void show_regs(struct pt_regs *regs)
 
 EXPORT_SYMBOL(show_regs);
 
+void exit_thread(struct task_struct *tsk)
+{
+#if defined(CONFIG_FPU) && defined(CONFIG_LAZY_FPU)
+	if (last_task_used_math == tsk)
+		last_task_used_math = NULL;
+#endif
+}
+
 void flush_thread(void)
 {
+#if defined(CONFIG_FPU)
+	clear_fpu(task_pt_regs(current));
+	clear_used_math();
+# ifdef CONFIG_LAZY_FPU
+	if (last_task_used_math == current)
+		last_task_used_math = NULL;
+# endif
+#endif
 }
 
 DEFINE_PER_CPU(struct task_struct *, __entry_task);
 
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 int copy_thread(unsigned long clone_flags, unsigned long stack_start,
-	    unsigned long stk_sz, struct task_struct *p)
+		unsigned long stk_sz, struct task_struct *p)
 {
 	struct pt_regs *childregs = task_pt_regs(p);
 
@@ -159,6 +176,22 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
 
+#if IS_ENABLED(CONFIG_FPU)
+	if (used_math()) {
+# if !IS_ENABLED(CONFIG_LAZY_FPU)
+		unlazy_fpu(current);
+# else
+		preempt_disable();
+		if (last_task_used_math == current)
+			save_fpu(current);
+		preempt_enable();
+# endif
+		p->thread.fpu = current->thread.fpu;
+		clear_fpu(task_pt_regs(p));
+		set_stopped_child_used_math(p);
+	}
+#endif
+
 #ifdef CONFIG_HWZOL
 	childregs->lb = 0;
 	childregs->le = 0;
@@ -168,12 +201,33 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	return 0;
 }
 
+#if IS_ENABLED(CONFIG_FPU)
+struct task_struct *_switch_fpu(struct task_struct *prev, struct task_struct *next)
+{
+#if !IS_ENABLED(CONFIG_LAZY_FPU)
+	unlazy_fpu(prev);
+#endif
+	if (!(next->flags & PF_KTHREAD))
+		clear_fpu(task_pt_regs(next));
+	return prev;
+}
+#endif
+
 /*
  * fill in the fpe structure for a core dump...
  */
 int dump_fpu(struct pt_regs *regs, elf_fpregset_t * fpu)
 {
 	int fpvalid = 0;
+#if IS_ENABLED(CONFIG_FPU)
+	struct task_struct *tsk = current;
+
+	fpvalid = tsk_used_math(tsk);
+	if (fpvalid) {
+		lose_fpu();
+		memcpy(fpu, &tsk->thread.fpu, sizeof(*fpu));
+	}
+#endif
 	return fpvalid;
 }
 
diff --git a/arch/nds32/kernel/setup.c b/arch/nds32/kernel/setup.c
index 63a1a5e..fb69526 100644
--- a/arch/nds32/kernel/setup.c
+++ b/arch/nds32/kernel/setup.c
@@ -16,6 +16,7 @@
 #include <asm/proc-fns.h>
 #include <asm/cache_info.h>
 #include <asm/elf.h>
+#include <asm/fpu.h>
 #include <nds32_intrinsic.h>
 
 #define HWCAP_MFUSR_PC		0x000001
@@ -41,6 +42,7 @@
 #define HWCAP_DX_REGS		0x100000
 
 unsigned long cpu_id, cpu_rev, cpu_cfgid;
+bool has_fpu = false;
 char cpu_series;
 char *endianness = NULL;
 
@@ -137,6 +139,11 @@ static void __init dump_cpu_info(int cpu)
 		    (aliasing_num - 1) << PAGE_SHIFT;
 	}
 #endif
+#ifdef CONFIG_FPU
+	/* Disable fpu and enable when it is used. */
+	if (has_fpu)
+		disable_fpu();
+#endif
 }
 
 static void __init setup_cpuinfo(void)
@@ -181,9 +188,10 @@ static void __init setup_cpuinfo(void)
 	if (cpu_cfgid & 0x0004)
 		elf_hwcap |= HWCAP_EXT2;
 
-	if (cpu_cfgid & 0x0008)
+	if (cpu_cfgid & 0x0008) {
 		elf_hwcap |= HWCAP_FPU;
-
+		has_fpu = true;
+	}
 	if (cpu_cfgid & 0x0010)
 		elf_hwcap |= HWCAP_STRING;
 
diff --git a/arch/nds32/kernel/signal.c b/arch/nds32/kernel/signal.c
index 5d01f6e..5b5be08 100644
--- a/arch/nds32/kernel/signal.c
+++ b/arch/nds32/kernel/signal.c
@@ -12,6 +12,7 @@
 #include <asm/cacheflush.h>
 #include <asm/ucontext.h>
 #include <asm/unistd.h>
+#include <asm/fpu.h>
 
 #include <asm/ptrace.h>
 #include <asm/vdso.h>
@@ -20,6 +21,60 @@ struct rt_sigframe {
 	struct siginfo info;
 	struct ucontext uc;
 };
+#if IS_ENABLED(CONFIG_FPU)
+static inline int restore_sigcontext_fpu(struct pt_regs *regs,
+					 struct sigcontext __user *sc)
+{
+	struct task_struct *tsk = current;
+	unsigned long used_math_flag;
+	int ret = 0;
+
+	clear_used_math();
+	__get_user_error(used_math_flag, &sc->used_math_flag, ret);
+
+	if (!used_math_flag)
+		return 0;
+	set_used_math();
+
+#if IS_ENABLED(CONFIG_LAZY_FPU)
+	preempt_disable();
+	if (current == last_task_used_math) {
+		last_task_used_math = NULL;
+		disable_ptreg_fpu(regs);
+	}
+	preempt_enable();
+#else
+	clear_fpu(regs);
+#endif
+
+	return __copy_from_user(&tsk->thread.fpu, &sc->fpu,
+				sizeof(struct fpu_struct));
+}
+
+static inline int setup_sigcontext_fpu(struct pt_regs *regs,
+				       struct sigcontext __user *sc)
+{
+	struct task_struct *tsk = current;
+	int ret = 0;
+
+	__put_user_error(used_math(), &sc->used_math_flag, ret);
+
+	if (!used_math())
+		return ret;
+
+	preempt_disable();
+#if IS_ENABLED(CONFIG_LAZY_FPU)
+	if (last_task_used_math == tsk)
+		save_fpu(last_task_used_math);
+#else
+	unlazy_fpu(tsk);
+#endif
+	ret = __copy_to_user(&sc->fpu, &tsk->thread.fpu,
+			     sizeof(struct fpu_struct));
+	preempt_enable();
+	return ret;
+}
+#endif
 
 static int restore_sigframe(struct pt_regs *regs,
 			    struct rt_sigframe __user * sf)
@@ -69,7 +124,9 @@ static int restore_sigframe(struct pt_regs *regs,
 	__get_user_error(regs->le, &sf->uc.uc_mcontext.zol.nds32_le, err);
 	__get_user_error(regs->lb, &sf->uc.uc_mcontext.zol.nds32_lb, err);
 #endif
-
+#if IS_ENABLED(CONFIG_FPU)
+	err |= restore_sigcontext_fpu(regs, &sf->uc.uc_mcontext);
+#endif
 	/*
 	 * Avoid sys_rt_sigreturn() restarting.
 	 */
@@ -153,6 +210,9 @@ asmlinkage long sys_rt_sigreturn(struct pt_regs *regs)
 	__put_user_error(regs->le, &sf->uc.uc_mcontext.zol.nds32_le, err);
 	__put_user_error(regs->lb, &sf->uc.uc_mcontext.zol.nds32_lb, err);
 #endif
+#if IS_ENABLED(CONFIG_FPU)
+	err |= setup_sigcontext_fpu(regs, &sf->uc.uc_mcontext);
+#endif
 
 	__put_user_error(current->thread.trap_no, &sf->uc.uc_mcontext.trap_no,
 			 err);
diff --git a/arch/nds32/kernel/sleep.S b/arch/nds32/kernel/sleep.S
index e417237..40599f7 100644
--- a/arch/nds32/kernel/sleep.S
+++ b/arch/nds32/kernel/sleep.S
@@ -37,6 +37,8 @@ suspend2ram:
 	mfsr    $r18, $ir15
 	pushm   $r0, $r19
 
+	jal	store_fpu_for_suspend
+
 	tlbop	FlushAll
 	isb
 
diff --git a/arch/nds32/kernel/traps.c b/arch/nds32/kernel/traps.c
index dcde7ab..9294bab 100644
--- a/arch/nds32/kernel/traps.c
+++ b/arch/nds32/kernel/traps.c
@@ -12,6 +12,7 @@
 
 #include <asm/proc-fns.h>
 #include <asm/unistd.h>
+#include <asm/fpu.h>
 
 #include <linux/ptrace.h>
 #include <nds32_intrinsic.h>
@@ -359,6 +360,21 @@ void do_dispatch_general(unsigned long entry, unsigned long addr,
 	} else if (type == ETYPE_RESERVED_INSTRUCTION) {
 		/* Reserved instruction */
 		do_revinsn(regs);
+	} else if (type == ETYPE_COPROCESSOR) {
+		/* Coprocessor */
+#if IS_ENABLED(CONFIG_FPU)
+		unsigned int fucop_exist = __nds32__mfsr(NDS32_SR_FUCOP_EXIST);
+		unsigned int cpid = ((itype & ITYPE_mskCPID) >> ITYPE_offCPID);
+
+		if ((cpid == FPU_CPID) &&
+		    (fucop_exist & FUCOP_EXIST_mskCP0ISFPU)) {
+			unsigned int subtype = (itype & ITYPE_mskSTYPE);
+
+			if (true == do_fpu_exception(subtype, regs))
+				return;
+		}
+#endif
+		unhandled_exceptions(entry, addr, type, regs);
 	} else if (type == ETYPE_TRAP && swid == SWID_RAISE_INTERRUPT_LEVEL) {
 		/* trap, used on v3 EDM target debugging workaround */
 		/*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/5] nds32: Support FP emulation
  2018-11-01  7:16 [PATCH v3 0/5] nds32 FPU port Vincent Chen
  2018-11-01  7:16 ` [PATCH v3 1/5] nds32: " Vincent Chen
@ 2018-11-01  7:16 ` Vincent Chen
  2018-11-01  7:16 ` [PATCH v3 3/5] nds32: support denormalized result through FP emulator Vincent Chen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Vincent Chen @ 2018-11-01  7:16 UTC (permalink / raw)
  To: arnd, linux-kernel; +Cc: green.hu, deanbo422, vincentc, Nickhu

The Andes FPU coprocessor does not support denormalized number handling.
According to the specification, FPU generates a denorm input exception
that requires the kernel to deal with this instrution operation when it
encounters denormalized operands. Hence an nds32 FPU ISA emulator in the
kernel is required to meet the requirement.

Signed-off-by: Vincent Chen <vincentc@andestech.com>
Signed-off-by: Nickhu <nickhu@andestech.com>
---
 arch/nds32/Makefile                     |    1 +
 arch/nds32/include/asm/fpu.h            |    1 +
 arch/nds32/include/asm/fpuemu.h         |   32 +++
 arch/nds32/include/asm/nds32_fpu_inst.h |  109 ++++++++++
 arch/nds32/include/asm/sfp-machine.h    |  158 ++++++++++++++
 arch/nds32/kernel/fpu.c                 |   31 +++-
 arch/nds32/math-emu/Makefile            |    7 +
 arch/nds32/math-emu/faddd.c             |   24 ++
 arch/nds32/math-emu/fadds.c             |   24 ++
 arch/nds32/math-emu/fcmpd.c             |   24 ++
 arch/nds32/math-emu/fcmps.c             |   24 ++
 arch/nds32/math-emu/fd2s.c              |   22 ++
 arch/nds32/math-emu/fdivd.c             |   27 +++
 arch/nds32/math-emu/fdivs.c             |   26 +++
 arch/nds32/math-emu/fmuld.c             |   23 ++
 arch/nds32/math-emu/fmuls.c             |   23 ++
 arch/nds32/math-emu/fnegd.c             |   21 ++
 arch/nds32/math-emu/fnegs.c             |   21 ++
 arch/nds32/math-emu/fpuemu.c            |  352 +++++++++++++++++++++++++++++++
 arch/nds32/math-emu/fs2d.c              |   23 ++
 arch/nds32/math-emu/fsqrtd.c            |   21 ++
 arch/nds32/math-emu/fsqrts.c            |   21 ++
 arch/nds32/math-emu/fsubd.c             |   27 +++
 arch/nds32/math-emu/fsubs.c             |   27 +++
 24 files changed, 1064 insertions(+), 5 deletions(-)
 create mode 100644 arch/nds32/include/asm/fpuemu.h
 create mode 100644 arch/nds32/include/asm/nds32_fpu_inst.h
 create mode 100644 arch/nds32/include/asm/sfp-machine.h
 create mode 100644 arch/nds32/math-emu/Makefile
 create mode 100644 arch/nds32/math-emu/faddd.c
 create mode 100644 arch/nds32/math-emu/fadds.c
 create mode 100644 arch/nds32/math-emu/fcmpd.c
 create mode 100644 arch/nds32/math-emu/fcmps.c
 create mode 100644 arch/nds32/math-emu/fd2s.c
 create mode 100644 arch/nds32/math-emu/fdivd.c
 create mode 100644 arch/nds32/math-emu/fdivs.c
 create mode 100644 arch/nds32/math-emu/fmuld.c
 create mode 100644 arch/nds32/math-emu/fmuls.c
 create mode 100644 arch/nds32/math-emu/fnegd.c
 create mode 100644 arch/nds32/math-emu/fnegs.c
 create mode 100644 arch/nds32/math-emu/fpuemu.c
 create mode 100644 arch/nds32/math-emu/fs2d.c
 create mode 100644 arch/nds32/math-emu/fsqrtd.c
 create mode 100644 arch/nds32/math-emu/fsqrts.c
 create mode 100644 arch/nds32/math-emu/fsubd.c
 create mode 100644 arch/nds32/math-emu/fsubs.c

diff --git a/arch/nds32/Makefile b/arch/nds32/Makefile
index 599ee62..e034e30 100644
--- a/arch/nds32/Makefile
+++ b/arch/nds32/Makefile
@@ -36,6 +36,7 @@ export	TEXTADDR
 
 # If we have a machine-specific directory, then include it in the build.
 core-y				+= arch/nds32/kernel/ arch/nds32/mm/
+core-$(CONFIG_FPU)              += arch/nds32/math-emu/
 libs-y				+= arch/nds32/lib/
 
 ifneq '$(CONFIG_NDS32_BUILTIN_DTB)' '""'
diff --git a/arch/nds32/include/asm/fpu.h b/arch/nds32/include/asm/fpu.h
index f7a7f6b..9b1107b 100644
--- a/arch/nds32/include/asm/fpu.h
+++ b/arch/nds32/include/asm/fpu.h
@@ -15,6 +15,7 @@
 extern void save_fpu(struct task_struct *__tsk);
 extern void load_fpu(const struct fpu_struct *fpregs);
 extern bool do_fpu_exception(unsigned int subtype, struct pt_regs *regs);
+extern int do_fpuemu(struct pt_regs *regs, struct fpu_struct *fpu);
 
 #define test_tsk_fpu(regs)	(regs->fucop_ctl & FUCOP_CTL_mskCP0EN)
 
diff --git a/arch/nds32/include/asm/fpuemu.h b/arch/nds32/include/asm/fpuemu.h
new file mode 100644
index 0000000..c4bd0c7
--- /dev/null
+++ b/arch/nds32/include/asm/fpuemu.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2005-2018 Andes Technology Corporation */
+
+#ifndef __ARCH_NDS32_FPUEMU_H
+#define __ARCH_NDS32_FPUEMU_H
+
+/*
+ * single precision
+ */
+
+void fadds(void *ft, void *fa, void *fb);
+void fsubs(void *ft, void *fa, void *fb);
+void fmuls(void *ft, void *fa, void *fb);
+void fdivs(void *ft, void *fa, void *fb);
+void fs2d(void *ft, void *fa);
+void fsqrts(void *ft, void *fa);
+void fnegs(void *ft, void *fa);
+int fcmps(void *ft, void *fa, void *fb, int cop);
+
+/*
+ * double precision
+ */
+void faddd(void *ft, void *fa, void *fb);
+void fsubd(void *ft, void *fa, void *fb);
+void fmuld(void *ft, void *fa, void *fb);
+void fdivd(void *ft, void *fa, void *fb);
+void fsqrtd(void *ft, void *fa);
+void fd2s(void *ft, void *fa);
+void fnegd(void *ft, void *fa);
+int fcmpd(void *ft, void *fa, void *fb, int cop);
+
+#endif /* __ARCH_NDS32_FPUEMU_H */
diff --git a/arch/nds32/include/asm/nds32_fpu_inst.h b/arch/nds32/include/asm/nds32_fpu_inst.h
new file mode 100644
index 0000000..1e4b86a
--- /dev/null
+++ b/arch/nds32/include/asm/nds32_fpu_inst.h
@@ -0,0 +1,109 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2005-2018 Andes Technology Corporation */
+
+#ifndef __NDS32_FPU_INST_H
+#define __NDS32_FPU_INST_H
+
+#define cop0_op	0x35
+
+/*
+ * COP0 field of opcodes.
+ */
+#define fs1_op	0x0
+#define fs2_op  0x4
+#define fd1_op  0x8
+#define fd2_op  0xc
+
+/*
+ * FS1 opcode.
+ */
+enum fs1 {
+	fadds_op, fsubs_op, fcpynss_op, fcpyss_op,
+	fmadds_op, fmsubs_op, fcmovns_op, fcmovzs_op,
+	fnmadds_op, fnmsubs_op,
+	fmuls_op = 0xc, fdivs_op,
+	fs1_f2op_op = 0xf
+};
+
+/*
+ * FS1/F2OP opcode.
+ */
+enum fs1_f2 {
+	fs2d_op, fsqrts_op,
+	fui2s_op = 0x8, fsi2s_op = 0xc,
+	fs2ui_op = 0x10, fs2ui_z_op = 0x14,
+	fs2si_op = 0x18, fs2si_z_op = 0x1c
+};
+
+/*
+ * FS2 opcode.
+ */
+enum fs2 {
+	fcmpeqs_op, fcmpeqs_e_op, fcmplts_op, fcmplts_e_op,
+	fcmples_op, fcmples_e_op, fcmpuns_op, fcmpuns_e_op
+};
+
+/*
+ * FD1 opcode.
+ */
+enum fd1 {
+	faddd_op, fsubd_op, fcpynsd_op, fcpysd_op,
+	fmaddd_op, fmsubd_op, fcmovnd_op, fcmovzd_op,
+	fnmaddd_op, fnmsubd_op,
+	fmuld_op = 0xc, fdivd_op, fd1_f2op_op = 0xf
+};
+
+/*
+ * FD1/F2OP opcode.
+ */
+enum fd1_f2 {
+	fd2s_op, fsqrtd_op,
+	fui2d_op = 0x8, fsi2d_op = 0xc,
+	fd2ui_op = 0x10, fd2ui_z_op = 0x14,
+	fd2si_op = 0x18, fd2si_z_op = 0x1c
+};
+
+/*
+ * FD2 opcode.
+ */
+enum fd2 {
+	fcmpeqd_op, fcmpeqd_e_op, fcmpltd_op, fcmpltd_e_op,
+	fcmpled_op, fcmpled_e_op, fcmpund_op, fcmpund_e_op
+};
+
+#define NDS32Insn(x) x
+
+#define I_OPCODE_off			25
+#define NDS32Insn_OPCODE(x)		(NDS32Insn(x) >> I_OPCODE_off)
+
+#define I_OPCODE_offRt			20
+#define I_OPCODE_mskRt			(0x1fUL << I_OPCODE_offRt)
+#define NDS32Insn_OPCODE_Rt(x) \
+	((NDS32Insn(x) & I_OPCODE_mskRt) >> I_OPCODE_offRt)
+
+#define I_OPCODE_offRa			15
+#define I_OPCODE_mskRa			(0x1fUL << I_OPCODE_offRa)
+#define NDS32Insn_OPCODE_Ra(x) \
+	((NDS32Insn(x) & I_OPCODE_mskRa) >> I_OPCODE_offRa)
+
+#define I_OPCODE_offRb			10
+#define I_OPCODE_mskRb			(0x1fUL << I_OPCODE_offRb)
+#define NDS32Insn_OPCODE_Rb(x) \
+	((NDS32Insn(x) & I_OPCODE_mskRb) >> I_OPCODE_offRb)
+
+#define I_OPCODE_offbit1014		10
+#define I_OPCODE_mskbit1014		(0x1fUL << I_OPCODE_offbit1014)
+#define NDS32Insn_OPCODE_BIT1014(x) \
+	((NDS32Insn(x) & I_OPCODE_mskbit1014) >> I_OPCODE_offbit1014)
+
+#define I_OPCODE_offbit69		6
+#define I_OPCODE_mskbit69		(0xfUL << I_OPCODE_offbit69)
+#define NDS32Insn_OPCODE_BIT69(x) \
+	((NDS32Insn(x) & I_OPCODE_mskbit69) >> I_OPCODE_offbit69)
+
+#define I_OPCODE_offCOP0		0
+#define I_OPCODE_mskCOP0		(0x3fUL << I_OPCODE_offCOP0)
+#define NDS32Insn_OPCODE_COP0(x) \
+	((NDS32Insn(x) & I_OPCODE_mskCOP0) >> I_OPCODE_offCOP0)
+
+#endif /* __NDS32_FPU_INST_H */
diff --git a/arch/nds32/include/asm/sfp-machine.h b/arch/nds32/include/asm/sfp-machine.h
new file mode 100644
index 0000000..b1a5caa
--- /dev/null
+++ b/arch/nds32/include/asm/sfp-machine.h
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2005-2018 Andes Technology Corporation */
+
+#include <asm/bitfield.h>
+
+#define _FP_W_TYPE_SIZE		32
+#define _FP_W_TYPE		unsigned long
+#define _FP_WS_TYPE		signed long
+#define _FP_I_TYPE		long
+
+#define __ll_B ((UWtype) 1 << (W_TYPE_SIZE / 2))
+#define __ll_lowpart(t) ((UWtype) (t) & (__ll_B - 1))
+#define __ll_highpart(t) ((UWtype) (t) >> (W_TYPE_SIZE / 2))
+
+#define _FP_MUL_MEAT_S(R, X, Y)				\
+	_FP_MUL_MEAT_1_wide(_FP_WFRACBITS_S, R, X, Y, umul_ppmm)
+#define _FP_MUL_MEAT_D(R, X, Y)				\
+	_FP_MUL_MEAT_2_wide(_FP_WFRACBITS_D, R, X, Y, umul_ppmm)
+#define _FP_MUL_MEAT_Q(R, X, Y)				\
+	_FP_MUL_MEAT_4_wide(_FP_WFRACBITS_Q, R, X, Y, umul_ppmm)
+
+#define _FP_MUL_MEAT_DW_S(R, X, Y)			\
+	_FP_MUL_MEAT_DW_1_wide(_FP_WFRACBITS_S, R, X, Y, umul_ppmm)
+#define _FP_MUL_MEAT_DW_D(R, X, Y)			\
+	_FP_MUL_MEAT_DW_2_wide(_FP_WFRACBITS_D, R, X, Y, umul_ppmm)
+
+#define _FP_DIV_MEAT_S(R, X, Y)	_FP_DIV_MEAT_1_udiv_norm(S, R, X, Y)
+#define _FP_DIV_MEAT_D(R, X, Y)	_FP_DIV_MEAT_2_udiv(D, R, X, Y)
+
+#define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
+#define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1), -1
+#define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1, -1, -1
+#define _FP_NANSIGN_S		0
+#define _FP_NANSIGN_D		0
+#define _FP_NANSIGN_Q		0
+
+#define _FP_KEEPNANFRACP 1
+#define _FP_QNANNEGATEDP 0
+
+#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP)			\
+do {								\
+	if ((_FP_FRAC_HIGH_RAW_##fs(X) & _FP_QNANBIT_##fs)	\
+	  && !(_FP_FRAC_HIGH_RAW_##fs(Y) & _FP_QNANBIT_##fs)) { \
+		R##_s = Y##_s;					\
+		_FP_FRAC_COPY_##wc(R, Y);			\
+	} else {						\
+		R##_s = X##_s;					\
+		_FP_FRAC_COPY_##wc(R, X);			\
+	}							\
+	R##_c = FP_CLS_NAN;					\
+} while (0)
+
+#define __FPU_FPCSR	(current->thread.fpu.fpcsr)
+
+/* Obtain the current rounding mode. */
+#define FP_ROUNDMODE                    \
+({                                      \
+	__FPU_FPCSR & FPCSR_mskRM;      \
+})
+
+#define FP_RND_NEAREST		0
+#define FP_RND_PINF		1
+#define FP_RND_MINF		2
+#define FP_RND_ZERO		3
+
+#define FP_EX_INVALID		FPCSR_mskIVO
+#define FP_EX_DIVZERO		FPCSR_mskDBZ
+#define FP_EX_OVERFLOW		FPCSR_mskOVF
+#define FP_EX_UNDERFLOW		FPCSR_mskUDF
+#define FP_EX_INEXACT		FPCSR_mskIEX
+
+#define SF_CEQ	2
+#define SF_CLT	1
+#define SF_CGT	3
+#define SF_CUN	4
+
+#include <asm/byteorder.h>
+
+#ifdef __BIG_ENDIAN__
+#define __BYTE_ORDER __BIG_ENDIAN
+#define __LITTLE_ENDIAN 0
+#else
+#define __BYTE_ORDER __LITTLE_ENDIAN
+#define __BIG_ENDIAN 0
+#endif
+
+#define abort() do { } while (0)
+#define umul_ppmm(w1, w0, u, v)						\
+do {									\
+	UWtype __x0, __x1, __x2, __x3;                                  \
+	UHWtype __ul, __vl, __uh, __vh;                                 \
+									\
+	__ul = __ll_lowpart(u);						\
+	__uh = __ll_highpart(u);					\
+	__vl = __ll_lowpart(v);						\
+	__vh = __ll_highpart(v);					\
+									\
+	__x0 = (UWtype) __ul * __vl;                                    \
+	__x1 = (UWtype) __ul * __vh;                                    \
+	__x2 = (UWtype) __uh * __vl;                                    \
+	__x3 = (UWtype) __uh * __vh;                                    \
+									\
+	__x1 += __ll_highpart(__x0);					\
+	__x1 += __x2;							\
+	if (__x1 < __x2)						\
+		__x3 += __ll_B;						\
+									\
+	(w1) = __x3 + __ll_highpart(__x1);				\
+	(w0) = __ll_lowpart(__x1) * __ll_B + __ll_lowpart(__x0);	\
+} while (0)
+
+#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+do { \
+	UWtype __x; \
+	__x = (al) + (bl); \
+	(sh) = (ah) + (bh) + (__x < (al)); \
+	(sl) = __x; \
+} while (0)
+
+#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+do { \
+	UWtype __x; \
+	__x = (al) - (bl); \
+	(sh) = (ah) - (bh) - (__x > (al)); \
+	(sl) = __x; \
+} while (0)
+
+#define udiv_qrnnd(q, r, n1, n0, d)				\
+do {								\
+	UWtype __d1, __d0, __q1, __q0, __r1, __r0, __m;		\
+	__d1 = __ll_highpart(d);				\
+	__d0 = __ll_lowpart(d);					\
+								\
+	__r1 = (n1) % __d1;					\
+	__q1 = (n1) / __d1;					\
+	__m = (UWtype) __q1 * __d0;				\
+	__r1 = __r1 * __ll_B | __ll_highpart(n0);		\
+	if (__r1 < __m)	{					\
+		__q1--, __r1 += (d);				\
+		if (__r1 >= (d))				\
+			if (__r1 < __m)				\
+				__q1--, __r1 += (d);		\
+	}							\
+	__r1 -= __m;						\
+	__r0 = __r1 % __d1;					\
+	__q0 = __r1 / __d1;					\
+	__m = (UWtype) __q0 * __d0;				\
+	__r0 = __r0 * __ll_B | __ll_lowpart(n0);		\
+	if (__r0 < __m)	{					\
+		__q0--, __r0 += (d);				\
+		if (__r0 >= (d))				\
+			if (__r0 < __m)				\
+				__q0--, __r0 += (d);		\
+	}							\
+	__r0 -= __m;						\
+	(q) = (UWtype) __q1 * __ll_B | __q0;			\
+	(r) = __r0;						\
+} while (0)
diff --git a/arch/nds32/kernel/fpu.c b/arch/nds32/kernel/fpu.c
index e55a1e1..2942df6 100644
--- a/arch/nds32/kernel/fpu.c
+++ b/arch/nds32/kernel/fpu.c
@@ -183,10 +183,10 @@ inline void fill_sigfpe_signo(unsigned int fpcsr, int *signo)
 {
 	if (fpcsr & FPCSR_mskOVFT)
 		*signo = FPE_FLTOVF;
-	else if (fpcsr & FPCSR_mskUDFT)
-		*signo = FPE_FLTUND;
 	else if (fpcsr & FPCSR_mskIVOT)
 		*signo = FPE_FLTINV;
+	else if (fpcsr & FPCSR_mskUDFT)
+		*signo = FPE_FLTUND;
 	else if (fpcsr & FPCSR_mskDBZT)
 		*signo = FPE_FLTDIV;
 	else if (fpcsr & FPCSR_mskIEXT)
@@ -201,16 +201,37 @@ inline void handle_fpu_exception(struct pt_regs *regs)
 	lose_fpu();
 	fpcsr = current->thread.fpu.fpcsr;
 
-	if (fpcsr & FPCSR_mskRIT) {
+	if (fpcsr & FPCSR_mskDNIT) {
+		si_signo = do_fpuemu(regs, &current->thread.fpu);
+		fpcsr = current->thread.fpu.fpcsr;
+		if (!si_signo)
+			goto done;
+	} else if (fpcsr & FPCSR_mskRIT) {
 		if (!user_mode(regs))
 			do_exit(SIGILL);
 		si_signo = SIGILL;
+	}
+
+
+	switch (si_signo) {
+	case SIGFPE:
+		fill_sigfpe_signo(fpcsr, &si_code);
+		break;
+	case SIGILL:
 		show_regs(regs);
 		si_code = ILL_COPROC;
-	} else
-		fill_sigfpe_signo(fpcsr, &si_code);
+		break;
+	case SIGBUS:
+		si_code = BUS_ADRERR;
+		break;
+	default:
+		break;
+	}
+
 	force_sig_fault(si_signo, si_code,
 			(void __user *)instruction_pointer(regs), current);
+done:
+	own_fpu();
 }
 
 bool do_fpu_exception(unsigned int subtype, struct pt_regs *regs)
diff --git a/arch/nds32/math-emu/Makefile b/arch/nds32/math-emu/Makefile
new file mode 100644
index 0000000..947fe0c
--- /dev/null
+++ b/arch/nds32/math-emu/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the Linux/nds32 kernel FPU emulation.
+#
+
+obj-y	:= fpuemu.o \
+	   fdivd.o fmuld.o fsubd.o faddd.o fs2d.o fsqrtd.o fcmpd.o fnegs.o \
+	   fdivs.o fmuls.o fsubs.o fadds.o fd2s.o fsqrts.o fcmps.o fnegd.o
diff --git a/arch/nds32/math-emu/faddd.c b/arch/nds32/math-emu/faddd.c
new file mode 100644
index 0000000..f7fd4e3
--- /dev/null
+++ b/arch/nds32/math-emu/faddd.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/double.h>
+void faddd(void *ft, void *fa, void *fb)
+{
+	FP_DECL_D(A);
+	FP_DECL_D(B);
+	FP_DECL_D(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_DP(A, fa);
+	FP_UNPACK_DP(B, fb);
+
+	FP_ADD_D(R, A, B);
+
+	FP_PACK_DP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+
+}
diff --git a/arch/nds32/math-emu/fadds.c b/arch/nds32/math-emu/fadds.c
new file mode 100644
index 0000000..f5af6ca
--- /dev/null
+++ b/arch/nds32/math-emu/fadds.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/single.h>
+void fadds(void *ft, void *fa, void *fb)
+{
+	FP_DECL_S(A);
+	FP_DECL_S(B);
+	FP_DECL_S(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_SP(A, fa);
+	FP_UNPACK_SP(B, fb);
+
+	FP_ADD_S(R, A, B);
+
+	FP_PACK_SP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+
+}
diff --git a/arch/nds32/math-emu/fcmpd.c b/arch/nds32/math-emu/fcmpd.c
new file mode 100644
index 0000000..0ea225a
--- /dev/null
+++ b/arch/nds32/math-emu/fcmpd.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/double.h>
+int fcmpd(void *ft, void *fa, void *fb, int cmpop)
+{
+	FP_DECL_D(A);
+	FP_DECL_D(B);
+	FP_DECL_EX;
+	long cmp;
+
+	FP_UNPACK_DP(A, fa);
+	FP_UNPACK_DP(B, fb);
+
+	FP_CMP_D(cmp, A, B, SF_CUN);
+	cmp += 2;
+	if (cmp == SF_CGT)
+		*(long *)ft = 0;
+	else
+		*(long *)ft = (cmp & cmpop) ? 1 : 0;
+
+	return 0;
+}
diff --git a/arch/nds32/math-emu/fcmps.c b/arch/nds32/math-emu/fcmps.c
new file mode 100644
index 0000000..6814807
--- /dev/null
+++ b/arch/nds32/math-emu/fcmps.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/single.h>
+int fcmps(void *ft, void *fa, void *fb, int cmpop)
+{
+	FP_DECL_S(A);
+	FP_DECL_S(B);
+	FP_DECL_EX;
+	long cmp;
+
+	FP_UNPACK_SP(A, fa);
+	FP_UNPACK_SP(B, fb);
+
+	FP_CMP_S(cmp, A, B, SF_CUN);
+	cmp += 2;
+	if (cmp == SF_CGT)
+		*(int *)ft = 0x0;
+	else
+		*(int *)ft = (cmp & cmpop) ? 0x1 : 0x0;
+
+	return 0;
+}
diff --git a/arch/nds32/math-emu/fd2s.c b/arch/nds32/math-emu/fd2s.c
new file mode 100644
index 0000000..1328371
--- /dev/null
+++ b/arch/nds32/math-emu/fd2s.c
@@ -0,0 +1,22 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/double.h>
+#include <math-emu/single.h>
+#include <math-emu/soft-fp.h>
+void fd2s(void *ft, void *fa)
+{
+	FP_DECL_D(A);
+	FP_DECL_S(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_DP(A, fa);
+
+	FP_CONV(S, D, 1, 2, R, A);
+
+	FP_PACK_SP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fdivd.c b/arch/nds32/math-emu/fdivd.c
new file mode 100644
index 0000000..458e7e9
--- /dev/null
+++ b/arch/nds32/math-emu/fdivd.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+
+#include <linux/uaccess.h>
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/double.h>
+
+void fdivd(void *ft, void *fa, void *fb)
+{
+	FP_DECL_D(A);
+	FP_DECL_D(B);
+	FP_DECL_D(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_DP(A, fa);
+	FP_UNPACK_DP(B, fb);
+
+	if (B_c == FP_CLS_ZERO && A_c != FP_CLS_ZERO)
+		FP_SET_EXCEPTION(FP_EX_DIVZERO);
+
+	FP_DIV_D(R, A, B);
+
+	FP_PACK_DP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fdivs.c b/arch/nds32/math-emu/fdivs.c
new file mode 100644
index 0000000..c7d2021
--- /dev/null
+++ b/arch/nds32/math-emu/fdivs.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/single.h>
+void fdivs(void *ft, void *fa, void *fb)
+{
+	FP_DECL_S(A);
+	FP_DECL_S(B);
+	FP_DECL_S(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_SP(A, fa);
+	FP_UNPACK_SP(B, fb);
+
+	if (B_c == FP_CLS_ZERO && A_c != FP_CLS_ZERO)
+		FP_SET_EXCEPTION(FP_EX_DIVZERO);
+
+	FP_DIV_S(R, A, B);
+
+	FP_PACK_SP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fmuld.c b/arch/nds32/math-emu/fmuld.c
new file mode 100644
index 0000000..f3c77a4
--- /dev/null
+++ b/arch/nds32/math-emu/fmuld.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/double.h>
+void fmuld(void *ft, void *fa, void *fb)
+{
+	FP_DECL_D(A);
+	FP_DECL_D(B);
+	FP_DECL_D(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_DP(A, fa);
+	FP_UNPACK_DP(B, fb);
+
+	FP_MUL_D(R, A, B);
+
+	FP_PACK_DP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fmuls.c b/arch/nds32/math-emu/fmuls.c
new file mode 100644
index 0000000..cf150df
--- /dev/null
+++ b/arch/nds32/math-emu/fmuls.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/single.h>
+void fmuls(void *ft, void *fa, void *fb)
+{
+	FP_DECL_S(A);
+	FP_DECL_S(B);
+	FP_DECL_S(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_SP(A, fa);
+	FP_UNPACK_SP(B, fb);
+
+	FP_MUL_S(R, A, B);
+
+	FP_PACK_SP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fnegd.c b/arch/nds32/math-emu/fnegd.c
new file mode 100644
index 0000000..de7ea6a
--- /dev/null
+++ b/arch/nds32/math-emu/fnegd.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/double.h>
+void fnegd(void *ft, void *fa)
+{
+	FP_DECL_D(A);
+	FP_DECL_D(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_DP(A, fa);
+
+	FP_NEG_D(R, A);
+
+	FP_PACK_DP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fnegs.c b/arch/nds32/math-emu/fnegs.c
new file mode 100644
index 0000000..07270b3
--- /dev/null
+++ b/arch/nds32/math-emu/fnegs.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/single.h>
+void fnegs(void *ft, void *fa)
+{
+	FP_DECL_S(A);
+	FP_DECL_S(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_SP(A, fa);
+
+	FP_NEG_S(R, A);
+
+	FP_PACK_SP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fpuemu.c b/arch/nds32/math-emu/fpuemu.c
new file mode 100644
index 0000000..2a01333
--- /dev/null
+++ b/arch/nds32/math-emu/fpuemu.c
@@ -0,0 +1,352 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+
+#include <asm/bitfield.h>
+#include <asm/uaccess.h>
+#include <asm/sfp-machine.h>
+#include <asm/fpuemu.h>
+#include <asm/nds32_fpu_inst.h>
+
+#define DPFROMREG(dp, x) (dp = (void *)((unsigned long *)fpu_reg + 2*x))
+#ifdef __NDS32_EL__
+#define SPFROMREG(sp, x)\
+	((sp) = (void *)((unsigned long *)fpu_reg + (x^1)))
+#else
+#define SPFROMREG(sp, x) ((sp) = (void *)((unsigned long *)fpu_reg + x))
+#endif
+
+#define DEF3OP(name, p, f1, f2) \
+void fpemu_##name##p(void *ft, void *fa, void *fb) \
+{ \
+	f1(fa, fa, fb); \
+	f2(ft, ft, fa); \
+}
+
+#define DEF3OPNEG(name, p, f1, f2, f3) \
+void fpemu_##name##p(void *ft, void *fa, void *fb) \
+{ \
+	f1(fa, fa, fb); \
+	f2(ft, ft, fa); \
+	f3(ft, ft); \
+}
+DEF3OP(fmadd, s, fmuls, fadds);
+DEF3OP(fmsub, s, fmuls, fsubs);
+DEF3OP(fmadd, d, fmuld, faddd);
+DEF3OP(fmsub, d, fmuld, fsubd);
+DEF3OPNEG(fnmadd, s, fmuls, fadds, fnegs);
+DEF3OPNEG(fnmsub, s, fmuls, fsubs, fnegs);
+DEF3OPNEG(fnmadd, d, fmuld, faddd, fnegd);
+DEF3OPNEG(fnmsub, d, fmuld, fsubd, fnegd);
+
+static const unsigned char cmptab[8] = {
+	SF_CEQ,
+	SF_CEQ,
+	SF_CLT,
+	SF_CLT,
+	SF_CLT | SF_CEQ,
+	SF_CLT | SF_CEQ,
+	SF_CUN,
+	SF_CUN
+};
+
+enum ARGTYPE {
+	S1S = 1,
+	S2S,
+	S1D,
+	CS,
+	D1D,
+	D2D,
+	D1S,
+	CD
+};
+union func_t {
+	void (*t)(void *ft, void *fa, void *fb);
+	void (*b)(void *ft, void *fa);
+};
+/*
+ * Emulate a single FPU arithmetic instruction.
+ */
+static int fpu_emu(struct fpu_struct *fpu_reg, unsigned long insn)
+{
+	int rfmt;		/* resulting format */
+	union func_t func;
+	int ftype = 0;
+
+	switch (rfmt = NDS32Insn_OPCODE_COP0(insn)) {
+	case fs1_op:{
+			switch (NDS32Insn_OPCODE_BIT69(insn)) {
+			case fadds_op:
+				func.t = fadds;
+				ftype = S2S;
+				break;
+			case fsubs_op:
+				func.t = fsubs;
+				ftype = S2S;
+				break;
+			case fmadds_op:
+				func.t = fpemu_fmadds;
+				ftype = S2S;
+				break;
+			case fmsubs_op:
+				func.t = fpemu_fmsubs;
+				ftype = S2S;
+				break;
+			case fnmadds_op:
+				func.t = fpemu_fnmadds;
+				ftype = S2S;
+				break;
+			case fnmsubs_op:
+				func.t = fpemu_fnmsubs;
+				ftype = S2S;
+				break;
+			case fmuls_op:
+				func.t = fmuls;
+				ftype = S2S;
+				break;
+			case fdivs_op:
+				func.t = fdivs;
+				ftype = S2S;
+				break;
+			case fs1_f2op_op:
+				switch (NDS32Insn_OPCODE_BIT1014(insn)) {
+				case fs2d_op:
+					func.b = fs2d;
+					ftype = S1D;
+					break;
+				case fsqrts_op:
+					func.b = fsqrts;
+					ftype = S1S;
+					break;
+				default:
+					return SIGILL;
+				}
+				break;
+			default:
+				return SIGILL;
+			}
+			break;
+		}
+	case fs2_op:
+		switch (NDS32Insn_OPCODE_BIT69(insn)) {
+		case fcmpeqs_op:
+		case fcmpeqs_e_op:
+		case fcmplts_op:
+		case fcmplts_e_op:
+		case fcmples_op:
+		case fcmples_e_op:
+		case fcmpuns_op:
+		case fcmpuns_e_op:
+			ftype = CS;
+			break;
+		default:
+			return SIGILL;
+		}
+		break;
+	case fd1_op:{
+			switch (NDS32Insn_OPCODE_BIT69(insn)) {
+			case faddd_op:
+				func.t = faddd;
+				ftype = D2D;
+				break;
+			case fsubd_op:
+				func.t = fsubd;
+				ftype = D2D;
+				break;
+			case fmaddd_op:
+				func.t = fpemu_fmaddd;
+				ftype = D2D;
+				break;
+			case fmsubd_op:
+				func.t = fpemu_fmsubd;
+				ftype = D2D;
+				break;
+			case fnmaddd_op:
+				func.t = fpemu_fnmaddd;
+				ftype = D2D;
+				break;
+			case fnmsubd_op:
+				func.t = fpemu_fnmsubd;
+				ftype = D2D;
+				break;
+			case fmuld_op:
+				func.t = fmuld;
+				ftype = D2D;
+				break;
+			case fdivd_op:
+				func.t = fdivd;
+				ftype = D2D;
+				break;
+			case fd1_f2op_op:
+				switch (NDS32Insn_OPCODE_BIT1014(insn)) {
+				case fd2s_op:
+					func.b = fd2s;
+					ftype = D1S;
+					break;
+				case fsqrtd_op:
+					func.b = fsqrtd;
+					ftype = D1D;
+					break;
+				default:
+					return SIGILL;
+				}
+				break;
+			default:
+				return SIGILL;
+
+			}
+			break;
+		}
+
+	case fd2_op:
+		switch (NDS32Insn_OPCODE_BIT69(insn)) {
+		case fcmpeqd_op:
+		case fcmpeqd_e_op:
+		case fcmpltd_op:
+		case fcmpltd_e_op:
+		case fcmpled_op:
+		case fcmpled_e_op:
+		case fcmpund_op:
+		case fcmpund_e_op:
+			ftype = CD;
+			break;
+		default:
+			return SIGILL;
+		}
+		break;
+
+	default:
+		return SIGILL;
+	}
+
+	switch (ftype) {
+	case S1S:{
+			void *ft, *fa;
+
+			SPFROMREG(ft, NDS32Insn_OPCODE_Rt(insn));
+			SPFROMREG(fa, NDS32Insn_OPCODE_Ra(insn));
+			func.b(ft, fa);
+			break;
+		}
+	case S2S:{
+			void *ft, *fa, *fb;
+
+			SPFROMREG(ft, NDS32Insn_OPCODE_Rt(insn));
+			SPFROMREG(fa, NDS32Insn_OPCODE_Ra(insn));
+			SPFROMREG(fb, NDS32Insn_OPCODE_Rb(insn));
+			func.t(ft, fa, fb);
+			break;
+		}
+	case S1D:{
+			void *ft, *fa;
+
+			DPFROMREG(ft, NDS32Insn_OPCODE_Rt(insn));
+			SPFROMREG(fa, NDS32Insn_OPCODE_Ra(insn));
+			func.b(ft, fa);
+			break;
+		}
+	case CS:{
+			unsigned int cmpop = NDS32Insn_OPCODE_BIT69(insn);
+			void *ft, *fa, *fb;
+
+			SPFROMREG(ft, NDS32Insn_OPCODE_Rt(insn));
+			SPFROMREG(fa, NDS32Insn_OPCODE_Ra(insn));
+			SPFROMREG(fb, NDS32Insn_OPCODE_Rb(insn));
+			if (cmpop < 0x8) {
+				cmpop = cmptab[cmpop];
+				fcmps(ft, fa, fb, cmpop);
+			} else
+				return SIGILL;
+			break;
+		}
+	case D1D:{
+			void *ft, *fa;
+
+			DPFROMREG(ft, NDS32Insn_OPCODE_Rt(insn));
+			DPFROMREG(fa, NDS32Insn_OPCODE_Ra(insn));
+			func.b(ft, fa);
+			break;
+		}
+	case D2D:{
+			void *ft, *fa, *fb;
+
+			DPFROMREG(ft, NDS32Insn_OPCODE_Rt(insn));
+			DPFROMREG(fa, NDS32Insn_OPCODE_Ra(insn));
+			DPFROMREG(fb, NDS32Insn_OPCODE_Rb(insn));
+			func.t(ft, fa, fb);
+			break;
+		}
+	case D1S:{
+			void *ft, *fa;
+
+			SPFROMREG(ft, NDS32Insn_OPCODE_Rt(insn));
+			DPFROMREG(fa, NDS32Insn_OPCODE_Ra(insn));
+			func.b(ft, fa);
+			break;
+		}
+	case CD:{
+			unsigned int cmpop = NDS32Insn_OPCODE_BIT69(insn);
+			void *ft, *fa, *fb;
+
+			SPFROMREG(ft, NDS32Insn_OPCODE_Rt(insn));
+			DPFROMREG(fa, NDS32Insn_OPCODE_Ra(insn));
+			DPFROMREG(fb, NDS32Insn_OPCODE_Rb(insn));
+			if (cmpop < 0x8) {
+				cmpop = cmptab[cmpop];
+				fcmpd(ft, fa, fb, cmpop);
+			} else
+				return SIGILL;
+			break;
+		}
+	default:
+		return SIGILL;
+	}
+
+	/*
+	 * If an exception is required, generate a tidy SIGFPE exception.
+	 */
+	if ((fpu_reg->fpcsr << 5) & fpu_reg->fpcsr & FPCSR_mskALLE)
+		return SIGFPE;
+	return 0;
+}
+
+
+int do_fpuemu(struct pt_regs *regs, struct fpu_struct *fpu)
+{
+	unsigned long insn = 0, addr = regs->ipc;
+	unsigned long emulpc, contpc;
+	unsigned char *pc = (void *)&insn;
+	char c;
+	int i = 0, ret;
+
+	for (i = 0; i < 4; i++) {
+		if (__get_user(c, (unsigned char *)addr++))
+			return SIGBUS;
+		*pc++ = c;
+	}
+
+	insn = be32_to_cpu(insn);
+
+	emulpc = regs->ipc;
+	contpc = regs->ipc + 4;
+
+	if (NDS32Insn_OPCODE(insn) != cop0_op)
+		return SIGILL;
+	switch (NDS32Insn_OPCODE_COP0(insn)) {
+	case fs1_op:
+	case fs2_op:
+	case fd1_op:
+	case fd2_op:
+		{
+			/* a real fpu computation instruction */
+			ret = fpu_emu(fpu, insn);
+			if (!ret)
+				regs->ipc = contpc;
+		}
+		break;
+
+	default:
+		return SIGILL;
+	}
+
+	return ret;
+}
diff --git a/arch/nds32/math-emu/fs2d.c b/arch/nds32/math-emu/fs2d.c
new file mode 100644
index 0000000..0e8db90
--- /dev/null
+++ b/arch/nds32/math-emu/fs2d.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+
+#include <linux/uaccess.h>
+#include <asm/sfp-machine.h>
+#include <math-emu/double.h>
+#include <math-emu/single.h>
+#include <math-emu/soft-fp.h>
+
+void fs2d(void *ft, void *fa)
+{
+	FP_DECL_S(A);
+	FP_DECL_D(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_SP(A, fa);
+
+	FP_CONV(D, S, 2, 1, R, A);
+
+	FP_PACK_DP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fsqrtd.c b/arch/nds32/math-emu/fsqrtd.c
new file mode 100644
index 0000000..c3a8dbd
--- /dev/null
+++ b/arch/nds32/math-emu/fsqrtd.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+
+#include <linux/uaccess.h>
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/double.h>
+void fsqrtd(void *ft, void *fa)
+{
+	FP_DECL_D(A);
+	FP_DECL_D(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_DP(A, fa);
+
+	FP_SQRT_D(R, A);
+
+	FP_PACK_DP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fsqrts.c b/arch/nds32/math-emu/fsqrts.c
new file mode 100644
index 0000000..4c6f94b
--- /dev/null
+++ b/arch/nds32/math-emu/fsqrts.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+
+#include <linux/uaccess.h>
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/single.h>
+void fsqrts(void *ft, void *fa)
+{
+	FP_DECL_S(A);
+	FP_DECL_S(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_SP(A, fa);
+
+	FP_SQRT_S(R, A);
+
+	FP_PACK_SP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fsubd.c b/arch/nds32/math-emu/fsubd.c
new file mode 100644
index 0000000..81b6a0d
--- /dev/null
+++ b/arch/nds32/math-emu/fsubd.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/double.h>
+void fsubd(void *ft, void *fa, void *fb)
+{
+
+	FP_DECL_D(A);
+	FP_DECL_D(B);
+	FP_DECL_D(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_DP(A, fa);
+	FP_UNPACK_DP(B, fb);
+
+	if (B_c != FP_CLS_NAN)
+		B_s ^= 1;
+
+	FP_ADD_D(R, A, B);
+
+	FP_PACK_DP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
diff --git a/arch/nds32/math-emu/fsubs.c b/arch/nds32/math-emu/fsubs.c
new file mode 100644
index 0000000..61ddd97
--- /dev/null
+++ b/arch/nds32/math-emu/fsubs.c
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2005-2018 Andes Technology Corporation
+#include <linux/uaccess.h>
+
+#include <asm/sfp-machine.h>
+#include <math-emu/soft-fp.h>
+#include <math-emu/single.h>
+void fsubs(void *ft, void *fa, void *fb)
+{
+
+	FP_DECL_S(A);
+	FP_DECL_S(B);
+	FP_DECL_S(R);
+	FP_DECL_EX;
+
+	FP_UNPACK_SP(A, fa);
+	FP_UNPACK_SP(B, fb);
+
+	if (B_c != FP_CLS_NAN)
+		B_s ^= 1;
+
+	FP_ADD_S(R, A, B);
+
+	FP_PACK_SP(ft, R);
+
+	__FPU_FPCSR |= FP_CUR_EXCEPTIONS;
+}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 3/5] nds32: support denormalized result through FP emulator
  2018-11-01  7:16 [PATCH v3 0/5] nds32 FPU port Vincent Chen
  2018-11-01  7:16 ` [PATCH v3 1/5] nds32: " Vincent Chen
  2018-11-01  7:16 ` [PATCH v3 2/5] nds32: Support FP emulation Vincent Chen
@ 2018-11-01  7:16 ` Vincent Chen
  2018-11-01  7:17 ` [PATCH v3 4/5] math-emu/op-2.h: Use statement expressions to prevent negative constant shift Vincent Chen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Vincent Chen @ 2018-11-01  7:16 UTC (permalink / raw)
  To: arnd, linux-kernel; +Cc: green.hu, deanbo422, vincentc

Currently, the nds32 FPU dose not support the arithmetic of denormalized
number. When the nds32 FPU finds the result of the instruction is a
denormlized number, the nds32 FPU considers it to be an underflow condition
and rounds the result to an appropriate number. It may causes some loss
of precision. This commit proposes a solution to re-execute the
instruction by the FPU emulator to enhance the precision. To transfer
calculations from user space to kernel space, this feature will enable
the underflow exception trap by default. Enabling this feature may cause
some side effects:
  1. Performance loss due to extra FPU exception
  2. Need another scheme to control real underflow trap
       A new parameter, UDF_trap, which is belong to FPU context is used
     to control underflow trap.

User can configure this feature via CONFIG_SUPPORT_DENORMAL_ARITHMETIC

Signed-off-by: Vincent Chen <vincentc@andestech.com>
---
 arch/nds32/Kconfig.cpu                   |   13 ++++++++++++
 arch/nds32/include/asm/elf.h             |   11 ++++++++++
 arch/nds32/include/asm/fpu.h             |   11 ++++++++++
 arch/nds32/include/asm/syscalls.h        |    1 +
 arch/nds32/include/uapi/asm/auxvec.h     |    7 ++++++
 arch/nds32/include/uapi/asm/sigcontext.h |    9 ++++++++
 arch/nds32/include/uapi/asm/udftrap.h    |   13 ++++++++++++
 arch/nds32/include/uapi/asm/unistd.h     |    2 +
 arch/nds32/kernel/fpu.c                  |   25 +++++++++++++++++++---
 arch/nds32/kernel/sys_nds32.c            |   32 ++++++++++++++++++++++++++++++
 arch/nds32/math-emu/fpuemu.c             |    5 ++++
 11 files changed, 125 insertions(+), 4 deletions(-)
 create mode 100644 arch/nds32/include/uapi/asm/udftrap.h

diff --git a/arch/nds32/Kconfig.cpu b/arch/nds32/Kconfig.cpu
index 593d0c2..3aad9e4 100644
--- a/arch/nds32/Kconfig.cpu
+++ b/arch/nds32/Kconfig.cpu
@@ -28,6 +28,19 @@ config LAZY_FPU
 
 	  For nomal case, say Y.
 
+config SUPPORT_DENORMAL_ARITHMETIC
+	bool "Denormal arithmetic support"
+	depends on FPU
+	default n
+	help
+	  Say Y here to enable arithmetic of denormalized number. Enabling
+	  this feature can enhance the precision for tininess number.
+	  However, performance loss in float pointe calculations is
+	  possibly significant due to additional FPU exception.
+
+	  If the calculated tolerance for tininess number is not critical,
+	  say N to prevent performance loss.
+
 config HWZOL
 	bool "hardware zero overhead loop support"
 	depends on CPU_D10 || CPU_D15
diff --git a/arch/nds32/include/asm/elf.h b/arch/nds32/include/asm/elf.h
index f5f9cf7..95f3ea2 100644
--- a/arch/nds32/include/asm/elf.h
+++ b/arch/nds32/include/asm/elf.h
@@ -9,6 +9,7 @@
  */
 
 #include <asm/ptrace.h>
+#include <asm/fpu.h>
 
 typedef unsigned long elf_greg_t;
 typedef unsigned long elf_freg_t[3];
@@ -159,8 +160,18 @@ struct user_fp {
 
 #endif
 
+
+#if IS_ENABLED(CONFIG_FPU)
+#define FPU_AUX_ENT	NEW_AUX_ENT(AT_FPUCW, FPCSR_INIT)
+#else
+#define FPU_AUX_ENT	NEW_AUX_ENT(AT_IGNORE, 0)
+#endif
+
 #define ARCH_DLINFO						\
 do {								\
+	/* Optional FPU initialization */			\
+	FPU_AUX_ENT;						\
+								\
 	NEW_AUX_ENT(AT_SYSINFO_EHDR,				\
 		    (elf_addr_t)current->mm->context.vdso);	\
 } while (0)
diff --git a/arch/nds32/include/asm/fpu.h b/arch/nds32/include/asm/fpu.h
index 9b1107b..019f1bc 100644
--- a/arch/nds32/include/asm/fpu.h
+++ b/arch/nds32/include/asm/fpu.h
@@ -28,7 +28,18 @@
 #define sNAN64    0xFFFFFFFFFFFFFFFFULL
 #define sNAN32    0xFFFFFFFFUL
 
+#if IS_ENABLED(CONFIG_SUPPORT_DENORMAL_ARITHMETIC)
+/*
+ * Denormalized number is unsupported by nds32 FPU. Hence the operation
+ * is treated as underflow cases when the final result is a denormalized
+ * number. To enhance precision, underflow exception trap should be
+ * enabled by default and kerenl will re-execute it by fpu emulator
+ * when getting underflow exception.
+ */
+#define FPCSR_INIT  FPCSR_mskUDFE
+#else
 #define FPCSR_INIT  0x0UL
+#endif
 
 extern const struct fpu_struct init_fpuregs;
 
diff --git a/arch/nds32/include/asm/syscalls.h b/arch/nds32/include/asm/syscalls.h
index 78778ec..da32101 100644
--- a/arch/nds32/include/asm/syscalls.h
+++ b/arch/nds32/include/asm/syscalls.h
@@ -7,6 +7,7 @@
 asmlinkage long sys_cacheflush(unsigned long addr, unsigned long len, unsigned int op);
 asmlinkage long sys_fadvise64_64_wrapper(int fd, int advice, loff_t offset, loff_t len);
 asmlinkage long sys_rt_sigreturn_wrapper(void);
+asmlinkage long sys_udftrap(int option);
 
 #include <asm-generic/syscalls.h>
 
diff --git a/arch/nds32/include/uapi/asm/auxvec.h b/arch/nds32/include/uapi/asm/auxvec.h
index 56043ce..2d3213f 100644
--- a/arch/nds32/include/uapi/asm/auxvec.h
+++ b/arch/nds32/include/uapi/asm/auxvec.h
@@ -4,6 +4,13 @@
 #ifndef __ASM_AUXVEC_H
 #define __ASM_AUXVEC_H
 
+/*
+ * This entry gives some information about the FPU initialization
+ * performed by the kernel.
+ */
+#define AT_FPUCW	18	/* Used FPU control word.  */
+
+
 /* VDSO location */
 #define AT_SYSINFO_EHDR	33
 
diff --git a/arch/nds32/include/uapi/asm/sigcontext.h b/arch/nds32/include/uapi/asm/sigcontext.h
index 1257a78..58afc41 100644
--- a/arch/nds32/include/uapi/asm/sigcontext.h
+++ b/arch/nds32/include/uapi/asm/sigcontext.h
@@ -12,6 +12,15 @@
 struct fpu_struct {
 	unsigned long long fd_regs[32];
 	unsigned long fpcsr;
+	/*
+	 * UDF_trap is used to recognize whether underflow trap is enabled
+	 * or not. When UDF_trap == 1, this process will be traped and then
+	 * get a SIGFPE signal when encountering an underflow exception.
+	 * UDF_trap is only modified through setfputrap syscall. Therefore,
+	 * UDF_trap needn't be saved or loaded to context in each context
+	 * switch.
+	 */
+	unsigned long UDF_trap;
 };
 
 struct zol_struct {
diff --git a/arch/nds32/include/uapi/asm/udftrap.h b/arch/nds32/include/uapi/asm/udftrap.h
new file mode 100644
index 0000000..433f79d
--- /dev/null
+++ b/arch/nds32/include/uapi/asm/udftrap.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2005-2018 Andes Technology Corporation */
+#ifndef	_ASM_SETFPUTRAP
+#define	_ASM_SETFPUTRAP
+
+/*
+ * Options for setfputrap system call
+ */
+#define	DISABLE_UDFTRAP	0	/* disable underflow exception trap */
+#define	ENABLE_UDFTRAP	1	/* enable undeflos exception trap */
+#define	GET_UDFTRAP	2	/* only get undeflos exception trap status */
+
+#endif /* _ASM_CACHECTL */
diff --git a/arch/nds32/include/uapi/asm/unistd.h b/arch/nds32/include/uapi/asm/unistd.h
index 6e95901..199e675 100644
--- a/arch/nds32/include/uapi/asm/unistd.h
+++ b/arch/nds32/include/uapi/asm/unistd.h
@@ -8,4 +8,6 @@
 
 /* Additional NDS32 specific syscalls. */
 #define __NR_cacheflush		(__NR_arch_specific_syscall)
+#define __NR_udftrap		(__NR_arch_specific_syscall + 1)
 __SYSCALL(__NR_cacheflush, sys_cacheflush)
+__SYSCALL(__NR_udftrap, sys_udftrap)
diff --git a/arch/nds32/kernel/fpu.c b/arch/nds32/kernel/fpu.c
index 2942df6..fddd40c 100644
--- a/arch/nds32/kernel/fpu.c
+++ b/arch/nds32/kernel/fpu.c
@@ -12,7 +12,10 @@
 
 const struct fpu_struct init_fpuregs = {
 	.fd_regs = {[0 ... 31] = sNAN64},
-	.fpcsr = FPCSR_INIT
+	.fpcsr = FPCSR_INIT,
+#if IS_ENABLED(CONFIG_SUPPORT_DENORMAL_ARITHMETIC)
+	.UDF_trap = 0
+#endif
 };
 
 void save_fpu(struct task_struct *tsk)
@@ -174,6 +177,9 @@ inline void do_fpu_context_switch(struct pt_regs *regs)
 	} else {
 		/* First time FPU user.  */
 		load_fpu(&init_fpuregs);
+#if IS_ENABLED(CONFIG_SUPPORT_DENORMAL_ARITHMETIC)
+		current->thread.fpu.UDF_trap = init_fpuregs.UDF_trap;
+#endif
 		set_used_math();
 	}
 
@@ -183,10 +189,12 @@ inline void fill_sigfpe_signo(unsigned int fpcsr, int *signo)
 {
 	if (fpcsr & FPCSR_mskOVFT)
 		*signo = FPE_FLTOVF;
-	else if (fpcsr & FPCSR_mskIVOT)
-		*signo = FPE_FLTINV;
+#ifndef CONFIG_SUPPORT_DENORMAL_ARITHMETIC
 	else if (fpcsr & FPCSR_mskUDFT)
 		*signo = FPE_FLTUND;
+#endif
+	else if (fpcsr & FPCSR_mskIVOT)
+		*signo = FPE_FLTINV;
 	else if (fpcsr & FPCSR_mskDBZT)
 		*signo = FPE_FLTDIV;
 	else if (fpcsr & FPCSR_mskIEXT)
@@ -197,11 +205,20 @@ inline void handle_fpu_exception(struct pt_regs *regs)
 {
 	unsigned int fpcsr;
 	int si_code = 0, si_signo = SIGFPE;
+#if IS_ENABLED(CONFIG_SUPPORT_DENORMAL_ARITHMETIC)
+	unsigned long redo_except = FPCSR_mskDNIT|FPCSR_mskUDFT;
+#else
+	unsigned long redo_except = FPCSR_mskDNIT;
+#endif
 
 	lose_fpu();
 	fpcsr = current->thread.fpu.fpcsr;
 
-	if (fpcsr & FPCSR_mskDNIT) {
+	if (fpcsr & redo_except) {
+#if IS_ENABLED(CONFIG_SUPPORT_DENORMAL_ARITHMETIC)
+		if (fpcsr & FPCSR_mskUDFT)
+			current->thread.fpu.fpcsr &= ~FPCSR_mskIEX;
+#endif
 		si_signo = do_fpuemu(regs, &current->thread.fpu);
 		fpcsr = current->thread.fpu.fpcsr;
 		if (!si_signo)
diff --git a/arch/nds32/kernel/sys_nds32.c b/arch/nds32/kernel/sys_nds32.c
index 9de93ab..0835277 100644
--- a/arch/nds32/kernel/sys_nds32.c
+++ b/arch/nds32/kernel/sys_nds32.c
@@ -6,6 +6,8 @@
 
 #include <asm/cachectl.h>
 #include <asm/proc-fns.h>
+#include <asm/udftrap.h>
+#include <asm/fpu.h>
 
 SYSCALL_DEFINE6(mmap2, unsigned long, addr, unsigned long, len,
 	       unsigned long, prot, unsigned long, flags,
@@ -48,3 +50,33 @@
 
 	return 0;
 }
+
+SYSCALL_DEFINE1(udftrap, int, option)
+{
+#if IS_ENABLED(CONFIG_SUPPORT_DENORMAL_ARITHMETIC)
+	int old_udftrap;
+
+	if (!used_math()) {
+		load_fpu(&init_fpuregs);
+		current->thread.fpu.UDF_trap = init_fpuregs.UDF_trap;
+		set_used_math();
+	}
+
+	old_udftrap = current->thread.fpu.UDF_trap;
+	switch (option) {
+	case DISABLE_UDFTRAP:
+		current->thread.fpu.UDF_trap = 0;
+		break;
+	case ENABLE_UDFTRAP:
+		current->thread.fpu.UDF_trap = FPCSR_mskUDFE;
+		break;
+	case GET_UDFTRAP:
+		break;
+	default:
+		return -EINVAL;
+	}
+	return old_udftrap;
+#else
+	return -ENOTSUPP;
+#endif
+}
diff --git a/arch/nds32/math-emu/fpuemu.c b/arch/nds32/math-emu/fpuemu.c
index 2a01333..75cf164 100644
--- a/arch/nds32/math-emu/fpuemu.c
+++ b/arch/nds32/math-emu/fpuemu.c
@@ -304,7 +304,12 @@ static int fpu_emu(struct fpu_struct *fpu_reg, unsigned long insn)
 	/*
 	 * If an exception is required, generate a tidy SIGFPE exception.
 	 */
+#if IS_ENABLED(CONFIG_SUPPORT_DENORMAL_ARITHMETIC)
+	if (((fpu_reg->fpcsr << 5) & fpu_reg->fpcsr & FPCSR_mskALLE_NO_UDFE) ||
+	    ((fpu_reg->fpcsr & FPCSR_mskUDF) && (fpu_reg->UDF_trap)))
+#else
 	if ((fpu_reg->fpcsr << 5) & fpu_reg->fpcsr & FPCSR_mskALLE)
+#endif
 		return SIGFPE;
 	return 0;
 }
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 4/5] math-emu/op-2.h: Use statement expressions to prevent negative constant shift
  2018-11-01  7:16 [PATCH v3 0/5] nds32 FPU port Vincent Chen
                   ` (2 preceding siblings ...)
  2018-11-01  7:16 ` [PATCH v3 3/5] nds32: support denormalized result through FP emulator Vincent Chen
@ 2018-11-01  7:17 ` Vincent Chen
  2018-11-01  7:17 ` [PATCH v3 5/5] math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning Vincent Chen
  2018-11-06  9:52 ` [PATCH v3 0/5] nds32 FPU port Greentime Hu
  5 siblings, 0 replies; 7+ messages in thread
From: Vincent Chen @ 2018-11-01  7:17 UTC (permalink / raw)
  To: arnd, linux-kernel; +Cc: green.hu, deanbo422, vincentc

This modification is quoted from glibc 'commit <
sysdeps/unix/sysv/linux/sparc/sparc64/dl-procinfo.c: Moved to>
(fe0b1e854ad32a69b260)'

Signed-off-by: Vincent Chen <vincentc@andestech.com>
---
 include/math-emu/op-2.h |   97 ++++++++++++++++++++++------------------------
 1 files changed, 46 insertions(+), 51 deletions(-)

diff --git a/include/math-emu/op-2.h b/include/math-emu/op-2.h
index 4f26ecc..13a374f 100644
--- a/include/math-emu/op-2.h
+++ b/include/math-emu/op-2.h
@@ -31,61 +31,56 @@
 #define _FP_FRAC_HIGH_2(X)	(X##_f1)
 #define _FP_FRAC_LOW_2(X)	(X##_f0)
 #define _FP_FRAC_WORD_2(X,w)	(X##_f##w)
+#define _FP_FRAC_SLL_2(X, N) (						       \
+	(void) (((N) < _FP_W_TYPE_SIZE)					       \
+	  ? ({								       \
+		if (__builtin_constant_p(N) && (N) == 1) {		       \
+			X##_f1 = X##_f1 + X##_f1 +			       \
+				(((_FP_WS_TYPE) (X##_f0)) < 0);		       \
+			X##_f0 += X##_f0;				       \
+		} else {						       \
+			X##_f1 = X##_f1 << (N) | X##_f0 >>		       \
+						(_FP_W_TYPE_SIZE - (N));       \
+			X##_f0 <<= (N);					       \
+		}							       \
+		0;							       \
+	    })								       \
+	  : ({								       \
+	      X##_f1 = X##_f0 << ((N) - _FP_W_TYPE_SIZE);		       \
+	      X##_f0 = 0;						       \
+	  })))
+
+
+#define _FP_FRAC_SRL_2(X, N) (						       \
+	(void) (((N) < _FP_W_TYPE_SIZE)					       \
+	  ? ({								       \
+	      X##_f0 = X##_f0 >> (N) | X##_f1 << (_FP_W_TYPE_SIZE - (N));      \
+	      X##_f1 >>= (N);						       \
+	    })								       \
+	  : ({								       \
+	      X##_f0 = X##_f1 >> ((N) - _FP_W_TYPE_SIZE);		       \
+	      X##_f1 = 0;						       \
+	    })))
 
-#define _FP_FRAC_SLL_2(X,N)						\
-  do {									\
-    if ((N) < _FP_W_TYPE_SIZE)						\
-      {									\
-	if (__builtin_constant_p(N) && (N) == 1) 			\
-	  {								\
-	    X##_f1 = X##_f1 + X##_f1 + (((_FP_WS_TYPE)(X##_f0)) < 0);	\
-	    X##_f0 += X##_f0;						\
-	  }								\
-	else								\
-	  {								\
-	    X##_f1 = X##_f1 << (N) | X##_f0 >> (_FP_W_TYPE_SIZE - (N));	\
-	    X##_f0 <<= (N);						\
-	  }								\
-      }									\
-    else								\
-      {									\
-	X##_f1 = X##_f0 << ((N) - _FP_W_TYPE_SIZE);			\
-	X##_f0 = 0;							\
-      }									\
-  } while (0)
-
-#define _FP_FRAC_SRL_2(X,N)						\
-  do {									\
-    if ((N) < _FP_W_TYPE_SIZE)						\
-      {									\
-	X##_f0 = X##_f0 >> (N) | X##_f1 << (_FP_W_TYPE_SIZE - (N));	\
-	X##_f1 >>= (N);							\
-      }									\
-    else								\
-      {									\
-	X##_f0 = X##_f1 >> ((N) - _FP_W_TYPE_SIZE);			\
-	X##_f1 = 0;							\
-      }									\
-  } while (0)
 
 /* Right shift with sticky-lsb.  */
-#define _FP_FRAC_SRS_2(X,N,sz)						\
-  do {									\
-    if ((N) < _FP_W_TYPE_SIZE)						\
-      {									\
-	X##_f0 = (X##_f1 << (_FP_W_TYPE_SIZE - (N)) | X##_f0 >> (N) |	\
-		  (__builtin_constant_p(N) && (N) == 1			\
-		   ? X##_f0 & 1						\
-		   : (X##_f0 << (_FP_W_TYPE_SIZE - (N))) != 0));	\
-	X##_f1 >>= (N);							\
-      }									\
-    else								\
-      {									\
-	X##_f0 = (X##_f1 >> ((N) - _FP_W_TYPE_SIZE) |			\
-		(((X##_f1 << (2*_FP_W_TYPE_SIZE - (N))) | X##_f0) != 0)); \
-	X##_f1 = 0;							\
-      }									\
-  } while (0)
+#define _FP_FRAC_SRS_2(X, N, sz) (					       \
+	(void) (((N) < _FP_W_TYPE_SIZE)					       \
+	  ? ({								       \
+	      X##_f0 = (X##_f1 << (_FP_W_TYPE_SIZE - (N)) | X##_f0 >> (N)      \
+			| (__builtin_constant_p(N) && (N) == 1		       \
+			   ? X##_f0 & 1					       \
+			   : (X##_f0 << (_FP_W_TYPE_SIZE - (N))) != 0));       \
+		X##_f1 >>= (N);						       \
+	    })								       \
+	  : ({								       \
+	      X##_f0 = (X##_f1 >> ((N) - _FP_W_TYPE_SIZE)		       \
+			| ((((N) == _FP_W_TYPE_SIZE			       \
+			     ? 0					       \
+			     : (X##_f1 << (2*_FP_W_TYPE_SIZE - (N))))          \
+			    | X##_f0) != 0));				       \
+	      X##_f1 = 0;						       \
+	    })))
 
 #define _FP_FRAC_ADDI_2(X,I)	\
   __FP_FRAC_ADDI_2(X##_f1, X##_f0, I)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 5/5] math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning
  2018-11-01  7:16 [PATCH v3 0/5] nds32 FPU port Vincent Chen
                   ` (3 preceding siblings ...)
  2018-11-01  7:17 ` [PATCH v3 4/5] math-emu/op-2.h: Use statement expressions to prevent negative constant shift Vincent Chen
@ 2018-11-01  7:17 ` Vincent Chen
  2018-11-06  9:52 ` [PATCH v3 0/5] nds32 FPU port Greentime Hu
  5 siblings, 0 replies; 7+ messages in thread
From: Vincent Chen @ 2018-11-01  7:17 UTC (permalink / raw)
  To: arnd, linux-kernel; +Cc: green.hu, deanbo422, vincentc

_FP_ROUND_ZERO is defined as 0 and used as a statemente in macro
_FP_ROUND. This generates "error: statement with no effect
[-Werror=unused-value]" from gcc. Defining _FP_ROUND_ZERO as (void)0 to
fix it.

This modification is quoted from glibc 'commit <In libc/:>
(8ed1e7d5894000c155acbd06f)'

Signed-off-by: Vincent Chen <vincentc@andestech.com>
---
 include/math-emu/soft-fp.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/math-emu/soft-fp.h b/include/math-emu/soft-fp.h
index 3f284bc..5650c16 100644
--- a/include/math-emu/soft-fp.h
+++ b/include/math-emu/soft-fp.h
@@ -138,7 +138,7 @@
       _FP_FRAC_ADDI_##wc(X, _FP_WORK_ROUND);		\
 } while (0)
 
-#define _FP_ROUND_ZERO(wc, X)		0
+#define _FP_ROUND_ZERO(wc, X)		(void)0
 
 #define _FP_ROUND_PINF(wc, X)				\
 do {							\
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 0/5] nds32 FPU port
  2018-11-01  7:16 [PATCH v3 0/5] nds32 FPU port Vincent Chen
                   ` (4 preceding siblings ...)
  2018-11-01  7:17 ` [PATCH v3 5/5] math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning Vincent Chen
@ 2018-11-06  9:52 ` Greentime Hu
  5 siblings, 0 replies; 7+ messages in thread
From: Greentime Hu @ 2018-11-06  9:52 UTC (permalink / raw)
  To: Vincent Chen; +Cc: Arnd Bergmann, Linux Kernel Mailing List, Vincent Chen

Hi Vincent,
Vincent Chen <vincentc@andestech.com> 於 2018年11月1日 週四 下午3:17寫道:
>
>   This patch set contains basic components for supporting the nds32 FPU,
> such as exception handlers and context switch for FPU registers. By
> default, the lazy FPU scheme is supported and the user can configure it
> via CONFIG_LZAY_FPU. In addition, a floating point emulator is required
> to handle all arithmetic of denormalized number because it is not supported
> by the nds32 FPU.
>
>   As mentioned above, the nds32 FPU does not support denormalized number
> This means the denormalized operands and results are not permitted. If an
> instruction contains denormalized operands, the nds32 FPU will raise an
> denormalized input exception to inform kernel to deal with this
> instruction. If the result of the instruction is a denormalized number,
> normally nds32 FPU will treat it as an underflow case and round the result
> to an appropriate value based on current rounding mode. Obviously, there is
> a precision gap for tininess number. To reduce this precision gap, kernel
> will enable the underflow trap by default to direct all underflow cases to
> the floating pointer emulator. By the floating pointer emulator, the
> correct denormalized number can be derived in kernel and return to the user
> program. The feature can be configured by
> CONFIG_SUPPORT_DENORMAL_ARITHMETIC, and if the precision requirement is not
> critical for tininess number, user may disables this feature to keep
> performance.
>
>   The implementation of floating point emulator is based on soft-fp
> which is located in include/math-emu folder. However, soft-fp is too
> outdated to pass the current compiler check. The needed modifications
> for soft-fp are included in this patch set
>
> Changes in v3:
>  - Kernel with FPU support enabled still can run on a CPU without FPU
>  - Rename CONFIG_UNLAZY_FPU to CONFIG_LAYZ_FPU
>  - Rename _switch() to _switch_fpu()
>  - Store FPU context when kernel suspends
>  - Modify the comments in code and patch
>
> Changes in v2:
>  - Remove the initilzation for floating pointer register before entering to
>    signal handler.
>
> Vincent Chen (5):
>   nds32: nds32 FPU port
>   nds32: Support FP emulation
>   nds32: support denormalized result through FP emulator
>   math-emu/op-2.h: Use statement expressions to prevent negative
>     constant shift
>   math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning
>
>  arch/nds32/Kconfig                       |    1 +
>  arch/nds32/Kconfig.cpu                   |   34 +++
>  arch/nds32/Makefile                      |   11 +
>  arch/nds32/include/asm/bitfield.h        |   15 ++
>  arch/nds32/include/asm/elf.h             |   11 +
>  arch/nds32/include/asm/fpu.h             |  126 +++++++++++
>  arch/nds32/include/asm/fpuemu.h          |   32 +++
>  arch/nds32/include/asm/nds32_fpu_inst.h  |  109 +++++++++
>  arch/nds32/include/asm/processor.h       |    7 +
>  arch/nds32/include/asm/sfp-machine.h     |  158 +++++++++++++
>  arch/nds32/include/asm/syscalls.h        |    1 +
>  arch/nds32/include/uapi/asm/auxvec.h     |    7 +
>  arch/nds32/include/uapi/asm/sigcontext.h |   14 ++
>  arch/nds32/include/uapi/asm/udftrap.h    |   13 +
>  arch/nds32/include/uapi/asm/unistd.h     |    2 +
>  arch/nds32/kernel/Makefile               |   10 +
>  arch/nds32/kernel/ex-entry.S             |   24 ++-
>  arch/nds32/kernel/ex-exit.S              |   13 +-
>  arch/nds32/kernel/ex-scall.S             |    8 +-
>  arch/nds32/kernel/fpu.c                  |  269 ++++++++++++++++++++++
>  arch/nds32/kernel/process.c              |   64 +++++-
>  arch/nds32/kernel/setup.c                |   12 +-
>  arch/nds32/kernel/signal.c               |   62 +++++-
>  arch/nds32/kernel/sleep.S                |    2 +
>  arch/nds32/kernel/sys_nds32.c            |   32 +++
>  arch/nds32/kernel/traps.c                |   16 ++
>  arch/nds32/math-emu/Makefile             |    7 +
>  arch/nds32/math-emu/faddd.c              |   24 ++
>  arch/nds32/math-emu/fadds.c              |   24 ++
>  arch/nds32/math-emu/fcmpd.c              |   24 ++
>  arch/nds32/math-emu/fcmps.c              |   24 ++
>  arch/nds32/math-emu/fd2s.c               |   22 ++
>  arch/nds32/math-emu/fdivd.c              |   27 +++
>  arch/nds32/math-emu/fdivs.c              |   26 +++
>  arch/nds32/math-emu/fmuld.c              |   23 ++
>  arch/nds32/math-emu/fmuls.c              |   23 ++
>  arch/nds32/math-emu/fnegd.c              |   21 ++
>  arch/nds32/math-emu/fnegs.c              |   21 ++
>  arch/nds32/math-emu/fpuemu.c             |  357 ++++++++++++++++++++++++++++++
>  arch/nds32/math-emu/fs2d.c               |   23 ++
>  arch/nds32/math-emu/fsqrtd.c             |   21 ++
>  arch/nds32/math-emu/fsqrts.c             |   21 ++
>  arch/nds32/math-emu/fsubd.c              |   27 +++
>  arch/nds32/math-emu/fsubs.c              |   27 +++
>  include/math-emu/op-2.h                  |   97 ++++-----
>  include/math-emu/soft-fp.h               |    2 +-
>  46 files changed, 1827 insertions(+), 67 deletions(-)
>  create mode 100644 arch/nds32/include/asm/fpu.h
>  create mode 100644 arch/nds32/include/asm/fpuemu.h
>  create mode 100644 arch/nds32/include/asm/nds32_fpu_inst.h
>  create mode 100644 arch/nds32/include/asm/sfp-machine.h
>  create mode 100644 arch/nds32/include/uapi/asm/udftrap.h
>  create mode 100644 arch/nds32/kernel/fpu.c
>  create mode 100644 arch/nds32/math-emu/Makefile
>  create mode 100644 arch/nds32/math-emu/faddd.c
>  create mode 100644 arch/nds32/math-emu/fadds.c
>  create mode 100644 arch/nds32/math-emu/fcmpd.c
>  create mode 100644 arch/nds32/math-emu/fcmps.c
>  create mode 100644 arch/nds32/math-emu/fd2s.c
>  create mode 100644 arch/nds32/math-emu/fdivd.c
>  create mode 100644 arch/nds32/math-emu/fdivs.c
>  create mode 100644 arch/nds32/math-emu/fmuld.c
>  create mode 100644 arch/nds32/math-emu/fmuls.c
>  create mode 100644 arch/nds32/math-emu/fnegd.c
>  create mode 100644 arch/nds32/math-emu/fnegs.c
>  create mode 100644 arch/nds32/math-emu/fpuemu.c
>  create mode 100644 arch/nds32/math-emu/fs2d.c
>  create mode 100644 arch/nds32/math-emu/fsqrtd.c
>  create mode 100644 arch/nds32/math-emu/fsqrts.c
>  create mode 100644 arch/nds32/math-emu/fsubd.c
>  create mode 100644 arch/nds32/math-emu/fsubs.c

Thank you.
Acked-by: Greentime Hu <greentime@andestech.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-11-06  9:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01  7:16 [PATCH v3 0/5] nds32 FPU port Vincent Chen
2018-11-01  7:16 ` [PATCH v3 1/5] nds32: " Vincent Chen
2018-11-01  7:16 ` [PATCH v3 2/5] nds32: Support FP emulation Vincent Chen
2018-11-01  7:16 ` [PATCH v3 3/5] nds32: support denormalized result through FP emulator Vincent Chen
2018-11-01  7:17 ` [PATCH v3 4/5] math-emu/op-2.h: Use statement expressions to prevent negative constant shift Vincent Chen
2018-11-01  7:17 ` [PATCH v3 5/5] math-emu/soft-fp.h: (_FP_ROUND_ZERO) cast 0 to void to fix warning Vincent Chen
2018-11-06  9:52 ` [PATCH v3 0/5] nds32 FPU port Greentime Hu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).