All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/8] FP/VEC/VSX switching optimisations
@ 2016-01-15  5:04 Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

Cover-letter for V1 of the series is at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136350.html

Version one of this series used a cmpb instruction in handcrafted assembly
which it turns out is not supported on older power machines. Michael
suggested replacing it with crandc, which instruction works fine. Testing
also showed no difference in performance between using cmpb and crandc.

The primary objective improving the syscall hot path. While gut feelings
may be that avoiding C is quicker it may also be the case that the C is not
significantly slower. If C is not slower using C would provide a distinct
readability and maintainability advantage.
I have benchmarked a few possible scenarios:
1. Always calling into C.
2. Testing for the common case in assembly and calling into C
3. Using crandc in the full assembly check

All benchmarks are the average of 50 runs of Antons context switch
benchmark http://www.ozlabs.org/~anton/junkcode/context_switch2.c with
the kernel and ramdisk run under QEMU/KVM on a POWER8.
To test for all cases a variety of flags were passed to the benchmark to
see the effect of only touching a subset of the 'math' register space.

The absolute numbers are in context switches per second can vary greatly
depending on the how the kernel is run (virt/powernv/ramdisk/disk) and as
such units aren't very relevant here as we're interested in a speedup.
The most interesting number here is the %speedup over the previous
scenario. In this case 100% means there was no difference, therefore <100%
indicates a decrease in performance and >100% an increase.

For 1 - Always calling into C
         Flags |  Average   |  Stddev  |
========================================
          none | 2059785.00 | 14217.64 |
            fp | 1766297.65 | 10576.64 |
    fp altivec | 1636125.04 | 5693.84  |
     fp vector | 1640951.76 | 13141.93 |
       altivec | 1815133.80 | 10450.46 |
altivec vector | 1636438.60 | 5475.12  |
        vector | 1639628.16 | 11456.06 |
           all | 1629516.32 | 7785.36  |



For 2 - Common case checking in asm before calling into C
         Flags |  Average   |  Stddev  | %speedup vs 1 |
========================================================
          none | 2058003.64 | 20464.22 | 99.91         |
            fp | 1757245.80 | 14455.45 | 99.49         |
    fp altivec | 1658240.12 | 6318.41  | 101.35        |
     fp vector | 1668912.96 | 9451.47  | 101.70        |
       altivec | 1815223.96 | 4819.82  | 100.00        |
altivec vector | 1648805.32 | 15100.50 | 100.76        |
        vector | 1663654.68 | 13814.79 | 101.47        |
           all | 1644884.04 | 11315.74 | 100.94        |



For 3 - Full checking in ASM using crandc instead of cmpb
         Flags |  Average   |  Stddev  | %speedup vs 2 |
========================================================
          none | 2066930.52 | 19426.46 | 100.43        |
            fp | 1781653.24 | 7744.55  | 101.39        |
    fp altivec | 1653125.84 | 6727.36  | 99.69         |
     fp vector | 1656011.04 | 11678.56 | 99.23         |
       altivec | 1824934.72 | 16842.19 | 100.53        |
altivec vector | 1649486.92 | 3219.14  | 100.04        |
        vector | 1662420.20 | 9609.34  | 99.93         |
           all | 1647933.64 | 11121.22 | 100.19        |

>From these numbers it appears that reducing the call to C in the common
case is beneficial, possibly up to 1.5% speedup over always calling C. The
benefit of the more complicated asm checking does appear to be very slight,
fractions of a percent at best. In balance it may prove wise to use the
option 2, there are much bigger fish to fry in terms of performance, the
complexity of the assembly for a small fraction of one percent improvement
is not worth it at this stage.

Version 2 of this series also addresses some comments from Mikey Neuling in
the tests such as adding .gitignore and forcing 64 bit compiles of the
tests as they use 64 bit only instructions.


Cyril Bur (8):
  selftests/powerpc: Test the preservation of FPU and VMX regs across
    syscall
  selftests/powerpc: Test preservation of FPU and VMX regs across
    preemption
  selftests/powerpc: Test FPU and VMX regs in signal ucontext
  powerpc: Explicitly disable math features when copying thread
  powerpc: Restore FPU/VEC/VSX if previously used
  powerpc: Add the ability to save FPU without giving it up
  powerpc: Add the ability to save Altivec without giving it up
  powerpc: Add the ability to save VSX without giving it up

 arch/powerpc/include/asm/processor.h               |   2 +
 arch/powerpc/include/asm/switch_to.h               |   5 +-
 arch/powerpc/kernel/asm-offsets.c                  |   2 +
 arch/powerpc/kernel/entry_64.S                     |  21 +-
 arch/powerpc/kernel/fpu.S                          |  25 +--
 arch/powerpc/kernel/ppc_ksyms.c                    |   4 -
 arch/powerpc/kernel/process.c                      | 144 +++++++++++--
 arch/powerpc/kernel/vector.S                       |  45 +---
 tools/testing/selftests/powerpc/Makefile           |   3 +-
 tools/testing/selftests/powerpc/basic_asm.h        |  26 +++
 tools/testing/selftests/powerpc/math/.gitignore    |   6 +
 tools/testing/selftests/powerpc/math/Makefile      |  19 ++
 tools/testing/selftests/powerpc/math/fpu_asm.S     | 195 ++++++++++++++++++
 tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 ++++++++++
 tools/testing/selftests/powerpc/math/fpu_signal.c  | 135 ++++++++++++
 tools/testing/selftests/powerpc/math/fpu_syscall.c |  90 ++++++++
 tools/testing/selftests/powerpc/math/vmx_asm.S     | 229 +++++++++++++++++++++
 tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 ++++++++++
 tools/testing/selftests/powerpc/math/vmx_signal.c  | 138 +++++++++++++
 tools/testing/selftests/powerpc/math/vmx_syscall.c |  92 +++++++++
 20 files changed, 1326 insertions(+), 81 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
 create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/math/Makefile
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c

-- 
2.7.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall
  2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
@ 2016-01-15  5:04 ` Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

Test that the non volatile floating point and Altivec registers get
correctly preserved across the fork() syscall.

fork() works nicely for this purpose, the registers should be the same for
both parent and child

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 tools/testing/selftests/powerpc/Makefile           |   3 +-
 tools/testing/selftests/powerpc/basic_asm.h        |  26 +++
 tools/testing/selftests/powerpc/math/.gitignore    |   2 +
 tools/testing/selftests/powerpc/math/Makefile      |  14 ++
 tools/testing/selftests/powerpc/math/fpu_asm.S     | 161 +++++++++++++++++
 tools/testing/selftests/powerpc/math/fpu_syscall.c |  90 ++++++++++
 tools/testing/selftests/powerpc/math/vmx_asm.S     | 193 +++++++++++++++++++++
 tools/testing/selftests/powerpc/math/vmx_syscall.c |  92 ++++++++++
 8 files changed, 580 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
 create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/math/Makefile
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c

diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile
index 0c2706b..19e8191 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -22,7 +22,8 @@ SUB_DIRS = benchmarks 		\
 	   switch_endian	\
 	   syscalls		\
 	   tm			\
-	   vphn
+	   vphn         \
+	   math
 
 endif
 
diff --git a/tools/testing/selftests/powerpc/basic_asm.h b/tools/testing/selftests/powerpc/basic_asm.h
new file mode 100644
index 0000000..27aca79
--- /dev/null
+++ b/tools/testing/selftests/powerpc/basic_asm.h
@@ -0,0 +1,26 @@
+#include <ppc-asm.h>
+#include <asm/unistd.h>
+
+#define LOAD_REG_IMMEDIATE(reg,expr) \
+	lis	reg,(expr)@highest;	\
+	ori	reg,reg,(expr)@higher;	\
+	rldicr	reg,reg,32,31;	\
+	oris	reg,reg,(expr)@high;	\
+	ori	reg,reg,(expr)@l;
+
+#define PUSH_BASIC_STACK(size) \
+	std	2,24(sp); \
+	mflr	r0; \
+	std	r0,16(sp); \
+	mfcr	r0; \
+	stw	r0,8(sp); \
+	stdu	sp,-size(sp);
+
+#define POP_BASIC_STACK(size) \
+	addi	sp,sp,size; \
+	ld	2,24(sp); \
+	ld	r0,16(sp); \
+	mtlr	r0; \
+	lwz	r0,8(sp); \
+	mtcr	r0; \
+
diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore
new file mode 100644
index 0000000..b19b269
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -0,0 +1,2 @@
+fpu_syscall
+vmx_syscall
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
new file mode 100644
index 0000000..418bef1
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -0,0 +1,14 @@
+TEST_PROGS := fpu_syscall vmx_syscall
+
+all: $(TEST_PROGS)
+
+$(TEST_PROGS): ../harness.c
+$(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
+
+fpu_syscall: fpu_asm.S
+vmx_syscall: vmx_asm.S
+
+include ../../lib.mk
+
+clean:
+	rm -f $(TEST_PROGS) *.o
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S
new file mode 100644
index 0000000..8733874
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -0,0 +1,161 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+
+#define PUSH_FPU(pos) \
+	stfd	f14,pos(sp); \
+	stfd	f15,pos+8(sp); \
+	stfd	f16,pos+16(sp); \
+	stfd	f17,pos+24(sp); \
+	stfd	f18,pos+32(sp); \
+	stfd	f19,pos+40(sp); \
+	stfd	f20,pos+48(sp); \
+	stfd	f21,pos+56(sp); \
+	stfd	f22,pos+64(sp); \
+	stfd	f23,pos+72(sp); \
+	stfd	f24,pos+80(sp); \
+	stfd	f25,pos+88(sp); \
+	stfd	f26,pos+96(sp); \
+	stfd	f27,pos+104(sp); \
+	stfd	f28,pos+112(sp); \
+	stfd	f29,pos+120(sp); \
+	stfd	f30,pos+128(sp); \
+	stfd	f31,pos+136(sp);
+
+#define POP_FPU(pos) \
+	lfd	f14,pos(sp); \
+	lfd	f15,pos+8(sp); \
+	lfd	f16,pos+16(sp); \
+	lfd	f17,pos+24(sp); \
+	lfd	f18,pos+32(sp); \
+	lfd	f19,pos+40(sp); \
+	lfd	f20,pos+48(sp); \
+	lfd	f21,pos+56(sp); \
+	lfd	f22,pos+64(sp); \
+	lfd	f23,pos+72(sp); \
+	lfd	f24,pos+80(sp); \
+	lfd	f25,pos+88(sp); \
+	lfd	f26,pos+96(sp); \
+	lfd	f27,pos+104(sp); \
+	lfd	f28,pos+112(sp); \
+	lfd	f29,pos+120(sp); \
+	lfd	f30,pos+128(sp); \
+	lfd	f31,pos+136(sp);
+
+#Careful calling this, it will 'clobber' fpu (by design)
+#Don't call this from C
+FUNC_START(load_fpu)
+	lfd	f14,0(r3)
+	lfd	f15,8(r3)
+	lfd	f16,16(r3)
+	lfd	f17,24(r3)
+	lfd	f18,32(r3)
+	lfd	f19,40(r3)
+	lfd	f20,48(r3)
+	lfd	f21,56(r3)
+	lfd	f22,64(r3)
+	lfd	f23,72(r3)
+	lfd	f24,80(r3)
+	lfd	f25,88(r3)
+	lfd	f26,96(r3)
+	lfd	f27,104(r3)
+	lfd	f28,112(r3)
+	lfd	f29,120(r3)
+	lfd	f30,128(r3)
+	lfd	f31,136(r3)
+	blr
+FUNC_END(load_fpu)
+
+FUNC_START(check_fpu)
+	mr r4,r3
+	li	r3,1 #assume a bad result
+	lfd	f0,0(r4)
+	fcmpu	cr1,f0,f14
+	bne	cr1,1f
+	lfd	f0,8(r4)
+	fcmpu	cr1,f0,f15
+	bne	cr1,1f
+	lfd	f0,16(r4)
+	fcmpu	cr1,f0,f16
+	bne	cr1,1f
+	lfd	f0,24(r4)
+	fcmpu	cr1,f0,f17
+	bne	cr1,1f
+	lfd	f0,32(r4)
+	fcmpu	cr1,f0,f18
+	bne	cr1,1f
+	lfd	f0,40(r4)
+	fcmpu	cr1,f0,f19
+	bne	cr1,1f
+	lfd	f0,48(r4)
+	fcmpu	cr1,f0,f20
+	bne	cr1,1f
+	lfd	f0,56(r4)
+	fcmpu	cr1,f0,f21
+	bne	cr1,1f
+	lfd	f0,64(r4)
+	fcmpu	cr1,f0,f22
+	bne	cr1,1f
+	lfd	f0,72(r4)
+	fcmpu	cr1,f0,f23
+	bne	cr1,1f
+	lfd	f0,80(r4)
+	fcmpu	cr1,f0,f24
+	bne	cr1,1f
+	lfd	f0,88(r4)
+	fcmpu	cr1,f0,f25
+	bne	cr1,1f
+	lfd	f0,96(r4)
+	fcmpu	cr1,f0,f26
+	bne	cr1,1f
+	lfd	f0,104(r4)
+	fcmpu	cr1,f0,f27
+	bne	cr1,1f
+	lfd	f0,112(r4)
+	fcmpu	cr1,f0,f28
+	bne	cr1,1f
+	lfd	f0,120(r4)
+	fcmpu	cr1,f0,f29
+	bne	cr1,1f
+	lfd	f0,128(r4)
+	fcmpu	cr1,f0,f30
+	bne	cr1,1f
+	lfd	f0,136(r4)
+	fcmpu	cr1,f0,f31
+	bne	cr1,1f
+	li	r3,0 #Sucess!!!
+1:	blr
+
+FUNC_START(test_fpu)
+	#r3 holds pointer to where to put the result of fork
+	#r4 holds pointer to the pid
+	#f14-f31 are non volatiles
+	PUSH_BASIC_STACK(256)
+	std	r3,40(sp) #Address of darray
+	std r4,48(sp) #Address of pid
+	PUSH_FPU(56)
+
+	bl load_fpu
+	nop
+	li	r0,__NR_fork
+	sc
+
+	#pass the result of the fork to the caller
+	ld	r9,48(sp)
+	std	r3,0(r9)
+
+	ld r3,40(sp)
+	bl check_fpu
+	nop
+
+	POP_FPU(56)
+	POP_BASIC_STACK(256)
+	blr
+FUNC_END(test_fpu)
diff --git a/tools/testing/selftests/powerpc/math/fpu_syscall.c b/tools/testing/selftests/powerpc/math/fpu_syscall.c
new file mode 100644
index 0000000..949e672
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_syscall.c
@@ -0,0 +1,90 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers change across a syscall (fork).
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+
+#include "utils.h"
+
+extern int test_fpu(double *darray, pid_t *pid);
+
+double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+		     1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+		     2.1};
+
+int syscall_fpu(void)
+{
+	pid_t fork_pid;
+	int i;
+	int ret;
+	int child_ret;
+	for (i = 0; i < 1000; i++) {
+		/* test_fpu will fork() */
+		ret = test_fpu(darray, &fork_pid);
+		if (fork_pid == -1)
+			return -1;
+		if (fork_pid == 0)
+			exit(ret);
+		waitpid(fork_pid, &child_ret, 0);
+		if (ret || child_ret)
+			return 1;
+	}
+
+	return 0;
+}
+
+int test_syscall_fpu(void)
+{
+	/*
+	 * Setup an environment with much context switching
+	 */
+	pid_t pid2;
+	pid_t pid = fork();
+	int ret;
+	int child_ret;
+	FAIL_IF(pid == -1);
+
+	pid2 = fork();
+	/* Can't FAIL_IF(pid2 == -1); because already forked once */
+	if (pid2 == -1) {
+		/*
+		 * Couldn't fork, ensure test is a fail
+		 */
+		child_ret = ret = 1;
+	} else {
+		ret = syscall_fpu();
+		if (pid2)
+			waitpid(pid2, &child_ret, 0);
+		else
+			exit(ret);
+	}
+
+	ret |= child_ret;
+
+	if (pid)
+		waitpid(pid, &child_ret, 0);
+	else
+		exit(ret);
+
+	FAIL_IF(ret || child_ret);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	return test_harness(test_syscall_fpu, "syscall_fpu");
+
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S
new file mode 100644
index 0000000..9ed32e7
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
@@ -0,0 +1,193 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+
+#define PUSH_VMX(pos,reg) \
+	li	reg,pos; \
+	stvx	v20,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v21,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v22,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v23,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v24,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v25,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v26,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v27,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v28,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v29,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v30,reg,sp; \
+	addi	reg,reg,16; \
+	stvx	v31,reg,sp;
+
+#define POP_VMX(pos,reg) \
+	li	reg,pos; \
+	lvx	v20,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v21,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v22,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v23,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v24,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v25,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v26,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v27,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v28,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v29,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v30,reg,sp; \
+	addi	reg,reg,16; \
+	lvx	v31,reg,sp;
+
+#Carefull this will 'clobber' vmx (by design)
+#Don't call this from C
+FUNC_START(load_vmx)
+	li	r5,0
+	lvx	v20,r5,r3
+	addi	r5,r5,16
+	lvx	v21,r5,r3
+	addi	r5,r5,16
+	lvx	v22,r5,r3
+	addi	r5,r5,16
+	lvx	v23,r5,r3
+	addi	r5,r5,16
+	lvx	v24,r5,r3
+	addi	r5,r5,16
+	lvx	v25,r5,r3
+	addi	r5,r5,16
+	lvx	v26,r5,r3
+	addi	r5,r5,16
+	lvx	v27,r5,r3
+	addi	r5,r5,16
+	lvx	v28,r5,r3
+	addi	r5,r5,16
+	lvx	v29,r5,r3
+	addi	r5,r5,16
+	lvx	v30,r5,r3
+	addi	r5,r5,16
+	lvx	v31,r5,r3
+	blr
+FUNC_END(load_vmx)
+
+#Should be safe from C, only touches r4, r5 and v0,v1,v2
+FUNC_START(check_vmx)
+	PUSH_BASIC_STACK(16)
+	mr r4,r3
+	li	r3,1 #assume a bad result
+	li	r5,0
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v20
+	vmr	v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v21
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v22
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v23
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v24
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v25
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v26
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v27
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v28
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v29
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v30
+	vand	v2,v2,v1
+
+	addi	r5,r5,16
+	lvx	v0,r5,r4
+	vcmpequd.	v1,v0,v31
+	vand	v2,v2,v1
+
+	li r5,0
+	stvx	v2,r5,sp
+	ldx	r0,r5,sp
+	cmpdi	r0,0xffffffff
+	bne	1f
+	li	r3,0
+1:	POP_BASIC_STACK(16)
+	blr
+FUNC_END(check_vmx)
+
+#Safe from C
+FUNC_START(test_vmx)
+	#r3 holds pointer to where to put the result of fork
+	#r4 holds pointer to the pid
+	#v20-v31 are non-volatile
+	PUSH_BASIC_STACK(512)
+	std	r3,40(sp) #Address of varray
+	std r4,48(sp) #address of pid
+	PUSH_VMX(56, r4)
+
+	bl load_vmx
+
+	li	r0,__NR_fork
+	sc
+	#Pass the result of fork back to the caller
+	ld	r9,48(sp)
+	std	r3,0(r9)
+
+	ld r3,40(sp)
+	bl check_vmx
+
+	POP_VMX(56,r4)
+	POP_BASIC_STACK(512)
+	blr
+FUNC_END(test_vmx)
diff --git a/tools/testing/selftests/powerpc/math/vmx_syscall.c b/tools/testing/selftests/powerpc/math/vmx_syscall.c
new file mode 100644
index 0000000..6989d96
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_syscall.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the VMX registers change across a syscall (fork).
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "utils.h"
+
+typedef int v4si __attribute__ ((vector_size (16)));
+v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+	{13,14,15,16},{17,18,19,20},{21,22,23,24},
+	{25,26,27,28},{29,30,31,32},{33,34,35,36},
+	{37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+extern int test_vmx(v4si *varray, pid_t *pid);
+
+int vmx_syscall(void)
+{
+	pid_t fork_pid;
+	int i;
+	int ret;
+	int child_ret;
+	for (i = 0; i < 1000; i++) {
+		/* test_vmx will fork() */
+		ret = test_vmx(varray, &fork_pid);
+		if (fork_pid == -1)
+			return -1;
+		if (fork_pid == 0)
+			exit(ret);
+		waitpid(fork_pid, &child_ret, 0);
+		if (ret || child_ret)
+			return 1;
+	}
+
+	return 0;
+}
+
+int test_vmx_syscall(void)
+{
+	/*
+	 * Setup an environment with much context switching
+	 */
+	pid_t pid2;
+	pid_t pid = fork();
+	int ret;
+	int child_ret;
+	FAIL_IF(pid == -1);
+
+	pid2 = fork();
+	ret = vmx_syscall();
+	/* Can't FAIL_IF(pid2 == -1); because we've already forked */
+	if (pid2 == -1) {
+		/*
+		 * Couldn't fork, ensure child_ret is set and is a fail
+		 */
+		ret = child_ret = 1;
+	} else {
+		if (pid2)
+			waitpid(pid2, &child_ret, 0);
+		else
+			exit(ret);
+	}
+
+	ret |= child_ret;
+
+	if (pid)
+		waitpid(pid, &child_ret, 0);
+	else
+		exit(ret);
+
+	FAIL_IF(ret || child_ret);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	return test_harness(test_vmx_syscall, "vmx_syscall");
+
+}
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH V2 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption
  2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
@ 2016-01-15  5:04 ` Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

Loop in assembly checking the registers with many threads.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 tools/testing/selftests/powerpc/math/.gitignore    |   2 +
 tools/testing/selftests/powerpc/math/Makefile      |   5 +-
 tools/testing/selftests/powerpc/math/fpu_asm.S     |  34 +++++++
 tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 +++++++++++++++++++++
 tools/testing/selftests/powerpc/math/vmx_asm.S     |  44 +++++++-
 tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 +++++++++++++++++++++
 6 files changed, 306 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c

diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore
index b19b269..1a6f09e 100644
--- a/tools/testing/selftests/powerpc/math/.gitignore
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -1,2 +1,4 @@
 fpu_syscall
 vmx_syscall
+fpu_preempt
+vmx_preempt
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
index 418bef1..b6f4158 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall vmx_syscall
+TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
 
 all: $(TEST_PROGS)
 
@@ -6,7 +6,10 @@ $(TEST_PROGS): ../harness.c
 $(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
 
 fpu_syscall: fpu_asm.S
+fpu_preempt: fpu_asm.S
+
 vmx_syscall: vmx_asm.S
+vmx_preempt: vmx_asm.S
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S
index 8733874..46bbe99 100644
--- a/tools/testing/selftests/powerpc/math/fpu_asm.S
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -159,3 +159,37 @@ FUNC_START(test_fpu)
 	POP_BASIC_STACK(256)
 	blr
 FUNC_END(test_fpu)
+
+#int preempt_fpu(double *darray, int *threads_running, int *running)
+#On starting will (atomically) decrement not_ready as a signal that the FPU
+#has been loaded with darray. Will proceed to check the validity of the FPU
+#registers while running is not zero.
+FUNC_START(preempt_fpu)
+	PUSH_BASIC_STACK(256)
+	std r3,32(sp) #double *darray
+	std r4,40(sp) #volatile int *not_ready
+	std r5,48(sp) #int *running
+	PUSH_FPU(56)
+
+	bl load_fpu
+
+	#Atomic DEC
+	ld r3,40(sp)
+1:	lwarx r4,0,r3
+	addi r4,r4,-1
+	stwcx. r4,0,r3
+	bne- 1b
+
+2:	ld r3, 32(sp)
+	bl check_fpu
+	cmpdi r3,0
+	bne 3f
+	ld r4, 48(sp)
+	ld r5, 0(r4)
+	cmpwi r5,0
+	bne 2b
+
+3:	POP_FPU(56)
+	POP_BASIC_STACK(256)
+	blr
+FUNC_END(preempt_fpu)
diff --git a/tools/testing/selftests/powerpc/math/fpu_preempt.c b/tools/testing/selftests/powerpc/math/fpu_preempt.c
new file mode 100644
index 0000000..0f85b79
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_preempt.c
@@ -0,0 +1,113 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers change across preemption.
+ * Two things should be noted here a) The check_fpu function in asm only checks
+ * the non volatile registers as it is reused from the syscall test b) There is
+ * no way to be sure preemption happened so this test just uses many threads
+ * and a long wait. As such, a successful test doesn't mean much but a failure
+ * is bad.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+		     1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+		     2.1};
+
+int threads_starting;
+int running;
+
+extern void preempt_fpu(double *darray, int *threads_starting, int *running);
+
+void *preempt_fpu_c(void *p)
+{
+	int i;
+	srand(pthread_self());
+	for (i = 0; i < 21; i++)
+		darray[i] = rand();
+
+	/* Test failed if it ever returns */
+	preempt_fpu(darray, &threads_starting, &running);
+
+	return p;
+}
+
+int test_preempt_fpu(void)
+{
+	int i, rc, threads;
+	pthread_t *tids;
+
+	threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+	tids = malloc((threads) * sizeof(pthread_t));
+	FAIL_IF(!tids);
+
+	running = true;
+	threads_starting = threads;
+	for (i = 0; i < threads; i++) {
+		rc = pthread_create(&tids[i], NULL, preempt_fpu_c, NULL);
+		FAIL_IF(rc);
+	}
+
+	setbuf(stdout, NULL);
+	/* Not really necessary but nice to wait for every thread to start */
+	printf("\tWaiting for all workers to start...");
+	while(threads_starting)
+		asm volatile("": : :"memory");
+	printf("done\n");
+
+	printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
+	sleep(PREEMPT_TIME);
+	printf("done\n");
+
+	printf("\tStopping workers...");
+	/*
+	 * Working are checking this value every loop. In preempt_fpu 'cmpwi r5,0; bne 2b'.
+	 * r5 will have loaded the value of running.
+	 */
+	running = 0;
+	for (i = 0; i < threads; i++) {
+		void *rc_p;
+		pthread_join(tids[i], &rc_p);
+
+		/*
+		 * Harness will say the fail was here, look at why preempt_fpu
+		 * returned
+		 */
+		if ((long) rc_p)
+			printf("oops\n");
+		FAIL_IF((long) rc_p);
+	}
+	printf("done\n");
+
+	free(tids);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	return test_harness(test_preempt_fpu, "fpu_preempt");
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S
index 9ed32e7..416b5b0 100644
--- a/tools/testing/selftests/powerpc/math/vmx_asm.S
+++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
@@ -9,6 +9,7 @@
 
 #include "../basic_asm.h"
 
+#POS MUST BE 16 ALIGNED!
 #define PUSH_VMX(pos,reg) \
 	li	reg,pos; \
 	stvx	v20,reg,sp; \
@@ -35,6 +36,7 @@
 	addi	reg,reg,16; \
 	stvx	v31,reg,sp;
 
+#POS MUST BE 16 ALIGNED!
 #define POP_VMX(pos,reg) \
 	li	reg,pos; \
 	lvx	v20,reg,sp; \
@@ -93,7 +95,7 @@ FUNC_END(load_vmx)
 
 #Should be safe from C, only touches r4, r5 and v0,v1,v2
 FUNC_START(check_vmx)
-	PUSH_BASIC_STACK(16)
+	PUSH_BASIC_STACK(32)
 	mr r4,r3
 	li	r3,1 #assume a bad result
 	li	r5,0
@@ -162,7 +164,7 @@ FUNC_START(check_vmx)
 	cmpdi	r0,0xffffffff
 	bne	1f
 	li	r3,0
-1:	POP_BASIC_STACK(16)
+1:	POP_BASIC_STACK(32)
 	blr
 FUNC_END(check_vmx)
 
@@ -174,7 +176,7 @@ FUNC_START(test_vmx)
 	PUSH_BASIC_STACK(512)
 	std	r3,40(sp) #Address of varray
 	std r4,48(sp) #address of pid
-	PUSH_VMX(56, r4)
+	PUSH_VMX(64, r4)
 
 	bl load_vmx
 
@@ -187,7 +189,41 @@ FUNC_START(test_vmx)
 	ld r3,40(sp)
 	bl check_vmx
 
-	POP_VMX(56,r4)
+	POP_VMX(64,r4)
 	POP_BASIC_STACK(512)
 	blr
 FUNC_END(test_vmx)
+
+#int preempt_vmx(v4si *varray, int *threads_starting, int *running)
+#On starting will (atomically) decrement not_ready as a signal that the FPU
+#has been loaded with varray. Will proceed to check the validity of the FPU
+#registers while running is not zero.
+FUNC_START(preempt_vmx)
+	PUSH_BASIC_STACK(512)
+	std r3,32(sp) #v4si *varray
+	std r4,40(sp) #volatile int *not_ready
+	std r5,48(sp) #int *running
+	PUSH_VMX(64,r4)
+
+	bl load_vmx
+
+	#Atomic DEC
+	ld r3,40(sp)
+1:	lwarx r4,0,r3
+	addi r4,r4,-1
+	stwcx. r4,0,r3
+	bne- 1b
+
+2:	ld r3,32(sp)
+	bl check_vmx
+	cmpdi r3,0
+	bne 3f
+	ld r4,48(sp)
+	ld r5,0(r4)
+	cmpwi r5,0
+	bne 2b
+
+3:	POP_VMX(64,r4)
+	POP_BASIC_STACK(512)
+	blr
+FUNC_END(preempt_vmx)
diff --git a/tools/testing/selftests/powerpc/math/vmx_preempt.c b/tools/testing/selftests/powerpc/math/vmx_preempt.c
new file mode 100644
index 0000000..9474e78
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_preempt.c
@@ -0,0 +1,113 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the VMX registers change across preemption.
+ * Two things should be noted here a) The check_vmx function in asm only checks
+ * the non volatile registers as it is reused from the syscall test b) There is
+ * no way to be sure preemption happened so this test just uses many threads
+ * and a long wait. As such, a successful test doesn't mean much but a failure
+ * is bad.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+typedef int v4si __attribute__ ((vector_size (16)));
+__thread v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+	{13,14,15,16},{17,18,19,20},{21,22,23,24},
+	{25,26,27,28},{29,30,31,32},{33,34,35,36},
+	{37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+int threads_starting;
+int running;
+
+extern void preempt_vmx(v4si *varray, int *threads_starting, int *running);
+
+void *preempt_vmx_c(void *p)
+{
+	int i, j;
+	srand(pthread_self());
+	for (i = 0; i < 12; i++)
+		for (j = 0; j < 4; j++)
+			varray[i][j] = rand();
+
+	/* Test fails if it ever returns */
+	preempt_vmx(varray, &threads_starting, &running);
+	return p;
+}
+
+int test_preempt_vmx(void)
+{
+	int i, rc, threads;
+	pthread_t *tids;
+
+	threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+	tids = malloc(threads * sizeof(pthread_t));
+	FAIL_IF(!tids);
+
+	running = true;
+	threads_starting = threads;
+	for (i = 0; i < threads; i++) {
+		rc = pthread_create(&tids[i], NULL, preempt_vmx_c, NULL);
+		FAIL_IF(rc);
+	}
+
+	setbuf(stdout, NULL);
+	/* Not really nessesary but nice to wait for every thread to start */
+	printf("\tWaiting for all workers to start...");
+	while(threads_starting)
+		asm volatile("": : :"memory");
+	printf("done\n");
+
+	printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
+	sleep(PREEMPT_TIME);
+	printf("done\n");
+
+	printf("\tStopping workers...");
+	/*
+	 * Working are checking this value every loop. In preempt_vmx 'cmpwi r5,0; bne 2b'.
+	 * r5 will have loaded the value of running.
+	 */
+	running = 0;
+	for (i = 0; i < threads; i++) {
+		void *rc_p;
+		pthread_join(tids[i], &rc_p);
+
+		/*
+		 * Harness will say the fail was here, look at why preempt_vmx
+		 * returned
+		 */
+		if ((long) rc_p)
+			printf("oops\n");
+		FAIL_IF((long) rc_p);
+	}
+	printf("done\n");
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	return test_harness(test_preempt_vmx, "vmx_preempt");
+}
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH V2 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext
  2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
@ 2016-01-15  5:04 ` Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

Load up the non volatile FPU and VMX regs and ensure that they are the
expected value in a signal handler

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 tools/testing/selftests/powerpc/math/.gitignore   |   2 +
 tools/testing/selftests/powerpc/math/Makefile     |   4 +-
 tools/testing/selftests/powerpc/math/fpu_signal.c | 135 +++++++++++++++++++++
 tools/testing/selftests/powerpc/math/vmx_signal.c | 138 ++++++++++++++++++++++
 4 files changed, 278 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
 create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c

diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore
index 1a6f09e..4fe13a4 100644
--- a/tools/testing/selftests/powerpc/math/.gitignore
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -2,3 +2,5 @@ fpu_syscall
 vmx_syscall
 fpu_preempt
 vmx_preempt
+fpu_signal
+vmx_signal
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
index b6f4158..5b88875 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
+TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt vmx_signal
 
 all: $(TEST_PROGS)
 
@@ -7,9 +7,11 @@ $(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
 
 fpu_syscall: fpu_asm.S
 fpu_preempt: fpu_asm.S
+fpu_signal:  fpu_asm.S
 
 vmx_syscall: vmx_asm.S
 vmx_preempt: vmx_asm.S
+vmx_signal: vmx_asm.S
 
 include ../../lib.mk
 
diff --git a/tools/testing/selftests/powerpc/math/fpu_signal.c b/tools/testing/selftests/powerpc/math/fpu_signal.c
new file mode 100644
index 0000000..888aa51
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_signal.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers are correctly reported in a
+ * signal context. Each worker just spins checking its FPU registers, at some
+ * point a signal will interrupt it and C code will check the signal context
+ * ensuring it is also the same.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Number of times each thread should receive the signal */
+#define ITERATIONS 10
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+		     1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+		     2.1};
+
+bool bad_context;
+int threads_starting;
+int running;
+
+extern long preempt_fpu(double *darray, int *threads_starting, int *running);
+
+void signal_fpu_sig(int sig, siginfo_t *info, void *context)
+{
+	int i;
+	ucontext_t *uc = context;
+	mcontext_t *mc = &uc->uc_mcontext;
+
+	/* Only the non volatiles were loaded up */
+	for (i = 14; i < 32; i++) {
+		if (mc->fp_regs[i] != darray[i - 14]) {
+			bad_context = true;
+			break;
+		}
+	}
+}
+
+void *signal_fpu_c(void *p)
+{
+	int i;
+	long rc;
+	struct sigaction act;
+	act.sa_sigaction = signal_fpu_sig;
+	act.sa_flags = SA_SIGINFO;
+	rc = sigaction(SIGUSR1, &act, NULL);
+	if (rc)
+		return p;
+
+	srand(pthread_self());
+	for (i = 0; i < 21; i++)
+		darray[i] = rand();
+
+	rc = preempt_fpu(darray, &threads_starting, &running);
+
+	return (void *) rc;
+}
+
+int test_signal_fpu(void)
+{
+	int i, j, rc, threads;
+	void *rc_p;
+	pthread_t *tids;
+
+	threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+	tids = malloc(threads * sizeof(pthread_t));
+	FAIL_IF(!tids);
+
+	running = true;
+	threads_starting = threads;
+	for (i = 0; i < threads; i++) {
+		rc = pthread_create(&tids[i], NULL, signal_fpu_c, NULL);
+		FAIL_IF(rc);
+	}
+
+	setbuf(stdout, NULL);
+	printf("\tWaiting for all workers to start...");
+	while (threads_starting)
+		asm volatile("": : :"memory");
+	printf("done\n");
+
+	printf("\tSending signals to all threads %d times...", ITERATIONS);
+	for (i = 0; i < ITERATIONS; i++) {
+		for (j = 0; j < threads; j++) {
+			pthread_kill(tids[j], SIGUSR1);
+		}
+		sleep(1);
+	}
+	printf("done\n");
+
+	printf("\tStopping workers...");
+	running = 0;
+	for (i = 0; i < threads; i++) {
+		pthread_join(tids[i], &rc_p);
+
+		/*
+		 * Harness will say the fail was here, look at why signal_fpu
+		 * returned
+		 */
+		if ((long) rc_p || bad_context)
+			printf("oops\n");
+		if (bad_context)
+			fprintf(stderr, "\t!! bad_context is true\n");
+		FAIL_IF((long) rc_p || bad_context);
+	}
+	printf("done\n");
+
+	free(tids);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	return test_harness(test_signal_fpu, "fpu_signal");
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_signal.c b/tools/testing/selftests/powerpc/math/vmx_signal.c
new file mode 100644
index 0000000..093af5a
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_signal.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the VMX registers are correctly reported in a
+ * signal context. Each worker just spins checking its VMX registers, at some
+ * point a signal will interrupt it and C code will check the signal context
+ * ensuring it is also the same.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Number of times each thread should receive the signal */
+#define ITERATIONS 10
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+typedef int v4si __attribute__ ((vector_size (16)));
+__thread v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+	{13,14,15,16},{17,18,19,20},{21,22,23,24},
+	{25,26,27,28},{29,30,31,32},{33,34,35,36},
+	{37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+bool bad_context;
+int running;
+int threads_starting;
+
+extern int preempt_vmx(v4si *varray, volatile int *not_ready, int *sentinal);
+
+void signal_vmx_sig(int sig, siginfo_t *info, void *context)
+{
+	int i;
+	ucontext_t *uc = context;
+	mcontext_t *mc = &uc->uc_mcontext;
+
+	/* Only the non volatiles were loaded up */
+	for (i = 20; i < 32; i++) {
+		if (memcmp(mc->v_regs->vrregs[i], &varray[i - 20], 16)) {
+			bad_context = true;
+			break;
+		}
+	}
+}
+
+void *signal_vmx_c(void *p)
+{
+	int i, j;
+	long rc;
+	struct sigaction act;
+	act.sa_sigaction = signal_vmx_sig;
+	act.sa_flags = SA_SIGINFO;
+	rc = sigaction(SIGUSR1, &act, NULL);
+	if (rc)
+		return p;
+
+	srand(pthread_self());
+	for (i = 0; i < 12; i++)
+		for (j = 0; j < 4; j++)
+			varray[i][j] = rand();
+
+	rc = preempt_vmx(varray, &not_ready, &running);
+
+	return (void *) rc;
+}
+
+int test_signal_vmx(void)
+{
+	int i, j, rc, threads;
+	void *rc_p;
+	pthread_t *tids;
+
+	threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+	tids = malloc(threads * sizeof(pthread_t));
+	FAIL_IF(!tids);
+
+	running = true;
+	not_ready = threads;
+	for (i = 0; i < threads; i++) {
+		rc = pthread_create(&tids[i], NULL, signal_vmx_c, NULL);
+		FAIL_IF(rc);
+	}
+
+	setbuf(stdout, NULL);
+	printf("\tWaiting for all workers to start...");
+	while (not_ready);
+	printf("done\n");
+
+	printf("\tSending signals to all threads %d times...", ITERATIONS);
+	for (i = 0; i < ITERATIONS; i++) {
+		for (j = 0; j < threads; j++) {
+			pthread_kill(tids[j], SIGUSR1);
+		}
+		sleep(1);
+	}
+	printf("done\n");
+
+	printf("\tKilling workers...");
+	running = 0;
+	for (i = 0; i < threads; i++) {
+		pthread_join(tids[i], &rc_p);
+
+		/*
+		 * Harness will say the fail was here, look at why signal_vmx
+		 * returned
+		 */
+		if ((long) rc_p || bad_context)
+			printf("oops\n");
+		if (bad_context)
+			fprintf(stderr, "\t!! bad_context is true\n");
+		FAIL_IF((long) rc_p || bad_context);
+	}
+	printf("done\n");
+
+	free(tids);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	return test_harness(test_signal_vmx, "vmx_signal");
+}
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread
  2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
                   ` (2 preceding siblings ...)
  2016-01-15  5:04 ` [PATCH V2 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
@ 2016-01-15  5:04 ` Cyril Bur
  2016-01-15  5:42   ` Michael Neuling
  2016-01-15  5:04 ` [PATCH V2 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

With threads leaving the math bits enabled in their saved MSR to indicate
that the hardware is hot and a restore is not needed, children need to turn
it off as when they do get scheduled, there's no way their registers could
have been hot.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/kernel/process.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dccc87e..e0c3d2d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
 
 		f = ret_from_fork;
 	}
+	childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
 	sp -= STACK_FRAME_OVERHEAD;
 
 	/*
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH V2 5/8] powerpc: Restore FPU/VEC/VSX if previously used
  2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
                   ` (3 preceding siblings ...)
  2016-01-15  5:04 ` [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
@ 2016-01-15  5:04 ` Cyril Bur
  2016-01-15  6:02   ` Michael Neuling
  2016-01-15  5:04 ` [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a
problem unless a process is using these facilities.

Modern versions of GCC are very good at automatically vectorising code, new
and modernised workloads make use of floating point and vector facilities,
even the kernel makes use of vectorised memcpy.

All this combined greatly increases the cost of a syscall since the kernel
uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking all three.

The obvious overcompensation to this problem is to simply always load all
the facilities on every exit to userspace. Loading up all FPU, VEC and VSX
registers every time can be expensive and if a workload does avoid using
them, it should not be forced to incur this penalty.

An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back to
zero.

Several versions of the assembly in entry_64.S. 1. Always calling C, 2.
Performing a common case check and then calling C and 3. A complex check in
asm. After some benchmarking it was determined that avoiding C in the
common case is a performance benefit. The full check in asm greatly
complicated that codepath for a negligible performance gain and the
trade-off was deemed not worth it.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/include/asm/processor.h |  2 ++
 arch/powerpc/kernel/asm-offsets.c    |  2 ++
 arch/powerpc/kernel/entry_64.S       | 21 ++++++++++--
 arch/powerpc/kernel/fpu.S            |  4 +++
 arch/powerpc/kernel/process.c        | 66 ++++++++++++++++++++++++++++++------
 arch/powerpc/kernel/vector.S         |  4 +++
 6 files changed, 85 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ac23308..dcab21f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -236,11 +236,13 @@ struct thread_struct {
 #endif
 	struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */
 	unsigned long	trap_nr;	/* last trap # on this thread */
+	u8 load_fp;
 #ifdef CONFIG_ALTIVEC
 	struct thread_vr_state vr_state;
 	struct thread_vr_state *vr_save_area;
 	unsigned long	vrsave;
 	int		used_vr;	/* set if process has used altivec */
+	u8 load_vec;
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
 	/* VSR status */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 07cebc3..10d5eab 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -95,12 +95,14 @@ int main(void)
 	DEFINE(THREAD_FPSTATE, offsetof(struct thread_struct, fp_state));
 	DEFINE(THREAD_FPSAVEAREA, offsetof(struct thread_struct, fp_save_area));
 	DEFINE(FPSTATE_FPSCR, offsetof(struct thread_fp_state, fpscr));
+	DEFINE(THREAD_LOAD_FP, offsetof(struct thread_struct, load_fp));
 #ifdef CONFIG_ALTIVEC
 	DEFINE(THREAD_VRSTATE, offsetof(struct thread_struct, vr_state));
 	DEFINE(THREAD_VRSAVEAREA, offsetof(struct thread_struct, vr_save_area));
 	DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
 	DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
 	DEFINE(VRSTATE_VSCR, offsetof(struct thread_vr_state, vscr));
+	DEFINE(THREAD_LOAD_VEC, offsetof(struct thread_struct, load_vec));
 #endif /* CONFIG_ALTIVEC */
 #ifdef CONFIG_VSX
 	DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0d525ce..038e0a1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -210,7 +210,20 @@ system_call:			/* label this so stack traces look sane */
 	li	r11,-MAX_ERRNO
 	andi.	r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
 	bne-	syscall_exit_work
-	cmpld	r3,r11
+
+	andi.	r0,r8,MSR_FP
+	beq 2f
+#ifdef CONFIG_ALTIVEC
+	andis.	r0,r8,MSR_VEC@h
+	bne	3f
+#endif
+2:	addi    r3,r1,STACK_FRAME_OVERHEAD
+	bl	restore_math
+	ld	r8,_MSR(r1)
+	ld	r3,RESULT(r1)
+	li	r11,-MAX_ERRNO
+
+3:	cmpld	r3,r11
 	ld	r5,_CCR(r1)
 	bge-	syscall_error
 .Lsyscall_error_cont:
@@ -602,8 +615,8 @@ _GLOBAL(ret_from_except_lite)
 
 	/* Check current_thread_info()->flags */
 	andi.	r0,r4,_TIF_USER_WORK_MASK
-#ifdef CONFIG_PPC_BOOK3E
 	bne	1f
+#ifdef CONFIG_PPC_BOOK3E
 	/*
 	 * Check to see if the dbcr0 register is set up to debug.
 	 * Use the internal debug mode bit to do this.
@@ -618,7 +631,9 @@ _GLOBAL(ret_from_except_lite)
 	mtspr	SPRN_DBSR,r10
 	b	restore
 #else
-	beq	restore
+	addi	r3,r1,STACK_FRAME_OVERHEAD
+	bl	restore_math
+	b	restore
 #endif
 1:	andi.	r0,r4,_TIF_NEED_RESCHED
 	beq	2f
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 2117eac..b063524 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -130,6 +130,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	or	r12,r12,r4
 	std	r12,_MSR(r1)
 #endif
+	/* Don't care if r4 overflows, this is desired behaviour */
+	lbz	r4,THREAD_LOAD_FP(r5)
+	addi	r4,r4,1
+	stb	r4,THREAD_LOAD_FP(r5)
 	addi	r10,r5,THREAD_FPSTATE
 	lfd	fr0,FPSTATE_FPSCR(r10)
 	MTFSF_L(fr0)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e0c3d2d..ec53468 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -374,6 +374,53 @@ void giveup_all(struct task_struct *tsk)
 }
 EXPORT_SYMBOL(giveup_all);
 
+void restore_math(struct pt_regs *regs)
+{
+	unsigned long msr;
+
+	if (!current->thread.load_fp
+#ifdef CONFIG_ALTIVEC
+		&& !current->thread.load_vec)
+#else
+		)
+#endif
+		return;
+
+	msr = regs->msr;
+	msr_check_and_set(msr_all_available);
+
+	/*
+	 * Only reload if the bit is not set in the user MSR, the bit BEING set
+	 * indicates that the registers are hot
+	 */
+#ifdef CONFIG_PPC_FPU
+	if (current->thread.load_fp && !(msr & MSR_FP)) {
+		load_fp_state(&current->thread.fp_state);
+		msr |= MSR_FP | current->thread.fpexc_mode;
+		current->thread.load_fp++;
+	}
+#endif
+#ifdef CONFIG_ALTIVEC
+	if (current->thread.load_vec && !(msr & MSR_VEC) &&
+			cpu_has_feature(CPU_FTR_ALTIVEC)) {
+		load_vr_state(&current->thread.vr_state);
+		current->thread.used_vr = 1;
+		msr |= MSR_VEC;
+		current->thread.load_vec++;
+	}
+#endif
+#ifdef CONFIG_VSX
+	if (!(msr & MSR_VSX) && (msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC)) {
+		current->thread.used_vsr = 1;
+		msr |= MSR_VSX;
+	}
+#endif
+
+	msr_check_and_clear(msr_all_available);
+
+	regs->msr = msr;
+}
+
 void flush_all_to_thread(struct task_struct *tsk)
 {
 	if (tsk->thread.regs) {
@@ -832,17 +879,9 @@ void restore_tm_state(struct pt_regs *regs)
 
 	msr_diff = current->thread.ckpt_regs.msr & ~regs->msr;
 	msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
-	if (msr_diff & MSR_FP) {
-		msr_check_and_set(MSR_FP);
-		load_fp_state(&current->thread.fp_state);
-		msr_check_and_clear(MSR_FP);
-		regs->msr |= current->thread.fpexc_mode;
-	}
-	if (msr_diff & MSR_VEC) {
-		msr_check_and_set(MSR_VEC);
-		load_vr_state(&current->thread.vr_state);
-		msr_check_and_clear(MSR_VEC);
-	}
+
+	restore_math(regs);
+
 	regs->msr |= msr_diff;
 }
 
@@ -1006,6 +1045,11 @@ struct task_struct *__switch_to(struct task_struct *prev,
 		batch = this_cpu_ptr(&ppc64_tlb_batch);
 		batch->active = 1;
 	}
+
+	/* Don't do this on a kernel thread */
+	if (current_thread_info()->task->thread.regs)
+		restore_math(current_thread_info()->task->thread.regs);
+
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
 	return last;
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 162d0f7..038cff8 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -91,6 +91,10 @@ _GLOBAL(load_up_altivec)
 	oris	r12,r12,MSR_VEC@h
 	std	r12,_MSR(r1)
 #endif
+	/* Don't care if r4 overflows, this is desired behaviour */
+	lbz	r4,THREAD_LOAD_VEC(r5)
+	addi	r4,r4,1
+	stb	r4,THREAD_LOAD_VEC(r5)
 	addi	r6,r5,THREAD_VRSTATE
 	li	r4,1
 	li	r10,VRSTATE_VSCR
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up
  2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
                   ` (4 preceding siblings ...)
  2016-01-15  5:04 ` [PATCH V2 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
@ 2016-01-15  5:04 ` Cyril Bur
  2016-01-15  6:08   ` Michael Neuling
  2016-01-15  7:38   ` Denis Kirjanov
  2016-01-15  5:04 ` [PATCH V2 7/8] powerpc: Add the ability to save Altivec " Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 8/8] powerpc: Add the ability to save VSX " Cyril Bur
  7 siblings, 2 replies; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

This patch adds the ability to be able to save the FPU registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch optimises the thread copy path (as a result of a fork() or
clone()) so that the parent thread can return to userspace with hot
registers avoiding a possibly pointless reload of FPU register state.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/include/asm/switch_to.h |  2 +-
 arch/powerpc/kernel/fpu.S            | 21 ++++------------
 arch/powerpc/kernel/process.c        | 46 +++++++++++++++++++++++++++++++++++-
 3 files changed, 50 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 5b268b6..c4d50e9 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -28,7 +28,7 @@ extern void giveup_all(struct task_struct *);
 extern void enable_kernel_fp(void);
 extern void flush_fp_to_thread(struct task_struct *);
 extern void giveup_fpu(struct task_struct *);
-extern void __giveup_fpu(struct task_struct *);
+extern void save_fpu(struct task_struct *);
 static inline void disable_kernel_fp(void)
 {
 	msr_check_and_clear(MSR_FP);
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index b063524..15da2b5 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	blr
 
 /*
- * __giveup_fpu(tsk)
- * Disable FP for the task given as the argument,
- * and save the floating-point registers in its thread_struct.
+ * save_fpu(tsk)
+ * Save the floating-point registers in its thread_struct.
  * Enables the FPU for use in the kernel on return.
  */
-_GLOBAL(__giveup_fpu)
+_GLOBAL(save_fpu)
 	addi	r3,r3,THREAD	        /* want THREAD of task */
 	PPC_LL	r6,THREAD_FPSAVEAREA(r3)
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r6,0
 	bne	2f
 	addi	r6,r3,THREAD_FPSTATE
-2:	PPC_LCMPI	0,r5,0
-	SAVE_32FPVSRS(0, R4, R6)
+2:	SAVE_32FPVSRS(0, R4, R6)
 	mffs	fr0
 	stfd	fr0,FPSTATE_FPSCR(r6)
-	beq	1f
-	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-	li	r3,MSR_FP|MSR_FE0|MSR_FE1
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
-	oris	r3,r3,MSR_VSX@h
-END_FTR_SECTION_IFSET(CPU_FTR_VSX)
-#endif
-	andc	r4,r4,r3		/* disable FP for previous task */
-	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
 	blr
 
 /*
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ec53468..8a96e4f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits)
 EXPORT_SYMBOL(__msr_check_and_clear);
 
 #ifdef CONFIG_PPC_FPU
+void __giveup_fpu(struct task_struct *tsk)
+{
+	save_fpu(tsk);
+	tsk->thread.regs->msr &= ~MSR_FP;
+#ifdef CONFIG_VSX
+	if (cpu_has_feature(CPU_FTR_VSX))
+		tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
 void giveup_fpu(struct task_struct *tsk)
 {
 	check_if_tm_restore_required(tsk);
@@ -421,12 +431,46 @@ void restore_math(struct pt_regs *regs)
 	regs->msr = msr;
 }
 
+void save_all(struct task_struct *tsk)
+{
+	unsigned long usermsr;
+
+	if (!tsk->thread.regs)
+		return;
+
+	usermsr = tsk->thread.regs->msr;
+
+	if ((usermsr & msr_all_available) == 0)
+		return;
+
+	msr_check_and_set(msr_all_available);
+
+#ifdef CONFIG_PPC_FPU
+	if (usermsr & MSR_FP)
+		save_fpu(tsk);
+#endif
+#ifdef CONFIG_ALTIVEC
+	if (usermsr & MSR_VEC)
+		__giveup_altivec(tsk);
+#endif
+#ifdef CONFIG_VSX
+	if (usermsr & MSR_VSX)
+		__giveup_vsx(tsk);
+#endif
+#ifdef CONFIG_SPE
+	if (usermsr & MSR_SPE)
+		__giveup_spe(tsk);
+#endif
+
+	msr_check_and_clear(msr_all_available);
+}
+
 void flush_all_to_thread(struct task_struct *tsk)
 {
 	if (tsk->thread.regs) {
 		preempt_disable();
 		BUG_ON(tsk != current);
-		giveup_all(tsk);
+		save_all(tsk);
 
 #ifdef CONFIG_SPE
 		if (tsk->thread.regs->msr & MSR_SPE)
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH V2 7/8] powerpc: Add the ability to save Altivec without giving it up
  2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
                   ` (5 preceding siblings ...)
  2016-01-15  5:04 ` [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
@ 2016-01-15  5:04 ` Cyril Bur
  2016-01-15  5:04 ` [PATCH V2 8/8] powerpc: Add the ability to save VSX " Cyril Bur
  7 siblings, 0 replies; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

This patch adds the ability to be able to save the VEC registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch builds on a previous optimisation for the FPU registers in the
thread copy path to avoid a possibly pointless reload of VEC state.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/include/asm/switch_to.h |  2 +-
 arch/powerpc/kernel/process.c        | 12 +++++++++++-
 arch/powerpc/kernel/vector.S         | 24 ++++--------------------
 3 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index c4d50e9..29dda9d 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -41,7 +41,7 @@ static inline void flush_fp_to_thread(struct task_struct *t) { }
 extern void enable_kernel_altivec(void);
 extern void flush_altivec_to_thread(struct task_struct *);
 extern void giveup_altivec(struct task_struct *);
-extern void __giveup_altivec(struct task_struct *);
+extern void save_altivec(struct task_struct *);
 static inline void disable_kernel_altivec(void)
 {
 	msr_check_and_clear(MSR_VEC);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 8a96e4f..5566c32 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -200,6 +200,16 @@ EXPORT_SYMBOL(enable_kernel_fp);
 #endif /* CONFIG_PPC_FPU */
 
 #ifdef CONFIG_ALTIVEC
+void __giveup_altivec(struct task_struct *tsk)
+{
+	save_altivec(tsk);
+	tsk->thread.regs->msr &= ~MSR_VEC;
+#ifdef CONFIG_VSX
+	if (cpu_has_feature(CPU_FTR_VSX))
+		tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
 void giveup_altivec(struct task_struct *tsk)
 {
 	check_if_tm_restore_required(tsk);
@@ -451,7 +461,7 @@ void save_all(struct task_struct *tsk)
 #endif
 #ifdef CONFIG_ALTIVEC
 	if (usermsr & MSR_VEC)
-		__giveup_altivec(tsk);
+		save_altivec(tsk);
 #endif
 #ifdef CONFIG_VSX
 	if (usermsr & MSR_VSX)
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 038cff8..51b0c17 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -106,36 +106,20 @@ _GLOBAL(load_up_altivec)
 	blr
 
 /*
- * __giveup_altivec(tsk)
- * Disable VMX for the task given as the argument,
- * and save the vector registers in its thread_struct.
+ * save_altivec(tsk)
+ * Save the vector registers to its thread_struct
  */
-_GLOBAL(__giveup_altivec)
+_GLOBAL(save_altivec)
 	addi	r3,r3,THREAD		/* want THREAD of task */
 	PPC_LL	r7,THREAD_VRSAVEAREA(r3)
 	PPC_LL	r5,PT_REGS(r3)
 	PPC_LCMPI	0,r7,0
 	bne	2f
 	addi	r7,r3,THREAD_VRSTATE
-2:	PPC_LCMPI	0,r5,0
-	SAVE_32VRS(0,r4,r7)
+2:	SAVE_32VRS(0,r4,r7)
 	mfvscr	v0
 	li	r4,VRSTATE_VSCR
 	stvx	v0,r4,r7
-	beq	1f
-	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
-	lis	r3,(MSR_VEC|MSR_VSX)@h
-FTR_SECTION_ELSE
-	lis	r3,MSR_VEC@h
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_VSX)
-#else
-	lis	r3,MSR_VEC@h
-#endif
-	andc	r4,r4,r3		/* disable FP for previous task */
-	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
 	blr
 
 #ifdef CONFIG_VSX
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH V2 8/8] powerpc: Add the ability to save VSX without giving it up
  2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
                   ` (6 preceding siblings ...)
  2016-01-15  5:04 ` [PATCH V2 7/8] powerpc: Add the ability to save Altivec " Cyril Bur
@ 2016-01-15  5:04 ` Cyril Bur
  2016-01-15  6:25   ` Michael Neuling
  7 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:04 UTC (permalink / raw)
  To: linuxppc-dev

This patch adds the ability to be able to save the VSX registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.

This patch builds on a previous optimisation for the FPU and VEC registers
in the thread copy path to avoid a possibly pointless reload of VSX state.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/include/asm/switch_to.h |  1 -
 arch/powerpc/kernel/ppc_ksyms.c      |  4 ----
 arch/powerpc/kernel/process.c        | 23 ++++++++++++++++++-----
 arch/powerpc/kernel/vector.S         | 17 -----------------
 4 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 29dda9d..4dfcd3e 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -52,7 +52,6 @@ static inline void disable_kernel_altivec(void)
 extern void enable_kernel_vsx(void);
 extern void flush_vsx_to_thread(struct task_struct *);
 extern void giveup_vsx(struct task_struct *);
-extern void __giveup_vsx(struct task_struct *);
 static inline void disable_kernel_vsx(void)
 {
 	msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 41e1607..ef7024da 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -28,10 +28,6 @@ EXPORT_SYMBOL(load_vr_state);
 EXPORT_SYMBOL(store_vr_state);
 #endif
 
-#ifdef CONFIG_VSX
-EXPORT_SYMBOL_GPL(__giveup_vsx);
-#endif
-
 #ifdef CONFIG_EPAPR_PARAVIRT
 EXPORT_SYMBOL(epapr_hypercall_start);
 #endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 5566c32..3d907b8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -252,20 +252,33 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
 #endif /* CONFIG_ALTIVEC */
 
 #ifdef CONFIG_VSX
-void giveup_vsx(struct task_struct *tsk)
+void __giveup_vsx(struct task_struct *tsk)
 {
-	check_if_tm_restore_required(tsk);
-
-	msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
 	if (tsk->thread.regs->msr & MSR_FP)
 		__giveup_fpu(tsk);
 	if (tsk->thread.regs->msr & MSR_VEC)
 		__giveup_altivec(tsk);
+	tsk->thread.regs->msr &= ~MSR_VSX;
+}
+
+void giveup_vsx(struct task_struct *tsk)
+{
+	check_if_tm_restore_required(tsk);
+
+	msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
 	__giveup_vsx(tsk);
 	msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
 }
 EXPORT_SYMBOL(giveup_vsx);
 
+void save_vsx(struct task_struct *tsk)
+{
+	if (tsk->thread.regs->msr & MSR_FP)
+		save_fpu(tsk);
+	if (tsk->thread.regs->msr & MSR_VEC)
+		save_altivec(tsk);
+}
+
 void enable_kernel_vsx(void)
 {
 	WARN_ON(preemptible());
@@ -465,7 +478,7 @@ void save_all(struct task_struct *tsk)
 #endif
 #ifdef CONFIG_VSX
 	if (usermsr & MSR_VSX)
-		__giveup_vsx(tsk);
+		save_vsx(tsk);
 #endif
 #ifdef CONFIG_SPE
 	if (usermsr & MSR_SPE)
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 51b0c17..1c2e7a3 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -151,23 +151,6 @@ _GLOBAL(load_up_vsx)
 	std	r12,_MSR(r1)
 	b	fast_exception_return
 
-/*
- * __giveup_vsx(tsk)
- * Disable VSX for the task given as the argument.
- * Does NOT save vsx registers.
- */
-_GLOBAL(__giveup_vsx)
-	addi	r3,r3,THREAD		/* want THREAD of task */
-	ld	r5,PT_REGS(r3)
-	cmpdi	0,r5,0
-	beq	1f
-	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-	lis	r3,MSR_VSX@h
-	andc	r4,r4,r3		/* disable VSX for previous task */
-	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
-	blr
-
 #endif /* CONFIG_VSX */
 
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread
  2016-01-15  5:04 ` [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
@ 2016-01-15  5:42   ` Michael Neuling
  2016-01-15  5:54     ` Cyril Bur
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Neuling @ 2016-01-15  5:42 UTC (permalink / raw)
  To: Cyril Bur, linuxppc-dev

On Fri, 2016-01-15 at 16:04 +1100, Cyril Bur wrote:
> With threads leaving the math bits enabled in their saved MSR to indicate
> that the hardware is hot and a restore is not needed, children need to tu=
rn
> it off as when they do get scheduled, there's no way their registers coul=
d
> have been hot.

Is this a bug in the current code?

Mikey

> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
>  arch/powerpc/kernel/process.c | 1 +
>  1 file changed, 1 insertion(+)
>=20
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c
> index dccc87e..e0c3d2d 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags,
> unsigned long usp,
> =20
>  		f =3D ret_from_fork;
>  	}
> +	childregs->msr &=3D ~(MSR_FP|MSR_VEC|MSR_VSX);
>  	sp -=3D STACK_FRAME_OVERHEAD;
> =20
>  	/*

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread
  2016-01-15  5:42   ` Michael Neuling
@ 2016-01-15  5:54     ` Cyril Bur
  2016-01-15  6:04       ` Michael Neuling
  0 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2016-01-15  5:54 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev

On Fri, 15 Jan 2016 16:42:22 +1100
Michael Neuling <mikey@neuling.org> wrote:

> On Fri, 2016-01-15 at 16:04 +1100, Cyril Bur wrote:
> > With threads leaving the math bits enabled in their saved MSR to indicate
> > that the hardware is hot and a restore is not needed, children need to turn
> > it off as when they do get scheduled, there's no way their registers could
> > have been hot.  
> 
> Is this a bug in the current code?
> 

You're very consistent:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136469.html

;)

> Mikey
> 
> > Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> > ---
> >  arch/powerpc/kernel/process.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/powerpc/kernel/process.c
> > b/arch/powerpc/kernel/process.c
> > index dccc87e..e0c3d2d 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags,
> > unsigned long usp,
> >  
> >  		f = ret_from_fork;
> >  	}
> > +	childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
> >  	sp -= STACK_FRAME_OVERHEAD;
> >  
> >  	/*  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/8] powerpc: Restore FPU/VEC/VSX if previously used
  2016-01-15  5:04 ` [PATCH V2 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
@ 2016-01-15  6:02   ` Michael Neuling
  2016-01-18  2:05     ` Cyril Bur
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Neuling @ 2016-01-15  6:02 UTC (permalink / raw)
  To: Cyril Bur, linuxppc-dev

> Currently the FPU, VEC and VSX facilities are lazily loaded. This is not =
a
> problem unless a process is using these facilities.

> Modern versions of GCC are very good at automatically vectorising code, n=
ew
> and modernised workloads make use of floating point and vector facilities=
,
> even the kernel makes use of vectorised memcpy.

> All this combined greatly increases the cost of a syscall since the kerne=
l
> uses the facilities sometimes even in syscall fast-path making it
> increasingly common for a thread to take an *_unavailable exception soon
> after a syscall, not to mention potentially taking all three.

> The obvious overcompensation to this problem is to simply always load all
> the facilities on every exit to userspace. Loading up all FPU, VEC and VS=
X
> registers every time can be expensive and if a workload does avoid using
> them, it should not be forced to incur this penalty.

> An 8bit counter is used to detect if the registers have been used in the
> past and the registers are always loaded until the value wraps to back to
> zero.

> Several versions of the assembly in entry_64.S. 1. Always calling C, 2.
> Performing a common case check and then calling C and 3. A complex check =
in
> asm. After some benchmarking it was determined that avoiding C in the
> common case is a performance benefit. The full check in asm greatly
> complicated that codepath for a negligible performance gain and the
> trade-off was deemed not worth it.

> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
>  arch/powerpc/include/asm/processor.h |  2 ++
>  arch/powerpc/kernel/asm-offsets.c    |  2 ++
>  arch/powerpc/kernel/entry_64.S       | 21 ++++++++++--
>  arch/powerpc/kernel/fpu.S            |  4 +++
>  arch/powerpc/kernel/process.c        | 66 ++++++++++++++++++++++++++++++=
------
>  arch/powerpc/kernel/vector.S         |  4 +++
>  6 files changed, 85 insertions(+), 14 deletions(-)

> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/=
asm/processor.h
> index ac23308..dcab21f 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -236,11 +236,13 @@ struct thread_struct {
>  #endif
>         struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpo=
int */
>         unsigned long   trap_nr;        /* last trap # on this thread */
> +       u8 load_fp;
>  #ifdef CONFIG_ALTIVEC
>         struct thread_vr_state vr_state;
>         struct thread_vr_state *vr_save_area;
>         unsigned long   vrsave;
>         int             used_vr;        /* set if process has used altive=
c */
> +       u8 load_vec;
>  #endif /* CONFIG_ALTIVEC */
>  #ifdef CONFIG_VSX
>         /* VSR status */
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-=
offsets.c
> index 07cebc3..10d5eab 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -95,12 +95,14 @@ int main(void)
>         DEFINE(THREAD_FPSTATE, offsetof(struct thread_struct, fp_state));
>         DEFINE(THREAD_FPSAVEAREA, offsetof(struct thread_struct, fp_save_=
area));
>         DEFINE(FPSTATE_FPSCR, offsetof(struct thread_fp_state, fpscr));
> +       DEFINE(THREAD_LOAD_FP, offsetof(struct thread_struct, load_fp));
>  #ifdef CONFIG_ALTIVEC
>         DEFINE(THREAD_VRSTATE, offsetof(struct thread_struct, vr_state));
>         DEFINE(THREAD_VRSAVEAREA, offsetof(struct thread_struct, vr_save_=
area));
>         DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
>         DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
>         DEFINE(VRSTATE_VSCR, offsetof(struct thread_vr_state, vscr));
> +       DEFINE(THREAD_LOAD_VEC, offsetof(struct thread_struct, load_vec))=
;
>  #endif /* CONFIG_ALTIVEC */
>  #ifdef CONFIG_VSX
>         DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr))=
;
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_6=
4.S
> index 0d525ce..038e0a1 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -210,7 +210,20 @@ system_call:                       /* label this so =
stack traces look sane */
>         li      r11,-MAX_ERRNO
>         andi.   r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WOR=
K_MASK|_TIF_PERSYSCALL_MASK)
>         bne-    syscall_exit_work
> -       cmpld   r3,r11
> +
> +       andi.   r0,r8,MSR_FP
> +       beq 2f
> +#ifdef CONFIG_ALTIVEC
> +       andis.  r0,r8,MSR_VEC@h
> +       bne     3f
> +#endif
> +2:     addi    r3,r1,STACK_FRAME_OVERHEAD
> +       bl      restore_math
> +       ld      r8,_MSR(r1)
> +       ld      r3,RESULT(r1)
> +       li      r11,-MAX_ERRNO
> +
> +3:     cmpld   r3,r11
>         ld      r5,_CCR(r1)
>         bge-    syscall_error
>  .Lsyscall_error_cont:
> @@ -602,8 +615,8 @@ _GLOBAL(ret_from_except_lite)
=20
>         /* Check current_thread_info()->flags */
>         andi.   r0,r4,_TIF_USER_WORK_MASK
> -#ifdef CONFIG_PPC_BOOK3E
>         bne     1f
> +#ifdef CONFIG_PPC_BOOK3E
>         /*
>          * Check to see if the dbcr0 register is set up to debug.
>          * Use the internal debug mode bit to do this.
> @@ -618,7 +631,9 @@ _GLOBAL(ret_from_except_lite)
>         mtspr   SPRN_DBSR,r10
>         b       restore
>  #else
> -       beq     restore
> +       addi    r3,r1,STACK_FRAME_OVERHEAD
> +       bl      restore_math
> +       b       restore
>  #endif
>  1:     andi.   r0,r4,_TIF_NEED_RESCHED
>         beq     2f
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index 2117eac..b063524 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -130,6 +130,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>         or      r12,r12,r4
>         std     r12,_MSR(r1)
>  #endif
> +       /* Don't care if r4 overflows, this is desired behaviour */
> +       lbz     r4,THREAD_LOAD_FP(r5)
> +       addi    r4,r4,1
> +       stb     r4,THREAD_LOAD_FP(r5)
>         addi    r10,r5,THREAD_FPSTATE
>         lfd     fr0,FPSTATE_FPSCR(r10)
>         MTFSF_L(fr0)
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.=
c
> index e0c3d2d..ec53468 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -374,6 +374,53 @@ void giveup_all(struct task_struct *tsk)
>  }
>  EXPORT_SYMBOL(giveup_all);
=20
> +void restore_math(struct pt_regs *regs)
> +{
> +       unsigned long msr;
> +
> +       if (!current->thread.load_fp
> +#ifdef CONFIG_ALTIVEC
> +               && !current->thread.load_vec)
> +#else
> +               )
> +#endif
> +               return;

Can you make the inline code easier to read?  Something like

#ifdef CONFIG_ALTIVEC
#define loadvec(thr) ((thr).load_vec)
#else
#define loadvec(thr) 0
#endif

void restore_math(struct pt_regs *regs)
{
       unsigned long msr;

       if (!current->thread.load_fp && !loadvec(current->thread)
		return;

> +
> +       msr =3D regs->msr;
> +       msr_check_and_set(msr_all_available);
> +
> +       /*
> +        * Only reload if the bit is not set in the user MSR, the bit BEI=
NG set
> +        * indicates that the registers are hot
> +        */
> +#ifdef CONFIG_PPC_FPU
> +       if (current->thread.load_fp && !(msr & MSR_FP)) {
> +               load_fp_state(&current->thread.fp_state);
> +               msr |=3D MSR_FP | current->thread.fpexc_mode;
> +               current->thread.load_fp++;
> +       }
> +#endif
> +#ifdef CONFIG_ALTIVEC
> +       if (current->thread.load_vec && !(msr & MSR_VEC) &&
> +                       cpu_has_feature(CPU_FTR_ALTIVEC)) {
> +               load_vr_state(&current->thread.vr_state);
> +               current->thread.used_vr =3D 1;
> +               msr |=3D MSR_VEC;
> +               current->thread.load_vec++;
> +       }
> +#endif
> +#ifdef CONFIG_VSX
> +       if (!(msr & MSR_VSX) && (msr & (MSR_FP | MSR_VEC)) =3D=3D (MSR_FP=
 | MSR_VEC)) {

What are you trying to hit with this if statement?

Seems you are turning on VSX if VSX is not already on but FP and VEC
is.  Why do you need the check MSR_VSX is not used?  That seems redundant.

> +               current->thread.used_vsr =3D 1;
> +               msr |=3D MSR_VSX;
> +       }
> +#endif
> +
> +       msr_check_and_clear(msr_all_available);

Why are you doing this?  Why all, and not just the ones you've enabled abov=
e?

> +
> +       regs->msr =3D msr;
> +}
> +
>  void flush_all_to_thread(struct task_struct *tsk)
>  {
>         if (tsk->thread.regs) {
> @@ -832,17 +879,9 @@ void restore_tm_state(struct pt_regs *regs)
=20
>         msr_diff =3D current->thread.ckpt_regs.msr & ~regs->msr;
>         msr_diff &=3D MSR_FP | MSR_VEC | MSR_VSX;
> -       if (msr_diff & MSR_FP) {
> -               msr_check_and_set(MSR_FP);
> -               load_fp_state(&current->thread.fp_state);
> -               msr_check_and_clear(MSR_FP);
> -               regs->msr |=3D current->thread.fpexc_mode;
> -       }
> -       if (msr_diff & MSR_VEC) {
> -               msr_check_and_set(MSR_VEC);
> -               load_vr_state(&current->thread.vr_state);
> -               msr_check_and_clear(MSR_VEC);
> -       }
> +
> +       restore_math(regs);
> +
>         regs->msr |=3D msr_diff;
>  }
=20
> @@ -1006,6 +1045,11 @@ struct task_struct *__switch_to(struct task_struct=
 *prev,
>                 batch =3D this_cpu_ptr(&ppc64_tlb_batch);
>                 batch->active =3D 1;
>         }
> +
> +       /* Don't do this on a kernel thread */

Why not?

> +       if (current_thread_info()->task->thread.regs)
> +               restore_math(current_thread_info()->task->thread.regs);
> +
>  #endif /* CONFIG_PPC_BOOK3S_64 */
=20
>         return last;
> diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
> index 162d0f7..038cff8 100644
> --- a/arch/powerpc/kernel/vector.S
> +++ b/arch/powerpc/kernel/vector.S
> @@ -91,6 +91,10 @@ _GLOBAL(load_up_altivec)
>         oris    r12,r12,MSR_VEC@h
>         std     r12,_MSR(r1)
>  #endif
> +       /* Don't care if r4 overflows, this is desired behaviour */
> +       lbz     r4,THREAD_LOAD_VEC(r5)
> +       addi    r4,r4,1
> +       stb     r4,THREAD_LOAD_VEC(r5)
>         addi    r6,r5,THREAD_VRSTATE
>         li      r4,1
>         li      r10,VRSTATE_VSCR
> --=20
> 2.7.0

> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-devOn Fri, 2016-01-15 at 16:04=
 +1100, Cyril Bur wrote:

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread
  2016-01-15  5:54     ` Cyril Bur
@ 2016-01-15  6:04       ` Michael Neuling
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Neuling @ 2016-01-15  6:04 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev

On Fri, 2016-01-15 at 16:54 +1100, Cyril Bur wrote:
> On Fri, 15 Jan 2016 16:42:22 +1100
> Michael Neuling <mikey@neuling.org> wrote:
>=20
> > On Fri, 2016-01-15 at 16:04 +1100, Cyril Bur wrote:
> > > With threads leaving the math bits enabled in their saved MSR to
> > > indicate
> > > that the hardware is hot and a restore is not needed, children
> > > need to turn
> > > it off as when they do get scheduled, there's no way their
> > > registers could
> > > have been hot. =20
> >=20
> > Is this a bug in the current code?
> >=20
>=20
> You're very consistent:
>=20
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136469.
> html

Fix the comment to explain why it's not a bug now.

Mikey

> ;)
>=20
> > Mikey
> >=20
> > > Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> > > ---
> > >  arch/powerpc/kernel/process.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >=20
> > > diff --git a/arch/powerpc/kernel/process.c
> > > b/arch/powerpc/kernel/process.c
> > > index dccc87e..e0c3d2d 100644
> > > --- a/arch/powerpc/kernel/process.c
> > > +++ b/arch/powerpc/kernel/process.c
> > > @@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags,
> > > unsigned long usp,
> > > =20
> > >  		f =3D ret_from_fork;
> > >  	}
> > > +	childregs->msr &=3D ~(MSR_FP|MSR_VEC|MSR_VSX);
> > >  	sp -=3D STACK_FRAME_OVERHEAD;
> > > =20
> > >  	/* =20
>=20
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up
  2016-01-15  5:04 ` [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
@ 2016-01-15  6:08   ` Michael Neuling
  2016-01-15  7:38   ` Denis Kirjanov
  1 sibling, 0 replies; 20+ messages in thread
From: Michael Neuling @ 2016-01-15  6:08 UTC (permalink / raw)
  To: Cyril Bur, linuxppc-dev

On Fri, 2016-01-15 at 16:04 +1100, Cyril Bur wrote:
> This patch adds the ability to be able to save the FPU registers to
> the
> thread struct without giving up (disabling the facility) next time
> the
> process returns to userspace.
>=20
> This patch optimises the thread copy path (as a result of a fork() or
> clone()) so that the parent thread can return to userspace with hot
> registers avoiding a possibly pointless reload of FPU register state.

This comment seems misleading.  The patch seems to touch the VSX, VMX
and SPE state as well. =20

Mikey

>=20
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
>  arch/powerpc/include/asm/switch_to.h |  2 +-
>  arch/powerpc/kernel/fpu.S            | 21 ++++------------
>  arch/powerpc/kernel/process.c        | 46
> +++++++++++++++++++++++++++++++++++-
>  3 files changed, 50 insertions(+), 19 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/switch_to.h
> b/arch/powerpc/include/asm/switch_to.h
> index 5b268b6..c4d50e9 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -28,7 +28,7 @@ extern void giveup_all(struct task_struct *);
>  extern void enable_kernel_fp(void);
>  extern void flush_fp_to_thread(struct task_struct *);
>  extern void giveup_fpu(struct task_struct *);
> -extern void __giveup_fpu(struct task_struct *);
> +extern void save_fpu(struct task_struct *);
>  static inline void disable_kernel_fp(void)
>  {
>  	msr_check_and_clear(MSR_FP);
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index b063524..15da2b5 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>  	blr
> =20
>  /*
> - * __giveup_fpu(tsk)
> - * Disable FP for the task given as the argument,
> - * and save the floating-point registers in its thread_struct.
> + * save_fpu(tsk)
> + * Save the floating-point registers in its thread_struct.
>   * Enables the FPU for use in the kernel on return.
>   */
> -_GLOBAL(__giveup_fpu)
> +_GLOBAL(save_fpu)
>  	addi	r3,r3,THREAD	        /* want THREAD of
> task */
>  	PPC_LL	r6,THREAD_FPSAVEAREA(r3)
>  	PPC_LL	r5,PT_REGS(r3)
>  	PPC_LCMPI	0,r6,0
>  	bne	2f
>  	addi	r6,r3,THREAD_FPSTATE
> -2:	PPC_LCMPI	0,r5,0
> -	SAVE_32FPVSRS(0, R4, R6)
> +2:	SAVE_32FPVSRS(0, R4, R6)
>  	mffs	fr0
>  	stfd	fr0,FPSTATE_FPSCR(r6)
> -	beq	1f
> -	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> -	li	r3,MSR_FP|MSR_FE0|MSR_FE1
> -#ifdef CONFIG_VSX
> -BEGIN_FTR_SECTION
> -	oris	r3,r3,MSR_VSX@h
> -END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> -#endif
> -	andc	r4,r4,r3		/* disable FP for
> previous task */
> -	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> -1:
>  	blr
> =20
>  /*
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c
> index ec53468..8a96e4f 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits)
>  EXPORT_SYMBOL(__msr_check_and_clear);
> =20
>  #ifdef CONFIG_PPC_FPU
> +void __giveup_fpu(struct task_struct *tsk)
> +{
> +	save_fpu(tsk);
> +	tsk->thread.regs->msr &=3D ~MSR_FP;
> +#ifdef CONFIG_VSX
> +	if (cpu_has_feature(CPU_FTR_VSX))
> +		tsk->thread.regs->msr &=3D ~MSR_VSX;
> +#endif
> +}
> +
>  void giveup_fpu(struct task_struct *tsk)
>  {
>  	check_if_tm_restore_required(tsk);
> @@ -421,12 +431,46 @@ void restore_math(struct pt_regs *regs)
>  	regs->msr =3D msr;
>  }
> =20
> +void save_all(struct task_struct *tsk)
> +{
> +	unsigned long usermsr;
> +
> +	if (!tsk->thread.regs)
> +		return;
> +
> +	usermsr =3D tsk->thread.regs->msr;
> +
> +	if ((usermsr & msr_all_available) =3D=3D 0)
> +		return;
> +
> +	msr_check_and_set(msr_all_available);
> +
> +#ifdef CONFIG_PPC_FPU
> +	if (usermsr & MSR_FP)
> +		save_fpu(tsk);
> +#endif
> +#ifdef CONFIG_ALTIVEC
> +	if (usermsr & MSR_VEC)
> +		__giveup_altivec(tsk);
> +#endif
> +#ifdef CONFIG_VSX
> +	if (usermsr & MSR_VSX)
> +		__giveup_vsx(tsk);
> +#endif
> +#ifdef CONFIG_SPE
> +	if (usermsr & MSR_SPE)
> +		__giveup_spe(tsk);
> +#endif
> +
> +	msr_check_and_clear(msr_all_available);
> +}
> +
>  void flush_all_to_thread(struct task_struct *tsk)
>  {
>  	if (tsk->thread.regs) {
>  		preempt_disable();
>  		BUG_ON(tsk !=3D current);
> -		giveup_all(tsk);
> +		save_all(tsk);
> =20
>  #ifdef CONFIG_SPE
>  		if (tsk->thread.regs->msr & MSR_SPE)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 8/8] powerpc: Add the ability to save VSX without giving it up
  2016-01-15  5:04 ` [PATCH V2 8/8] powerpc: Add the ability to save VSX " Cyril Bur
@ 2016-01-15  6:25   ` Michael Neuling
  2016-01-18  2:10     ` Cyril Bur
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Neuling @ 2016-01-15  6:25 UTC (permalink / raw)
  To: Cyril Bur, linuxppc-dev

On Fri, 2016-01-15 at 16:04 +1100, Cyril Bur wrote:
> This patch adds the ability to be able to save the VSX registers to
> the
> thread struct without giving up (disabling the facility) next time
> the
> process returns to userspace.
>=20
> This patch builds on a previous optimisation for the FPU and VEC
> registers
> in the thread copy path to avoid a possibly pointless reload of VSX
> state.
>=20
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
>  arch/powerpc/include/asm/switch_to.h |  1 -
>  arch/powerpc/kernel/ppc_ksyms.c      |  4 ----
>  arch/powerpc/kernel/process.c        | 23 ++++++++++++++++++-----
>  arch/powerpc/kernel/vector.S         | 17 -----------------
>  4 files changed, 18 insertions(+), 27 deletions(-)
>=20
> diff --git a/arch/powerpc/include/asm/switch_to.h
> b/arch/powerpc/include/asm/switch_to.h
> index 29dda9d..4dfcd3e 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -52,7 +52,6 @@ static inline void disable_kernel_altivec(void)
>  extern void enable_kernel_vsx(void);
>  extern void flush_vsx_to_thread(struct task_struct *);
>  extern void giveup_vsx(struct task_struct *);
> -extern void __giveup_vsx(struct task_struct *);
>  static inline void disable_kernel_vsx(void)
>  {
>  	msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
> diff --git a/arch/powerpc/kernel/ppc_ksyms.c
> b/arch/powerpc/kernel/ppc_ksyms.c
> index 41e1607..ef7024da 100644
> --- a/arch/powerpc/kernel/ppc_ksyms.c
> +++ b/arch/powerpc/kernel/ppc_ksyms.c
> @@ -28,10 +28,6 @@ EXPORT_SYMBOL(load_vr_state);
>  EXPORT_SYMBOL(store_vr_state);
>  #endif
> =20
> -#ifdef CONFIG_VSX
> -EXPORT_SYMBOL_GPL(__giveup_vsx);
> -#endif
> -
>  #ifdef CONFIG_EPAPR_PARAVIRT
>  EXPORT_SYMBOL(epapr_hypercall_start);
>  #endif
> diff --git a/arch/powerpc/kernel/process.c
> b/arch/powerpc/kernel/process.c
> index 5566c32..3d907b8 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -252,20 +252,33 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
>  #endif /* CONFIG_ALTIVEC */
> =20
>  #ifdef CONFIG_VSX
> -void giveup_vsx(struct task_struct *tsk)
> +void __giveup_vsx(struct task_struct *tsk)
>  {
> -	check_if_tm_restore_required(tsk);
> -
> -	msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
>  	if (tsk->thread.regs->msr & MSR_FP)
>  		__giveup_fpu(tsk);
>  	if (tsk->thread.regs->msr & MSR_VEC)
>  		__giveup_altivec(tsk);
> +	tsk->thread.regs->msr &=3D ~MSR_VSX;
> +}
> +
> +void giveup_vsx(struct task_struct *tsk)
> +{
> +	check_if_tm_restore_required(tsk);
> +
> +	msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
>  	__giveup_vsx(tsk);
>  	msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
>  }
>  EXPORT_SYMBOL(giveup_vsx);
> =20
> +void save_vsx(struct task_struct *tsk)
> +{
> +	if (tsk->thread.regs->msr & MSR_FP)
> +		save_fpu(tsk);
> +	if (tsk->thread.regs->msr & MSR_VEC)
> +		save_altivec(tsk);
> +}
> +
>  void enable_kernel_vsx(void)
>  {
>  	WARN_ON(preemptible());
> @@ -465,7 +478,7 @@ void save_all(struct task_struct *tsk)
>  #endif
>  #ifdef CONFIG_VSX
>  	if (usermsr & MSR_VSX)
> -		__giveup_vsx(tsk);
> +		save_vsx(tsk);

This seems suboptimal.  save_vsx() will call save_fpu() and
save_altivec() again, which you just called earlier in save_all().

save_vsx() is only used here, so could be static.=20

Also, put the #ifdef junk as part of the function so that the caller
doesn't have to deal with it.=20

Mikey

>  #endif
>  #ifdef CONFIG_SPE
>  	if (usermsr & MSR_SPE)
> diff --git a/arch/powerpc/kernel/vector.S
> b/arch/powerpc/kernel/vector.S
> index 51b0c17..1c2e7a3 100644
> --- a/arch/powerpc/kernel/vector.S
> +++ b/arch/powerpc/kernel/vector.S
> @@ -151,23 +151,6 @@ _GLOBAL(load_up_vsx)
>  	std	r12,_MSR(r1)
>  	b	fast_exception_return
> =20
> -/*
> - * __giveup_vsx(tsk)
> - * Disable VSX for the task given as the argument.
> - * Does NOT save vsx registers.
> - */
> -_GLOBAL(__giveup_vsx)
> -	addi	r3,r3,THREAD		/* want THREAD of
> task */
> -	ld	r5,PT_REGS(r3)
> -	cmpdi	0,r5,0
> -	beq	1f
> -	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> -	lis	r3,MSR_VSX@h
> -	andc	r4,r4,r3		/* disable VSX for
> previous task */
> -	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> -1:
> -	blr
> -
>  #endif /* CONFIG_VSX */
> =20
> =20

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up
  2016-01-15  5:04 ` [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
  2016-01-15  6:08   ` Michael Neuling
@ 2016-01-15  7:38   ` Denis Kirjanov
  2016-01-15  7:42     ` Denis Kirjanov
  1 sibling, 1 reply; 20+ messages in thread
From: Denis Kirjanov @ 2016-01-15  7:38 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev

On 1/15/16, Cyril Bur <cyrilbur@gmail.com> wrote:
> This patch adds the ability to be able to save the FPU registers to the
> thread struct without giving up (disabling the facility) next time the
> process returns to userspace.
>
> This patch optimises the thread copy path (as a result of a fork() or
> clone()) so that the parent thread can return to userspace with hot
> registers avoiding a possibly pointless reload of FPU register state.

Ok, but if the patch optimizes the copy path then show the performance numbers.

Thanks!
>
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
>  arch/powerpc/include/asm/switch_to.h |  2 +-
>  arch/powerpc/kernel/fpu.S            | 21 ++++------------
>  arch/powerpc/kernel/process.c        | 46
> +++++++++++++++++++++++++++++++++++-
>  3 files changed, 50 insertions(+), 19 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/switch_to.h
> b/arch/powerpc/include/asm/switch_to.h
> index 5b268b6..c4d50e9 100644
> --- a/arch/powerpc/include/asm/switch_to.h
> +++ b/arch/powerpc/include/asm/switch_to.h
> @@ -28,7 +28,7 @@ extern void giveup_all(struct task_struct *);
>  extern void enable_kernel_fp(void);
>  extern void flush_fp_to_thread(struct task_struct *);
>  extern void giveup_fpu(struct task_struct *);
> -extern void __giveup_fpu(struct task_struct *);
> +extern void save_fpu(struct task_struct *);
>  static inline void disable_kernel_fp(void)
>  {
>  	msr_check_and_clear(MSR_FP);
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index b063524..15da2b5 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>  	blr
>
>  /*
> - * __giveup_fpu(tsk)
> - * Disable FP for the task given as the argument,
> - * and save the floating-point registers in its thread_struct.
> + * save_fpu(tsk)
> + * Save the floating-point registers in its thread_struct.
>   * Enables the FPU for use in the kernel on return.
>   */
> -_GLOBAL(__giveup_fpu)
> +_GLOBAL(save_fpu)
>  	addi	r3,r3,THREAD	        /* want THREAD of task */
>  	PPC_LL	r6,THREAD_FPSAVEAREA(r3)
>  	PPC_LL	r5,PT_REGS(r3)
>  	PPC_LCMPI	0,r6,0
>  	bne	2f
>  	addi	r6,r3,THREAD_FPSTATE
> -2:	PPC_LCMPI	0,r5,0
> -	SAVE_32FPVSRS(0, R4, R6)
> +2:	SAVE_32FPVSRS(0, R4, R6)
>  	mffs	fr0
>  	stfd	fr0,FPSTATE_FPSCR(r6)
> -	beq	1f
> -	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> -	li	r3,MSR_FP|MSR_FE0|MSR_FE1
> -#ifdef CONFIG_VSX
> -BEGIN_FTR_SECTION
> -	oris	r3,r3,MSR_VSX@h
> -END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> -#endif
> -	andc	r4,r4,r3		/* disable FP for previous task */
> -	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> -1:
>  	blr
>
>  /*
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index ec53468..8a96e4f 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits)
>  EXPORT_SYMBOL(__msr_check_and_clear);
>
>  #ifdef CONFIG_PPC_FPU
> +void __giveup_fpu(struct task_struct *tsk)
> +{
> +	save_fpu(tsk);
> +	tsk->thread.regs->msr &= ~MSR_FP;
> +#ifdef CONFIG_VSX
> +	if (cpu_has_feature(CPU_FTR_VSX))
> +		tsk->thread.regs->msr &= ~MSR_VSX;
> +#endif
> +}
> +
>  void giveup_fpu(struct task_struct *tsk)
>  {
>  	check_if_tm_restore_required(tsk);
> @@ -421,12 +431,46 @@ void restore_math(struct pt_regs *regs)
>  	regs->msr = msr;
>  }
>
> +void save_all(struct task_struct *tsk)
> +{
> +	unsigned long usermsr;
> +
> +	if (!tsk->thread.regs)
> +		return;
> +
> +	usermsr = tsk->thread.regs->msr;
> +
> +	if ((usermsr & msr_all_available) == 0)
> +		return;
> +
> +	msr_check_and_set(msr_all_available);
> +
> +#ifdef CONFIG_PPC_FPU
> +	if (usermsr & MSR_FP)
> +		save_fpu(tsk);
> +#endif
> +#ifdef CONFIG_ALTIVEC
> +	if (usermsr & MSR_VEC)
> +		__giveup_altivec(tsk);
> +#endif
> +#ifdef CONFIG_VSX
> +	if (usermsr & MSR_VSX)
> +		__giveup_vsx(tsk);
> +#endif
> +#ifdef CONFIG_SPE
> +	if (usermsr & MSR_SPE)
> +		__giveup_spe(tsk);
> +#endif
> +
> +	msr_check_and_clear(msr_all_available);
> +}
> +
>  void flush_all_to_thread(struct task_struct *tsk)
>  {
>  	if (tsk->thread.regs) {
>  		preempt_disable();
>  		BUG_ON(tsk != current);
> -		giveup_all(tsk);
> +		save_all(tsk);
>
>  #ifdef CONFIG_SPE
>  		if (tsk->thread.regs->msr & MSR_SPE)
> --
> 2.7.0
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up
  2016-01-15  7:38   ` Denis Kirjanov
@ 2016-01-15  7:42     ` Denis Kirjanov
  0 siblings, 0 replies; 20+ messages in thread
From: Denis Kirjanov @ 2016-01-15  7:42 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev

On 1/15/16, Denis Kirjanov <kda@linux-powerpc.org> wrote:
> On 1/15/16, Cyril Bur <cyrilbur@gmail.com> wrote:
>> This patch adds the ability to be able to save the FPU registers to the
>> thread struct without giving up (disabling the facility) next time the
>> process returns to userspace.
>>
>> This patch optimises the thread copy path (as a result of a fork() or
>> clone()) so that the parent thread can return to userspace with hot
>> registers avoiding a possibly pointless reload of FPU register state.
>
> Ok, but if the patch optimizes the copy path then show the performance
> numbers.
>
> Thanks!

Oh, I've missed your cover letter.

>>
>> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
>> ---
>>  arch/powerpc/include/asm/switch_to.h |  2 +-
>>  arch/powerpc/kernel/fpu.S            | 21 ++++------------
>>  arch/powerpc/kernel/process.c        | 46
>> +++++++++++++++++++++++++++++++++++-
>>  3 files changed, 50 insertions(+), 19 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/switch_to.h
>> b/arch/powerpc/include/asm/switch_to.h
>> index 5b268b6..c4d50e9 100644
>> --- a/arch/powerpc/include/asm/switch_to.h
>> +++ b/arch/powerpc/include/asm/switch_to.h
>> @@ -28,7 +28,7 @@ extern void giveup_all(struct task_struct *);
>>  extern void enable_kernel_fp(void);
>>  extern void flush_fp_to_thread(struct task_struct *);
>>  extern void giveup_fpu(struct task_struct *);
>> -extern void __giveup_fpu(struct task_struct *);
>> +extern void save_fpu(struct task_struct *);
>>  static inline void disable_kernel_fp(void)
>>  {
>>  	msr_check_and_clear(MSR_FP);
>> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
>> index b063524..15da2b5 100644
>> --- a/arch/powerpc/kernel/fpu.S
>> +++ b/arch/powerpc/kernel/fpu.S
>> @@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>>  	blr
>>
>>  /*
>> - * __giveup_fpu(tsk)
>> - * Disable FP for the task given as the argument,
>> - * and save the floating-point registers in its thread_struct.
>> + * save_fpu(tsk)
>> + * Save the floating-point registers in its thread_struct.
>>   * Enables the FPU for use in the kernel on return.
>>   */
>> -_GLOBAL(__giveup_fpu)
>> +_GLOBAL(save_fpu)
>>  	addi	r3,r3,THREAD	        /* want THREAD of task */
>>  	PPC_LL	r6,THREAD_FPSAVEAREA(r3)
>>  	PPC_LL	r5,PT_REGS(r3)
>>  	PPC_LCMPI	0,r6,0
>>  	bne	2f
>>  	addi	r6,r3,THREAD_FPSTATE
>> -2:	PPC_LCMPI	0,r5,0
>> -	SAVE_32FPVSRS(0, R4, R6)
>> +2:	SAVE_32FPVSRS(0, R4, R6)
>>  	mffs	fr0
>>  	stfd	fr0,FPSTATE_FPSCR(r6)
>> -	beq	1f
>> -	PPC_LL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
>> -	li	r3,MSR_FP|MSR_FE0|MSR_FE1
>> -#ifdef CONFIG_VSX
>> -BEGIN_FTR_SECTION
>> -	oris	r3,r3,MSR_VSX@h
>> -END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>> -#endif
>> -	andc	r4,r4,r3		/* disable FP for previous task */
>> -	PPC_STL	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
>> -1:
>>  	blr
>>
>>  /*
>> diff --git a/arch/powerpc/kernel/process.c
>> b/arch/powerpc/kernel/process.c
>> index ec53468..8a96e4f 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits)
>>  EXPORT_SYMBOL(__msr_check_and_clear);
>>
>>  #ifdef CONFIG_PPC_FPU
>> +void __giveup_fpu(struct task_struct *tsk)
>> +{
>> +	save_fpu(tsk);
>> +	tsk->thread.regs->msr &= ~MSR_FP;
>> +#ifdef CONFIG_VSX
>> +	if (cpu_has_feature(CPU_FTR_VSX))
>> +		tsk->thread.regs->msr &= ~MSR_VSX;
>> +#endif
>> +}
>> +
>>  void giveup_fpu(struct task_struct *tsk)
>>  {
>>  	check_if_tm_restore_required(tsk);
>> @@ -421,12 +431,46 @@ void restore_math(struct pt_regs *regs)
>>  	regs->msr = msr;
>>  }
>>
>> +void save_all(struct task_struct *tsk)
>> +{
>> +	unsigned long usermsr;
>> +
>> +	if (!tsk->thread.regs)
>> +		return;
>> +
>> +	usermsr = tsk->thread.regs->msr;
>> +
>> +	if ((usermsr & msr_all_available) == 0)
>> +		return;
>> +
>> +	msr_check_and_set(msr_all_available);
>> +
>> +#ifdef CONFIG_PPC_FPU
>> +	if (usermsr & MSR_FP)
>> +		save_fpu(tsk);
>> +#endif
>> +#ifdef CONFIG_ALTIVEC
>> +	if (usermsr & MSR_VEC)
>> +		__giveup_altivec(tsk);
>> +#endif
>> +#ifdef CONFIG_VSX
>> +	if (usermsr & MSR_VSX)
>> +		__giveup_vsx(tsk);
>> +#endif
>> +#ifdef CONFIG_SPE
>> +	if (usermsr & MSR_SPE)
>> +		__giveup_spe(tsk);
>> +#endif
>> +
>> +	msr_check_and_clear(msr_all_available);
>> +}
>> +
>>  void flush_all_to_thread(struct task_struct *tsk)
>>  {
>>  	if (tsk->thread.regs) {
>>  		preempt_disable();
>>  		BUG_ON(tsk != current);
>> -		giveup_all(tsk);
>> +		save_all(tsk);
>>
>>  #ifdef CONFIG_SPE
>>  		if (tsk->thread.regs->msr & MSR_SPE)
>> --
>> 2.7.0
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/8] powerpc: Restore FPU/VEC/VSX if previously used
  2016-01-15  6:02   ` Michael Neuling
@ 2016-01-18  2:05     ` Cyril Bur
  0 siblings, 0 replies; 20+ messages in thread
From: Cyril Bur @ 2016-01-18  2:05 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev

On Fri, 15 Jan 2016 17:02:41 +1100
Michael Neuling <mikey@neuling.org> wrote:

Hey Mikey,

Thanks for the review, as always you're correct :).

> 
> Can you make the inline code easier to read?  Something like
> 
> #ifdef CONFIG_ALTIVEC
> #define loadvec(thr) ((thr).load_vec)
> #else
> #define loadvec(thr) 0
> #endif
> 
> void restore_math(struct pt_regs *regs)
> {
>        unsigned long msr;
> 
>        if (!current->thread.load_fp && !loadvec(current->thread)
> 		return;
> 
> > +
> > +       msr = regs->msr;
> > +       msr_check_and_set(msr_all_available);
> > +
> > +       /*
> > +        * Only reload if the bit is not set in the user MSR, the bit BEING set
> > +        * indicates that the registers are hot
> > +        */
> > +#ifdef CONFIG_PPC_FPU
> > +       if (current->thread.load_fp && !(msr & MSR_FP)) {
> > +               load_fp_state(&current->thread.fp_state);
> > +               msr |= MSR_FP | current->thread.fpexc_mode;
> > +               current->thread.load_fp++;
> > +       }
> > +#endif
> > +#ifdef CONFIG_ALTIVEC
> > +       if (current->thread.load_vec && !(msr & MSR_VEC) &&
> > +                       cpu_has_feature(CPU_FTR_ALTIVEC)) {
> > +               load_vr_state(&current->thread.vr_state);
> > +               current->thread.used_vr = 1;
> > +               msr |= MSR_VEC;
> > +               current->thread.load_vec++;
> > +       }
> > +#endif
> > +#ifdef CONFIG_VSX
> > +       if (!(msr & MSR_VSX) && (msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC)) {  
> 
> What are you trying to hit with this if statement?
> 
> Seems you are turning on VSX if VSX is not already on but FP and VEC
> is.  Why do you need the check MSR_VSX is not used?  That seems redundant.
> 
> > +               current->thread.used_vsr = 1;
> > +               msr |= MSR_VSX;
> > +       }
> > +#endif
> > +
> > +       msr_check_and_clear(msr_all_available);  
> 
> Why are you doing this?  Why all, and not just the ones you've enabled above?
> 

This is part of the batching of MSR reads and writes. We turned everything on
at the start of restore_math() because it means only one write, the MSR
reads/writes are where the performance hit, not the number of bits changed.
Obviously we subsequently we turn everything off again because it also means
only one write (and we had unconditionally turned everything on).

The check at the start of restore_math() and in entry_64.S should mean that we
don't the msr_check_and_set()/msr_check_and_clear() block with nothing to do.

> > +
> > +       regs->msr = msr;
> > +}
> > +
> >  void flush_all_to_thread(struct task_struct *tsk)
> >  {
> >         if (tsk->thread.regs) {
> > @@ -832,17 +879,9 @@ void restore_tm_state(struct pt_regs *regs)  
>  
> >         msr_diff = current->thread.ckpt_regs.msr & ~regs->msr;
> >         msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
> > -       if (msr_diff & MSR_FP) {
> > -               msr_check_and_set(MSR_FP);
> > -               load_fp_state(&current->thread.fp_state);
> > -               msr_check_and_clear(MSR_FP);
> > -               regs->msr |= current->thread.fpexc_mode;
> > -       }
> > -       if (msr_diff & MSR_VEC) {
> > -               msr_check_and_set(MSR_VEC);
> > -               load_vr_state(&current->thread.vr_state);
> > -               msr_check_and_clear(MSR_VEC);
> > -       }
> > +
> > +       restore_math(regs);
> > +
> >         regs->msr |= msr_diff;
> >  }  
>  
> > @@ -1006,6 +1045,11 @@ struct task_struct *__switch_to(struct task_struct *prev,
> >                 batch = this_cpu_ptr(&ppc64_tlb_batch);
> >                 batch->active = 1;
> >         }
> > +
> > +       /* Don't do this on a kernel thread */  
> 
> Why not?
> 
> > +       if (current_thread_info()->task->thread.regs)
> > +               restore_math(current_thread_info()->task->thread.regs);
> > +
> >  #endif /* CONFIG_PPC_BOOK3S_64 */  
>  
> >         return last;
> > diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
> > index 162d0f7..038cff8 100644
> > --- a/arch/powerpc/kernel/vector.S
> > +++ b/arch/powerpc/kernel/vector.S
> > @@ -91,6 +91,10 @@ _GLOBAL(load_up_altivec)
> >         oris    r12,r12,MSR_VEC@h
> >         std     r12,_MSR(r1)
> >  #endif
> > +       /* Don't care if r4 overflows, this is desired behaviour */
> > +       lbz     r4,THREAD_LOAD_VEC(r5)
> > +       addi    r4,r4,1
> > +       stb     r4,THREAD_LOAD_VEC(r5)
> >         addi    r6,r5,THREAD_VRSTATE
> >         li      r4,1
> >         li      r10,VRSTATE_VSCR
> > -- 
> > 2.7.0  
> 
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-devOn Fri, 2016-01-15 at 16:04 +1100, Cyril Bur wrote:  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 8/8] powerpc: Add the ability to save VSX without giving it up
  2016-01-15  6:25   ` Michael Neuling
@ 2016-01-18  2:10     ` Cyril Bur
  2016-01-18  4:29       ` Michael Neuling
  0 siblings, 1 reply; 20+ messages in thread
From: Cyril Bur @ 2016-01-18  2:10 UTC (permalink / raw)
  To: Michael Neuling; +Cc: linuxppc-dev

On Fri, 15 Jan 2016 17:25:26 +1100
Michael Neuling <mikey@neuling.org> wrote:

> On Fri, 2016-01-15 at 16:04 +1100, Cyril Bur wrote:
> > This patch adds the ability to be able to save the VSX registers to
> > the
> > thread struct without giving up (disabling the facility) next time
> > the
> > process returns to userspace.
> > 
> > This patch builds on a previous optimisation for the FPU and VEC
> > registers
> > in the thread copy path to avoid a possibly pointless reload of VSX
> > state.
> > 
> > Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> > ---
> >  arch/powerpc/include/asm/switch_to.h |  1 -
> >  arch/powerpc/kernel/ppc_ksyms.c      |  4 ----
> >  arch/powerpc/kernel/process.c        | 23 ++++++++++++++++++-----
> >  arch/powerpc/kernel/vector.S         | 17 -----------------
> >  4 files changed, 18 insertions(+), 27 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/switch_to.h
> > b/arch/powerpc/include/asm/switch_to.h
> > index 29dda9d..4dfcd3e 100644
> > --- a/arch/powerpc/include/asm/switch_to.h
> > +++ b/arch/powerpc/include/asm/switch_to.h
> > @@ -52,7 +52,6 @@ static inline void disable_kernel_altivec(void)
> >  extern void enable_kernel_vsx(void);
> >  extern void flush_vsx_to_thread(struct task_struct *);
> >  extern void giveup_vsx(struct task_struct *);
> > -extern void __giveup_vsx(struct task_struct *);
> >  static inline void disable_kernel_vsx(void)
> >  {
> >  	msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
> > diff --git a/arch/powerpc/kernel/ppc_ksyms.c
> > b/arch/powerpc/kernel/ppc_ksyms.c
> > index 41e1607..ef7024da 100644
> > --- a/arch/powerpc/kernel/ppc_ksyms.c
> > +++ b/arch/powerpc/kernel/ppc_ksyms.c
> > @@ -28,10 +28,6 @@ EXPORT_SYMBOL(load_vr_state);
> >  EXPORT_SYMBOL(store_vr_state);
> >  #endif
> >  
> > -#ifdef CONFIG_VSX
> > -EXPORT_SYMBOL_GPL(__giveup_vsx);
> > -#endif
> > -
> >  #ifdef CONFIG_EPAPR_PARAVIRT
> >  EXPORT_SYMBOL(epapr_hypercall_start);
> >  #endif
> > diff --git a/arch/powerpc/kernel/process.c
> > b/arch/powerpc/kernel/process.c
> > index 5566c32..3d907b8 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -252,20 +252,33 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
> >  #endif /* CONFIG_ALTIVEC */
> >  
> >  #ifdef CONFIG_VSX
> > -void giveup_vsx(struct task_struct *tsk)
> > +void __giveup_vsx(struct task_struct *tsk)
> >  {
> > -	check_if_tm_restore_required(tsk);
> > -
> > -	msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
> >  	if (tsk->thread.regs->msr & MSR_FP)
> >  		__giveup_fpu(tsk);
> >  	if (tsk->thread.regs->msr & MSR_VEC)
> >  		__giveup_altivec(tsk);
> > +	tsk->thread.regs->msr &= ~MSR_VSX;
> > +}
> > +
> > +void giveup_vsx(struct task_struct *tsk)
> > +{
> > +	check_if_tm_restore_required(tsk);
> > +
> > +	msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
> >  	__giveup_vsx(tsk);
> >  	msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
> >  }
> >  EXPORT_SYMBOL(giveup_vsx);
> >  
> > +void save_vsx(struct task_struct *tsk)
> > +{
> > +	if (tsk->thread.regs->msr & MSR_FP)
> > +		save_fpu(tsk);
> > +	if (tsk->thread.regs->msr & MSR_VEC)
> > +		save_altivec(tsk);
> > +}
> > +
> >  void enable_kernel_vsx(void)
> >  {
> >  	WARN_ON(preemptible());
> > @@ -465,7 +478,7 @@ void save_all(struct task_struct *tsk)
> >  #endif
> >  #ifdef CONFIG_VSX
> >  	if (usermsr & MSR_VSX)
> > -		__giveup_vsx(tsk);
> > +		save_vsx(tsk);  
> 
> This seems suboptimal.  save_vsx() will call save_fpu() and
> save_altivec() again, which you just called earlier in save_all().
> 

Ah yes, will fix

> save_vsx() is only used here, so could be static. 
> 

Thanks.

> Also, put the #ifdef junk as part of the function so that the caller
> doesn't have to deal with it. 
> 

Can do absolutely, however this means that in save_all I can't check if the
function needs to be called or not. For example, without CONFIG_VSX, MSR_VSX
won't exist which means we might end up calling save_vsx THEN checking MSR_VSX
and returning early.

I'm happy to defer to you and mpe on what's nicer, I would side with avoiding
the function call at the cost of ugly #ifdefs but I can always see the merits
of clean code.

Thanks for the review,

Cyril

> Mikey
> 
> >  #endif
> >  #ifdef CONFIG_SPE
> >  	if (usermsr & MSR_SPE)
> > diff --git a/arch/powerpc/kernel/vector.S
> > b/arch/powerpc/kernel/vector.S
> > index 51b0c17..1c2e7a3 100644
> > --- a/arch/powerpc/kernel/vector.S
> > +++ b/arch/powerpc/kernel/vector.S
> > @@ -151,23 +151,6 @@ _GLOBAL(load_up_vsx)
> >  	std	r12,_MSR(r1)
> >  	b	fast_exception_return
> >  
> > -/*
> > - * __giveup_vsx(tsk)
> > - * Disable VSX for the task given as the argument.
> > - * Does NOT save vsx registers.
> > - */
> > -_GLOBAL(__giveup_vsx)
> > -	addi	r3,r3,THREAD		/* want THREAD of
> > task */
> > -	ld	r5,PT_REGS(r3)
> > -	cmpdi	0,r5,0
> > -	beq	1f
> > -	ld	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> > -	lis	r3,MSR_VSX@h
> > -	andc	r4,r4,r3		/* disable VSX for
> > previous task */
> > -	std	r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> > -1:
> > -	blr
> > -
> >  #endif /* CONFIG_VSX */
> >  
> >    

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 8/8] powerpc: Add the ability to save VSX without giving it up
  2016-01-18  2:10     ` Cyril Bur
@ 2016-01-18  4:29       ` Michael Neuling
  0 siblings, 0 replies; 20+ messages in thread
From: Michael Neuling @ 2016-01-18  4:29 UTC (permalink / raw)
  To: Cyril Bur; +Cc: linuxppc-dev


> > Also, put the #ifdef junk as part of the function so that the caller
> > doesn't have to deal with it.=20
> >=20
>=20
> Can do absolutely, however this means that in save_all I can't check if t=
he
> function needs to be called or not. For example, without CONFIG_VSX, MSR_=
VSX
> won't exist which means we might end up calling save_vsx THEN checking MS=
R_VSX
> and returning early.

MSR_VSX should exist even without CONFIG_VSX, so you should be able to
do it.

> I'm happy to defer to you and mpe on what's nicer, I would side with avoi=
ding
> the function call at the cost of ugly #ifdefs but I can always see the me=
rits
> of clean code.

I'd prefer the cleanup.  This code has been pretty bad in the past for
being a bit of an #ifdef mess, so it'd be nice to clean it up a bit if
possible.

Mikey

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-01-18  4:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-15  5:04 [PATCH V2 0/8] FP/VEC/VSX switching optimisations Cyril Bur
2016-01-15  5:04 ` [PATCH V2 1/8] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
2016-01-15  5:04 ` [PATCH V2 2/8] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
2016-01-15  5:04 ` [PATCH V2 3/8] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
2016-01-15  5:04 ` [PATCH V2 4/8] powerpc: Explicitly disable math features when copying thread Cyril Bur
2016-01-15  5:42   ` Michael Neuling
2016-01-15  5:54     ` Cyril Bur
2016-01-15  6:04       ` Michael Neuling
2016-01-15  5:04 ` [PATCH V2 5/8] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
2016-01-15  6:02   ` Michael Neuling
2016-01-18  2:05     ` Cyril Bur
2016-01-15  5:04 ` [PATCH V2 6/8] powerpc: Add the ability to save FPU without giving it up Cyril Bur
2016-01-15  6:08   ` Michael Neuling
2016-01-15  7:38   ` Denis Kirjanov
2016-01-15  7:42     ` Denis Kirjanov
2016-01-15  5:04 ` [PATCH V2 7/8] powerpc: Add the ability to save Altivec " Cyril Bur
2016-01-15  5:04 ` [PATCH V2 8/8] powerpc: Add the ability to save VSX " Cyril Bur
2016-01-15  6:25   ` Michael Neuling
2016-01-18  2:10     ` Cyril Bur
2016-01-18  4:29       ` Michael Neuling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.