* [PATCH v3 0/9] FP/VEC/VSX switching optimisations
@ 2016-01-21 0:55 Cyril Bur
2016-01-21 0:55 ` [PATCH v3 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
` (8 more replies)
0 siblings, 9 replies; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
Cover-letter for V1 of the series is at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-November/136350.html
Cover-letter for V2 of the series is at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-January/138054.html
Changes in V3:
Addressed review comments from Michael Neuling
- Made commit message in 4/9 better reflect the patch
- Removed overuse of #ifdef blocks and redundant condition in 5/9
- Split 6/8 in two to better prepare for 7,8,9
- Removed #ifdefs in 6/9
Cyril Bur (9):
selftests/powerpc: Test the preservation of FPU and VMX regs across
syscall
selftests/powerpc: Test preservation of FPU and VMX regs across
preemption
selftests/powerpc: Test FPU and VMX regs in signal ucontext
powerpc: Explicitly disable math features when copying thread
powerpc: Restore FPU/VEC/VSX if previously used
powerpc: Prepare for splitting giveup_{fpu,altivec,vsx} in two
powerpc: Add the ability to save FPU without giving it up
powerpc: Add the ability to save Altivec without giving it up
powerpc: Add the ability to save VSX without giving it up
arch/powerpc/include/asm/processor.h | 2 +
arch/powerpc/include/asm/switch_to.h | 13 +-
arch/powerpc/kernel/asm-offsets.c | 2 +
arch/powerpc/kernel/entry_64.S | 21 +-
arch/powerpc/kernel/fpu.S | 25 +--
arch/powerpc/kernel/ppc_ksyms.c | 4 -
arch/powerpc/kernel/process.c | 172 ++++++++++++++--
arch/powerpc/kernel/vector.S | 45 +---
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/basic_asm.h | 26 +++
tools/testing/selftests/powerpc/math/.gitignore | 6 +
tools/testing/selftests/powerpc/math/Makefile | 19 ++
tools/testing/selftests/powerpc/math/fpu_asm.S | 195 ++++++++++++++++++
tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 ++++++++++
tools/testing/selftests/powerpc/math/fpu_signal.c | 135 ++++++++++++
tools/testing/selftests/powerpc/math/fpu_syscall.c | 90 ++++++++
tools/testing/selftests/powerpc/math/vmx_asm.S | 229 +++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 ++++++++++
tools/testing/selftests/powerpc/math/vmx_signal.c | 138 +++++++++++++
tools/testing/selftests/powerpc/math/vmx_syscall.c | 92 +++++++++
20 files changed, 1360 insertions(+), 83 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
create mode 100644 tools/testing/selftests/powerpc/math/Makefile
create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c
--
2.7.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v3 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
2016-02-10 8:56 ` [v3, " Michael Ellerman
2016-01-21 0:55 ` [PATCH v3 2/9] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
` (7 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
Test that the non volatile floating point and Altivec registers get
correctly preserved across the fork() syscall.
fork() works nicely for this purpose, the registers should be the same for
both parent and child
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
tools/testing/selftests/powerpc/Makefile | 3 +-
tools/testing/selftests/powerpc/basic_asm.h | 26 +++
tools/testing/selftests/powerpc/math/.gitignore | 2 +
tools/testing/selftests/powerpc/math/Makefile | 14 ++
tools/testing/selftests/powerpc/math/fpu_asm.S | 161 +++++++++++++++++
tools/testing/selftests/powerpc/math/fpu_syscall.c | 90 ++++++++++
tools/testing/selftests/powerpc/math/vmx_asm.S | 193 +++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_syscall.c | 92 ++++++++++
8 files changed, 580 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/powerpc/basic_asm.h
create mode 100644 tools/testing/selftests/powerpc/math/.gitignore
create mode 100644 tools/testing/selftests/powerpc/math/Makefile
create mode 100644 tools/testing/selftests/powerpc/math/fpu_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/fpu_syscall.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_asm.S
create mode 100644 tools/testing/selftests/powerpc/math/vmx_syscall.c
diff --git a/tools/testing/selftests/powerpc/Makefile b/tools/testing/selftests/powerpc/Makefile
index 0c2706b..19e8191 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -22,7 +22,8 @@ SUB_DIRS = benchmarks \
switch_endian \
syscalls \
tm \
- vphn
+ vphn \
+ math
endif
diff --git a/tools/testing/selftests/powerpc/basic_asm.h b/tools/testing/selftests/powerpc/basic_asm.h
new file mode 100644
index 0000000..27aca79
--- /dev/null
+++ b/tools/testing/selftests/powerpc/basic_asm.h
@@ -0,0 +1,26 @@
+#include <ppc-asm.h>
+#include <asm/unistd.h>
+
+#define LOAD_REG_IMMEDIATE(reg,expr) \
+ lis reg,(expr)@highest; \
+ ori reg,reg,(expr)@higher; \
+ rldicr reg,reg,32,31; \
+ oris reg,reg,(expr)@high; \
+ ori reg,reg,(expr)@l;
+
+#define PUSH_BASIC_STACK(size) \
+ std 2,24(sp); \
+ mflr r0; \
+ std r0,16(sp); \
+ mfcr r0; \
+ stw r0,8(sp); \
+ stdu sp,-size(sp);
+
+#define POP_BASIC_STACK(size) \
+ addi sp,sp,size; \
+ ld 2,24(sp); \
+ ld r0,16(sp); \
+ mtlr r0; \
+ lwz r0,8(sp); \
+ mtcr r0; \
+
diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore
new file mode 100644
index 0000000..b19b269
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -0,0 +1,2 @@
+fpu_syscall
+vmx_syscall
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
new file mode 100644
index 0000000..418bef1
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -0,0 +1,14 @@
+TEST_PROGS := fpu_syscall vmx_syscall
+
+all: $(TEST_PROGS)
+
+$(TEST_PROGS): ../harness.c
+$(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
+
+fpu_syscall: fpu_asm.S
+vmx_syscall: vmx_asm.S
+
+include ../../lib.mk
+
+clean:
+ rm -f $(TEST_PROGS) *.o
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S
new file mode 100644
index 0000000..8733874
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -0,0 +1,161 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+
+#define PUSH_FPU(pos) \
+ stfd f14,pos(sp); \
+ stfd f15,pos+8(sp); \
+ stfd f16,pos+16(sp); \
+ stfd f17,pos+24(sp); \
+ stfd f18,pos+32(sp); \
+ stfd f19,pos+40(sp); \
+ stfd f20,pos+48(sp); \
+ stfd f21,pos+56(sp); \
+ stfd f22,pos+64(sp); \
+ stfd f23,pos+72(sp); \
+ stfd f24,pos+80(sp); \
+ stfd f25,pos+88(sp); \
+ stfd f26,pos+96(sp); \
+ stfd f27,pos+104(sp); \
+ stfd f28,pos+112(sp); \
+ stfd f29,pos+120(sp); \
+ stfd f30,pos+128(sp); \
+ stfd f31,pos+136(sp);
+
+#define POP_FPU(pos) \
+ lfd f14,pos(sp); \
+ lfd f15,pos+8(sp); \
+ lfd f16,pos+16(sp); \
+ lfd f17,pos+24(sp); \
+ lfd f18,pos+32(sp); \
+ lfd f19,pos+40(sp); \
+ lfd f20,pos+48(sp); \
+ lfd f21,pos+56(sp); \
+ lfd f22,pos+64(sp); \
+ lfd f23,pos+72(sp); \
+ lfd f24,pos+80(sp); \
+ lfd f25,pos+88(sp); \
+ lfd f26,pos+96(sp); \
+ lfd f27,pos+104(sp); \
+ lfd f28,pos+112(sp); \
+ lfd f29,pos+120(sp); \
+ lfd f30,pos+128(sp); \
+ lfd f31,pos+136(sp);
+
+#Careful calling this, it will 'clobber' fpu (by design)
+#Don't call this from C
+FUNC_START(load_fpu)
+ lfd f14,0(r3)
+ lfd f15,8(r3)
+ lfd f16,16(r3)
+ lfd f17,24(r3)
+ lfd f18,32(r3)
+ lfd f19,40(r3)
+ lfd f20,48(r3)
+ lfd f21,56(r3)
+ lfd f22,64(r3)
+ lfd f23,72(r3)
+ lfd f24,80(r3)
+ lfd f25,88(r3)
+ lfd f26,96(r3)
+ lfd f27,104(r3)
+ lfd f28,112(r3)
+ lfd f29,120(r3)
+ lfd f30,128(r3)
+ lfd f31,136(r3)
+ blr
+FUNC_END(load_fpu)
+
+FUNC_START(check_fpu)
+ mr r4,r3
+ li r3,1 #assume a bad result
+ lfd f0,0(r4)
+ fcmpu cr1,f0,f14
+ bne cr1,1f
+ lfd f0,8(r4)
+ fcmpu cr1,f0,f15
+ bne cr1,1f
+ lfd f0,16(r4)
+ fcmpu cr1,f0,f16
+ bne cr1,1f
+ lfd f0,24(r4)
+ fcmpu cr1,f0,f17
+ bne cr1,1f
+ lfd f0,32(r4)
+ fcmpu cr1,f0,f18
+ bne cr1,1f
+ lfd f0,40(r4)
+ fcmpu cr1,f0,f19
+ bne cr1,1f
+ lfd f0,48(r4)
+ fcmpu cr1,f0,f20
+ bne cr1,1f
+ lfd f0,56(r4)
+ fcmpu cr1,f0,f21
+ bne cr1,1f
+ lfd f0,64(r4)
+ fcmpu cr1,f0,f22
+ bne cr1,1f
+ lfd f0,72(r4)
+ fcmpu cr1,f0,f23
+ bne cr1,1f
+ lfd f0,80(r4)
+ fcmpu cr1,f0,f24
+ bne cr1,1f
+ lfd f0,88(r4)
+ fcmpu cr1,f0,f25
+ bne cr1,1f
+ lfd f0,96(r4)
+ fcmpu cr1,f0,f26
+ bne cr1,1f
+ lfd f0,104(r4)
+ fcmpu cr1,f0,f27
+ bne cr1,1f
+ lfd f0,112(r4)
+ fcmpu cr1,f0,f28
+ bne cr1,1f
+ lfd f0,120(r4)
+ fcmpu cr1,f0,f29
+ bne cr1,1f
+ lfd f0,128(r4)
+ fcmpu cr1,f0,f30
+ bne cr1,1f
+ lfd f0,136(r4)
+ fcmpu cr1,f0,f31
+ bne cr1,1f
+ li r3,0 #Sucess!!!
+1: blr
+
+FUNC_START(test_fpu)
+ #r3 holds pointer to where to put the result of fork
+ #r4 holds pointer to the pid
+ #f14-f31 are non volatiles
+ PUSH_BASIC_STACK(256)
+ std r3,40(sp) #Address of darray
+ std r4,48(sp) #Address of pid
+ PUSH_FPU(56)
+
+ bl load_fpu
+ nop
+ li r0,__NR_fork
+ sc
+
+ #pass the result of the fork to the caller
+ ld r9,48(sp)
+ std r3,0(r9)
+
+ ld r3,40(sp)
+ bl check_fpu
+ nop
+
+ POP_FPU(56)
+ POP_BASIC_STACK(256)
+ blr
+FUNC_END(test_fpu)
diff --git a/tools/testing/selftests/powerpc/math/fpu_syscall.c b/tools/testing/selftests/powerpc/math/fpu_syscall.c
new file mode 100644
index 0000000..949e672
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_syscall.c
@@ -0,0 +1,90 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers change across a syscall (fork).
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+
+#include "utils.h"
+
+extern int test_fpu(double *darray, pid_t *pid);
+
+double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+ 2.1};
+
+int syscall_fpu(void)
+{
+ pid_t fork_pid;
+ int i;
+ int ret;
+ int child_ret;
+ for (i = 0; i < 1000; i++) {
+ /* test_fpu will fork() */
+ ret = test_fpu(darray, &fork_pid);
+ if (fork_pid == -1)
+ return -1;
+ if (fork_pid == 0)
+ exit(ret);
+ waitpid(fork_pid, &child_ret, 0);
+ if (ret || child_ret)
+ return 1;
+ }
+
+ return 0;
+}
+
+int test_syscall_fpu(void)
+{
+ /*
+ * Setup an environment with much context switching
+ */
+ pid_t pid2;
+ pid_t pid = fork();
+ int ret;
+ int child_ret;
+ FAIL_IF(pid == -1);
+
+ pid2 = fork();
+ /* Can't FAIL_IF(pid2 == -1); because already forked once */
+ if (pid2 == -1) {
+ /*
+ * Couldn't fork, ensure test is a fail
+ */
+ child_ret = ret = 1;
+ } else {
+ ret = syscall_fpu();
+ if (pid2)
+ waitpid(pid2, &child_ret, 0);
+ else
+ exit(ret);
+ }
+
+ ret |= child_ret;
+
+ if (pid)
+ waitpid(pid, &child_ret, 0);
+ else
+ exit(ret);
+
+ FAIL_IF(ret || child_ret);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_syscall_fpu, "syscall_fpu");
+
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S
new file mode 100644
index 0000000..9ed32e7
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
@@ -0,0 +1,193 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+
+#define PUSH_VMX(pos,reg) \
+ li reg,pos; \
+ stvx v20,reg,sp; \
+ addi reg,reg,16; \
+ stvx v21,reg,sp; \
+ addi reg,reg,16; \
+ stvx v22,reg,sp; \
+ addi reg,reg,16; \
+ stvx v23,reg,sp; \
+ addi reg,reg,16; \
+ stvx v24,reg,sp; \
+ addi reg,reg,16; \
+ stvx v25,reg,sp; \
+ addi reg,reg,16; \
+ stvx v26,reg,sp; \
+ addi reg,reg,16; \
+ stvx v27,reg,sp; \
+ addi reg,reg,16; \
+ stvx v28,reg,sp; \
+ addi reg,reg,16; \
+ stvx v29,reg,sp; \
+ addi reg,reg,16; \
+ stvx v30,reg,sp; \
+ addi reg,reg,16; \
+ stvx v31,reg,sp;
+
+#define POP_VMX(pos,reg) \
+ li reg,pos; \
+ lvx v20,reg,sp; \
+ addi reg,reg,16; \
+ lvx v21,reg,sp; \
+ addi reg,reg,16; \
+ lvx v22,reg,sp; \
+ addi reg,reg,16; \
+ lvx v23,reg,sp; \
+ addi reg,reg,16; \
+ lvx v24,reg,sp; \
+ addi reg,reg,16; \
+ lvx v25,reg,sp; \
+ addi reg,reg,16; \
+ lvx v26,reg,sp; \
+ addi reg,reg,16; \
+ lvx v27,reg,sp; \
+ addi reg,reg,16; \
+ lvx v28,reg,sp; \
+ addi reg,reg,16; \
+ lvx v29,reg,sp; \
+ addi reg,reg,16; \
+ lvx v30,reg,sp; \
+ addi reg,reg,16; \
+ lvx v31,reg,sp;
+
+#Carefull this will 'clobber' vmx (by design)
+#Don't call this from C
+FUNC_START(load_vmx)
+ li r5,0
+ lvx v20,r5,r3
+ addi r5,r5,16
+ lvx v21,r5,r3
+ addi r5,r5,16
+ lvx v22,r5,r3
+ addi r5,r5,16
+ lvx v23,r5,r3
+ addi r5,r5,16
+ lvx v24,r5,r3
+ addi r5,r5,16
+ lvx v25,r5,r3
+ addi r5,r5,16
+ lvx v26,r5,r3
+ addi r5,r5,16
+ lvx v27,r5,r3
+ addi r5,r5,16
+ lvx v28,r5,r3
+ addi r5,r5,16
+ lvx v29,r5,r3
+ addi r5,r5,16
+ lvx v30,r5,r3
+ addi r5,r5,16
+ lvx v31,r5,r3
+ blr
+FUNC_END(load_vmx)
+
+#Should be safe from C, only touches r4, r5 and v0,v1,v2
+FUNC_START(check_vmx)
+ PUSH_BASIC_STACK(16)
+ mr r4,r3
+ li r3,1 #assume a bad result
+ li r5,0
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v20
+ vmr v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v21
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v22
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v23
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v24
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v25
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v26
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v27
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v28
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v29
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v30
+ vand v2,v2,v1
+
+ addi r5,r5,16
+ lvx v0,r5,r4
+ vcmpequd. v1,v0,v31
+ vand v2,v2,v1
+
+ li r5,0
+ stvx v2,r5,sp
+ ldx r0,r5,sp
+ cmpdi r0,0xffffffff
+ bne 1f
+ li r3,0
+1: POP_BASIC_STACK(16)
+ blr
+FUNC_END(check_vmx)
+
+#Safe from C
+FUNC_START(test_vmx)
+ #r3 holds pointer to where to put the result of fork
+ #r4 holds pointer to the pid
+ #v20-v31 are non-volatile
+ PUSH_BASIC_STACK(512)
+ std r3,40(sp) #Address of varray
+ std r4,48(sp) #address of pid
+ PUSH_VMX(56, r4)
+
+ bl load_vmx
+
+ li r0,__NR_fork
+ sc
+ #Pass the result of fork back to the caller
+ ld r9,48(sp)
+ std r3,0(r9)
+
+ ld r3,40(sp)
+ bl check_vmx
+
+ POP_VMX(56,r4)
+ POP_BASIC_STACK(512)
+ blr
+FUNC_END(test_vmx)
diff --git a/tools/testing/selftests/powerpc/math/vmx_syscall.c b/tools/testing/selftests/powerpc/math/vmx_syscall.c
new file mode 100644
index 0000000..6989d96
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_syscall.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the VMX registers change across a syscall (fork).
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "utils.h"
+
+typedef int v4si __attribute__ ((vector_size (16)));
+v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+ {13,14,15,16},{17,18,19,20},{21,22,23,24},
+ {25,26,27,28},{29,30,31,32},{33,34,35,36},
+ {37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+extern int test_vmx(v4si *varray, pid_t *pid);
+
+int vmx_syscall(void)
+{
+ pid_t fork_pid;
+ int i;
+ int ret;
+ int child_ret;
+ for (i = 0; i < 1000; i++) {
+ /* test_vmx will fork() */
+ ret = test_vmx(varray, &fork_pid);
+ if (fork_pid == -1)
+ return -1;
+ if (fork_pid == 0)
+ exit(ret);
+ waitpid(fork_pid, &child_ret, 0);
+ if (ret || child_ret)
+ return 1;
+ }
+
+ return 0;
+}
+
+int test_vmx_syscall(void)
+{
+ /*
+ * Setup an environment with much context switching
+ */
+ pid_t pid2;
+ pid_t pid = fork();
+ int ret;
+ int child_ret;
+ FAIL_IF(pid == -1);
+
+ pid2 = fork();
+ ret = vmx_syscall();
+ /* Can't FAIL_IF(pid2 == -1); because we've already forked */
+ if (pid2 == -1) {
+ /*
+ * Couldn't fork, ensure child_ret is set and is a fail
+ */
+ ret = child_ret = 1;
+ } else {
+ if (pid2)
+ waitpid(pid2, &child_ret, 0);
+ else
+ exit(ret);
+ }
+
+ ret |= child_ret;
+
+ if (pid)
+ waitpid(pid, &child_ret, 0);
+ else
+ exit(ret);
+
+ FAIL_IF(ret || child_ret);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_vmx_syscall, "vmx_syscall");
+
+}
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 2/9] selftests/powerpc: Test preservation of FPU and VMX regs across preemption
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
2016-01-21 0:55 ` [PATCH v3 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
2016-01-21 0:55 ` [PATCH v3 3/9] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
` (6 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
Loop in assembly checking the registers with many threads.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
tools/testing/selftests/powerpc/math/.gitignore | 2 +
tools/testing/selftests/powerpc/math/Makefile | 5 +-
tools/testing/selftests/powerpc/math/fpu_asm.S | 34 +++++++
tools/testing/selftests/powerpc/math/fpu_preempt.c | 113 +++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_asm.S | 44 +++++++-
tools/testing/selftests/powerpc/math/vmx_preempt.c | 113 +++++++++++++++++++++
6 files changed, 306 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/math/fpu_preempt.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_preempt.c
diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore
index b19b269..1a6f09e 100644
--- a/tools/testing/selftests/powerpc/math/.gitignore
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -1,2 +1,4 @@
fpu_syscall
vmx_syscall
+fpu_preempt
+vmx_preempt
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
index 418bef1..b6f4158 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall vmx_syscall
+TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
all: $(TEST_PROGS)
@@ -6,7 +6,10 @@ $(TEST_PROGS): ../harness.c
$(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
fpu_syscall: fpu_asm.S
+fpu_preempt: fpu_asm.S
+
vmx_syscall: vmx_asm.S
+vmx_preempt: vmx_asm.S
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S
index 8733874..46bbe99 100644
--- a/tools/testing/selftests/powerpc/math/fpu_asm.S
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -159,3 +159,37 @@ FUNC_START(test_fpu)
POP_BASIC_STACK(256)
blr
FUNC_END(test_fpu)
+
+#int preempt_fpu(double *darray, int *threads_running, int *running)
+#On starting will (atomically) decrement not_ready as a signal that the FPU
+#has been loaded with darray. Will proceed to check the validity of the FPU
+#registers while running is not zero.
+FUNC_START(preempt_fpu)
+ PUSH_BASIC_STACK(256)
+ std r3,32(sp) #double *darray
+ std r4,40(sp) #volatile int *not_ready
+ std r5,48(sp) #int *running
+ PUSH_FPU(56)
+
+ bl load_fpu
+
+ #Atomic DEC
+ ld r3,40(sp)
+1: lwarx r4,0,r3
+ addi r4,r4,-1
+ stwcx. r4,0,r3
+ bne- 1b
+
+2: ld r3, 32(sp)
+ bl check_fpu
+ cmpdi r3,0
+ bne 3f
+ ld r4, 48(sp)
+ ld r5, 0(r4)
+ cmpwi r5,0
+ bne 2b
+
+3: POP_FPU(56)
+ POP_BASIC_STACK(256)
+ blr
+FUNC_END(preempt_fpu)
diff --git a/tools/testing/selftests/powerpc/math/fpu_preempt.c b/tools/testing/selftests/powerpc/math/fpu_preempt.c
new file mode 100644
index 0000000..0f85b79
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_preempt.c
@@ -0,0 +1,113 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers change across preemption.
+ * Two things should be noted here a) The check_fpu function in asm only checks
+ * the non volatile registers as it is reused from the syscall test b) There is
+ * no way to be sure preemption happened so this test just uses many threads
+ * and a long wait. As such, a successful test doesn't mean much but a failure
+ * is bad.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+ 2.1};
+
+int threads_starting;
+int running;
+
+extern void preempt_fpu(double *darray, int *threads_starting, int *running);
+
+void *preempt_fpu_c(void *p)
+{
+ int i;
+ srand(pthread_self());
+ for (i = 0; i < 21; i++)
+ darray[i] = rand();
+
+ /* Test failed if it ever returns */
+ preempt_fpu(darray, &threads_starting, &running);
+
+ return p;
+}
+
+int test_preempt_fpu(void)
+{
+ int i, rc, threads;
+ pthread_t *tids;
+
+ threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+ tids = malloc((threads) * sizeof(pthread_t));
+ FAIL_IF(!tids);
+
+ running = true;
+ threads_starting = threads;
+ for (i = 0; i < threads; i++) {
+ rc = pthread_create(&tids[i], NULL, preempt_fpu_c, NULL);
+ FAIL_IF(rc);
+ }
+
+ setbuf(stdout, NULL);
+ /* Not really necessary but nice to wait for every thread to start */
+ printf("\tWaiting for all workers to start...");
+ while(threads_starting)
+ asm volatile("": : :"memory");
+ printf("done\n");
+
+ printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
+ sleep(PREEMPT_TIME);
+ printf("done\n");
+
+ printf("\tStopping workers...");
+ /*
+ * Working are checking this value every loop. In preempt_fpu 'cmpwi r5,0; bne 2b'.
+ * r5 will have loaded the value of running.
+ */
+ running = 0;
+ for (i = 0; i < threads; i++) {
+ void *rc_p;
+ pthread_join(tids[i], &rc_p);
+
+ /*
+ * Harness will say the fail was here, look at why preempt_fpu
+ * returned
+ */
+ if ((long) rc_p)
+ printf("oops\n");
+ FAIL_IF((long) rc_p);
+ }
+ printf("done\n");
+
+ free(tids);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_preempt_fpu, "fpu_preempt");
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S
index 9ed32e7..416b5b0 100644
--- a/tools/testing/selftests/powerpc/math/vmx_asm.S
+++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
@@ -9,6 +9,7 @@
#include "../basic_asm.h"
+#POS MUST BE 16 ALIGNED!
#define PUSH_VMX(pos,reg) \
li reg,pos; \
stvx v20,reg,sp; \
@@ -35,6 +36,7 @@
addi reg,reg,16; \
stvx v31,reg,sp;
+#POS MUST BE 16 ALIGNED!
#define POP_VMX(pos,reg) \
li reg,pos; \
lvx v20,reg,sp; \
@@ -93,7 +95,7 @@ FUNC_END(load_vmx)
#Should be safe from C, only touches r4, r5 and v0,v1,v2
FUNC_START(check_vmx)
- PUSH_BASIC_STACK(16)
+ PUSH_BASIC_STACK(32)
mr r4,r3
li r3,1 #assume a bad result
li r5,0
@@ -162,7 +164,7 @@ FUNC_START(check_vmx)
cmpdi r0,0xffffffff
bne 1f
li r3,0
-1: POP_BASIC_STACK(16)
+1: POP_BASIC_STACK(32)
blr
FUNC_END(check_vmx)
@@ -174,7 +176,7 @@ FUNC_START(test_vmx)
PUSH_BASIC_STACK(512)
std r3,40(sp) #Address of varray
std r4,48(sp) #address of pid
- PUSH_VMX(56, r4)
+ PUSH_VMX(64, r4)
bl load_vmx
@@ -187,7 +189,41 @@ FUNC_START(test_vmx)
ld r3,40(sp)
bl check_vmx
- POP_VMX(56,r4)
+ POP_VMX(64,r4)
POP_BASIC_STACK(512)
blr
FUNC_END(test_vmx)
+
+#int preempt_vmx(v4si *varray, int *threads_starting, int *running)
+#On starting will (atomically) decrement not_ready as a signal that the FPU
+#has been loaded with varray. Will proceed to check the validity of the FPU
+#registers while running is not zero.
+FUNC_START(preempt_vmx)
+ PUSH_BASIC_STACK(512)
+ std r3,32(sp) #v4si *varray
+ std r4,40(sp) #volatile int *not_ready
+ std r5,48(sp) #int *running
+ PUSH_VMX(64,r4)
+
+ bl load_vmx
+
+ #Atomic DEC
+ ld r3,40(sp)
+1: lwarx r4,0,r3
+ addi r4,r4,-1
+ stwcx. r4,0,r3
+ bne- 1b
+
+2: ld r3,32(sp)
+ bl check_vmx
+ cmpdi r3,0
+ bne 3f
+ ld r4,48(sp)
+ ld r5,0(r4)
+ cmpwi r5,0
+ bne 2b
+
+3: POP_VMX(64,r4)
+ POP_BASIC_STACK(512)
+ blr
+FUNC_END(preempt_vmx)
diff --git a/tools/testing/selftests/powerpc/math/vmx_preempt.c b/tools/testing/selftests/powerpc/math/vmx_preempt.c
new file mode 100644
index 0000000..9474e78
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_preempt.c
@@ -0,0 +1,113 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the VMX registers change across preemption.
+ * Two things should be noted here a) The check_vmx function in asm only checks
+ * the non volatile registers as it is reused from the syscall test b) There is
+ * no way to be sure preemption happened so this test just uses many threads
+ * and a long wait. As such, a successful test doesn't mean much but a failure
+ * is bad.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+typedef int v4si __attribute__ ((vector_size (16)));
+__thread v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+ {13,14,15,16},{17,18,19,20},{21,22,23,24},
+ {25,26,27,28},{29,30,31,32},{33,34,35,36},
+ {37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+int threads_starting;
+int running;
+
+extern void preempt_vmx(v4si *varray, int *threads_starting, int *running);
+
+void *preempt_vmx_c(void *p)
+{
+ int i, j;
+ srand(pthread_self());
+ for (i = 0; i < 12; i++)
+ for (j = 0; j < 4; j++)
+ varray[i][j] = rand();
+
+ /* Test fails if it ever returns */
+ preempt_vmx(varray, &threads_starting, &running);
+ return p;
+}
+
+int test_preempt_vmx(void)
+{
+ int i, rc, threads;
+ pthread_t *tids;
+
+ threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+ tids = malloc(threads * sizeof(pthread_t));
+ FAIL_IF(!tids);
+
+ running = true;
+ threads_starting = threads;
+ for (i = 0; i < threads; i++) {
+ rc = pthread_create(&tids[i], NULL, preempt_vmx_c, NULL);
+ FAIL_IF(rc);
+ }
+
+ setbuf(stdout, NULL);
+ /* Not really nessesary but nice to wait for every thread to start */
+ printf("\tWaiting for all workers to start...");
+ while(threads_starting)
+ asm volatile("": : :"memory");
+ printf("done\n");
+
+ printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
+ sleep(PREEMPT_TIME);
+ printf("done\n");
+
+ printf("\tStopping workers...");
+ /*
+ * Working are checking this value every loop. In preempt_vmx 'cmpwi r5,0; bne 2b'.
+ * r5 will have loaded the value of running.
+ */
+ running = 0;
+ for (i = 0; i < threads; i++) {
+ void *rc_p;
+ pthread_join(tids[i], &rc_p);
+
+ /*
+ * Harness will say the fail was here, look at why preempt_vmx
+ * returned
+ */
+ if ((long) rc_p)
+ printf("oops\n");
+ FAIL_IF((long) rc_p);
+ }
+ printf("done\n");
+
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_preempt_vmx, "vmx_preempt");
+}
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 3/9] selftests/powerpc: Test FPU and VMX regs in signal ucontext
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
2016-01-21 0:55 ` [PATCH v3 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
2016-01-21 0:55 ` [PATCH v3 2/9] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
2016-01-21 0:55 ` [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread Cyril Bur
` (5 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
Load up the non volatile FPU and VMX regs and ensure that they are the
expected value in a signal handler
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
tools/testing/selftests/powerpc/math/.gitignore | 2 +
tools/testing/selftests/powerpc/math/Makefile | 4 +-
tools/testing/selftests/powerpc/math/fpu_signal.c | 135 +++++++++++++++++++++
tools/testing/selftests/powerpc/math/vmx_signal.c | 138 ++++++++++++++++++++++
4 files changed, 278 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/powerpc/math/fpu_signal.c
create mode 100644 tools/testing/selftests/powerpc/math/vmx_signal.c
diff --git a/tools/testing/selftests/powerpc/math/.gitignore b/tools/testing/selftests/powerpc/math/.gitignore
index 1a6f09e..4fe13a4 100644
--- a/tools/testing/selftests/powerpc/math/.gitignore
+++ b/tools/testing/selftests/powerpc/math/.gitignore
@@ -2,3 +2,5 @@ fpu_syscall
vmx_syscall
fpu_preempt
vmx_preempt
+fpu_signal
+vmx_signal
diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
index b6f4158..5b88875 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall fpu_preempt vmx_syscall vmx_preempt
+TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt vmx_signal
all: $(TEST_PROGS)
@@ -7,9 +7,11 @@ $(TEST_PROGS): CFLAGS += -O2 -g -pthread -m64 -maltivec
fpu_syscall: fpu_asm.S
fpu_preempt: fpu_asm.S
+fpu_signal: fpu_asm.S
vmx_syscall: vmx_asm.S
vmx_preempt: vmx_asm.S
+vmx_signal: vmx_asm.S
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/math/fpu_signal.c b/tools/testing/selftests/powerpc/math/fpu_signal.c
new file mode 100644
index 0000000..888aa51
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/fpu_signal.c
@@ -0,0 +1,135 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the FPU registers are correctly reported in a
+ * signal context. Each worker just spins checking its FPU registers, at some
+ * point a signal will interrupt it and C code will check the signal context
+ * ensuring it is also the same.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Number of times each thread should receive the signal */
+#define ITERATIONS 10
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+__thread double darray[] = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
+ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0,
+ 2.1};
+
+bool bad_context;
+int threads_starting;
+int running;
+
+extern long preempt_fpu(double *darray, int *threads_starting, int *running);
+
+void signal_fpu_sig(int sig, siginfo_t *info, void *context)
+{
+ int i;
+ ucontext_t *uc = context;
+ mcontext_t *mc = &uc->uc_mcontext;
+
+ /* Only the non volatiles were loaded up */
+ for (i = 14; i < 32; i++) {
+ if (mc->fp_regs[i] != darray[i - 14]) {
+ bad_context = true;
+ break;
+ }
+ }
+}
+
+void *signal_fpu_c(void *p)
+{
+ int i;
+ long rc;
+ struct sigaction act;
+ act.sa_sigaction = signal_fpu_sig;
+ act.sa_flags = SA_SIGINFO;
+ rc = sigaction(SIGUSR1, &act, NULL);
+ if (rc)
+ return p;
+
+ srand(pthread_self());
+ for (i = 0; i < 21; i++)
+ darray[i] = rand();
+
+ rc = preempt_fpu(darray, &threads_starting, &running);
+
+ return (void *) rc;
+}
+
+int test_signal_fpu(void)
+{
+ int i, j, rc, threads;
+ void *rc_p;
+ pthread_t *tids;
+
+ threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+ tids = malloc(threads * sizeof(pthread_t));
+ FAIL_IF(!tids);
+
+ running = true;
+ threads_starting = threads;
+ for (i = 0; i < threads; i++) {
+ rc = pthread_create(&tids[i], NULL, signal_fpu_c, NULL);
+ FAIL_IF(rc);
+ }
+
+ setbuf(stdout, NULL);
+ printf("\tWaiting for all workers to start...");
+ while (threads_starting)
+ asm volatile("": : :"memory");
+ printf("done\n");
+
+ printf("\tSending signals to all threads %d times...", ITERATIONS);
+ for (i = 0; i < ITERATIONS; i++) {
+ for (j = 0; j < threads; j++) {
+ pthread_kill(tids[j], SIGUSR1);
+ }
+ sleep(1);
+ }
+ printf("done\n");
+
+ printf("\tStopping workers...");
+ running = 0;
+ for (i = 0; i < threads; i++) {
+ pthread_join(tids[i], &rc_p);
+
+ /*
+ * Harness will say the fail was here, look at why signal_fpu
+ * returned
+ */
+ if ((long) rc_p || bad_context)
+ printf("oops\n");
+ if (bad_context)
+ fprintf(stderr, "\t!! bad_context is true\n");
+ FAIL_IF((long) rc_p || bad_context);
+ }
+ printf("done\n");
+
+ free(tids);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_signal_fpu, "fpu_signal");
+}
diff --git a/tools/testing/selftests/powerpc/math/vmx_signal.c b/tools/testing/selftests/powerpc/math/vmx_signal.c
new file mode 100644
index 0000000..093af5a
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vmx_signal.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the VMX registers are correctly reported in a
+ * signal context. Each worker just spins checking its VMX registers, at some
+ * point a signal will interrupt it and C code will check the signal context
+ * ensuring it is also the same.
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Number of times each thread should receive the signal */
+#define ITERATIONS 10
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+typedef int v4si __attribute__ ((vector_size (16)));
+__thread v4si varray[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+ {13,14,15,16},{17,18,19,20},{21,22,23,24},
+ {25,26,27,28},{29,30,31,32},{33,34,35,36},
+ {37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+bool bad_context;
+int running;
+int threads_starting;
+
+extern int preempt_vmx(v4si *varray, volatile int *not_ready, int *sentinal);
+
+void signal_vmx_sig(int sig, siginfo_t *info, void *context)
+{
+ int i;
+ ucontext_t *uc = context;
+ mcontext_t *mc = &uc->uc_mcontext;
+
+ /* Only the non volatiles were loaded up */
+ for (i = 20; i < 32; i++) {
+ if (memcmp(mc->v_regs->vrregs[i], &varray[i - 20], 16)) {
+ bad_context = true;
+ break;
+ }
+ }
+}
+
+void *signal_vmx_c(void *p)
+{
+ int i, j;
+ long rc;
+ struct sigaction act;
+ act.sa_sigaction = signal_vmx_sig;
+ act.sa_flags = SA_SIGINFO;
+ rc = sigaction(SIGUSR1, &act, NULL);
+ if (rc)
+ return p;
+
+ srand(pthread_self());
+ for (i = 0; i < 12; i++)
+ for (j = 0; j < 4; j++)
+ varray[i][j] = rand();
+
+ rc = preempt_vmx(varray, ¬_ready, &running);
+
+ return (void *) rc;
+}
+
+int test_signal_vmx(void)
+{
+ int i, j, rc, threads;
+ void *rc_p;
+ pthread_t *tids;
+
+ threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+ tids = malloc(threads * sizeof(pthread_t));
+ FAIL_IF(!tids);
+
+ running = true;
+ not_ready = threads;
+ for (i = 0; i < threads; i++) {
+ rc = pthread_create(&tids[i], NULL, signal_vmx_c, NULL);
+ FAIL_IF(rc);
+ }
+
+ setbuf(stdout, NULL);
+ printf("\tWaiting for all workers to start...");
+ while (not_ready);
+ printf("done\n");
+
+ printf("\tSending signals to all threads %d times...", ITERATIONS);
+ for (i = 0; i < ITERATIONS; i++) {
+ for (j = 0; j < threads; j++) {
+ pthread_kill(tids[j], SIGUSR1);
+ }
+ sleep(1);
+ }
+ printf("done\n");
+
+ printf("\tKilling workers...");
+ running = 0;
+ for (i = 0; i < threads; i++) {
+ pthread_join(tids[i], &rc_p);
+
+ /*
+ * Harness will say the fail was here, look at why signal_vmx
+ * returned
+ */
+ if ((long) rc_p || bad_context)
+ printf("oops\n");
+ if (bad_context)
+ fprintf(stderr, "\t!! bad_context is true\n");
+ FAIL_IF((long) rc_p || bad_context);
+ }
+ printf("done\n");
+
+ free(tids);
+ return 0;
+}
+
+int main(int argc, char *argv[])
+{
+ return test_harness(test_signal_vmx, "vmx_signal");
+}
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
` (2 preceding siblings ...)
2016-01-21 0:55 ` [PATCH v3 3/9] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
2016-01-25 0:04 ` Balbir Singh
2016-01-21 0:55 ` [PATCH v3 5/9] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
` (4 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
Currently when threads get scheduled off they always giveup the FPU,
Altivec (VMX) and Vector (VSX) units if they were using them. When they are
scheduled back on a fault is then taken to enable each facility and load
registers. As a result explicitly disabling FPU/VMX/VSX has not been
necessary.
Future changes and optimisations remove this mandatory giveup and fault
which could cause calls such as clone() and fork() to copy threads and run
them later with FPU/VMX/VSX enabled but no registers loaded.
This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
registers must be loaded. This allows for a smarter return to userspace.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/kernel/process.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dccc87e..e0c3d2d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
f = ret_from_fork;
}
+ childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
sp -= STACK_FRAME_OVERHEAD;
/*
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 5/9] powerpc: Restore FPU/VEC/VSX if previously used
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
` (3 preceding siblings ...)
2016-01-21 0:55 ` [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
2016-01-21 0:55 ` [PATCH v3 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two Cyril Bur
` (3 subsequent siblings)
8 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
Currently the FPU, VEC and VSX facilities are lazily loaded. This is not a
problem unless a process is using these facilities.
Modern versions of GCC are very good at automatically vectorising code, new
and modernised workloads make use of floating point and vector facilities,
even the kernel makes use of vectorised memcpy.
All this combined greatly increases the cost of a syscall since the kernel
uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking all three.
The obvious overcompensation to this problem is to simply always load all
the facilities on every exit to userspace. Loading up all FPU, VEC and VSX
registers every time can be expensive and if a workload does avoid using
them, it should not be forced to incur this penalty.
An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back to
zero.
Several versions of the assembly in entry_64.S. 1. Always calling C, 2.
Performing a common case check and then calling C and 3. A complex check in
asm. After some benchmarking it was determined that avoiding C in the
common case is a performance benefit. The full check in asm greatly
complicated that codepath for a negligible performance gain and the
trade-off was deemed not worth it.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/processor.h | 2 +
arch/powerpc/kernel/asm-offsets.c | 2 +
arch/powerpc/kernel/entry_64.S | 21 +++++++--
arch/powerpc/kernel/fpu.S | 4 ++
arch/powerpc/kernel/process.c | 84 +++++++++++++++++++++++++++++++-----
arch/powerpc/kernel/vector.S | 4 ++
6 files changed, 103 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index ac23308..dcab21f 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -236,11 +236,13 @@ struct thread_struct {
#endif
struct arch_hw_breakpoint hw_brk; /* info on the hardware breakpoint */
unsigned long trap_nr; /* last trap # on this thread */
+ u8 load_fp;
#ifdef CONFIG_ALTIVEC
struct thread_vr_state vr_state;
struct thread_vr_state *vr_save_area;
unsigned long vrsave;
int used_vr; /* set if process has used altivec */
+ u8 load_vec;
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
/* VSR status */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 07cebc3..10d5eab 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -95,12 +95,14 @@ int main(void)
DEFINE(THREAD_FPSTATE, offsetof(struct thread_struct, fp_state));
DEFINE(THREAD_FPSAVEAREA, offsetof(struct thread_struct, fp_save_area));
DEFINE(FPSTATE_FPSCR, offsetof(struct thread_fp_state, fpscr));
+ DEFINE(THREAD_LOAD_FP, offsetof(struct thread_struct, load_fp));
#ifdef CONFIG_ALTIVEC
DEFINE(THREAD_VRSTATE, offsetof(struct thread_struct, vr_state));
DEFINE(THREAD_VRSAVEAREA, offsetof(struct thread_struct, vr_save_area));
DEFINE(THREAD_VRSAVE, offsetof(struct thread_struct, vrsave));
DEFINE(THREAD_USED_VR, offsetof(struct thread_struct, used_vr));
DEFINE(VRSTATE_VSCR, offsetof(struct thread_vr_state, vscr));
+ DEFINE(THREAD_LOAD_VEC, offsetof(struct thread_struct, load_vec));
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
DEFINE(THREAD_USED_VSR, offsetof(struct thread_struct, used_vsr));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 0d525ce..038e0a1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -210,7 +210,20 @@ system_call: /* label this so stack traces look sane */
li r11,-MAX_ERRNO
andi. r0,r9,(_TIF_SYSCALL_DOTRACE|_TIF_SINGLESTEP|_TIF_USER_WORK_MASK|_TIF_PERSYSCALL_MASK)
bne- syscall_exit_work
- cmpld r3,r11
+
+ andi. r0,r8,MSR_FP
+ beq 2f
+#ifdef CONFIG_ALTIVEC
+ andis. r0,r8,MSR_VEC@h
+ bne 3f
+#endif
+2: addi r3,r1,STACK_FRAME_OVERHEAD
+ bl restore_math
+ ld r8,_MSR(r1)
+ ld r3,RESULT(r1)
+ li r11,-MAX_ERRNO
+
+3: cmpld r3,r11
ld r5,_CCR(r1)
bge- syscall_error
.Lsyscall_error_cont:
@@ -602,8 +615,8 @@ _GLOBAL(ret_from_except_lite)
/* Check current_thread_info()->flags */
andi. r0,r4,_TIF_USER_WORK_MASK
-#ifdef CONFIG_PPC_BOOK3E
bne 1f
+#ifdef CONFIG_PPC_BOOK3E
/*
* Check to see if the dbcr0 register is set up to debug.
* Use the internal debug mode bit to do this.
@@ -618,7 +631,9 @@ _GLOBAL(ret_from_except_lite)
mtspr SPRN_DBSR,r10
b restore
#else
- beq restore
+ addi r3,r1,STACK_FRAME_OVERHEAD
+ bl restore_math
+ b restore
#endif
1: andi. r0,r4,_TIF_NEED_RESCHED
beq 2f
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 2117eac..b063524 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -130,6 +130,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
or r12,r12,r4
std r12,_MSR(r1)
#endif
+ /* Don't care if r4 overflows, this is desired behaviour */
+ lbz r4,THREAD_LOAD_FP(r5)
+ addi r4,r4,1
+ stb r4,THREAD_LOAD_FP(r5)
addi r10,r5,THREAD_FPSTATE
lfd fr0,FPSTATE_FPSCR(r10)
MTFSF_L(fr0)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e0c3d2d..0955b7c 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -187,9 +187,22 @@ void enable_kernel_fp(void)
}
}
EXPORT_SYMBOL(enable_kernel_fp);
+
+static int restore_fp(struct task_struct *tsk) {
+ if (tsk->thread.load_fp) {
+ load_fp_state(¤t->thread.fp_state);
+ current->thread.load_fp++;
+ return 1;
+ }
+ return 0;
+}
+#else
+static int restore_fp(struct task_struct *tsk) { return 0; }
#endif /* CONFIG_PPC_FPU */
#ifdef CONFIG_ALTIVEC
+#define loadvec(thr) ((thr).load_vec)
+
void giveup_altivec(struct task_struct *tsk)
{
check_if_tm_restore_required(tsk);
@@ -229,6 +242,21 @@ void flush_altivec_to_thread(struct task_struct *tsk)
}
}
EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
+
+static int restore_altivec(struct task_struct *tsk)
+{
+ if (tsk->thread.load_vec && cpu_has_feature(CPU_FTR_ALTIVEC)) {
+ load_vr_state(&tsk->thread.vr_state);
+ tsk->thread.used_vr = 1;
+ tsk->thread.load_vec++;
+
+ return 1;
+ }
+ return 0;
+}
+#else
+#define loadvec(thr) 0
+static inline int restore_altivec(struct task_struct *tsk) { return 0; }
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
@@ -275,6 +303,14 @@ void flush_vsx_to_thread(struct task_struct *tsk)
}
}
EXPORT_SYMBOL_GPL(flush_vsx_to_thread);
+
+static int restore_vsx(struct task_struct *tsk)
+{
+ tsk->thread.used_vsr = 1;
+ return 1;
+}
+#else
+static inline int restore_vsx(struct task_struct *tsk) { return 0; }
#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
@@ -374,6 +410,36 @@ void giveup_all(struct task_struct *tsk)
}
EXPORT_SYMBOL(giveup_all);
+void restore_math(struct pt_regs *regs)
+{
+ unsigned long msr;
+
+ if (!current->thread.load_fp && !loadvec(current->thread))
+ return;
+
+ msr = regs->msr;
+ msr_check_and_set(msr_all_available);
+
+ /*
+ * Only reload if the bit is not set in the user MSR, the bit BEING set
+ * indicates that the registers are hot
+ */
+ if ((!(msr & MSR_FP)) && restore_fp(current))
+ msr |= MSR_FP | current->thread.fpexc_mode;
+
+ if ((!(msr & MSR_VEC)) && restore_altivec(current))
+ msr |= MSR_VEC;
+
+ if ((msr & (MSR_FP | MSR_VEC)) == (MSR_FP | MSR_VEC) &&
+ restore_vsx(current)) {
+ msr |= MSR_VSX;
+ }
+
+ msr_check_and_clear(msr_all_available);
+
+ regs->msr = msr;
+}
+
void flush_all_to_thread(struct task_struct *tsk)
{
if (tsk->thread.regs) {
@@ -832,17 +898,9 @@ void restore_tm_state(struct pt_regs *regs)
msr_diff = current->thread.ckpt_regs.msr & ~regs->msr;
msr_diff &= MSR_FP | MSR_VEC | MSR_VSX;
- if (msr_diff & MSR_FP) {
- msr_check_and_set(MSR_FP);
- load_fp_state(¤t->thread.fp_state);
- msr_check_and_clear(MSR_FP);
- regs->msr |= current->thread.fpexc_mode;
- }
- if (msr_diff & MSR_VEC) {
- msr_check_and_set(MSR_VEC);
- load_vr_state(¤t->thread.vr_state);
- msr_check_and_clear(MSR_VEC);
- }
+
+ restore_math(regs);
+
regs->msr |= msr_diff;
}
@@ -1006,6 +1064,10 @@ struct task_struct *__switch_to(struct task_struct *prev,
batch = this_cpu_ptr(&ppc64_tlb_batch);
batch->active = 1;
}
+
+ if (current_thread_info()->task->thread.regs)
+ restore_math(current_thread_info()->task->thread.regs);
+
#endif /* CONFIG_PPC_BOOK3S_64 */
return last;
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 162d0f7..038cff8 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -91,6 +91,10 @@ _GLOBAL(load_up_altivec)
oris r12,r12,MSR_VEC@h
std r12,_MSR(r1)
#endif
+ /* Don't care if r4 overflows, this is desired behaviour */
+ lbz r4,THREAD_LOAD_VEC(r5)
+ addi r4,r4,1
+ stb r4,THREAD_LOAD_VEC(r5)
addi r6,r5,THREAD_VRSTATE
li r4,1
li r10,VRSTATE_VSCR
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
` (4 preceding siblings ...)
2016-01-21 0:55 ` [PATCH v3 5/9] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
2016-02-10 7:51 ` [v3, " Michael Ellerman
2016-01-21 0:55 ` [PATCH v3 7/9] powerpc: Add the ability to save FPU without giving it up Cyril Bur
` (2 subsequent siblings)
8 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
marking {fpu,altivec,vsx} as being unused by a thread.
Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
task switching can be made if these two operations are decoupled.
save_all() will permit the saving of registers to thread structs and leave
threads MSR with bits enabled.
This patch introduces no functional change.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/switch_to.h | 7 +++++++
arch/powerpc/kernel/process.c | 39 +++++++++++++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 5b268b6..3690041 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -34,6 +34,7 @@ static inline void disable_kernel_fp(void)
msr_check_and_clear(MSR_FP);
}
#else
+static inline void __giveup_fpu(struct task_struct *t) { }
static inline void flush_fp_to_thread(struct task_struct *t) { }
#endif
@@ -46,6 +47,8 @@ static inline void disable_kernel_altivec(void)
{
msr_check_and_clear(MSR_VEC);
}
+#else
+static inline void __giveup_altivec(struct task_struct *t) { }
#endif
#ifdef CONFIG_VSX
@@ -57,6 +60,8 @@ static inline void disable_kernel_vsx(void)
{
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
}
+#else
+static inline void __giveup_vsx(struct task_struct *t) { }
#endif
#ifdef CONFIG_SPE
@@ -68,6 +73,8 @@ static inline void disable_kernel_spe(void)
{
msr_check_and_clear(MSR_SPE);
}
+#else
+static inline void __giveup_spe(struct task_struct *t) { }
#endif
static inline void clear_task_ebb(struct task_struct *t)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 0955b7c..45e37c0 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -349,6 +349,14 @@ void flush_spe_to_thread(struct task_struct *tsk)
preempt_enable();
}
}
+#else
+/*
+ * save_all() is going to test MSR_SPE, rather than pull in all the
+ * booke definitions all the time on a books kernel just ensure it exists
+ * but acts as a nop.
+ */
+
+#define MSR_SPE 0
#endif /* CONFIG_SPE */
static unsigned long msr_all_available;
@@ -440,12 +448,41 @@ void restore_math(struct pt_regs *regs)
regs->msr = msr;
}
+void save_all(struct task_struct *tsk)
+{
+ unsigned long usermsr;
+
+ if (!tsk->thread.regs)
+ return;
+
+ usermsr = tsk->thread.regs->msr;
+
+ if ((usermsr & msr_all_available) == 0)
+ return;
+
+ msr_check_and_set(msr_all_available);
+
+ if (usermsr & MSR_FP)
+ __giveup_fpu(tsk);
+
+ if (usermsr & MSR_VEC)
+ __giveup_altivec(tsk);
+
+ if (usermsr & MSR_VSX)
+ __giveup_vsx(tsk);
+
+ if (usermsr & MSR_SPE)
+ __giveup_spe(tsk);
+
+ msr_check_and_clear(msr_all_available);
+}
+
void flush_all_to_thread(struct task_struct *tsk)
{
if (tsk->thread.regs) {
preempt_disable();
BUG_ON(tsk != current);
- giveup_all(tsk);
+ save_all(tsk);
#ifdef CONFIG_SPE
if (tsk->thread.regs->msr & MSR_SPE)
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 7/9] powerpc: Add the ability to save FPU without giving it up
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
` (5 preceding siblings ...)
2016-01-21 0:55 ` [PATCH v3 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
2016-01-21 0:55 ` [PATCH v3 8/9] powerpc: Add the ability to save Altivec " Cyril Bur
2016-01-21 0:55 ` [PATCH v3 9/9] powerpc: Add the ability to save VSX " Cyril Bur
8 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
This patch adds the ability to be able to save the FPU registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.
This patch optimises the thread copy path (as a result of a fork() or
clone()) so that the parent thread can return to userspace with hot
registers avoiding a possibly pointless reload of FPU register state.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/switch_to.h | 3 ++-
arch/powerpc/kernel/fpu.S | 21 ++++-----------------
arch/powerpc/kernel/process.c | 12 +++++++++++-
3 files changed, 17 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 3690041..6a201e8 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -28,13 +28,14 @@ extern void giveup_all(struct task_struct *);
extern void enable_kernel_fp(void);
extern void flush_fp_to_thread(struct task_struct *);
extern void giveup_fpu(struct task_struct *);
-extern void __giveup_fpu(struct task_struct *);
+extern void save_fpu(struct task_struct *);
static inline void disable_kernel_fp(void)
{
msr_check_and_clear(MSR_FP);
}
#else
static inline void __giveup_fpu(struct task_struct *t) { }
+static inline void save_fpu(struct task_struct *t) { }
static inline void flush_fp_to_thread(struct task_struct *t) { }
#endif
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index b063524..15da2b5 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -143,33 +143,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
blr
/*
- * __giveup_fpu(tsk)
- * Disable FP for the task given as the argument,
- * and save the floating-point registers in its thread_struct.
+ * save_fpu(tsk)
+ * Save the floating-point registers in its thread_struct.
* Enables the FPU for use in the kernel on return.
*/
-_GLOBAL(__giveup_fpu)
+_GLOBAL(save_fpu)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r6,THREAD_FPSAVEAREA(r3)
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r6,0
bne 2f
addi r6,r3,THREAD_FPSTATE
-2: PPC_LCMPI 0,r5,0
- SAVE_32FPVSRS(0, R4, R6)
+2: SAVE_32FPVSRS(0, R4, R6)
mffs fr0
stfd fr0,FPSTATE_FPSCR(r6)
- beq 1f
- PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
- li r3,MSR_FP|MSR_FE0|MSR_FE1
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
- oris r3,r3,MSR_VSX@h
-END_FTR_SECTION_IFSET(CPU_FTR_VSX)
-#endif
- andc r4,r4,r3 /* disable FP for previous task */
- PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
blr
/*
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 45e37c0..a3411ce 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -133,6 +133,16 @@ void __msr_check_and_clear(unsigned long bits)
EXPORT_SYMBOL(__msr_check_and_clear);
#ifdef CONFIG_PPC_FPU
+void __giveup_fpu(struct task_struct *tsk)
+{
+ save_fpu(tsk);
+ tsk->thread.regs->msr &= ~MSR_FP;
+#ifdef CONFIG_VSX
+ if (cpu_has_feature(CPU_FTR_VSX))
+ tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
void giveup_fpu(struct task_struct *tsk)
{
check_if_tm_restore_required(tsk);
@@ -463,7 +473,7 @@ void save_all(struct task_struct *tsk)
msr_check_and_set(msr_all_available);
if (usermsr & MSR_FP)
- __giveup_fpu(tsk);
+ save_fpu(tsk);
if (usermsr & MSR_VEC)
__giveup_altivec(tsk);
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 8/9] powerpc: Add the ability to save Altivec without giving it up
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
` (6 preceding siblings ...)
2016-01-21 0:55 ` [PATCH v3 7/9] powerpc: Add the ability to save FPU without giving it up Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
2016-01-21 0:55 ` [PATCH v3 9/9] powerpc: Add the ability to save VSX " Cyril Bur
8 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
This patch adds the ability to be able to save the VEC registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.
This patch builds on a previous optimisation for the FPU registers in the
thread copy path to avoid a possibly pointless reload of VEC state.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/switch_to.h | 3 ++-
arch/powerpc/kernel/process.c | 12 +++++++++++-
arch/powerpc/kernel/vector.S | 24 ++++--------------------
3 files changed, 17 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 6a201e8..9028822 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -43,12 +43,13 @@ static inline void flush_fp_to_thread(struct task_struct *t) { }
extern void enable_kernel_altivec(void);
extern void flush_altivec_to_thread(struct task_struct *);
extern void giveup_altivec(struct task_struct *);
-extern void __giveup_altivec(struct task_struct *);
+extern void save_altivec(struct task_struct *);
static inline void disable_kernel_altivec(void)
{
msr_check_and_clear(MSR_VEC);
}
#else
+static inline void save_altivec(struct task_struct *t) { }
static inline void __giveup_altivec(struct task_struct *t) { }
#endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a3411ce..fef5b7d 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -213,6 +213,16 @@ static int restore_fp(struct task_struct *tsk) { return 0; }
#ifdef CONFIG_ALTIVEC
#define loadvec(thr) ((thr).load_vec)
+static void __giveup_altivec(struct task_struct *tsk)
+{
+ save_altivec(tsk);
+ tsk->thread.regs->msr &= ~MSR_VEC;
+#ifdef CONFIG_VSX
+ if (cpu_has_feature(CPU_FTR_VSX))
+ tsk->thread.regs->msr &= ~MSR_VSX;
+#endif
+}
+
void giveup_altivec(struct task_struct *tsk)
{
check_if_tm_restore_required(tsk);
@@ -476,7 +486,7 @@ void save_all(struct task_struct *tsk)
save_fpu(tsk);
if (usermsr & MSR_VEC)
- __giveup_altivec(tsk);
+ save_altivec(tsk);
if (usermsr & MSR_VSX)
__giveup_vsx(tsk);
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 038cff8..51b0c17 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -106,36 +106,20 @@ _GLOBAL(load_up_altivec)
blr
/*
- * __giveup_altivec(tsk)
- * Disable VMX for the task given as the argument,
- * and save the vector registers in its thread_struct.
+ * save_altivec(tsk)
+ * Save the vector registers to its thread_struct
*/
-_GLOBAL(__giveup_altivec)
+_GLOBAL(save_altivec)
addi r3,r3,THREAD /* want THREAD of task */
PPC_LL r7,THREAD_VRSAVEAREA(r3)
PPC_LL r5,PT_REGS(r3)
PPC_LCMPI 0,r7,0
bne 2f
addi r7,r3,THREAD_VRSTATE
-2: PPC_LCMPI 0,r5,0
- SAVE_32VRS(0,r4,r7)
+2: SAVE_32VRS(0,r4,r7)
mfvscr v0
li r4,VRSTATE_VSCR
stvx v0,r4,r7
- beq 1f
- PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
- lis r3,(MSR_VEC|MSR_VSX)@h
-FTR_SECTION_ELSE
- lis r3,MSR_VEC@h
-ALT_FTR_SECTION_END_IFSET(CPU_FTR_VSX)
-#else
- lis r3,MSR_VEC@h
-#endif
- andc r4,r4,r3 /* disable FP for previous task */
- PPC_STL r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
blr
#ifdef CONFIG_VSX
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v3 9/9] powerpc: Add the ability to save VSX without giving it up
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
` (7 preceding siblings ...)
2016-01-21 0:55 ` [PATCH v3 8/9] powerpc: Add the ability to save Altivec " Cyril Bur
@ 2016-01-21 0:55 ` Cyril Bur
8 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-01-21 0:55 UTC (permalink / raw)
To: linuxppc-dev; +Cc: mikey, anton
This patch adds the ability to be able to save the VSX registers to the
thread struct without giving up (disabling the facility) next time the
process returns to userspace.
This patch builds on a previous optimisation for the FPU and VEC registers
in the thread copy path to avoid a possibly pointless reload of VSX state.
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
arch/powerpc/include/asm/switch_to.h | 4 ----
arch/powerpc/kernel/ppc_ksyms.c | 4 ----
arch/powerpc/kernel/process.c | 42 +++++++++++++++++++++++++-----------
arch/powerpc/kernel/vector.S | 17 ---------------
4 files changed, 30 insertions(+), 37 deletions(-)
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index 9028822..17c8380 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -56,14 +56,10 @@ static inline void __giveup_altivec(struct task_struct *t) { }
#ifdef CONFIG_VSX
extern void enable_kernel_vsx(void);
extern void flush_vsx_to_thread(struct task_struct *);
-extern void giveup_vsx(struct task_struct *);
-extern void __giveup_vsx(struct task_struct *);
static inline void disable_kernel_vsx(void)
{
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
}
-#else
-static inline void __giveup_vsx(struct task_struct *t) { }
#endif
#ifdef CONFIG_SPE
diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
index 41e1607..ef7024da 100644
--- a/arch/powerpc/kernel/ppc_ksyms.c
+++ b/arch/powerpc/kernel/ppc_ksyms.c
@@ -28,10 +28,6 @@ EXPORT_SYMBOL(load_vr_state);
EXPORT_SYMBOL(store_vr_state);
#endif
-#ifdef CONFIG_VSX
-EXPORT_SYMBOL_GPL(__giveup_vsx);
-#endif
-
#ifdef CONFIG_EPAPR_PARAVIRT
EXPORT_SYMBOL(epapr_hypercall_start);
#endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fef5b7d..7c3dd30 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -280,19 +280,31 @@ static inline int restore_altivec(struct task_struct *tsk) { return 0; }
#endif /* CONFIG_ALTIVEC */
#ifdef CONFIG_VSX
-void giveup_vsx(struct task_struct *tsk)
+static void __giveup_vsx(struct task_struct *tsk)
{
- check_if_tm_restore_required(tsk);
-
- msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
if (tsk->thread.regs->msr & MSR_FP)
__giveup_fpu(tsk);
if (tsk->thread.regs->msr & MSR_VEC)
__giveup_altivec(tsk);
+ tsk->thread.regs->msr &= ~MSR_VSX;
+}
+
+static void giveup_vsx(struct task_struct *tsk)
+{
+ check_if_tm_restore_required(tsk);
+
+ msr_check_and_set(MSR_FP|MSR_VEC|MSR_VSX);
__giveup_vsx(tsk);
msr_check_and_clear(MSR_FP|MSR_VEC|MSR_VSX);
}
-EXPORT_SYMBOL(giveup_vsx);
+
+static void save_vsx(struct task_struct *tsk)
+{
+ if (tsk->thread.regs->msr & MSR_FP)
+ save_fpu(tsk);
+ if (tsk->thread.regs->msr & MSR_VEC)
+ save_altivec(tsk);
+}
void enable_kernel_vsx(void)
{
@@ -331,6 +343,7 @@ static int restore_vsx(struct task_struct *tsk)
}
#else
static inline int restore_vsx(struct task_struct *tsk) { return 0; }
+static inline void save_vsx(struct task_struct *tsk) { }
#endif /* CONFIG_VSX */
#ifdef CONFIG_SPE
@@ -482,14 +495,19 @@ void save_all(struct task_struct *tsk)
msr_check_and_set(msr_all_available);
- if (usermsr & MSR_FP)
- save_fpu(tsk);
-
- if (usermsr & MSR_VEC)
- save_altivec(tsk);
+ /*
+ * Saving the way the register space is in hardware, save_vsx boils
+ * down to a save_fpu() and save_altivec()
+ */
+ if (usermsr & MSR_VSX) {
+ save_vsx(tsk);
+ } else {
+ if (usermsr & MSR_FP)
+ save_fpu(tsk);
- if (usermsr & MSR_VSX)
- __giveup_vsx(tsk);
+ if (usermsr & MSR_VEC)
+ save_altivec(tsk);
+ }
if (usermsr & MSR_SPE)
__giveup_spe(tsk);
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 51b0c17..1c2e7a3 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -151,23 +151,6 @@ _GLOBAL(load_up_vsx)
std r12,_MSR(r1)
b fast_exception_return
-/*
- * __giveup_vsx(tsk)
- * Disable VSX for the task given as the argument.
- * Does NOT save vsx registers.
- */
-_GLOBAL(__giveup_vsx)
- addi r3,r3,THREAD /* want THREAD of task */
- ld r5,PT_REGS(r3)
- cmpdi 0,r5,0
- beq 1f
- ld r4,_MSR-STACK_FRAME_OVERHEAD(r5)
- lis r3,MSR_VSX@h
- andc r4,r4,r3 /* disable VSX for previous task */
- std r4,_MSR-STACK_FRAME_OVERHEAD(r5)
-1:
- blr
-
#endif /* CONFIG_VSX */
--
2.7.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread
2016-01-21 0:55 ` [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread Cyril Bur
@ 2016-01-25 0:04 ` Balbir Singh
2016-01-26 23:50 ` Cyril Bur
0 siblings, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2016-01-25 0:04 UTC (permalink / raw)
To: Cyril Bur; +Cc: linuxppc-dev, mikey, anton
On Thu, 21 Jan 2016 11:55:44 +1100
Cyril Bur <cyrilbur@gmail.com> wrote:
> Currently when threads get scheduled off they always giveup the FPU,
> Altivec (VMX) and Vector (VSX) units if they were using them. When they are
> scheduled back on a fault is then taken to enable each facility and load
> registers. As a result explicitly disabling FPU/VMX/VSX has not been
> necessary.
>
> Future changes and optimisations remove this mandatory giveup and fault
> which could cause calls such as clone() and fork() to copy threads and run
> them later with FPU/VMX/VSX enabled but no registers loaded.
>
> This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
> threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
> registers must be loaded. This allows for a smarter return to userspace.
>
> Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> ---
> arch/powerpc/kernel/process.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index dccc87e..e0c3d2d 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>
> f = ret_from_fork;
> }
> + childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
Ideally you want to use __msr_check_and_clear()
Basically we start with these bits off and then take an exception on use?
> sp -= STACK_FRAME_OVERHEAD;
>
> /*
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread
2016-01-25 0:04 ` Balbir Singh
@ 2016-01-26 23:50 ` Cyril Bur
2016-01-27 12:01 ` Balbir Singh
0 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2016-01-26 23:50 UTC (permalink / raw)
To: Balbir Singh; +Cc: linuxppc-dev, mikey, anton
On Mon, 25 Jan 2016 11:04:23 +1100
Balbir Singh <bsingharora@gmail.com> wrote:
> On Thu, 21 Jan 2016 11:55:44 +1100
> Cyril Bur <cyrilbur@gmail.com> wrote:
>
> > Currently when threads get scheduled off they always giveup the FPU,
> > Altivec (VMX) and Vector (VSX) units if they were using them. When they are
> > scheduled back on a fault is then taken to enable each facility and load
> > registers. As a result explicitly disabling FPU/VMX/VSX has not been
> > necessary.
> >
> > Future changes and optimisations remove this mandatory giveup and fault
> > which could cause calls such as clone() and fork() to copy threads and run
> > them later with FPU/VMX/VSX enabled but no registers loaded.
> >
> > This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
> > threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
> > registers must be loaded. This allows for a smarter return to userspace.
> >
> > Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> > ---
> > arch/powerpc/kernel/process.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> > index dccc87e..e0c3d2d 100644
> > --- a/arch/powerpc/kernel/process.c
> > +++ b/arch/powerpc/kernel/process.c
> > @@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
> >
> > f = ret_from_fork;
> > }
> > + childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
>
Hi Balbir,
Perhaps I'm missing something, are you saying
> Ideally you want to use __msr_check_and_clear()
>
instead of childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX); ? I don't see how that
can work...
__msr_check_and_clear() operates on the currently active MSR, that is, the msr
for the current kernel context. childregs->msr is the value that will be used
for that userspace context when the kernel returns. Here we must ensure that
that children are created with the bit disabled.
> Basically we start with these bits off and then take an exception on use?
>
Currently yes, this is what I'm trying to change. This patch hasn't been
necessary until now as any thread which saves its FPU/VMX/VSX data ALSO
disables those bits in regs->msr and so theres no way a clone() or fork() can
create a child with MSR_FP or MSR_VEC or MSR_VSX set. I add a meaning to
'having a regs->msr FP,VEC,VSX bit set' to mean that 'the regs are hot' in a
subsequent patch which means this assumption no longer holds so now we must
explicitly disable (so as to signal that the FPU/VMX/VSX regs are not hot) for
children thread.
Sounds like I still haven't got that commit message quite right yet.
> > sp -= STACK_FRAME_OVERHEAD;
> >
> > /*
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread
2016-01-26 23:50 ` Cyril Bur
@ 2016-01-27 12:01 ` Balbir Singh
2016-02-09 0:45 ` Cyril Bur
0 siblings, 1 reply; 16+ messages in thread
From: Balbir Singh @ 2016-01-27 12:01 UTC (permalink / raw)
To: Cyril Bur; +Cc: linuxppc-dev, Michael Neuling, Anton Blanchard
On Wed, Jan 27, 2016 at 10:50 AM, Cyril Bur <cyrilbur@gmail.com> wrote:
> On Mon, 25 Jan 2016 11:04:23 +1100
> Balbir Singh <bsingharora@gmail.com> wrote:
>
>> On Thu, 21 Jan 2016 11:55:44 +1100
>> Cyril Bur <cyrilbur@gmail.com> wrote:
>>
>> > Currently when threads get scheduled off they always giveup the FPU,
>> > Altivec (VMX) and Vector (VSX) units if they were using them. When they are
>> > scheduled back on a fault is then taken to enable each facility and load
>> > registers. As a result explicitly disabling FPU/VMX/VSX has not been
>> > necessary.
>> >
>> > Future changes and optimisations remove this mandatory giveup and fault
>> > which could cause calls such as clone() and fork() to copy threads and run
>> > them later with FPU/VMX/VSX enabled but no registers loaded.
>> >
>> > This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
>> > threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
>> > registers must be loaded. This allows for a smarter return to userspace.
>> >
>> > Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
>> > ---
>> > arch/powerpc/kernel/process.c | 1 +
>> > 1 file changed, 1 insertion(+)
>> >
>> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> > index dccc87e..e0c3d2d 100644
>> > --- a/arch/powerpc/kernel/process.c
>> > +++ b/arch/powerpc/kernel/process.c
>> > @@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
>> >
>> > f = ret_from_fork;
>> > }
>> > + childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
>>
>
> Hi Balbir,
>
> Perhaps I'm missing something, are you saying
>
>> Ideally you want to use __msr_check_and_clear()
>>
>
> instead of childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX); ? I don't see how that
> can work...
>
> __msr_check_and_clear() operates on the currently active MSR, that is, the msr
> for the current kernel context. childregs->msr is the value that will be used
> for that userspace context when the kernel returns. Here we must ensure that
> that children are created with the bit disabled.
>
Yes, my bad! I thought the routine took generic bits, hoping to reuse
the CONFIG_VSX bits. I don't think it helps much, what you have is
correct.
>> Basically we start with these bits off and then take an exception on use?
>>
>
> Currently yes, this is what I'm trying to change. This patch hasn't been
> necessary until now as any thread which saves its FPU/VMX/VSX data ALSO
> disables those bits in regs->msr and so theres no way a clone() or fork() can
> create a child with MSR_FP or MSR_VEC or MSR_VSX set. I add a meaning to
> 'having a regs->msr FP,VEC,VSX bit set' to mean that 'the regs are hot' in a
> subsequent patch which means this assumption no longer holds so now we must
> explicitly disable (so as to signal that the FPU/VMX/VSX regs are not hot) for
> children thread.
>
> Sounds like I still haven't got that commit message quite right yet.
I think the older series had more data to help understand the patch.
It would help to move some of them to the current series
Balbir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread
2016-01-27 12:01 ` Balbir Singh
@ 2016-02-09 0:45 ` Cyril Bur
0 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-02-09 0:45 UTC (permalink / raw)
To: Balbir Singh; +Cc: linuxppc-dev, Michael Neuling, Anton Blanchard
On Wed, 27 Jan 2016 23:01:59 +1100
Balbir Singh <bsingharora@gmail.com> wrote:
> On Wed, Jan 27, 2016 at 10:50 AM, Cyril Bur <cyrilbur@gmail.com> wrote:
> > On Mon, 25 Jan 2016 11:04:23 +1100
> > Balbir Singh <bsingharora@gmail.com> wrote:
> >
> >> On Thu, 21 Jan 2016 11:55:44 +1100
> >> Cyril Bur <cyrilbur@gmail.com> wrote:
> >>
> >> > Currently when threads get scheduled off they always giveup the FPU,
> >> > Altivec (VMX) and Vector (VSX) units if they were using them. When they are
> >> > scheduled back on a fault is then taken to enable each facility and load
> >> > registers. As a result explicitly disabling FPU/VMX/VSX has not been
> >> > necessary.
> >> >
> >> > Future changes and optimisations remove this mandatory giveup and fault
> >> > which could cause calls such as clone() and fork() to copy threads and run
> >> > them later with FPU/VMX/VSX enabled but no registers loaded.
> >> >
> >> > This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
> >> > threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
> >> > registers must be loaded. This allows for a smarter return to userspace.
> >> >
> >> > Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
> >> > ---
> >> > arch/powerpc/kernel/process.c | 1 +
> >> > 1 file changed, 1 insertion(+)
> >> >
> >> > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> >> > index dccc87e..e0c3d2d 100644
> >> > --- a/arch/powerpc/kernel/process.c
> >> > +++ b/arch/powerpc/kernel/process.c
> >> > @@ -1307,6 +1307,7 @@ int copy_thread(unsigned long clone_flags, unsigned long usp,
> >> >
> >> > f = ret_from_fork;
> >> > }
> >> > + childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX);
> >>
> >
> > Hi Balbir,
> >
> > Perhaps I'm missing something, are you saying
> >
> >> Ideally you want to use __msr_check_and_clear()
> >>
> >
> > instead of childregs->msr &= ~(MSR_FP|MSR_VEC|MSR_VSX); ? I don't see how that
> > can work...
> >
> > __msr_check_and_clear() operates on the currently active MSR, that is, the msr
> > for the current kernel context. childregs->msr is the value that will be used
> > for that userspace context when the kernel returns. Here we must ensure that
> > that children are created with the bit disabled.
> >
>
> Yes, my bad! I thought the routine took generic bits, hoping to reuse
> the CONFIG_VSX bits. I don't think it helps much, what you have is
> correct.
>
> >> Basically we start with these bits off and then take an exception on use?
> >>
> >
> > Currently yes, this is what I'm trying to change. This patch hasn't been
> > necessary until now as any thread which saves its FPU/VMX/VSX data ALSO
> > disables those bits in regs->msr and so theres no way a clone() or fork() can
> > create a child with MSR_FP or MSR_VEC or MSR_VSX set. I add a meaning to
> > 'having a regs->msr FP,VEC,VSX bit set' to mean that 'the regs are hot' in a
> > subsequent patch which means this assumption no longer holds so now we must
> > explicitly disable (so as to signal that the FPU/VMX/VSX regs are not hot) for
> > children thread.
> >
> > Sounds like I still haven't got that commit message quite right yet.
>
> I think the older series had more data to help understand the patch.
> It would help to move some of them to the current series
>
The previous commit message to this patch was:
With threads leaving the math bits enabled in their saved MSR to indicate
that the hardware is hot and a restore is not needed, children need to turn
it off as when they do get scheduled, there's no way their registers could
have been hot.
Mikey pointed out, that was misleading as it wasn't clear that I was preparing
for future work and that no bug currently exists.
Are you talking about another patch?
I have tried to incorporate more from the old commit message into the new one,
see below:
Currently when threads get scheduled off they always giveup the FPU,
Altivec (VMX) and Vector (VSX) units if they were using them. When they are
scheduled back on a fault is then taken to enable each facility and load
registers. As a result explicitly disabling FPU/VMX/VSX has not been
necessary.
Future changes and optimisations remove this mandatory giveup and fault
which could cause calls such as clone() and fork() to copy threads and run
them later with FPU/VMX/VSX enabled but no registers loaded.
This patch starts the process of having MSR_{FP,VEC,VSX} mean that a
threads registers are hot while not having MSR_{FP,VEC,VSX} means that the
registers must be loaded. This allows for a smarter return to userspace. In the
case of fork() and clone(), children must have their FPU/VMX/VSX facilities
disabled as there is no way they could have been hot.
Mpe: Let me know if you want me to send this as a new series or if you're
happy to incorporate that from here.
Cyril
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v3, 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two
2016-01-21 0:55 ` [PATCH v3 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two Cyril Bur
@ 2016-02-10 7:51 ` Michael Ellerman
0 siblings, 0 replies; 16+ messages in thread
From: Michael Ellerman @ 2016-02-10 7:51 UTC (permalink / raw)
To: Cyril Bur, linuxppc-dev; +Cc: mikey, anton
On Thu, 2016-21-01 at 00:55:46 UTC, Cyril Bur wrote:
> This prepares for the decoupling of saving {fpu,altivec,vsx} registers and
> marking {fpu,altivec,vsx} as being unused by a thread.
>
> Currently giveup_{fpu,altivec,vsx}() does both however optimisations to
> task switching can be made if these two operations are decoupled.
> save_all() will permit the saving of registers to thread structs and leave
> threads MSR with bits enabled.
>
> This patch introduces no functional change.
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 0955b7c..45e37c0 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -349,6 +349,14 @@ void flush_spe_to_thread(struct task_struct *tsk)
> preempt_enable();
> }
> }
> +#else
> +/*
> + * save_all() is going to test MSR_SPE, rather than pull in all the
> + * booke definitions all the time on a books kernel just ensure it exists
> + * but acts as a nop.
> + */
> +
> +#define MSR_SPE 0
> #endif /* CONFIG_SPE */
Building corenet32/64_defconfig this gives me:
arch/powerpc/kernel/process.c:392:0: error: "MSR_SPE" redefined [-Werror]
#define MSR_SPE 0
^
In file included from ./arch/powerpc/include/asm/reg.h:18:0,
from ./arch/powerpc/include/asm/processor.h:13,
from ./arch/powerpc/include/asm/thread_info.h:33,
from include/linux/thread_info.h:54,
from include/asm-generic/preempt.h:4,
from arch/powerpc/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:59,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/uapi/linux/timex.h:56,
from include/linux/timex.h:56,
from include/linux/sched.h:19,
from arch/powerpc/kernel/process.c:18:
./arch/powerpc/include/asm/reg_booke.h:33:0: note: this is the location of the previous definition
#define MSR_SPE __MASK(MSR_SPE_LG)
^
cc1: all warnings being treated as errors
scripts/Makefile.build:258: recipe for target 'arch/powerpc/kernel/process.o' failed
make[1]: *** [arch/powerpc/kernel/process.o] Error 1
make[1]: *** Waiting for unfinished jobs....
Makefile:950: recipe for target 'arch/powerpc/kernel' failed
cheers
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [v3, 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall
2016-01-21 0:55 ` [PATCH v3 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
@ 2016-02-10 8:56 ` Michael Ellerman
0 siblings, 0 replies; 16+ messages in thread
From: Michael Ellerman @ 2016-02-10 8:56 UTC (permalink / raw)
To: Cyril Bur, linuxppc-dev
[offlist]
On Thu, 2016-21-01 at 00:55:41 UTC, Cyril Bur wrote:
> Test that the non volatile floating point and Altivec registers get
> correctly preserved across the fork() syscall.
>
> fork() works nicely for this purpose, the registers should be the same for
> both parent and child
...
> diff --git a/tools/testing/selftests/powerpc/basic_asm.h b/tools/testing/selftests/powerpc/basic_asm.h
> new file mode 100644
> index 0000000..27aca79
> --- /dev/null
> +++ b/tools/testing/selftests/powerpc/basic_asm.h
> @@ -0,0 +1,26 @@
> +#include <ppc-asm.h>
> +#include <asm/unistd.h>
> +
> +#define LOAD_REG_IMMEDIATE(reg,expr) \
> + lis reg,(expr)@highest; \
> + ori reg,reg,(expr)@higher; \
> + rldicr reg,reg,32,31; \
> + oris reg,reg,(expr)@high; \
> + ori reg,reg,(expr)@l;
> +
> +#define PUSH_BASIC_STACK(size) \
> + std 2,24(sp); \
> + mflr r0; \
> + std r0,16(sp); \
> + mfcr r0; \
> + stw r0,8(sp); \
> + stdu sp,-size(sp);
Unless I'm completely misreading this, that's not ABI compliant.
You are supposed to save LR into your caller's stack frame, but the TOC and CR
go into *your* stack frame.
I know that's not super obvious from the ABI doc.
See for example this code generated by GCC:
extern int global_thing;
int foo(int x, int (*fp)(int x))
{
int a, b, c;
a = x;
b = a + global_thing;
c = b + 2;
a = fp(x);
printf("%d %d %d\n", a, b, c);
return a;
}
0000000000000000 <foo>:
0: 00 00 4c 3c addis r2,r12,0
0: R_PPC64_REL16_HA .TOC.
4: 00 00 42 38 addi r2,r2,0
4: R_PPC64_REL16_LO .TOC.+0x4
8: a6 02 08 7c mflr r0
c: f0 ff c1 fb std r30,-16(r1) <- save r30 into (what will be) our stack frame
10: f8 ff e1 fb std r31,-8(r1) <- save r31 into (what will be) our stack frame
14: 78 23 8c 7c mr r12,r4 <- put fp into r12 because we'll be calling its global entry point
18: a6 03 89 7c mtctr r4 <- put fp into CTR
1c: 00 00 22 3d addis r9,r2,0
1c: R_PPC64_TOC16_HA .toc
20: 00 00 29 e9 ld r9,0(r9)
20: R_PPC64_TOC16_LO_DS .toc
24: 10 00 01 f8 std r0,16(r1) <- save r0 (LR) into *our caller's* stack frame
28: 91 ff 21 f8 stdu r1,-112(r1) <- create our stack frame & store the backchain pointer
2c: 18 00 41 f8 std r2,24(r1) <- save r2 (TOC) into *our* stack frame
30: 00 00 e9 83 lwz r31,0(r9)
34: 14 1a ff 7f add r31,r31,r3
38: 21 04 80 4e bctrl <- call fp, it will get it's TOC pointer using r12
3c: 18 00 41 e8 ld r2,24(r1) <- restore our r2 (TOC) from *our* stack frame
40: 02 00 ff 38 addi r7,r31,2
44: b4 07 e6 7f extsw r6,r31
48: b4 07 e7 7c extsw r7,r7
4c: 78 1b 7e 7c mr r30,r3
50: 00 00 82 3c addis r4,r2,0
50: R_PPC64_TOC16_HA .rodata.str1.8
54: 78 f3 c5 7f mr r5,r30
58: 00 00 84 38 addi r4,r4,0
58: R_PPC64_TOC16_LO .rodata.str1.8
5c: 01 00 60 38 li r3,1
60: 01 00 00 48 bl 60 <foo+0x60>
60: R_PPC64_REL24 __printf_chk
64: 00 00 00 60 nop
68: 70 00 21 38 addi r1,r1,112 <- pop our stack frame
6c: 78 f3 c3 7f mr r3,r30
70: 10 00 01 e8 ld r0,16(r1) <- restore r0 (LR) from our caller's stack frame
74: f0 ff c1 eb ld r30,-16(r1) <- restore r30 from (what was) our stack frame
78: f8 ff e1 eb ld r31,-8(r1) <- restore r31 from (what was) our stack frame
7c: a6 03 08 7c mtlr r0 <- restore LR
80: 20 00 80 4e blr <- return
So your macro wants to be more like:
#define PUSH_BASIC_STACK(_size) \
mflr r0; \
std r0,16(r1); \
stdu r1,-_size(r1); \
mfcr r0; \
stw r0,8(r1); \
std r2,24(r1);
Though untested.
Also you're relying on the caller to ensure size is big enough. So maybe it'd
be better if you ensured that, making the size parameter the *extra* size above
the minimum stack frame. eg:
#define PUSH_BASIC_STACK(_extra) \
mflr r0; \
std r0,16(r1); \
stdu r1,-(_extra + 32)(r1); \
mfcr r0; \
stw r0,8(r1); \
std r2,24(r1);
cheers
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2016-02-10 8:56 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21 0:55 [PATCH v3 0/9] FP/VEC/VSX switching optimisations Cyril Bur
2016-01-21 0:55 ` [PATCH v3 1/9] selftests/powerpc: Test the preservation of FPU and VMX regs across syscall Cyril Bur
2016-02-10 8:56 ` [v3, " Michael Ellerman
2016-01-21 0:55 ` [PATCH v3 2/9] selftests/powerpc: Test preservation of FPU and VMX regs across preemption Cyril Bur
2016-01-21 0:55 ` [PATCH v3 3/9] selftests/powerpc: Test FPU and VMX regs in signal ucontext Cyril Bur
2016-01-21 0:55 ` [PATCH v3 4/9] powerpc: Explicitly disable math features when copying thread Cyril Bur
2016-01-25 0:04 ` Balbir Singh
2016-01-26 23:50 ` Cyril Bur
2016-01-27 12:01 ` Balbir Singh
2016-02-09 0:45 ` Cyril Bur
2016-01-21 0:55 ` [PATCH v3 5/9] powerpc: Restore FPU/VEC/VSX if previously used Cyril Bur
2016-01-21 0:55 ` [PATCH v3 6/9] powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in two Cyril Bur
2016-02-10 7:51 ` [v3, " Michael Ellerman
2016-01-21 0:55 ` [PATCH v3 7/9] powerpc: Add the ability to save FPU without giving it up Cyril Bur
2016-01-21 0:55 ` [PATCH v3 8/9] powerpc: Add the ability to save Altivec " Cyril Bur
2016-01-21 0:55 ` [PATCH v3 9/9] powerpc: Add the ability to save VSX " Cyril Bur
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.