All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Consistent TM structures
@ 2016-06-08  4:00 Cyril Bur
  2016-06-08  4:00 ` [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption Cyril Bur
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Cyril Bur @ 2016-06-08  4:00 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, mikey, Anshuman Khandual

Hi,

The reason for this series is outlined in 3/5. I'll reexplain here
quickly.

If userspace doesn't use TM at all then pt_regs, fp_state and vr_state
hold (almost) all the register state of the CPU.

If userspace uses TM then pt_regs is ALWAYS the live state. This may
be a transactional speculative state or if the thread is between
transactions it is just the regular live state. The checkpointed state
(if needed) always exists in ckpt_regs.
This is not true of fp_state and vr_state which MAY hold a live state
when the thread has not entered a transaction but will then contain
checkpointed values once a thread enters a transaction.
transact_fp and transact_vr are used only when a thread is in a
transaction (active or suspended) to keep the live (but speculative)
state.

Here I aim to remove this disconnect and have everything behave like
pt_regs.

For ease of review I've left patches 3, 4 and 5 separate. It probably
makes sense for them to be squashed into one, the naming inconsistency
between 3 and 4 can't be a good idea.

A few apologies for this series:
 - I had to write tests to have an idea what I've done is correct,
they're still a bit rough around the edges.
 - In the process I made more the asm helpers shared as the powerpc/math
selftests had quite a few things I found useful.
 - This pretty much means the 2/5 monster should be a few patches. I'll
split them up.

I didn't want this series held up from initial review while I cleaned
up tests.

Thanks,

Cyril

Cyril Bur (5):
  selftests/powerpc: Check for VSX preservation across userspace
    preemption
  selftests/powerpc: Add test to check TM ucontext creation
  powerpc: tm: Always use fp_state and vr_state to store live registers
  powerpc: tm: Rename transct_(*) to ck(\1)_state
  powerpc: Remove do_load_up_transact_{fpu,altivec}

 arch/powerpc/include/asm/processor.h               |  20 +--
 arch/powerpc/include/asm/tm.h                      |   5 -
 arch/powerpc/kernel/asm-offsets.c                  |  12 +-
 arch/powerpc/kernel/fpu.S                          |  26 ----
 arch/powerpc/kernel/process.c                      |  48 ++-----
 arch/powerpc/kernel/signal.h                       |   8 +-
 arch/powerpc/kernel/signal_32.c                    |  84 ++++++-------
 arch/powerpc/kernel/signal_64.c                    |  59 ++++-----
 arch/powerpc/kernel/tm.S                           |  95 +++++++-------
 arch/powerpc/kernel/traps.c                        |  12 +-
 arch/powerpc/kernel/vector.S                       |  25 ----
 tools/testing/selftests/powerpc/basic_asm.h        |   4 +
 tools/testing/selftests/powerpc/fpu_asm.h          |  72 +++++++++++
 tools/testing/selftests/powerpc/gpr_asm.h          |  96 ++++++++++++++
 tools/testing/selftests/powerpc/math/Makefile      |   4 +-
 tools/testing/selftests/powerpc/math/fpu_asm.S     |  73 +----------
 tools/testing/selftests/powerpc/math/vmx_asm.S     |  85 +------------
 tools/testing/selftests/powerpc/math/vsx_asm.S     |  57 +++++++++
 tools/testing/selftests/powerpc/math/vsx_preempt.c | 140 +++++++++++++++++++++
 tools/testing/selftests/powerpc/tm/Makefile        |   9 +-
 .../powerpc/tm/tm-signal-context-chk-fpu.c         |  94 ++++++++++++++
 .../powerpc/tm/tm-signal-context-chk-gpr.c         |  96 ++++++++++++++
 .../powerpc/tm/tm-signal-context-chk-vmx.c         | 112 +++++++++++++++++
 .../powerpc/tm/tm-signal-context-chk-vsx.c         | 127 +++++++++++++++++++
 .../selftests/powerpc/tm/tm-signal-context-chk.c   | 102 +++++++++++++++
 tools/testing/selftests/powerpc/tm/tm-signal.S     | 105 ++++++++++++++++
 tools/testing/selftests/powerpc/vmx_asm.h          |  98 +++++++++++++++
 tools/testing/selftests/powerpc/vsx_asm.h          |  71 +++++++++++
 28 files changed, 1343 insertions(+), 396 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/fpu_asm.h
 create mode 100644 tools/testing/selftests/powerpc/gpr_asm.h
 create mode 100644 tools/testing/selftests/powerpc/math/vsx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vsx_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal.S
 create mode 100644 tools/testing/selftests/powerpc/vmx_asm.h
 create mode 100644 tools/testing/selftests/powerpc/vsx_asm.h

-- 
2.8.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption
  2016-06-08  4:00 [PATCH 0/5] Consistent TM structures Cyril Bur
@ 2016-06-08  4:00 ` Cyril Bur
  2016-06-09  1:35   ` Daniel Axtens
  2016-06-08  4:00 ` [PATCH 2/5] selftests/powerpc: Add test to check TM ucontext creation Cyril Bur
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2016-06-08  4:00 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, mikey, Anshuman Khandual

Ensure the kernel correctly switches VSX registers correctly. VSX
registers are all volatile, and despite the kernel preserving VSX
across syscalls, it doesn't have to. Test that during interrupts and
timeslices ending the VSX regs remain the same.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 tools/testing/selftests/powerpc/math/Makefile      |   4 +-
 tools/testing/selftests/powerpc/math/vsx_asm.S     |  57 +++++++++
 tools/testing/selftests/powerpc/math/vsx_preempt.c | 140 +++++++++++++++++++++
 tools/testing/selftests/powerpc/vsx_asm.h          |  71 +++++++++++
 4 files changed, 271 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/powerpc/math/vsx_asm.S
 create mode 100644 tools/testing/selftests/powerpc/math/vsx_preempt.c
 create mode 100644 tools/testing/selftests/powerpc/vsx_asm.h

diff --git a/tools/testing/selftests/powerpc/math/Makefile b/tools/testing/selftests/powerpc/math/Makefile
index 5b88875..aa6598b 100644
--- a/tools/testing/selftests/powerpc/math/Makefile
+++ b/tools/testing/selftests/powerpc/math/Makefile
@@ -1,4 +1,4 @@
-TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt vmx_signal
+TEST_PROGS := fpu_syscall fpu_preempt fpu_signal vmx_syscall vmx_preempt vmx_signal vsx_preempt
 
 all: $(TEST_PROGS)
 
@@ -13,6 +13,8 @@ vmx_syscall: vmx_asm.S
 vmx_preempt: vmx_asm.S
 vmx_signal: vmx_asm.S
 
+vsx_preempt: vsx_asm.S
+
 include ../../lib.mk
 
 clean:
diff --git a/tools/testing/selftests/powerpc/math/vsx_asm.S b/tools/testing/selftests/powerpc/math/vsx_asm.S
new file mode 100644
index 0000000..4ceaf37
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vsx_asm.S
@@ -0,0 +1,57 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+#include "../vsx_asm.h"
+
+FUNC_START(check_vsx)
+	PUSH_BASIC_STACK(32)
+	std	r3,STACK_FRAME_PARAM(0)(sp)
+	addi r3, r3, 16 * 12 #Second half of array
+	bl store_vsx
+	ld r3,STACK_FRAME_PARAM(0)(sp)
+	bl vsx_memcmp
+	POP_BASIC_STACK(32)
+	blr
+FUNC_END(check_vsx)
+
+# int preempt_vmx(vector int *varray, int *threads_starting, int *running)
+# On starting will (atomically) decrement threads_starting as a signal that
+# the VMX have been loaded with varray. Will proceed to check the validity of
+# the VMX registers while running is not zero.
+FUNC_START(preempt_vsx)
+	PUSH_BASIC_STACK(512)
+	std r3,STACK_FRAME_PARAM(0)(sp) # vector int *varray
+	std r4,STACK_FRAME_PARAM(1)(sp) # int *threads_starting
+	std r5,STACK_FRAME_PARAM(2)(sp) # int *running
+
+	bl load_vsx
+	nop
+
+	sync
+	# Atomic DEC
+	ld r3,STACK_FRAME_PARAM(1)(sp)
+1:	lwarx r4,0,r3
+	addi r4,r4,-1
+	stwcx. r4,0,r3
+	bne- 1b
+
+2:	ld r3,STACK_FRAME_PARAM(0)(sp)
+	bl check_vsx
+	nop
+	cmpdi r3,0
+	bne 3f
+	ld r4,STACK_FRAME_PARAM(2)(sp)
+	ld r5,0(r4)
+	cmpwi r5,0
+	bne 2b
+
+3:	POP_BASIC_STACK(512)
+	blr
+FUNC_END(preempt_vsx)
diff --git a/tools/testing/selftests/powerpc/math/vsx_preempt.c b/tools/testing/selftests/powerpc/math/vsx_preempt.c
new file mode 100644
index 0000000..706dbaa
--- /dev/null
+++ b/tools/testing/selftests/powerpc/math/vsx_preempt.c
@@ -0,0 +1,140 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test attempts to see if the VSX registers change across preemption.
+ * There is no way to be sure preemption happened so this test just
+ * uses many threads and a long wait. As such, a successful test
+ * doesn't mean much but a failure is bad.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <stdlib.h>
+#include <pthread.h>
+
+#include "utils.h"
+
+/* Time to wait for workers to get preempted (seconds) */
+#define PREEMPT_TIME 20
+/*
+ * Factor by which to multiply number of online CPUs for total number of
+ * worker threads
+ */
+#define THREAD_FACTOR 8
+
+__thread vector int varray[24] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+	{13,14,15,16},{17,18,19,20},{21,22,23,24},
+	{25,26,27,28},{29,30,31,32},{33,34,35,36},
+	{37,38,39,40},{41,42,43,44},{45,46,47,48}};
+
+int threads_starting;
+int running;
+
+extern long preempt_vsx(vector int *varray, int *threads_starting, int *running);
+
+long vsx_memcmp(vector int *a) {
+	vector int zero = {0,0,0,0};
+	int i;
+
+	FAIL_IF(a != varray);
+
+	for(i = 0; i < 12; i++) {
+		if (memcmp(&a[i + 12], &zero, 16) == 0) {
+			fprintf(stderr, "Detected zero from the VSX reg %d\n", i + 12);
+			return 1;
+		}
+	}
+
+	if (memcmp(a, &a[12], 12 * 16)) {
+		long *p = (long *)a;
+		fprintf(stderr, "VSX mismatch\n");
+		for (i = 0; i < 24; i=i+2)
+			fprintf(stderr, "%d: 0x%08lx%08lx | 0x%08lx%08lx\n",
+					i/2 + i%2 + 20, p[i], p[i + 1], p[i + 24], p[i + 25]);
+		return 1;
+	}
+	return 0;
+}
+
+void *preempt_vsx_c(void *p)
+{
+	int i, j;
+	long rc;
+	srand(pthread_self());
+	for (i = 0; i < 12; i++)
+		for (j = 0; j < 4; j++) {
+			varray[i][j] = rand();
+			/* Don't want zero because it hides kernel problems */
+			if (varray[i][j] == 0)
+				j--;
+		}
+	rc = preempt_vsx(varray, &threads_starting, &running);
+	if (rc == 2)
+		fprintf(stderr, "Caught zeros in VSX compares\n");
+	return (void *)rc;
+}
+
+int test_preempt_vsx(void)
+{
+	int i, rc, threads;
+	pthread_t *tids;
+
+	threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
+	tids = malloc(threads * sizeof(pthread_t));
+	FAIL_IF(!tids);
+
+	running = true;
+	threads_starting = threads;
+	for (i = 0; i < threads; i++) {
+		rc = pthread_create(&tids[i], NULL, preempt_vsx_c, NULL);
+		FAIL_IF(rc);
+	}
+
+	setbuf(stdout, NULL);
+	/* Not really nessesary but nice to wait for every thread to start */
+	printf("\tWaiting for %d workers to start...", threads_starting);
+	while(threads_starting)
+		asm volatile("": : :"memory");
+	printf("done\n");
+
+	printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
+	sleep(PREEMPT_TIME);
+	printf("done\n");
+
+	printf("\tStopping workers...");
+	/*
+	 * Working are checking this value every loop. In preempt_vsx 'cmpwi r5,0; bne 2b'.
+	 * r5 will have loaded the value of running.
+	 */
+	running = 0;
+	for (i = 0; i < threads; i++) {
+		void *rc_p;
+		pthread_join(tids[i], &rc_p);
+
+		/*
+		 * Harness will say the fail was here, look at why preempt_vsx
+		 * returned
+		 */
+		if ((long) rc_p)
+			printf("oops\n");
+		FAIL_IF((long) rc_p);
+	}
+	printf("done\n");
+
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	return test_harness(test_preempt_vsx, "vsx_preempt");
+}
diff --git a/tools/testing/selftests/powerpc/vsx_asm.h b/tools/testing/selftests/powerpc/vsx_asm.h
new file mode 100644
index 0000000..d828bfb
--- /dev/null
+++ b/tools/testing/selftests/powerpc/vsx_asm.h
@@ -0,0 +1,71 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "basic_asm.h"
+
+/*
+ * Careful this will 'clobber' vsx (by design), VSX are always
+ * volatile though so unlike vmx this isn't so much of an issue
+ * Still should avoid calling from C
+ */
+FUNC_START(load_vsx)
+	li	r5,0
+	lxvx	vs20,r5,r3
+	addi	r5,r5,16
+	lxvx	vs21,r5,r3
+	addi	r5,r5,16
+	lxvx	vs22,r5,r3
+	addi	r5,r5,16
+	lxvx	vs23,r5,r3
+	addi	r5,r5,16
+	lxvx	vs24,r5,r3
+	addi	r5,r5,16
+	lxvx	vs25,r5,r3
+	addi	r5,r5,16
+	lxvx	vs26,r5,r3
+	addi	r5,r5,16
+	lxvx	vs27,r5,r3
+	addi	r5,r5,16
+	lxvx	vs28,r5,r3
+	addi	r5,r5,16
+	lxvx	vs29,r5,r3
+	addi	r5,r5,16
+	lxvx	vs30,r5,r3
+	addi	r5,r5,16
+	lxvx	vs31,r5,r3
+	blr
+FUNC_END(load_vsx)
+
+FUNC_START(store_vsx)
+	li	r5,0
+	stxvx	vs20,r5,r3
+	addi	r5,r5,16
+	stxvx	vs21,r5,r3
+	addi	r5,r5,16
+	stxvx	vs22,r5,r3
+	addi	r5,r5,16
+	stxvx	vs23,r5,r3
+	addi	r5,r5,16
+	stxvx	vs24,r5,r3
+	addi	r5,r5,16
+	stxvx	vs25,r5,r3
+	addi	r5,r5,16
+	stxvx	vs26,r5,r3
+	addi	r5,r5,16
+	stxvx	vs27,r5,r3
+	addi	r5,r5,16
+	stxvx	vs28,r5,r3
+	addi	r5,r5,16
+	stxvx	vs29,r5,r3
+	addi	r5,r5,16
+	stxvx	vs30,r5,r3
+	addi	r5,r5,16
+	stxvx	vs31,r5,r3
+	blr
+FUNC_END(store_vsx)
-- 
2.8.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/5] selftests/powerpc: Add test to check TM ucontext creation
  2016-06-08  4:00 [PATCH 0/5] Consistent TM structures Cyril Bur
  2016-06-08  4:00 ` [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption Cyril Bur
@ 2016-06-08  4:00 ` Cyril Bur
  2016-06-09  5:12   ` Daniel Axtens
  2016-06-08  4:00 ` [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers Cyril Bur
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2016-06-08  4:00 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, mikey, Anshuman Khandual

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 tools/testing/selftests/powerpc/basic_asm.h        |   4 +
 tools/testing/selftests/powerpc/fpu_asm.h          |  72 ++++++++++++
 tools/testing/selftests/powerpc/gpr_asm.h          |  96 ++++++++++++++++
 tools/testing/selftests/powerpc/math/fpu_asm.S     |  73 +-----------
 tools/testing/selftests/powerpc/math/vmx_asm.S     |  85 +-------------
 tools/testing/selftests/powerpc/tm/Makefile        |   9 +-
 .../powerpc/tm/tm-signal-context-chk-fpu.c         |  94 +++++++++++++++
 .../powerpc/tm/tm-signal-context-chk-gpr.c         |  96 ++++++++++++++++
 .../powerpc/tm/tm-signal-context-chk-vmx.c         | 112 ++++++++++++++++++
 .../powerpc/tm/tm-signal-context-chk-vsx.c         | 127 +++++++++++++++++++++
 .../selftests/powerpc/tm/tm-signal-context-chk.c   | 102 +++++++++++++++++
 tools/testing/selftests/powerpc/tm/tm-signal.S     | 105 +++++++++++++++++
 tools/testing/selftests/powerpc/vmx_asm.h          |  98 ++++++++++++++++
 13 files changed, 920 insertions(+), 153 deletions(-)
 create mode 100644 tools/testing/selftests/powerpc/fpu_asm.h
 create mode 100644 tools/testing/selftests/powerpc/gpr_asm.h
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c
 create mode 100644 tools/testing/selftests/powerpc/tm/tm-signal.S
 create mode 100644 tools/testing/selftests/powerpc/vmx_asm.h

diff --git a/tools/testing/selftests/powerpc/basic_asm.h b/tools/testing/selftests/powerpc/basic_asm.h
index 3349a07..5131059 100644
--- a/tools/testing/selftests/powerpc/basic_asm.h
+++ b/tools/testing/selftests/powerpc/basic_asm.h
@@ -4,6 +4,10 @@
 #include <ppc-asm.h>
 #include <asm/unistd.h>
 
+#define TBEGIN .long 0x7C00051D
+#define TSUSPEND .long 0x7C0005DD
+#define TRESUME .long 0x7C2005DD
+
 #define LOAD_REG_IMMEDIATE(reg,expr) \
 	lis	reg,(expr)@highest;	\
 	ori	reg,reg,(expr)@higher;	\
diff --git a/tools/testing/selftests/powerpc/fpu_asm.h b/tools/testing/selftests/powerpc/fpu_asm.h
new file mode 100644
index 0000000..a73a7a9
--- /dev/null
+++ b/tools/testing/selftests/powerpc/fpu_asm.h
@@ -0,0 +1,72 @@
+#ifndef _SELFTESTS_POWERPC_FPU_ASM_H
+#define _SELFTESTS_POWERPC_FPU_ASM_H
+#include "basic_asm.h"
+
+#define PUSH_FPU(stack_size) \
+	stfd	f31,(stack_size + STACK_FRAME_MIN_SIZE)(%r1); \
+	stfd	f30,(stack_size + STACK_FRAME_MIN_SIZE - 8)(%r1); \
+	stfd	f29,(stack_size + STACK_FRAME_MIN_SIZE - 16)(%r1); \
+	stfd	f28,(stack_size + STACK_FRAME_MIN_SIZE - 24)(%r1); \
+	stfd	f27,(stack_size + STACK_FRAME_MIN_SIZE - 32)(%r1); \
+	stfd	f26,(stack_size + STACK_FRAME_MIN_SIZE - 40)(%r1); \
+	stfd	f25,(stack_size + STACK_FRAME_MIN_SIZE - 48)(%r1); \
+	stfd	f24,(stack_size + STACK_FRAME_MIN_SIZE - 56)(%r1); \
+	stfd	f23,(stack_size + STACK_FRAME_MIN_SIZE - 64)(%r1); \
+	stfd	f22,(stack_size + STACK_FRAME_MIN_SIZE - 72)(%r1); \
+	stfd	f21,(stack_size + STACK_FRAME_MIN_SIZE - 80)(%r1); \
+	stfd	f20,(stack_size + STACK_FRAME_MIN_SIZE - 88)(%r1); \
+	stfd	f19,(stack_size + STACK_FRAME_MIN_SIZE - 96)(%r1); \
+	stfd	f18,(stack_size + STACK_FRAME_MIN_SIZE - 104)(%r1); \
+	stfd	f17,(stack_size + STACK_FRAME_MIN_SIZE - 112)(%r1); \
+	stfd	f16,(stack_size + STACK_FRAME_MIN_SIZE - 120)(%r1); \
+	stfd	f15,(stack_size + STACK_FRAME_MIN_SIZE - 128)(%r1); \
+	stfd	f14,(stack_size + STACK_FRAME_MIN_SIZE - 136)(%r1);
+
+#define POP_FPU(stack_size) \
+	lfd	f31,(stack_size + STACK_FRAME_MIN_SIZE)(%r1); \
+	lfd	f30,(stack_size + STACK_FRAME_MIN_SIZE - 8)(%r1); \
+	lfd	f29,(stack_size + STACK_FRAME_MIN_SIZE - 16)(%r1); \
+	lfd	f28,(stack_size + STACK_FRAME_MIN_SIZE - 24)(%r1); \
+	lfd	f27,(stack_size + STACK_FRAME_MIN_SIZE - 32)(%r1); \
+	lfd	f26,(stack_size + STACK_FRAME_MIN_SIZE - 40)(%r1); \
+	lfd	f25,(stack_size + STACK_FRAME_MIN_SIZE - 48)(%r1); \
+	lfd	f24,(stack_size + STACK_FRAME_MIN_SIZE - 56)(%r1); \
+	lfd	f23,(stack_size + STACK_FRAME_MIN_SIZE - 64)(%r1); \
+	lfd	f22,(stack_size + STACK_FRAME_MIN_SIZE - 72)(%r1); \
+	lfd	f21,(stack_size + STACK_FRAME_MIN_SIZE - 80)(%r1); \
+	lfd	f20,(stack_size + STACK_FRAME_MIN_SIZE - 88)(%r1); \
+	lfd	f19,(stack_size + STACK_FRAME_MIN_SIZE - 96)(%r1); \
+	lfd	f18,(stack_size + STACK_FRAME_MIN_SIZE - 104)(%r1); \
+	lfd	f17,(stack_size + STACK_FRAME_MIN_SIZE - 112)(%r1); \
+	lfd	f16,(stack_size + STACK_FRAME_MIN_SIZE - 120)(%r1); \
+	lfd	f15,(stack_size + STACK_FRAME_MIN_SIZE - 128)(%r1); \
+	lfd	f14,(stack_size + STACK_FRAME_MIN_SIZE - 136)(%r1);
+
+/*
+ * Careful calling this, it will 'clobber' fpu (by design)
+ * Don't call this from C
+ */
+FUNC_START(load_fpu)
+	lfd	f14,0(r3)
+	lfd	f15,8(r3)
+	lfd	f16,16(r3)
+	lfd	f17,24(r3)
+	lfd	f18,32(r3)
+	lfd	f19,40(r3)
+	lfd	f20,48(r3)
+	lfd	f21,56(r3)
+	lfd	f22,64(r3)
+	lfd	f23,72(r3)
+	lfd	f24,80(r3)
+	lfd	f25,88(r3)
+	lfd	f26,96(r3)
+	lfd	f27,104(r3)
+	lfd	f28,112(r3)
+	lfd	f29,120(r3)
+	lfd	f30,128(r3)
+	lfd	f31,136(r3)
+	blr
+FUNC_END(load_fpu)
+
+#endif /* _SELFTESTS_POWERPC_FPU_ASM_H */
+
diff --git a/tools/testing/selftests/powerpc/gpr_asm.h b/tools/testing/selftests/powerpc/gpr_asm.h
new file mode 100644
index 0000000..475fde9
--- /dev/null
+++ b/tools/testing/selftests/powerpc/gpr_asm.h
@@ -0,0 +1,96 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _SELFTESTS_POWERPC_GPR_ASM_H
+#define _SELFTESTS_POWERPC_GPR_ASM_H
+
+#include "basic_asm.h"
+
+#define __PUSH_NVREGS(top_pos); \
+	std r31,(top_pos)(%r1); \
+	std r30,(top_pos - 8)(%r1); \
+	std r29,(top_pos - 16)(%r1); \
+	std r28,(top_pos - 24)(%r1); \
+	std r27,(top_pos - 32)(%r1); \
+	std r26,(top_pos - 40)(%r1); \
+	std r25,(top_pos - 48)(%r1); \
+	std r24,(top_pos - 56)(%r1); \
+	std r23,(top_pos - 64)(%r1); \
+	std r22,(top_pos - 72)(%r1); \
+	std r21,(top_pos - 80)(%r1); \
+	std r20,(top_pos - 88)(%r1); \
+	std r19,(top_pos - 96)(%r1); \
+	std r18,(top_pos - 104)(%r1); \
+	std r17,(top_pos - 112)(%r1); \
+	std r16,(top_pos - 120)(%r1); \
+	std r15,(top_pos - 128)(%r1); \
+	std r14,(top_pos - 136)(%r1)
+
+#define __POP_NVREGS(top_pos); \
+	ld r31,(top_pos)(%r1); \
+	ld r30,(top_pos - 8)(%r1); \
+	ld r29,(top_pos - 16)(%r1); \
+	ld r28,(top_pos - 24)(%r1); \
+	ld r27,(top_pos - 32)(%r1); \
+	ld r26,(top_pos - 40)(%r1); \
+	ld r25,(top_pos - 48)(%r1); \
+	ld r24,(top_pos - 56)(%r1); \
+	ld r23,(top_pos - 64)(%r1); \
+	ld r22,(top_pos - 72)(%r1); \
+	ld r21,(top_pos - 80)(%r1); \
+	ld r20,(top_pos - 88)(%r1); \
+	ld r19,(top_pos - 96)(%r1); \
+	ld r18,(top_pos - 104)(%r1); \
+	ld r17,(top_pos - 112)(%r1); \
+	ld r16,(top_pos - 120)(%r1); \
+	ld r15,(top_pos - 128)(%r1); \
+	ld r14,(top_pos - 136)(%r1)
+
+#define PUSH_NVREGS(stack_size) \
+	__PUSH_NVREGS(stack_size + STACK_FRAME_MIN_SIZE)
+
+/* 18 NV FPU REGS */
+#define PUSH_NVREGS_BELOW_FPU(stack_size) \
+	__PUSH_NVREGS(stack_size + STACK_FRAME_MIN_SIZE - (18 * 8))
+
+#define POP_NVREGS(stack_size) \
+	__POP_NVREGS(stack_size + STACK_FRAME_MIN_SIZE)
+
+/* 18 NV FPU REGS */
+#define POP_NVREGS_BELOW_FPU(stack_size) \
+	__POP_NVREGS(stack_size + STACK_FRAME_MIN_SIZE - (18 * 8))
+
+/*
+ * Careful calling this, it will 'clobber' NVGPRs (by design)
+ * Don't call this from C
+ */
+FUNC_START(load_gpr)
+	ld	r14,0(r3)
+	ld	r15,8(r3)
+	ld	r16,16(r3)
+	ld	r17,24(r3)
+	ld	r18,32(r3)
+	ld	r19,40(r3)
+	ld	r20,48(r3)
+	ld	r21,56(r3)
+	ld	r22,64(r3)
+	ld	r23,72(r3)
+	ld	r24,80(r3)
+	ld	r25,88(r3)
+	ld	r26,96(r3)
+	ld	r27,104(r3)
+	ld	r28,112(r3)
+	ld	r29,120(r3)
+	ld	r30,128(r3)
+	ld	r31,136(r3)
+	blr
+FUNC_END(load_gpr)
+
+
+#endif /* _SELFTESTS_POWERPC_GPR_ASM_H */
diff --git a/tools/testing/selftests/powerpc/math/fpu_asm.S b/tools/testing/selftests/powerpc/math/fpu_asm.S
index f3711d8..241f067 100644
--- a/tools/testing/selftests/powerpc/math/fpu_asm.S
+++ b/tools/testing/selftests/powerpc/math/fpu_asm.S
@@ -8,70 +8,7 @@
  */
 
 #include "../basic_asm.h"
-
-#define PUSH_FPU(pos) \
-	stfd	f14,pos(sp); \
-	stfd	f15,pos+8(sp); \
-	stfd	f16,pos+16(sp); \
-	stfd	f17,pos+24(sp); \
-	stfd	f18,pos+32(sp); \
-	stfd	f19,pos+40(sp); \
-	stfd	f20,pos+48(sp); \
-	stfd	f21,pos+56(sp); \
-	stfd	f22,pos+64(sp); \
-	stfd	f23,pos+72(sp); \
-	stfd	f24,pos+80(sp); \
-	stfd	f25,pos+88(sp); \
-	stfd	f26,pos+96(sp); \
-	stfd	f27,pos+104(sp); \
-	stfd	f28,pos+112(sp); \
-	stfd	f29,pos+120(sp); \
-	stfd	f30,pos+128(sp); \
-	stfd	f31,pos+136(sp);
-
-#define POP_FPU(pos) \
-	lfd	f14,pos(sp); \
-	lfd	f15,pos+8(sp); \
-	lfd	f16,pos+16(sp); \
-	lfd	f17,pos+24(sp); \
-	lfd	f18,pos+32(sp); \
-	lfd	f19,pos+40(sp); \
-	lfd	f20,pos+48(sp); \
-	lfd	f21,pos+56(sp); \
-	lfd	f22,pos+64(sp); \
-	lfd	f23,pos+72(sp); \
-	lfd	f24,pos+80(sp); \
-	lfd	f25,pos+88(sp); \
-	lfd	f26,pos+96(sp); \
-	lfd	f27,pos+104(sp); \
-	lfd	f28,pos+112(sp); \
-	lfd	f29,pos+120(sp); \
-	lfd	f30,pos+128(sp); \
-	lfd	f31,pos+136(sp);
-
-# Careful calling this, it will 'clobber' fpu (by design)
-# Don't call this from C
-FUNC_START(load_fpu)
-	lfd	f14,0(r3)
-	lfd	f15,8(r3)
-	lfd	f16,16(r3)
-	lfd	f17,24(r3)
-	lfd	f18,32(r3)
-	lfd	f19,40(r3)
-	lfd	f20,48(r3)
-	lfd	f21,56(r3)
-	lfd	f22,64(r3)
-	lfd	f23,72(r3)
-	lfd	f24,80(r3)
-	lfd	f25,88(r3)
-	lfd	f26,96(r3)
-	lfd	f27,104(r3)
-	lfd	f28,112(r3)
-	lfd	f29,120(r3)
-	lfd	f30,128(r3)
-	lfd	f31,136(r3)
-	blr
-FUNC_END(load_fpu)
+#include "../fpu_asm.h"
 
 FUNC_START(check_fpu)
 	mr r4,r3
@@ -138,9 +75,9 @@ FUNC_START(test_fpu)
 	# r4 holds pointer to the pid
 	# f14-f31 are non volatiles
 	PUSH_BASIC_STACK(256)
+	PUSH_FPU(256)
 	std	r3,STACK_FRAME_PARAM(0)(sp) # Address of darray
 	std r4,STACK_FRAME_PARAM(1)(sp) # Address of pid
-	PUSH_FPU(STACK_FRAME_LOCAL(2,0))
 
 	bl load_fpu
 	nop
@@ -155,7 +92,7 @@ FUNC_START(test_fpu)
 	bl check_fpu
 	nop
 
-	POP_FPU(STACK_FRAME_LOCAL(2,0))
+	POP_FPU(256)
 	POP_BASIC_STACK(256)
 	blr
 FUNC_END(test_fpu)
@@ -166,10 +103,10 @@ FUNC_END(test_fpu)
 # registers while running is not zero.
 FUNC_START(preempt_fpu)
 	PUSH_BASIC_STACK(256)
+	PUSH_FPU(256)
 	std r3,STACK_FRAME_PARAM(0)(sp) # double *darray
 	std r4,STACK_FRAME_PARAM(1)(sp) # int *threads_starting
 	std r5,STACK_FRAME_PARAM(2)(sp) # int *running
-	PUSH_FPU(STACK_FRAME_LOCAL(3,0))
 
 	bl load_fpu
 	nop
@@ -192,7 +129,7 @@ FUNC_START(preempt_fpu)
 	cmpwi r5,0
 	bne 2b
 
-3:	POP_FPU(STACK_FRAME_LOCAL(3,0))
+3:	POP_FPU(256)
 	POP_BASIC_STACK(256)
 	blr
 FUNC_END(preempt_fpu)
diff --git a/tools/testing/selftests/powerpc/math/vmx_asm.S b/tools/testing/selftests/powerpc/math/vmx_asm.S
index 1b8c248..fd74da4 100644
--- a/tools/testing/selftests/powerpc/math/vmx_asm.S
+++ b/tools/testing/selftests/powerpc/math/vmx_asm.S
@@ -8,90 +8,7 @@
  */
 
 #include "../basic_asm.h"
-
-# POS MUST BE 16 ALIGNED!
-#define PUSH_VMX(pos,reg) \
-	li	reg,pos; \
-	stvx	v20,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v21,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v22,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v23,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v24,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v25,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v26,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v27,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v28,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v29,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v30,reg,sp; \
-	addi	reg,reg,16; \
-	stvx	v31,reg,sp;
-
-# POS MUST BE 16 ALIGNED!
-#define POP_VMX(pos,reg) \
-	li	reg,pos; \
-	lvx	v20,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v21,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v22,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v23,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v24,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v25,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v26,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v27,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v28,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v29,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v30,reg,sp; \
-	addi	reg,reg,16; \
-	lvx	v31,reg,sp;
-
-# Carefull this will 'clobber' vmx (by design)
-# Don't call this from C
-FUNC_START(load_vmx)
-	li	r5,0
-	lvx	v20,r5,r3
-	addi	r5,r5,16
-	lvx	v21,r5,r3
-	addi	r5,r5,16
-	lvx	v22,r5,r3
-	addi	r5,r5,16
-	lvx	v23,r5,r3
-	addi	r5,r5,16
-	lvx	v24,r5,r3
-	addi	r5,r5,16
-	lvx	v25,r5,r3
-	addi	r5,r5,16
-	lvx	v26,r5,r3
-	addi	r5,r5,16
-	lvx	v27,r5,r3
-	addi	r5,r5,16
-	lvx	v28,r5,r3
-	addi	r5,r5,16
-	lvx	v29,r5,r3
-	addi	r5,r5,16
-	lvx	v30,r5,r3
-	addi	r5,r5,16
-	lvx	v31,r5,r3
-	blr
-FUNC_END(load_vmx)
+#include "../vmx_asm.h"
 
 # Should be safe from C, only touches r4, r5 and v0,v1,v2
 FUNC_START(check_vmx)
diff --git a/tools/testing/selftests/powerpc/tm/Makefile b/tools/testing/selftests/powerpc/tm/Makefile
index d0505db..4362666 100644
--- a/tools/testing/selftests/powerpc/tm/Makefile
+++ b/tools/testing/selftests/powerpc/tm/Makefile
@@ -1,4 +1,8 @@
-TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack tm-vmxcopy tm-fork tm-tar tm-tmspr
+SIGNAL_CONTEXT_CHK_TESTS := tm-signal-context-chk-gpr tm-signal-context-chk-fpu \
+	tm-signal-context-chk-vmx tm-signal-context-chk-vsx
+
+TEST_PROGS := tm-resched-dscr tm-syscall tm-signal-msr-resv tm-signal-stack \
+	tm-vmxcopy tm-fork tm-tar tm-tmspr $(SIGNAL_CONTEXT_CHK_TESTS)
 
 all: $(TEST_PROGS)
 
@@ -8,6 +12,9 @@ tm-syscall: tm-syscall-asm.S
 tm-syscall: CFLAGS += -mhtm -I../../../../../usr/include
 tm-tmspr: CFLAGS += -pthread
 
+$(SIGNAL_CONTEXT_CHK_TESTS): tm-signal.S
+$(SIGNAL_CONTEXT_CHK_TESTS): CFLAGS += -mhtm
+
 include ../../lib.mk
 
 clean:
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c
new file mode 100644
index 0000000..776457d
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-fpu.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ * Licensed under GPLv2.
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be delivered
+ * while the thread was in a transaction. Expected behaviour is that the
+ * currently executing code is in the first and the checkpointed state (the
+ * state that will be rolled back to) is in the uc_link ucontext.
+ *
+ * The reason for this is that code which is not TM aware and installs a signal
+ * handler will expect to see/modify its currently running state in the uc,
+ * this code may have dynamicially linked against code which is TM aware and is
+ * doing HTM under the hood.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include "utils.h"
+#include "tm.h"
+
+#define TBEGIN          ".long 0x7C00051D ;"
+#define TSUSPEND        ".long 0x7C0005DD ;"
+#define TRESUME         ".long 0x7C2005DD ;"
+#define MAX_ATTEMPT 100
+
+#define NV_FPU_REGS 18
+
+/* Be sure there are 2x as many as there are NV FPU regs (2x18) */
+static double fps[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
+					   -1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17,-18 };
+
+extern long tm_signal_self_context_load(pid_t pid, long *gps, double *fps, vector int *vms, vector int *vss);
+
+static int signaled;
+static int fail;
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+	int i;
+	ucontext_t *ucp = uc;
+	ucontext_t *tm_ucp = ucp->uc_link;
+
+	signaled = 1;
+
+	for (i = 0; i < NV_FPU_REGS && !fail; i++) {
+		fail = (ucp->uc_mcontext.fp_regs[i + 14] != fps[i]);
+		fail |= (tm_ucp->uc_mcontext.fp_regs[i + 14] != fps[i + NV_FPU_REGS]);
+	}
+	if (fail)
+		printf("Failed on %d FP %g or %g\n", i - 1, ucp->uc_mcontext.fp_regs[i + 13], tm_ucp->uc_mcontext.fp_regs[i + 13]);
+}
+
+static int tm_signal_context_chk_fpu()
+{
+	struct sigaction act;
+	int i;
+	long rc;
+	pid_t pid = getpid();
+
+	SKIP_IF(!have_htm());
+
+	act.sa_sigaction = signal_usr1;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGUSR1, &act, NULL) < 0) {
+		perror("sigaction sigusr1");
+		exit(1);
+	}
+
+	i = 0;
+	while (!signaled && i < MAX_ATTEMPT) {
+		rc = tm_signal_self_context_load(pid, NULL, fps, NULL, NULL);
+		if (!rc)
+			fprintf(stderr, "Transaction was not doomed...\n");
+		FAIL_IF(!rc);
+
+		i++;
+	}
+
+	if (i == MAX_ATTEMPT)
+		fprintf(stderr, "Tried to signal %d times and didn't work, failing!\n", MAX_ATTEMPT);
+	FAIL_IF(i == MAX_ATTEMPT);
+	return fail;
+}
+
+int main(void)
+{
+	return test_harness(tm_signal_context_chk_fpu, "tm_signal_context_chk_fpu");
+}
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
new file mode 100644
index 0000000..22abe48
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
@@ -0,0 +1,96 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ * Licensed under GPLv2.
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be delivered
+ * while the thread was in a transaction. Expected behaviour is that the
+ * currently executing code is in the first and the checkpointed state (the
+ * state that will be rolled back to) is in the uc_link ucontext.
+ *
+ * The reason for this is that code which is not TM aware and installs a signal
+ * handler will expect to see/modify its currently running state in the uc,
+ * this code may have dynamicially linked against code which is TM aware and is
+ * doing HTM under the hood.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include <altivec.h>
+
+#include "utils.h"
+#include "tm.h"
+
+#define TBEGIN          ".long 0x7C00051D ;"
+#define TSUSPEND        ".long 0x7C0005DD ;"
+#define TRESUME         ".long 0x7C2005DD ;"
+#define MAX_ATTEMPT 100
+
+#define NV_GPR_REGS 18
+
+extern long tm_signal_self_context_load(pid_t pid, long *gps, double *fps, vector int *vms, vector int *vss);
+
+static int signaled;
+static int fail;
+
+static long gps[] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
+					 -1,-2,-3,-4,-5,-6,-7,-8,-9,-10,-11,-12,-13,-14,-15,-16,-17,-18};
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+	int i;
+	ucontext_t *ucp = uc;
+	ucontext_t *tm_ucp = ucp->uc_link;
+
+	signaled = 1;
+
+	/* Always be 64bit, don't really care about 32bit */
+	for (i = 0; i < NV_GPR_REGS && !fail; i++) {
+		fail = (ucp->uc_mcontext.gp_regs[i + 14] != gps[i]);
+		fail |= (tm_ucp->uc_mcontext.gp_regs[i + 14] != gps[i + NV_GPR_REGS]);
+	}
+	if (fail)
+		printf("Failed on %d GPR %lu or %lu\n", i - 1,
+				ucp->uc_mcontext.gp_regs[i + 13], tm_ucp->uc_mcontext.gp_regs[i + 13]);
+}
+
+static int tm_signal_context_chk_gpr()
+{
+	struct sigaction act;
+	int i;
+	long rc;
+	pid_t pid = getpid();
+
+	SKIP_IF(!have_htm());
+
+	act.sa_sigaction = signal_usr1;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGUSR1, &act, NULL) < 0) {
+		perror("sigaction sigusr1");
+		exit(1);
+	}
+
+	i = 0;
+	while (!signaled && i < MAX_ATTEMPT) {
+		rc = tm_signal_self_context_load(pid, gps, NULL, NULL, NULL);
+		if (!rc)
+			fprintf(stderr, "Transaction was not doomed...\n");
+		FAIL_IF(!rc);
+		i++;
+	}
+
+	if (i == MAX_ATTEMPT)
+		fprintf(stderr, "Tried to signal %d times and didn't work, failing!\n", MAX_ATTEMPT);
+	FAIL_IF(i == MAX_ATTEMPT);
+	return fail;
+}
+
+int main(void)
+{
+	return test_harness(tm_signal_context_chk_gpr, "tm_signal_context_chk_gpr");
+}
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c
new file mode 100644
index 0000000..fd58369
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vmx.c
@@ -0,0 +1,112 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ * Licensed under GPLv2.
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be delivered
+ * while the thread was in a transaction. Expected behaviour is that the
+ * currently executing code is in the first and the checkpointed state (the
+ * state that will be rolled back to) is in the uc_link ucontext.
+ *
+ * The reason for this is that code which is not TM aware and installs a signal
+ * handler will expect to see/modify its currently running state in the uc,
+ * this code may have dynamicially linked against code which is TM aware and is
+ * doing HTM under the hood.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include <altivec.h>
+
+#include "utils.h"
+#include "tm.h"
+
+#define TBEGIN          ".long 0x7C00051D ;"
+#define TSUSPEND        ".long 0x7C0005DD ;"
+#define TRESUME         ".long 0x7C2005DD ;"
+#define MAX_ATTEMPT 100
+
+extern long tm_signal_self_context_load(pid_t pid, long *gps, double *fps, vector int *vms, vector int *vss);
+
+static int signaled;
+static int fail;
+
+vector int vms[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+	{13,14,15,16},{17,18,19,20},{21,22,23,24},
+	{25,26,27,28},{29,30,31,32},{33,34,35,36},
+	{37,38,39,40},{41,42,43,44},{45,46,47,48},
+	{-1, -2, -3, -4}, {-5, -6, -7, -8}, {-9, -10,-11,-12},
+	{-13,-14,-15,-16},{-17,-18,-19,-20},{-21,-22,-23,-24},
+	{-25,-26,-27,-28},{-29,-30,-31,-32},{-33,-34,-35,-36},
+	{-37,-38,-39,-40},{-41,-42,-43,-44},{-45,-46,-47,-48}};
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+	int i;
+	ucontext_t *ucp = uc;
+	ucontext_t *tm_ucp = ucp->uc_link;
+
+	signaled = 1;
+
+	/* Always be 64bit, don't really care about 32bit */
+	for (i = 0; i < 12 && !fail; i++) {
+		fail = memcmp(ucp->uc_mcontext.v_regs->vrregs[i + 20], &vms[i], 16);
+		fail |= memcmp(tm_ucp->uc_mcontext.v_regs->vrregs[i + 20], &vms[i + 12], 16);
+	}
+	if (fail) {
+		int j;
+
+		fprintf(stderr, "Failed on %d vmx 0x", i - 1);
+		for (j = 0; j < 4; j++)
+			fprintf(stderr, "%08x", ucp->uc_mcontext.v_regs->vrregs[i + 19][j]);
+		fprintf(stderr, " vs 0x");
+		for (j = 0 ; j < 4; j++)
+			fprintf(stderr, "%08x", tm_ucp->uc_mcontext.v_regs->vrregs[i + 19][j]);
+		fprintf(stderr, "\n");
+		return;
+	}
+}
+
+static int tm_signal_context_chk()
+{
+	struct sigaction act;
+	int i;
+	long rc;
+	pid_t pid = getpid();
+
+	SKIP_IF(!have_htm());
+
+	act.sa_sigaction = signal_usr1;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGUSR1, &act, NULL) < 0) {
+		perror("sigaction sigusr1");
+		exit(1);
+	}
+
+	i = 0;
+	while (!signaled && i < MAX_ATTEMPT) {
+		rc = tm_signal_self_context_load(pid, NULL, NULL, vms, NULL);
+		if (!rc) {
+			fprintf(stderr, "Transaction was not doomed...\n");
+			FAIL_IF(!rc);
+		}
+		i++;
+	}
+
+	if (i == MAX_ATTEMPT) {
+		fprintf(stderr, "Tried to signal %d times and didn't work, failing!\n", MAX_ATTEMPT);
+		fail = 1;
+	}
+	return fail;
+}
+
+int main(void)
+{
+	return test_harness(tm_signal_context_chk, "tm_signal_context_chk_vmx");
+}
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c
new file mode 100644
index 0000000..9685881
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-vsx.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ * Licensed under GPLv2.
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be delivered
+ * while the thread was in a transaction. Expected behaviour is that the
+ * currently executing code is in the first and the checkpointed state (the
+ * state that will be rolled back to) is in the uc_link ucontext.
+ *
+ * The reason for this is that code which is not TM aware and installs a signal
+ * handler will expect to see/modify its currently running state in the uc,
+ * this code may have dynamicially linked against code which is TM aware and is
+ * doing HTM under the hood.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include <altivec.h>
+
+#include "utils.h"
+#include "tm.h"
+
+#define TBEGIN          ".long 0x7C00051D ;"
+#define TSUSPEND        ".long 0x7C0005DD ;"
+#define TRESUME         ".long 0x7C2005DD ;"
+#define MAX_ATTEMPT 100
+
+extern long tm_signal_self_context_load(pid_t pid, long *gps, double *fps, vector int *vms, vector int *vss);
+
+static int signaled;
+static int fail;
+
+vector int vss[] = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10,11,12},
+	{13,14,15,16},{17,18,19,20},{21,22,23,24},
+	{25,26,27,28},{29,30,31,32},{33,34,35,36},
+	{37,38,39,40},{41,42,43,44},{45,46,47,48},
+	{-1, -2, -3, -4}, {-5, -6, -7, -8}, {-9, -10,-11,-12},
+	{-13,-14,-15,-16},{-17,-18,-19,-20},{-21,-22,-23,-24},
+	{-25,-26,-27,-28},{-29,-30,-31,-32},{-33,-34,-35,-36},
+	{-37,-38,-39,-40},{-41,-42,-43,-44},{-45,-46,-47,-48}};
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+	int i;
+	uint8_t vsc[16];
+	uint8_t vst[16];
+	ucontext_t *ucp = uc;
+	ucontext_t *tm_ucp = ucp->uc_link;
+
+	signaled = 1;
+
+	/*
+	 * The other half of the VSX regs will be after v_regs.
+	 *
+	 * In short, vmx_reserve array holds everything. v_regs is a 16
+	 * byte aligned pointer at the start of vmx_reserve (vmx_reserve
+	 * may or may not be 16 aligned) where the v_regs structure exists.
+	 * (half of) The VSX regsters are directly after v_regs so the
+	 * easiest way to find them below.
+	 */
+	long *vsx_ptr = (long *)(ucp->uc_mcontext.v_regs + 1);
+	long *tm_vsx_ptr = (long *)(tm_ucp->uc_mcontext.v_regs + 1);
+	/* Always be 64bit, don't really care about 32bit */
+	for (i = 0; i < 12 && !fail; i++) {
+		memcpy(vsc, &ucp->uc_mcontext.fp_regs[i + 20], 8);
+		memcpy(vsc + 8, &vsx_ptr[20 + i], 8);
+		fail = memcmp(vsc, &vss[i], 16);
+		memcpy(vst, &tm_ucp->uc_mcontext.fp_regs[i + 20], 8);
+		memcpy(vst + 8, &tm_vsx_ptr[20 + i], 8);
+		fail |= memcmp(vst, &vss[i + 12], 16);
+	}
+	if (fail) {
+		fprintf(stderr, "Failed on %d vsx 0x", i - 1);
+		for (i = 0; i < 16; i++)
+			fprintf(stderr, "%02x", vsc[i]);
+		fprintf(stderr, " vs 0x");
+		for (i = 0; i < 16; i++)
+			fprintf(stderr, "%02x", vst[i]);
+		fprintf(stderr, "\n");
+		return;
+	}
+}
+
+static int tm_signal_context_chk()
+{
+	struct sigaction act;
+	int i;
+	long rc;
+	pid_t pid = getpid();
+
+	SKIP_IF(!have_htm());
+
+	act.sa_sigaction = signal_usr1;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGUSR1, &act, NULL) < 0) {
+		perror("sigaction sigusr1");
+		exit(1);
+	}
+
+	i = 0;
+	while (!signaled && i < MAX_ATTEMPT) {
+		rc = tm_signal_self_context_load(pid, NULL, NULL, NULL, vss);
+		if (!rc) {
+			fprintf(stderr, "Transaction was not doomed...\n");
+			FAIL_IF(!rc);
+		}
+		i++;
+	}
+
+	if (i == MAX_ATTEMPT) {
+		fprintf(stderr, "Tried to signal %d times and didn't work, failing!\n", MAX_ATTEMPT);
+		fail = 1;
+	}
+	return fail;
+}
+
+int main(void)
+{
+	return test_harness(tm_signal_context_chk, "tm_signal_context_chk_vsx");
+}
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c
new file mode 100644
index 0000000..4c906cf
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk.c
@@ -0,0 +1,102 @@
+/*
+ * Copyright 2016, Cyril Bur, IBM Corp.
+ * Licensed under GPLv2.
+ *
+ * Test the kernel's signal frame code.
+ *
+ * The kernel sets up two sets of ucontexts if the signal was to be delivered
+ * while the thread was in a transaction. Expected behaviour is that the
+ * currently executing code is in the first and the checkpointed state (the
+ * state that will be rolled back to) is in the uc_link ucontext.
+ *
+ * The reason for this is that code which is not TM aware and installs a signal
+ * handler will expect to see/modify its currently running state in the uc,
+ * this code may have dynamicially linked against code which is TM aware and is
+ * doing HTM under the hood.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <signal.h>
+#include <unistd.h>
+
+#include "utils.h"
+#include "tm.h"
+
+#define TBEGIN          ".long 0x7C00051D ;"
+#define TSUSPEND        ".long 0x7C0005DD ;"
+#define TRESUME         ".long 0x7C2005DD ;"
+#define MAX_ATTEMPT 100
+
+static double fps[] = { 1, 2, 3, 4, 5, 6, 7, 8,
+						-1, -2, -3, -4, -5, -6, -7, -8 };
+
+extern long tm_signal_self(pid_t pid, double *fps);
+
+static int signaled;
+static int fail;
+
+static void signal_usr1(int signum, siginfo_t *info, void *uc)
+{
+	int i;
+	ucontext_t *ucp = uc;
+	ucontext_t *tm_ucp = ucp->uc_link;
+
+	signaled = 1;
+
+	/* Always be 64bit, don't really care about 32bit */
+	for (i = 0; i < 8 && !fail; i++) {
+		fail = (ucp->uc_mcontext.gp_regs[i + 14] != i);
+		fail |= (tm_ucp->uc_mcontext.gp_regs[i + 14] != 0xFF - i);
+	}
+	if (fail) {
+		printf("Failed on %d gpr %lu or %lu\n", i - 1, ucp->uc_mcontext.gp_regs[i + 13], tm_ucp->uc_mcontext.gp_regs[i + 13]);
+		return;
+	}
+	for (i = 0; i < 8 && !fail; i++) {
+		fail = (ucp->uc_mcontext.fp_regs[i + 14] != fps[i]);
+		fail |= (tm_ucp->uc_mcontext.fp_regs[i + 14] != fps[i + 8]);
+	}
+	if (fail) {
+		printf("Failed on %d FP %g or %g\n", i - 1, ucp->uc_mcontext.fp_regs[i + 13], tm_ucp->uc_mcontext.fp_regs[i + 13]);
+	}
+}
+
+static int tm_signal_context_chk()
+{
+	struct sigaction act;
+	int i;
+	long rc;
+	pid_t pid = getpid();
+
+	SKIP_IF(!have_htm());
+
+	act.sa_sigaction = signal_usr1;
+	sigemptyset(&act.sa_mask);
+	act.sa_flags = SA_SIGINFO;
+	if (sigaction(SIGUSR1, &act, NULL) < 0) {
+		perror("sigaction sigusr1");
+		exit(1);
+	}
+
+	i = 0;
+	while (!signaled && i < MAX_ATTEMPT) {
+		rc = tm_signal_self(pid, fps);
+		if (!rc) {
+			fprintf(stderr, "Transaction was not doomed...\n");
+			FAIL_IF(!rc);
+		}
+		i++;
+	}
+
+	if (i == MAX_ATTEMPT) {
+		fprintf(stderr, "Tried to signal %d times and didn't work, failing!\n", MAX_ATTEMPT);
+		fail = 1;
+	}
+	return fail;
+}
+
+int main(void)
+{
+	return test_harness(tm_signal_context_chk, "tm_signal_context_chk");
+}
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal.S b/tools/testing/selftests/powerpc/tm/tm-signal.S
new file mode 100644
index 0000000..c9e7d1e
--- /dev/null
+++ b/tools/testing/selftests/powerpc/tm/tm-signal.S
@@ -0,0 +1,105 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "../basic_asm.h"
+#include "../gpr_asm.h"
+#include "../fpu_asm.h"
+#include "../vmx_asm.h"
+#include "../vsx_asm.h"
+
+/* long tm_signal_self(pid_t pid, long *gprs, double *fps, vector *vms, vector *vss); */
+FUNC_START(tm_signal_self_context_load)
+	PUSH_BASIC_STACK(512)
+	/*
+	 * Don't strictly need to save and restore as it depends on if
+	 * we're going to use them, however this reduces messy logic
+	 */
+	PUSH_VMX(STACK_FRAME_LOCAL(5,0),r8)
+	PUSH_FPU(512)
+	PUSH_NVREGS_BELOW_FPU(512)
+	std r3, STACK_FRAME_PARAM(0)(sp) /* pid */
+	std r4, STACK_FRAME_PARAM(1)(sp) /* gps */
+	std r5, STACK_FRAME_PARAM(2)(sp) /* fps */
+	std r6, STACK_FRAME_PARAM(3)(sp) /* vms */
+	std r7, STACK_FRAME_PARAM(4)(sp) /* vss */
+
+	ld r3, STACK_FRAME_PARAM(1)(sp)
+	cmpdi r3, 0
+	beq skip_gpr_lc
+	bl load_gpr
+skip_gpr_lc:
+	ld r3, STACK_FRAME_PARAM(2)(sp)
+	cmpdi	r3, 0
+	beq	skip_fpu_lc
+	bl load_fpu
+skip_fpu_lc:
+	ld r3, STACK_FRAME_PARAM(3)(sp)
+	cmpdi r3, 0
+	beq	skip_vmx_lc
+	bl load_vmx
+skip_vmx_lc:
+	ld r3, STACK_FRAME_PARAM(4)(sp)
+	cmpdi	r3, 0
+	beq	skip_vsx_lc
+	bl load_vsx
+skip_vsx_lc:
+	/* Set r3 (return value) before TBEGIN. Use the pid as a known
+	 * 'all good' return value, zero is used to indicate a non-doomed
+	 * transaction.
+	 */
+	ld	r3, STACK_FRAME_PARAM(0)(sp)
+	TBEGIN
+	beq	1f
+	ld	r3, STACK_FRAME_PARAM(1)(sp)
+	cmpdi	r3, 0
+	/* Get the second half of the array */
+	addi	r3, r3, 8 * 18
+	beq skip_gpr_lt
+	bl load_gpr
+skip_gpr_lt:
+	ld r3, STACK_FRAME_PARAM(2)(sp)
+	cmpdi	r3, 0
+	beq	skip_fpu_lt
+	/* Get the second half of the array */
+	addi	r3, r3, 8 * 18
+	bl load_fpu
+skip_fpu_lt:
+	ld r3, STACK_FRAME_PARAM(3)(sp)
+	cmpdi r3, 0
+	beq	skip_vmx_lt
+	/* Get the second half of the array */
+	addi	r3, r3, 16 * 12
+	bl load_vmx
+skip_vmx_lt:
+	ld r3, STACK_FRAME_PARAM(4)(sp)
+	cmpdi	r3, 0
+	beq	skip_vsx_lt
+	/* Get the second half of the array */
+	addi	r3, r3, 16 * 12
+	bl load_vsx
+skip_vsx_lt:
+	TSUSPEND /* Can't enter a syscall transactionally, hardware won't let us */
+	li	r0, 37 /* sys_kill */
+	ld r3, STACK_FRAME_PARAM(0)(sp)
+	li r4, 10 /* SIGUSR1 */
+	sc /* Taking the signal will doom the transaction */
+	TRESUME
+	/*
+	 * This will cause us to resume doomed transaction and cause
+	 * hardware to cleanup, we'll end up at 1: anything between
+	 * TRESUME and 1: shouldn't ever run.
+	 */
+	li r3, 0
+	1:
+	POP_VMX(STACK_FRAME_LOCAL(5,0),r4)
+	POP_FPU(512)
+	POP_NVREGS_BELOW_FPU(512)
+	POP_BASIC_STACK(512)
+	blr
+FUNC_END(tm_signal_self_context_load)
diff --git a/tools/testing/selftests/powerpc/vmx_asm.h b/tools/testing/selftests/powerpc/vmx_asm.h
new file mode 100644
index 0000000..461845dd
--- /dev/null
+++ b/tools/testing/selftests/powerpc/vmx_asm.h
@@ -0,0 +1,98 @@
+/*
+ * Copyright 2015, Cyril Bur, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "basic_asm.h"
+
+/* POS MUST BE 16 ALIGNED! */
+#define PUSH_VMX(pos,reg) \
+	li	reg,pos; \
+	stvx	v20,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v21,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v22,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v23,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v24,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v25,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v26,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v27,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v28,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v29,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v30,reg,%r1; \
+	addi	reg,reg,16; \
+	stvx	v31,reg,%r1;
+
+/* POS MUST BE 16 ALIGNED! */
+#define POP_VMX(pos,reg) \
+	li	reg,pos; \
+	lvx	v20,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v21,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v22,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v23,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v24,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v25,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v26,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v27,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v28,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v29,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v30,reg,%r1; \
+	addi	reg,reg,16; \
+	lvx	v31,reg,%r1;
+
+/*
+ * Careful this will 'clobber' vmx (by design)
+ * Don't call this from C
+ */
+FUNC_START(load_vmx)
+	li	r5,0
+	lvx	v20,r5,r3
+	addi	r5,r5,16
+	lvx	v21,r5,r3
+	addi	r5,r5,16
+	lvx	v22,r5,r3
+	addi	r5,r5,16
+	lvx	v23,r5,r3
+	addi	r5,r5,16
+	lvx	v24,r5,r3
+	addi	r5,r5,16
+	lvx	v25,r5,r3
+	addi	r5,r5,16
+	lvx	v26,r5,r3
+	addi	r5,r5,16
+	lvx	v27,r5,r3
+	addi	r5,r5,16
+	lvx	v28,r5,r3
+	addi	r5,r5,16
+	lvx	v29,r5,r3
+	addi	r5,r5,16
+	lvx	v30,r5,r3
+	addi	r5,r5,16
+	lvx	v31,r5,r3
+	blr
+FUNC_END(load_vmx)
+
+
-- 
2.8.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers
  2016-06-08  4:00 [PATCH 0/5] Consistent TM structures Cyril Bur
  2016-06-08  4:00 ` [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption Cyril Bur
  2016-06-08  4:00 ` [PATCH 2/5] selftests/powerpc: Add test to check TM ucontext creation Cyril Bur
@ 2016-06-08  4:00 ` Cyril Bur
  2016-06-28  3:53   ` Simon Guo
  2016-07-17  3:25   ` Simon Guo
  2016-06-08  4:00 ` [PATCH 4/5] powerpc: tm: Rename transct_(*) to ck(\1)_state Cyril Bur
  2016-06-08  4:00 ` [PATCH 5/5] powerpc: Remove do_load_up_transact_{fpu,altivec} Cyril Bur
  4 siblings, 2 replies; 16+ messages in thread
From: Cyril Bur @ 2016-06-08  4:00 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, mikey, Anshuman Khandual

There is currently an inconsistency as to how the entire CPU register
state is saved and restored when a thread uses transactional memory
(TM).

Using transactional memory results in the CPU having duplicated
(almost all) of its register state. This duplication results in a set
of registers which can be considered 'live', those being currently
modified by the instructions being executed and another set that is
frozen at a point in time.

On context switch, both sets of state have to be saved and (later)
restored. These two states are often called a variety of different
things. Common terms for the state which only exists after has entered
a transaction (performed a TBEGIN instruction) in hardware is the
'transactional' or 'speculative'.

Between a TBEGIN and a TEND or TABORT (or an event that causes the
hardware to abort), regardless of the use of TSUSPEND the
transactional state can be referred to as the live state.

The second state is often to referred to as the 'checkpointed' state
and is a duplication of the live state when the TBEGIN instruction is
executed. This state is kept in the hardware and will be rolled back
to on transaction failure.

Currently all the registers stored in pt_regs are ALWAYS the live
registers, that is, when a thread has transactional registers their
values are stored in pt_regs and the checkpointed state is in
ckpt_regs. A strange opposite is true for fp_state. When a thread is
non transactional fp_state holds the live registers. When a thread has
initiated a transaction fp_state holds the checkpointed state and
transact_fp becomes the structure which holds the live state (at this
point it is a transactional state). The same is true for vr_state

This method creates confusion as to where the live state is, in some
circumstances it requires extra work to determine where to put the
live state and prevents the use of common functions designed (probably
before TM) to save the live state.

With this patch pt_regs, fp_state and vr_state all represent the same
thing and the other structures [pending rename] are for checkpointed
state.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/kernel/process.c   | 44 +++++--------------
 arch/powerpc/kernel/signal_32.c | 50 ++++++++++------------
 arch/powerpc/kernel/signal_64.c | 53 +++++++++++------------
 arch/powerpc/kernel/tm.S        | 95 ++++++++++++++++++++++-------------------
 arch/powerpc/kernel/traps.c     | 12 ++++--
 5 files changed, 116 insertions(+), 138 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index ea8a28f..696e0236 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -763,24 +763,12 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 {
 	unsigned long msr_diff = 0;
 
-	/*
-	 * If FP/VSX registers have been already saved to the
-	 * thread_struct, move them to the transact_fp array.
-	 * We clear the TIF_RESTORE_TM bit since after the reclaim
-	 * the thread will no longer be transactional.
-	 */
 	if (test_ti_thread_flag(ti, TIF_RESTORE_TM)) {
-		msr_diff = thr->ckpt_regs.msr & ~thr->regs->msr;
-		if (msr_diff & MSR_FP)
-			memcpy(&thr->transact_fp, &thr->fp_state,
-			       sizeof(struct thread_fp_state));
-		if (msr_diff & MSR_VEC)
-			memcpy(&thr->transact_vr, &thr->vr_state,
-			       sizeof(struct thread_vr_state));
+		msr_diff = (thr->ckpt_regs.msr & ~thr->regs->msr)
+			& (MSR_FP | MSR_VEC | MSR_VSX | MSR_FE0 | MSR_FE1);
+
 		clear_ti_thread_flag(ti, TIF_RESTORE_TM);
-		msr_diff &= MSR_FP | MSR_VEC | MSR_VSX | MSR_FE0 | MSR_FE1;
 	}
-
 	/*
 	 * Use the current MSR TM suspended bit to track if we have
 	 * checkpointed state outstanding.
@@ -799,6 +787,8 @@ static void tm_reclaim_thread(struct thread_struct *thr,
 	if (!MSR_TM_SUSPENDED(mfmsr()))
 		return;
 
+	save_all(container_of(thr, struct task_struct, thread));
+
 	tm_reclaim(thr, thr->regs->msr, cause);
 
 	/* Having done the reclaim, we now have the checkpointed
@@ -901,7 +891,7 @@ static inline void tm_recheckpoint_new_task(struct task_struct *new)
 	 * If the task was using FP, we non-lazily reload both the original and
 	 * the speculative FP register states.  This is because the kernel
 	 * doesn't see if/when a TM rollback occurs, so if we take an FP
-	 * unavoidable later, we are unable to determine which set of FP regs
+	 * unavailable later, we are unable to determine which set of FP regs
 	 * need to be restored.
 	 */
 	if (!new->thread.regs)
@@ -917,24 +907,10 @@ static inline void tm_recheckpoint_new_task(struct task_struct *new)
 		 "(new->msr 0x%lx, new->origmsr 0x%lx)\n",
 		 new->pid, new->thread.regs->msr, msr);
 
-	/* This loads the checkpointed FP/VEC state, if used */
 	tm_recheckpoint(&new->thread, msr);
 
-	/* This loads the speculative FP/VEC state, if used */
-	if (msr & MSR_FP) {
-		do_load_up_transact_fpu(&new->thread);
-		new->thread.regs->msr |=
-			(MSR_FP | new->thread.fpexc_mode);
-	}
-#ifdef CONFIG_ALTIVEC
-	if (msr & MSR_VEC) {
-		do_load_up_transact_altivec(&new->thread);
-		new->thread.regs->msr |= MSR_VEC;
-	}
-#endif
-	/* We may as well turn on VSX too since all the state is restored now */
-	if (msr & MSR_VSX)
-		new->thread.regs->msr |= MSR_VSX;
+	/* Won't restore math get called later? */
+	restore_math(new->thread.regs);
 
 	TM_DEBUG("*** tm_recheckpoint of pid %d complete "
 		 "(kernel msr 0x%lx)\n",
@@ -1108,11 +1084,11 @@ struct task_struct *__switch_to(struct task_struct *prev,
 	 */
 	save_sprs(&prev->thread);
 
-	__switch_to_tm(prev);
-
 	/* Save FPU, Altivec, VSX and SPE state */
 	giveup_all(prev);
 
+	__switch_to_tm(prev);
+
 	/*
 	 * We can't take a PMU exception inside _switch() since there is a
 	 * window where the kernel stack SLB and the kernel stack are out
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index b6aa378..d106b91 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -525,9 +525,6 @@ static int save_tm_user_regs(struct pt_regs *regs,
 	 */
 	regs->msr &= ~MSR_TS_MASK;
 
-	/* Make sure floating point registers are stored in regs */
-	flush_fp_to_thread(current);
-
 	/* Save both sets of general registers */
 	if (save_general_regs(&current->thread.ckpt_regs, frame)
 	    || save_general_regs(regs, tm_frame))
@@ -545,18 +542,17 @@ static int save_tm_user_regs(struct pt_regs *regs,
 #ifdef CONFIG_ALTIVEC
 	/* save altivec registers */
 	if (current->thread.used_vr) {
-		flush_altivec_to_thread(current);
-		if (__copy_to_user(&frame->mc_vregs, &current->thread.vr_state,
+		if (__copy_to_user(&frame->mc_vregs, &current->thread.transact_vr,
 				   ELF_NVRREG * sizeof(vector128)))
 			return 1;
 		if (msr & MSR_VEC) {
 			if (__copy_to_user(&tm_frame->mc_vregs,
-					   &current->thread.transact_vr,
+					   &current->thread.vr_state,
 					   ELF_NVRREG * sizeof(vector128)))
 				return 1;
 		} else {
 			if (__copy_to_user(&tm_frame->mc_vregs,
-					   &current->thread.vr_state,
+					   &current->thread.transact_vr,
 					   ELF_NVRREG * sizeof(vector128)))
 				return 1;
 		}
@@ -573,28 +569,28 @@ static int save_tm_user_regs(struct pt_regs *regs,
 	 * most significant bits of that same vector. --BenH
 	 */
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-		current->thread.vrsave = mfspr(SPRN_VRSAVE);
-	if (__put_user(current->thread.vrsave,
+		current->thread.transact_vrsave = mfspr(SPRN_VRSAVE);
+	if (__put_user(current->thread.transact_vrsave,
 		       (u32 __user *)&frame->mc_vregs[32]))
 		return 1;
 	if (msr & MSR_VEC) {
-		if (__put_user(current->thread.transact_vrsave,
+		if (__put_user(current->thread.vrsave,
 			       (u32 __user *)&tm_frame->mc_vregs[32]))
 			return 1;
 	} else {
-		if (__put_user(current->thread.vrsave,
+		if (__put_user(current->thread.transact_vrsave,
 			       (u32 __user *)&tm_frame->mc_vregs[32]))
 			return 1;
 	}
 #endif /* CONFIG_ALTIVEC */
 
-	if (copy_fpr_to_user(&frame->mc_fregs, current))
+	if (copy_transact_fpr_to_user(&frame->mc_fregs, current))
 		return 1;
 	if (msr & MSR_FP) {
-		if (copy_transact_fpr_to_user(&tm_frame->mc_fregs, current))
+		if (copy_fpr_to_user(&tm_frame->mc_fregs, current))
 			return 1;
 	} else {
-		if (copy_fpr_to_user(&tm_frame->mc_fregs, current))
+		if (copy_transact_fpr_to_user(&tm_frame->mc_fregs, current))
 			return 1;
 	}
 
@@ -606,15 +602,14 @@ static int save_tm_user_regs(struct pt_regs *regs,
 	 * contains valid data
 	 */
 	if (current->thread.used_vsr) {
-		flush_vsx_to_thread(current);
-		if (copy_vsx_to_user(&frame->mc_vsregs, current))
+		if (copy_transact_vsx_to_user(&frame->mc_vsregs, current))
 			return 1;
 		if (msr & MSR_VSX) {
-			if (copy_transact_vsx_to_user(&tm_frame->mc_vsregs,
+			if (copy_vsx_to_user(&tm_frame->mc_vsregs,
 						      current))
 				return 1;
 		} else {
-			if (copy_vsx_to_user(&tm_frame->mc_vsregs, current))
+			if (copy_transact_vsx_to_user(&tm_frame->mc_vsregs, current))
 				return 1;
 		}
 
@@ -793,9 +788,9 @@ static long restore_tm_user_regs(struct pt_regs *regs,
 	regs->msr &= ~MSR_VEC;
 	if (msr & MSR_VEC) {
 		/* restore altivec registers from the stack */
-		if (__copy_from_user(&current->thread.vr_state, &sr->mc_vregs,
+		if (__copy_from_user(&current->thread.transact_vr, &sr->mc_vregs,
 				     sizeof(sr->mc_vregs)) ||
-		    __copy_from_user(&current->thread.transact_vr,
+		    __copy_from_user(&current->thread.vr_state,
 				     &tm_sr->mc_vregs,
 				     sizeof(sr->mc_vregs)))
 			return 1;
@@ -807,13 +802,13 @@ static long restore_tm_user_regs(struct pt_regs *regs,
 	}
 
 	/* Always get VRSAVE back */
-	if (__get_user(current->thread.vrsave,
+	if (__get_user(current->thread.transact_vrsave,
 		       (u32 __user *)&sr->mc_vregs[32]) ||
-	    __get_user(current->thread.transact_vrsave,
+	    __get_user(current->thread.vrsave,
 		       (u32 __user *)&tm_sr->mc_vregs[32]))
 		return 1;
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-		mtspr(SPRN_VRSAVE, current->thread.vrsave);
+		mtspr(SPRN_VRSAVE, current->thread.transact_vrsave);
 #endif /* CONFIG_ALTIVEC */
 
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
@@ -829,8 +824,8 @@ static long restore_tm_user_regs(struct pt_regs *regs,
 		 * Restore altivec registers from the stack to a local
 		 * buffer, then write this out to the thread_struct
 		 */
-		if (copy_vsx_from_user(current, &sr->mc_vsregs) ||
-		    copy_transact_vsx_from_user(current, &tm_sr->mc_vsregs))
+		if (copy_vsx_from_user(current, &tm_sr->mc_vsregs) ||
+		    copy_transact_vsx_from_user(current, &sr->mc_vsregs))
 			return 1;
 	} else if (current->thread.used_vsr)
 		for (i = 0; i < 32 ; i++) {
@@ -877,13 +872,14 @@ static long restore_tm_user_regs(struct pt_regs *regs,
 	tm_recheckpoint(&current->thread, msr);
 
 	/* This loads the speculative FP/VEC state, if used */
+	msr_check_and_set(msr & (MSR_FP | MSR_VEC));
 	if (msr & MSR_FP) {
-		do_load_up_transact_fpu(&current->thread);
+		load_fp_state(&current->thread.fp_state);
 		regs->msr |= (MSR_FP | current->thread.fpexc_mode);
 	}
 #ifdef CONFIG_ALTIVEC
 	if (msr & MSR_VEC) {
-		do_load_up_transact_altivec(&current->thread);
+		load_vr_state(&current->thread.vr_state);
 		regs->msr |= MSR_VEC;
 	}
 #endif
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 2552079..9f9fdb5 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -209,7 +209,6 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 	 */
 	regs->msr &= ~MSR_TS_MASK;
 
-	flush_fp_to_thread(current);
 
 #ifdef CONFIG_ALTIVEC
 	err |= __put_user(v_regs, &sc->v_regs);
@@ -217,20 +216,19 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 
 	/* save altivec registers */
 	if (current->thread.used_vr) {
-		flush_altivec_to_thread(current);
 		/* Copy 33 vec registers (vr0..31 and vscr) to the stack */
-		err |= __copy_to_user(v_regs, &current->thread.vr_state,
+		err |= __copy_to_user(v_regs, &current->thread.transact_vr,
 				      33 * sizeof(vector128));
 		/* If VEC was enabled there are transactional VRs valid too,
 		 * else they're a copy of the checkpointed VRs.
 		 */
 		if (msr & MSR_VEC)
 			err |= __copy_to_user(tm_v_regs,
-					      &current->thread.transact_vr,
+					      &current->thread.vr_state,
 					      33 * sizeof(vector128));
 		else
 			err |= __copy_to_user(tm_v_regs,
-					      &current->thread.vr_state,
+					      &current->thread.transact_vr,
 					      33 * sizeof(vector128));
 
 		/* set MSR_VEC in the MSR value in the frame to indicate
@@ -242,13 +240,13 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 	 * use altivec.
 	 */
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-		current->thread.vrsave = mfspr(SPRN_VRSAVE);
-	err |= __put_user(current->thread.vrsave, (u32 __user *)&v_regs[33]);
+		current->thread.transact_vrsave = mfspr(SPRN_VRSAVE);
+	err |= __put_user(current->thread.transact_vrsave, (u32 __user *)&v_regs[33]);
 	if (msr & MSR_VEC)
-		err |= __put_user(current->thread.transact_vrsave,
+		err |= __put_user(current->thread.vrsave,
 				  (u32 __user *)&tm_v_regs[33]);
 	else
-		err |= __put_user(current->thread.vrsave,
+		err |= __put_user(current->thread.transact_vrsave,
 				  (u32 __user *)&tm_v_regs[33]);
 
 #else /* CONFIG_ALTIVEC */
@@ -257,11 +255,11 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 #endif /* CONFIG_ALTIVEC */
 
 	/* copy fpr regs and fpscr */
-	err |= copy_fpr_to_user(&sc->fp_regs, current);
+	err |= copy_transact_fpr_to_user(&sc->fp_regs, current);
 	if (msr & MSR_FP)
-		err |= copy_transact_fpr_to_user(&tm_sc->fp_regs, current);
-	else
 		err |= copy_fpr_to_user(&tm_sc->fp_regs, current);
+	else
+		err |= copy_transact_fpr_to_user(&tm_sc->fp_regs, current);
 
 #ifdef CONFIG_VSX
 	/*
@@ -270,16 +268,15 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 	 * VMX data.
 	 */
 	if (current->thread.used_vsr) {
-		flush_vsx_to_thread(current);
 		v_regs += ELF_NVRREG;
 		tm_v_regs += ELF_NVRREG;
 
-		err |= copy_vsx_to_user(v_regs, current);
+		err |= copy_transact_vsx_to_user(v_regs, current);
 
 		if (msr & MSR_VSX)
-			err |= copy_transact_vsx_to_user(tm_v_regs, current);
-		else
 			err |= copy_vsx_to_user(tm_v_regs, current);
+		else
+			err |= copy_transact_vsx_to_user(tm_v_regs, current);
 
 		/* set MSR_VSX in the MSR value in the frame to
 		 * indicate that sc->vs_reg) contains valid data.
@@ -478,9 +475,9 @@ static long restore_tm_sigcontexts(struct pt_regs *regs,
 		return -EFAULT;
 	/* Copy 33 vec registers (vr0..31 and vscr) from the stack */
 	if (v_regs != NULL && tm_v_regs != NULL && (msr & MSR_VEC) != 0) {
-		err |= __copy_from_user(&current->thread.vr_state, v_regs,
+		err |= __copy_from_user(&current->thread.transact_vr, v_regs,
 					33 * sizeof(vector128));
-		err |= __copy_from_user(&current->thread.transact_vr, tm_v_regs,
+		err |= __copy_from_user(&current->thread.vr_state, tm_v_regs,
 					33 * sizeof(vector128));
 	}
 	else if (current->thread.used_vr) {
@@ -489,9 +486,9 @@ static long restore_tm_sigcontexts(struct pt_regs *regs,
 	}
 	/* Always get VRSAVE back */
 	if (v_regs != NULL && tm_v_regs != NULL) {
-		err |= __get_user(current->thread.vrsave,
-				  (u32 __user *)&v_regs[33]);
 		err |= __get_user(current->thread.transact_vrsave,
+				  (u32 __user *)&v_regs[33]);
+		err |= __get_user(current->thread.vrsave,
 				  (u32 __user *)&tm_v_regs[33]);
 	}
 	else {
@@ -502,8 +499,8 @@ static long restore_tm_sigcontexts(struct pt_regs *regs,
 		mtspr(SPRN_VRSAVE, current->thread.vrsave);
 #endif /* CONFIG_ALTIVEC */
 	/* restore floating point */
-	err |= copy_fpr_from_user(current, &sc->fp_regs);
-	err |= copy_transact_fpr_from_user(current, &tm_sc->fp_regs);
+	err |= copy_fpr_from_user(current, &tm_sc->fp_regs);
+	err |= copy_transact_fpr_from_user(current, &sc->fp_regs);
 #ifdef CONFIG_VSX
 	/*
 	 * Get additional VSX data. Update v_regs to point after the
@@ -513,8 +510,8 @@ static long restore_tm_sigcontexts(struct pt_regs *regs,
 	if (v_regs && ((msr & MSR_VSX) != 0)) {
 		v_regs += ELF_NVRREG;
 		tm_v_regs += ELF_NVRREG;
-		err |= copy_vsx_from_user(current, v_regs);
-		err |= copy_transact_vsx_from_user(current, tm_v_regs);
+		err |= copy_vsx_from_user(current, tm_v_regs);
+		err |= copy_transact_vsx_from_user(current, v_regs);
 	} else {
 		for (i = 0; i < 32 ; i++) {
 			current->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
@@ -528,17 +525,15 @@ static long restore_tm_sigcontexts(struct pt_regs *regs,
 	/* This loads the checkpointed FP/VEC state, if used */
 	tm_recheckpoint(&current->thread, msr);
 
-	/* This loads the speculative FP/VEC state, if used */
+	msr_check_and_set(msr & (MSR_FP | MSR_VEC));
 	if (msr & MSR_FP) {
-		do_load_up_transact_fpu(&current->thread);
+		load_fp_state(&current->thread.fp_state);
 		regs->msr |= (MSR_FP | current->thread.fpexc_mode);
 	}
-#ifdef CONFIG_ALTIVEC
 	if (msr & MSR_VEC) {
-		do_load_up_transact_altivec(&current->thread);
+		load_vr_state(&current->thread.vr_state);
 		regs->msr |= MSR_VEC;
 	}
-#endif
 
 	return err;
 }
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index bf8f34a..3741b47 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -108,6 +108,7 @@ _GLOBAL(tm_reclaim)
 	/* We've a struct pt_regs at [r1+STACK_FRAME_OVERHEAD]. */
 
 	std	r3, STK_PARAM(R3)(r1)
+	std	r4, STK_PARAM(R4)(r1)
 	SAVE_NVGPRS(r1)
 
 	/* We need to setup MSR for VSX register save instructions.  Here we
@@ -132,43 +133,6 @@ _GLOBAL(tm_reclaim)
 	mtmsrd	r15
 	std	r14, TM_FRAME_L0(r1)
 
-	/* Stash the stack pointer away for use after reclaim */
-	std	r1, PACAR1(r13)
-
-	/* ******************** FPR/VR/VSRs ************
-	 * Before reclaiming, capture the current/transactional FPR/VR
-	* versions /if used/.
-	 *
-	 * (If VSX used, FP and VMX are implied.  Or, we don't need to look
-	 * at MSR.VSX as copying FP regs if .FP, vector regs if .VMX covers it.)
-	 *
-	 * We're passed the thread's MSR as parameter 2.
-	 *
-	 * We enabled VEC/FP/VSX in the msr above, so we can execute these
-	 * instructions!
-	 */
-	andis.		r0, r4, MSR_VEC@h
-	beq	dont_backup_vec
-
-	addi	r7, r3, THREAD_TRANSACT_VRSTATE
-	SAVE_32VRS(0, r6, r7)	/* r6 scratch, r7 transact vr state */
-	mfvscr	v0
-	li	r6, VRSTATE_VSCR
-	stvx	v0, r7, r6
-dont_backup_vec:
-	mfspr	r0, SPRN_VRSAVE
-	std	r0, THREAD_TRANSACT_VRSAVE(r3)
-
-	andi.	r0, r4, MSR_FP
-	beq	dont_backup_fp
-
-	addi	r7, r3, THREAD_TRANSACT_FPSTATE
-	SAVE_32FPRS_VSRS(0, R6, R7)	/* r6 scratch, r7 transact fp state */
-
-	mffs    fr0
-	stfd    fr0,FPSTATE_FPSCR(r7)
-
-dont_backup_fp:
 	/* Do sanity check on MSR to make sure we are suspended */
 	li	r7, (MSR_TS_S)@higher
 	srdi	r6, r14, 32
@@ -176,6 +140,10 @@ dont_backup_fp:
 1:	tdeqi   r6, 0
 	EMIT_BUG_ENTRY 1b,__FILE__,__LINE__,0
 
+
+	/* Stash the stack pointer away for use after reclaim */
+	std	r1, PACAR1(r13)
+
 	/* The moment we treclaim, ALL of our GPRs will switch
 	 * to user register state.  (FPRs, CCR etc. also!)
 	 * Use an sprg and a tm_scratch in the PACA to shuffle.
@@ -264,6 +232,43 @@ dont_backup_fp:
 	 * MSR.
 	 */
 
+
+	/* ******************** FPR/VR/VSRs ************
+	 * After reclaiming, capture the checkpointed FPRs/VRs /if used/.
+	 *
+	 * (If VSX used, FP and VMX are implied.  Or, we don't need to look
+	 * at MSR.VSX as copying FP regs if .FP, vector regs if .VMX covers it.)
+	 *
+	 * We're passed the thread's MSR as the second parameter
+	 *
+	 * We enabled VEC/FP/VSX in the msr above, so we can execute these
+	 * instructions!
+	 */
+	ld	r4, STK_PARAM(R4)(r1)		/* Second parameter, MSR * */
+	mr	r3, r12
+	andis.		r0, r4, MSR_VEC@h
+	beq	dont_backup_vec
+
+	addi	r7, r3, THREAD_TRANSACT_VRSTATE
+	SAVE_32VRS(0, r6, r7)	/* r6 scratch, r7 transact vr state */
+	mfvscr	v0
+	li	r6, VRSTATE_VSCR
+	stvx	v0, r7, r6
+dont_backup_vec:
+	mfspr	r0, SPRN_VRSAVE
+	std	r0, THREAD_TRANSACT_VRSAVE(r3)
+
+	andi.	r0, r4, MSR_FP
+	beq	dont_backup_fp
+
+	addi	r7, r3, THREAD_TRANSACT_FPSTATE
+	SAVE_32FPRS_VSRS(0, R6, R7)	/* r6 scratch, r7 transact fp state */
+
+	mffs    fr0
+	stfd    fr0,FPSTATE_FPSCR(r7)
+
+dont_backup_fp:
+
 	/* TM regs, incl TEXASR -- these live in thread_struct.  Note they've
 	 * been updated by the treclaim, to explain to userland the failure
 	 * cause (aborted).
@@ -279,6 +284,7 @@ dont_backup_fp:
 
 	/* Restore original MSR/IRQ state & clear TM mode */
 	ld	r14, TM_FRAME_L0(r1)		/* Orig MSR */
+
 	li	r15, 0
 	rldimi  r14, r15, MSR_TS_LG, (63-MSR_TS_LG)-1
 	mtmsrd  r14
@@ -349,28 +355,29 @@ _GLOBAL(__tm_recheckpoint)
 	mtmsr	r5
 
 #ifdef CONFIG_ALTIVEC
-	/* FP and VEC registers:  These are recheckpointed from thread.fpr[]
-	 * and thread.vr[] respectively.  The thread.transact_fpr[] version
-	 * is more modern, and will be loaded subsequently by any FPUnavailable
-	 * trap.
+	/*
+	 * FP and VEC registers: These are recheckpointed from
+	 * thread.ckfp_state and thread.ckvr_state respectively. The
+	 * thread.fp_state[] version holds the 'live' (transactional)
+	 * and will be loaded subsequently by any FPUnavailable trap.
 	 */
 	andis.	r0, r4, MSR_VEC@h
 	beq	dont_restore_vec
 
-	addi	r8, r3, THREAD_VRSTATE
+	addi	r8, r3, THREAD_TRANSACT_VRSTATE
 	li	r5, VRSTATE_VSCR
 	lvx	v0, r8, r5
 	mtvscr	v0
 	REST_32VRS(0, r5, r8)			/* r5 scratch, r8 ptr */
 dont_restore_vec:
-	ld	r5, THREAD_VRSAVE(r3)
+	ld	r5, THREAD_TRANSACT_VRSAVE(r3)
 	mtspr	SPRN_VRSAVE, r5
 #endif
 
 	andi.	r0, r4, MSR_FP
 	beq	dont_restore_fp
 
-	addi	r8, r3, THREAD_FPSTATE
+	addi	r8, r3, THREAD_TRANSACT_FPSTATE
 	lfd	fr0, FPSTATE_FPSCR(r8)
 	MTFSF_L(fr0)
 	REST_32FPRS_VSRS(0, R4, R8)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 9229ba6..3e4c84d 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1487,7 +1487,8 @@ void fp_unavailable_tm(struct pt_regs *regs)
 
 	/* If VMX is in use, get the transactional values back */
 	if (regs->msr & MSR_VEC) {
-		do_load_up_transact_altivec(&current->thread);
+		msr_check_and_set(MSR_VEC);
+		load_vr_state(&current->thread.vr_state);
 		/* At this point all the VSX state is loaded, so enable it */
 		regs->msr |= MSR_VSX;
 	}
@@ -1508,7 +1509,8 @@ void altivec_unavailable_tm(struct pt_regs *regs)
 	current->thread.used_vr = 1;
 
 	if (regs->msr & MSR_FP) {
-		do_load_up_transact_fpu(&current->thread);
+		msr_check_and_set(MSR_FP);
+		load_fp_state(&current->thread.fp_state);
 		regs->msr |= MSR_VSX;
 	}
 }
@@ -1547,10 +1549,12 @@ void vsx_unavailable_tm(struct pt_regs *regs)
 	 */
 	tm_recheckpoint(&current->thread, regs->msr & ~orig_msr);
 
+	msr_check_and_set(orig_msr & (MSR_FP | MSR_VEC));
+
 	if (orig_msr & MSR_FP)
-		do_load_up_transact_fpu(&current->thread);
+		load_fp_state(&current->thread.fp_state);
 	if (orig_msr & MSR_VEC)
-		do_load_up_transact_altivec(&current->thread);
+		load_vr_state(&current->thread.vr_state);
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 
-- 
2.8.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/5] powerpc: tm: Rename transct_(*) to ck(\1)_state
  2016-06-08  4:00 [PATCH 0/5] Consistent TM structures Cyril Bur
                   ` (2 preceding siblings ...)
  2016-06-08  4:00 ` [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers Cyril Bur
@ 2016-06-08  4:00 ` Cyril Bur
  2016-06-08  4:00 ` [PATCH 5/5] powerpc: Remove do_load_up_transact_{fpu,altivec} Cyril Bur
  4 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-06-08  4:00 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, mikey, Anshuman Khandual

Make the structures being used for checkpointed state named
consistently with the pt_regs/ckpt_regs.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/include/asm/processor.h | 20 +++---------
 arch/powerpc/kernel/asm-offsets.c    | 12 ++++----
 arch/powerpc/kernel/fpu.S            |  2 +-
 arch/powerpc/kernel/process.c        |  4 +--
 arch/powerpc/kernel/signal.h         |  8 ++---
 arch/powerpc/kernel/signal_32.c      | 60 ++++++++++++++++++------------------
 arch/powerpc/kernel/signal_64.c      | 32 +++++++++----------
 arch/powerpc/kernel/tm.S             | 12 ++++----
 arch/powerpc/kernel/vector.S         |  4 +--
 9 files changed, 71 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 009fab1..6fd0f00 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -147,7 +147,7 @@ typedef struct {
 } mm_segment_t;
 
 #define TS_FPR(i) fp_state.fpr[i][TS_FPROFFSET]
-#define TS_TRANS_FPR(i) transact_fp.fpr[i][TS_FPROFFSET]
+#define TS_CKFPR(i) ckfp_state.fpr[i][TS_FPROFFSET]
 
 /* FP and VSX 0-31 register set */
 struct thread_fp_state {
@@ -266,21 +266,9 @@ struct thread_struct {
 	unsigned long	tm_ppr;
 	unsigned long	tm_dscr;
 
-	/*
-	 * Transactional FP and VSX 0-31 register set.
-	 * NOTE: the sense of these is the opposite of the integer ckpt_regs!
-	 *
-	 * When a transaction is active/signalled/scheduled etc., *regs is the
-	 * most recent set of/speculated GPRs with ckpt_regs being the older
-	 * checkpointed regs to which we roll back if transaction aborts.
-	 *
-	 * However, fpr[] is the checkpointed 'base state' of FP regs, and
-	 * transact_fpr[] is the new set of transactional values.
-	 * VRs work the same way.
-	 */
-	struct thread_fp_state transact_fp;
-	struct thread_vr_state transact_vr;
-	unsigned long	transact_vrsave;
+	struct thread_fp_state ckfp_state; /* Checkpointed FP state */
+	struct thread_vr_state ckvr_state; /* Checkpointed VR state */
+	unsigned long	ckvrsave; /* Checkpointed VRSAVE */
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
 #ifdef CONFIG_KVM_BOOK3S_32_HANDLER
 	void*		kvm_shadow_vcpu; /* KVM internal data */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 9ea0955..e67741f 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -152,12 +152,12 @@ int main(void)
 	DEFINE(THREAD_TM_PPR, offsetof(struct thread_struct, tm_ppr));
 	DEFINE(THREAD_TM_DSCR, offsetof(struct thread_struct, tm_dscr));
 	DEFINE(PT_CKPT_REGS, offsetof(struct thread_struct, ckpt_regs));
-	DEFINE(THREAD_TRANSACT_VRSTATE, offsetof(struct thread_struct,
-						 transact_vr));
-	DEFINE(THREAD_TRANSACT_VRSAVE, offsetof(struct thread_struct,
-					    transact_vrsave));
-	DEFINE(THREAD_TRANSACT_FPSTATE, offsetof(struct thread_struct,
-						 transact_fp));
+	DEFINE(THREAD_CKVRSTATE, offsetof(struct thread_struct,
+						 ckvr_state));
+	DEFINE(THREAD_CKVRSAVE, offsetof(struct thread_struct,
+					    ckvrsave));
+	DEFINE(THREAD_CKFPSTATE, offsetof(struct thread_struct,
+						 ckfp_state));
 	/* Local pt_regs on stack for Transactional Memory funcs. */
 	DEFINE(TM_FRAME_SIZE, STACK_FRAME_OVERHEAD +
 	       sizeof(struct pt_regs) + 16);
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 15da2b5..181c187 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -68,7 +68,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	SYNC
 	MTMSRD(r5)
 
-	addi	r7,r3,THREAD_TRANSACT_FPSTATE
+	addi	r7,r3,THREAD_CKFPSTATE
 	lfd	fr0,FPSTATE_FPSCR(r7)
 	MTFSF_L(fr0)
 	REST_32FPVSRS(0, R4, R7)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 696e0236..15462c9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -813,8 +813,8 @@ static inline void tm_reclaim_task(struct task_struct *tsk)
 	 *
 	 * In switching we need to maintain a 2nd register state as
 	 * oldtask->thread.ckpt_regs.  We tm_reclaim(oldproc); this saves the
-	 * checkpointed (tbegin) state in ckpt_regs and saves the transactional
-	 * (current) FPRs into oldtask->thread.transact_fpr[].
+	 * checkpointed (tbegin) state in ckpt_regs, ckfp_state and
+	 * ckvr_state
 	 *
 	 * We also context switch (save) TFHAR/TEXASR/TFIAR in here.
 	 */
diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
index be305c8..b1cebf66 100644
--- a/arch/powerpc/kernel/signal.h
+++ b/arch/powerpc/kernel/signal.h
@@ -23,20 +23,20 @@ extern int handle_rt_signal32(struct ksignal *ksig, sigset_t *oldset,
 
 extern unsigned long copy_fpr_to_user(void __user *to,
 				      struct task_struct *task);
-extern unsigned long copy_transact_fpr_to_user(void __user *to,
+extern unsigned long copy_ckfpr_to_user(void __user *to,
 					       struct task_struct *task);
 extern unsigned long copy_fpr_from_user(struct task_struct *task,
 					void __user *from);
-extern unsigned long copy_transact_fpr_from_user(struct task_struct *task,
+extern unsigned long copy_ckfpr_from_user(struct task_struct *task,
 						 void __user *from);
 #ifdef CONFIG_VSX
 extern unsigned long copy_vsx_to_user(void __user *to,
 				      struct task_struct *task);
-extern unsigned long copy_transact_vsx_to_user(void __user *to,
+extern unsigned long copy_ckvsx_to_user(void __user *to,
 					       struct task_struct *task);
 extern unsigned long copy_vsx_from_user(struct task_struct *task,
 					void __user *from);
-extern unsigned long copy_transact_vsx_from_user(struct task_struct *task,
+extern unsigned long copy_ckvsx_from_user(struct task_struct *task,
 						 void __user *from);
 #endif
 
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index d106b91..c7d905b 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -315,7 +315,7 @@ unsigned long copy_vsx_from_user(struct task_struct *task,
 }
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-unsigned long copy_transact_fpr_to_user(void __user *to,
+unsigned long copy_ckfpr_to_user(void __user *to,
 				  struct task_struct *task)
 {
 	u64 buf[ELF_NFPREG];
@@ -323,12 +323,12 @@ unsigned long copy_transact_fpr_to_user(void __user *to,
 
 	/* save FPR copy to local buffer then write to the thread_struct */
 	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		buf[i] = task->thread.TS_TRANS_FPR(i);
-	buf[i] = task->thread.transact_fp.fpscr;
+		buf[i] = task->thread.TS_CKFPR(i);
+	buf[i] = task->thread.ckfp_state.fpscr;
 	return __copy_to_user(to, buf, ELF_NFPREG * sizeof(double));
 }
 
-unsigned long copy_transact_fpr_from_user(struct task_struct *task,
+unsigned long copy_ckfpr_from_user(struct task_struct *task,
 					  void __user *from)
 {
 	u64 buf[ELF_NFPREG];
@@ -337,13 +337,13 @@ unsigned long copy_transact_fpr_from_user(struct task_struct *task,
 	if (__copy_from_user(buf, from, ELF_NFPREG * sizeof(double)))
 		return 1;
 	for (i = 0; i < (ELF_NFPREG - 1) ; i++)
-		task->thread.TS_TRANS_FPR(i) = buf[i];
-	task->thread.transact_fp.fpscr = buf[i];
+		task->thread.TS_CKFPR(i) = buf[i];
+	task->thread.ckfp_state.fpscr = buf[i];
 
 	return 0;
 }
 
-unsigned long copy_transact_vsx_to_user(void __user *to,
+unsigned long copy_ckvsx_to_user(void __user *to,
 				  struct task_struct *task)
 {
 	u64 buf[ELF_NVSRHALFREG];
@@ -351,11 +351,11 @@ unsigned long copy_transact_vsx_to_user(void __user *to,
 
 	/* save FPR copy to local buffer then write to the thread_struct */
 	for (i = 0; i < ELF_NVSRHALFREG; i++)
-		buf[i] = task->thread.transact_fp.fpr[i][TS_VSRLOWOFFSET];
+		buf[i] = task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET];
 	return __copy_to_user(to, buf, ELF_NVSRHALFREG * sizeof(double));
 }
 
-unsigned long copy_transact_vsx_from_user(struct task_struct *task,
+unsigned long copy_ckvsx_from_user(struct task_struct *task,
 					  void __user *from)
 {
 	u64 buf[ELF_NVSRHALFREG];
@@ -364,7 +364,7 @@ unsigned long copy_transact_vsx_from_user(struct task_struct *task,
 	if (__copy_from_user(buf, from, ELF_NVSRHALFREG * sizeof(double)))
 		return 1;
 	for (i = 0; i < ELF_NVSRHALFREG ; i++)
-		task->thread.transact_fp.fpr[i][TS_VSRLOWOFFSET] = buf[i];
+		task->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i];
 	return 0;
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
@@ -384,17 +384,17 @@ inline unsigned long copy_fpr_from_user(struct task_struct *task,
 }
 
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-inline unsigned long copy_transact_fpr_to_user(void __user *to,
+inline unsigned long copy_ckfpr_to_user(void __user *to,
 					 struct task_struct *task)
 {
-	return __copy_to_user(to, task->thread.transact_fp.fpr,
+	return __copy_to_user(to, task->thread.ckfp_state.fpr,
 			      ELF_NFPREG * sizeof(double));
 }
 
-inline unsigned long copy_transact_fpr_from_user(struct task_struct *task,
+inline unsigned long copy_ckfpr_from_user(struct task_struct *task,
 						 void __user *from)
 {
-	return __copy_from_user(task->thread.transact_fp.fpr, from,
+	return __copy_from_user(task->thread.ckfp_state.fpr, from,
 				ELF_NFPREG * sizeof(double));
 }
 #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
@@ -542,7 +542,7 @@ static int save_tm_user_regs(struct pt_regs *regs,
 #ifdef CONFIG_ALTIVEC
 	/* save altivec registers */
 	if (current->thread.used_vr) {
-		if (__copy_to_user(&frame->mc_vregs, &current->thread.transact_vr,
+		if (__copy_to_user(&frame->mc_vregs, &current->thread.ckvr_state,
 				   ELF_NVRREG * sizeof(vector128)))
 			return 1;
 		if (msr & MSR_VEC) {
@@ -552,7 +552,7 @@ static int save_tm_user_regs(struct pt_regs *regs,
 				return 1;
 		} else {
 			if (__copy_to_user(&tm_frame->mc_vregs,
-					   &current->thread.transact_vr,
+					   &current->thread.ckvr_state,
 					   ELF_NVRREG * sizeof(vector128)))
 				return 1;
 		}
@@ -569,8 +569,8 @@ static int save_tm_user_regs(struct pt_regs *regs,
 	 * most significant bits of that same vector. --BenH
 	 */
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-		current->thread.transact_vrsave = mfspr(SPRN_VRSAVE);
-	if (__put_user(current->thread.transact_vrsave,
+		current->thread.ckvrsave = mfspr(SPRN_VRSAVE);
+	if (__put_user(current->thread.ckvrsave,
 		       (u32 __user *)&frame->mc_vregs[32]))
 		return 1;
 	if (msr & MSR_VEC) {
@@ -578,19 +578,19 @@ static int save_tm_user_regs(struct pt_regs *regs,
 			       (u32 __user *)&tm_frame->mc_vregs[32]))
 			return 1;
 	} else {
-		if (__put_user(current->thread.transact_vrsave,
+		if (__put_user(current->thread.ckvrsave,
 			       (u32 __user *)&tm_frame->mc_vregs[32]))
 			return 1;
 	}
 #endif /* CONFIG_ALTIVEC */
 
-	if (copy_transact_fpr_to_user(&frame->mc_fregs, current))
+	if (copy_ckfpr_to_user(&frame->mc_fregs, current))
 		return 1;
 	if (msr & MSR_FP) {
 		if (copy_fpr_to_user(&tm_frame->mc_fregs, current))
 			return 1;
 	} else {
-		if (copy_transact_fpr_to_user(&tm_frame->mc_fregs, current))
+		if (copy_ckfpr_to_user(&tm_frame->mc_fregs, current))
 			return 1;
 	}
 
@@ -602,14 +602,14 @@ static int save_tm_user_regs(struct pt_regs *regs,
 	 * contains valid data
 	 */
 	if (current->thread.used_vsr) {
-		if (copy_transact_vsx_to_user(&frame->mc_vsregs, current))
+		if (copy_ckvsx_to_user(&frame->mc_vsregs, current))
 			return 1;
 		if (msr & MSR_VSX) {
 			if (copy_vsx_to_user(&tm_frame->mc_vsregs,
 						      current))
 				return 1;
 		} else {
-			if (copy_transact_vsx_to_user(&tm_frame->mc_vsregs, current))
+			if (copy_ckvsx_to_user(&tm_frame->mc_vsregs, current))
 				return 1;
 		}
 
@@ -788,7 +788,7 @@ static long restore_tm_user_regs(struct pt_regs *regs,
 	regs->msr &= ~MSR_VEC;
 	if (msr & MSR_VEC) {
 		/* restore altivec registers from the stack */
-		if (__copy_from_user(&current->thread.transact_vr, &sr->mc_vregs,
+		if (__copy_from_user(&current->thread.ckvr_state, &sr->mc_vregs,
 				     sizeof(sr->mc_vregs)) ||
 		    __copy_from_user(&current->thread.vr_state,
 				     &tm_sr->mc_vregs,
@@ -797,24 +797,24 @@ static long restore_tm_user_regs(struct pt_regs *regs,
 	} else if (current->thread.used_vr) {
 		memset(&current->thread.vr_state, 0,
 		       ELF_NVRREG * sizeof(vector128));
-		memset(&current->thread.transact_vr, 0,
+		memset(&current->thread.ckvr_state, 0,
 		       ELF_NVRREG * sizeof(vector128));
 	}
 
 	/* Always get VRSAVE back */
-	if (__get_user(current->thread.transact_vrsave,
+	if (__get_user(current->thread.ckvrsave,
 		       (u32 __user *)&sr->mc_vregs[32]) ||
 	    __get_user(current->thread.vrsave,
 		       (u32 __user *)&tm_sr->mc_vregs[32]))
 		return 1;
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-		mtspr(SPRN_VRSAVE, current->thread.transact_vrsave);
+		mtspr(SPRN_VRSAVE, current->thread.ckvrsave);
 #endif /* CONFIG_ALTIVEC */
 
 	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1);
 
 	if (copy_fpr_from_user(current, &sr->mc_fregs) ||
-	    copy_transact_fpr_from_user(current, &tm_sr->mc_fregs))
+	    copy_ckfpr_from_user(current, &tm_sr->mc_fregs))
 		return 1;
 
 #ifdef CONFIG_VSX
@@ -825,12 +825,12 @@ static long restore_tm_user_regs(struct pt_regs *regs,
 		 * buffer, then write this out to the thread_struct
 		 */
 		if (copy_vsx_from_user(current, &tm_sr->mc_vsregs) ||
-		    copy_transact_vsx_from_user(current, &sr->mc_vsregs))
+		    copy_ckvsx_from_user(current, &sr->mc_vsregs))
 			return 1;
 	} else if (current->thread.used_vsr)
 		for (i = 0; i < 32 ; i++) {
 			current->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
-			current->thread.transact_fp.fpr[i][TS_VSRLOWOFFSET] = 0;
+			current->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
 		}
 #endif /* CONFIG_VSX */
 
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 9f9fdb5..bb26aeb 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -217,7 +217,7 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 	/* save altivec registers */
 	if (current->thread.used_vr) {
 		/* Copy 33 vec registers (vr0..31 and vscr) to the stack */
-		err |= __copy_to_user(v_regs, &current->thread.transact_vr,
+		err |= __copy_to_user(v_regs, &current->thread.ckvr_state,
 				      33 * sizeof(vector128));
 		/* If VEC was enabled there are transactional VRs valid too,
 		 * else they're a copy of the checkpointed VRs.
@@ -228,7 +228,7 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 					      33 * sizeof(vector128));
 		else
 			err |= __copy_to_user(tm_v_regs,
-					      &current->thread.transact_vr,
+					      &current->thread.ckvr_state,
 					      33 * sizeof(vector128));
 
 		/* set MSR_VEC in the MSR value in the frame to indicate
@@ -240,13 +240,13 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 	 * use altivec.
 	 */
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
-		current->thread.transact_vrsave = mfspr(SPRN_VRSAVE);
-	err |= __put_user(current->thread.transact_vrsave, (u32 __user *)&v_regs[33]);
+		current->thread.ckvrsave = mfspr(SPRN_VRSAVE);
+	err |= __put_user(current->thread.ckvrsave, (u32 __user *)&v_regs[33]);
 	if (msr & MSR_VEC)
 		err |= __put_user(current->thread.vrsave,
 				  (u32 __user *)&tm_v_regs[33]);
 	else
-		err |= __put_user(current->thread.transact_vrsave,
+		err |= __put_user(current->thread.ckvrsave,
 				  (u32 __user *)&tm_v_regs[33]);
 
 #else /* CONFIG_ALTIVEC */
@@ -255,11 +255,11 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 #endif /* CONFIG_ALTIVEC */
 
 	/* copy fpr regs and fpscr */
-	err |= copy_transact_fpr_to_user(&sc->fp_regs, current);
+	err |= copy_ckfpr_to_user(&sc->fp_regs, current);
 	if (msr & MSR_FP)
 		err |= copy_fpr_to_user(&tm_sc->fp_regs, current);
 	else
-		err |= copy_transact_fpr_to_user(&tm_sc->fp_regs, current);
+		err |= copy_ckfpr_to_user(&tm_sc->fp_regs, current);
 
 #ifdef CONFIG_VSX
 	/*
@@ -271,12 +271,12 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
 		v_regs += ELF_NVRREG;
 		tm_v_regs += ELF_NVRREG;
 
-		err |= copy_transact_vsx_to_user(v_regs, current);
+		err |= copy_ckvsx_to_user(v_regs, current);
 
 		if (msr & MSR_VSX)
 			err |= copy_vsx_to_user(tm_v_regs, current);
 		else
-			err |= copy_transact_vsx_to_user(tm_v_regs, current);
+			err |= copy_ckvsx_to_user(tm_v_regs, current);
 
 		/* set MSR_VSX in the MSR value in the frame to
 		 * indicate that sc->vs_reg) contains valid data.
@@ -475,32 +475,32 @@ static long restore_tm_sigcontexts(struct pt_regs *regs,
 		return -EFAULT;
 	/* Copy 33 vec registers (vr0..31 and vscr) from the stack */
 	if (v_regs != NULL && tm_v_regs != NULL && (msr & MSR_VEC) != 0) {
-		err |= __copy_from_user(&current->thread.transact_vr, v_regs,
+		err |= __copy_from_user(&current->thread.ckvr_state, v_regs,
 					33 * sizeof(vector128));
 		err |= __copy_from_user(&current->thread.vr_state, tm_v_regs,
 					33 * sizeof(vector128));
 	}
 	else if (current->thread.used_vr) {
 		memset(&current->thread.vr_state, 0, 33 * sizeof(vector128));
-		memset(&current->thread.transact_vr, 0, 33 * sizeof(vector128));
+		memset(&current->thread.ckvr_state, 0, 33 * sizeof(vector128));
 	}
 	/* Always get VRSAVE back */
 	if (v_regs != NULL && tm_v_regs != NULL) {
-		err |= __get_user(current->thread.transact_vrsave,
+		err |= __get_user(current->thread.ckvrsave,
 				  (u32 __user *)&v_regs[33]);
 		err |= __get_user(current->thread.vrsave,
 				  (u32 __user *)&tm_v_regs[33]);
 	}
 	else {
 		current->thread.vrsave = 0;
-		current->thread.transact_vrsave = 0;
+		current->thread.ckvrsave = 0;
 	}
 	if (cpu_has_feature(CPU_FTR_ALTIVEC))
 		mtspr(SPRN_VRSAVE, current->thread.vrsave);
 #endif /* CONFIG_ALTIVEC */
 	/* restore floating point */
 	err |= copy_fpr_from_user(current, &tm_sc->fp_regs);
-	err |= copy_transact_fpr_from_user(current, &sc->fp_regs);
+	err |= copy_ckfpr_from_user(current, &sc->fp_regs);
 #ifdef CONFIG_VSX
 	/*
 	 * Get additional VSX data. Update v_regs to point after the
@@ -511,11 +511,11 @@ static long restore_tm_sigcontexts(struct pt_regs *regs,
 		v_regs += ELF_NVRREG;
 		tm_v_regs += ELF_NVRREG;
 		err |= copy_vsx_from_user(current, tm_v_regs);
-		err |= copy_transact_vsx_from_user(current, v_regs);
+		err |= copy_ckvsx_from_user(current, v_regs);
 	} else {
 		for (i = 0; i < 32 ; i++) {
 			current->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
-			current->thread.transact_fp.fpr[i][TS_VSRLOWOFFSET] = 0;
+			current->thread.ckfp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
 		}
 	}
 #endif
diff --git a/arch/powerpc/kernel/tm.S b/arch/powerpc/kernel/tm.S
index 3741b47..409b2c8 100644
--- a/arch/powerpc/kernel/tm.S
+++ b/arch/powerpc/kernel/tm.S
@@ -249,19 +249,19 @@ _GLOBAL(tm_reclaim)
 	andis.		r0, r4, MSR_VEC@h
 	beq	dont_backup_vec
 
-	addi	r7, r3, THREAD_TRANSACT_VRSTATE
+	addi	r7, r3, THREAD_CKVRSTATE
 	SAVE_32VRS(0, r6, r7)	/* r6 scratch, r7 transact vr state */
 	mfvscr	v0
 	li	r6, VRSTATE_VSCR
 	stvx	v0, r7, r6
 dont_backup_vec:
 	mfspr	r0, SPRN_VRSAVE
-	std	r0, THREAD_TRANSACT_VRSAVE(r3)
+	std	r0, THREAD_CKVRSAVE(r3)
 
 	andi.	r0, r4, MSR_FP
 	beq	dont_backup_fp
 
-	addi	r7, r3, THREAD_TRANSACT_FPSTATE
+	addi	r7, r3, THREAD_CKFPSTATE
 	SAVE_32FPRS_VSRS(0, R6, R7)	/* r6 scratch, r7 transact fp state */
 
 	mffs    fr0
@@ -364,20 +364,20 @@ _GLOBAL(__tm_recheckpoint)
 	andis.	r0, r4, MSR_VEC@h
 	beq	dont_restore_vec
 
-	addi	r8, r3, THREAD_TRANSACT_VRSTATE
+	addi	r8, r3, THREAD_CKVRSTATE
 	li	r5, VRSTATE_VSCR
 	lvx	v0, r8, r5
 	mtvscr	v0
 	REST_32VRS(0, r5, r8)			/* r5 scratch, r8 ptr */
 dont_restore_vec:
-	ld	r5, THREAD_TRANSACT_VRSAVE(r3)
+	ld	r5, THREAD_CKVRSAVE(r3)
 	mtspr	SPRN_VRSAVE, r5
 #endif
 
 	andi.	r0, r4, MSR_FP
 	beq	dont_restore_fp
 
-	addi	r8, r3, THREAD_TRANSACT_FPSTATE
+	addi	r8, r3, THREAD_CKFPSTATE
 	lfd	fr0, FPSTATE_FPSCR(r8)
 	MTFSF_L(fr0)
 	REST_32FPRS_VSRS(0, R4, R8)
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index 1c2e7a3..b5d5025 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -23,10 +23,10 @@ _GLOBAL(do_load_up_transact_altivec)
 	li	r4,1
 	stw	r4,THREAD_USED_VR(r3)
 
-	li	r10,THREAD_TRANSACT_VRSTATE+VRSTATE_VSCR
+	li	r10,THREAD_CKVRSTATE+VRSTATE_VSCR
 	lvx	v0,r10,r3
 	mtvscr	v0
-	addi	r10,r3,THREAD_TRANSACT_VRSTATE
+	addi	r10,r3,THREAD_CKVRSTATE
 	REST_32VRS(0,r4,r10)
 
 	blr
-- 
2.8.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/5] powerpc: Remove do_load_up_transact_{fpu,altivec}
  2016-06-08  4:00 [PATCH 0/5] Consistent TM structures Cyril Bur
                   ` (3 preceding siblings ...)
  2016-06-08  4:00 ` [PATCH 4/5] powerpc: tm: Rename transct_(*) to ck(\1)_state Cyril Bur
@ 2016-06-08  4:00 ` Cyril Bur
  4 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-06-08  4:00 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, mikey, Anshuman Khandual

Previous rework of TM code leaves these functions unused

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
---
 arch/powerpc/include/asm/tm.h |  5 -----
 arch/powerpc/kernel/fpu.S     | 26 --------------------------
 arch/powerpc/kernel/vector.S  | 25 -------------------------
 3 files changed, 56 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index c22d704..82e06ca 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -9,11 +9,6 @@
 
 #ifndef __ASSEMBLY__
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-extern void do_load_up_transact_fpu(struct thread_struct *thread);
-extern void do_load_up_transact_altivec(struct thread_struct *thread);
-#endif
-
 extern void tm_enable(void);
 extern void tm_reclaim(struct thread_struct *thread,
 		       unsigned long orig_msr, uint8_t cause);
diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index 181c187..08d14b0 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -50,32 +50,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX);					\
 #define REST_32FPVSRS(n,c,base) __REST_32FPVSRS(n,__REG_##c,__REG_##base)
 #define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base)
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-/* void do_load_up_transact_fpu(struct thread_struct *thread)
- *
- * This is similar to load_up_fpu but for the transactional version of the FP
- * register set.  It doesn't mess with the task MSR or valid flags.
- * Furthermore, we don't do lazy FP with TM currently.
- */
-_GLOBAL(do_load_up_transact_fpu)
-	mfmsr	r6
-	ori	r5,r6,MSR_FP
-#ifdef CONFIG_VSX
-BEGIN_FTR_SECTION
-	oris	r5,r5,MSR_VSX@h
-END_FTR_SECTION_IFSET(CPU_FTR_VSX)
-#endif
-	SYNC
-	MTMSRD(r5)
-
-	addi	r7,r3,THREAD_CKFPSTATE
-	lfd	fr0,FPSTATE_FPSCR(r7)
-	MTFSF_L(fr0)
-	REST_32FPVSRS(0, R4, R7)
-
-	blr
-#endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
-
 /*
  * Load state from memory into FP registers including FPSCR.
  * Assumes the caller has enabled FP in the MSR.
diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
index b5d5025..84b19ab 100644
--- a/arch/powerpc/kernel/vector.S
+++ b/arch/powerpc/kernel/vector.S
@@ -7,31 +7,6 @@
 #include <asm/page.h>
 #include <asm/ptrace.h>
 
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
-/* void do_load_up_transact_altivec(struct thread_struct *thread)
- *
- * This is similar to load_up_altivec but for the transactional version of the
- * vector regs.  It doesn't mess with the task MSR or valid flags.
- * Furthermore, VEC laziness is not supported with TM currently.
- */
-_GLOBAL(do_load_up_transact_altivec)
-	mfmsr	r6
-	oris	r5,r6,MSR_VEC@h
-	MTMSRD(r5)
-	isync
-
-	li	r4,1
-	stw	r4,THREAD_USED_VR(r3)
-
-	li	r10,THREAD_CKVRSTATE+VRSTATE_VSCR
-	lvx	v0,r10,r3
-	mtvscr	v0
-	addi	r10,r3,THREAD_CKVRSTATE
-	REST_32VRS(0,r4,r10)
-
-	blr
-#endif
-
 /*
  * Load state from memory into VMX registers including VSCR.
  * Assumes the caller has enabled VMX in the MSR.
-- 
2.8.3

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption
  2016-06-08  4:00 ` [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption Cyril Bur
@ 2016-06-09  1:35   ` Daniel Axtens
  2016-06-09  3:52     ` Michael Ellerman
  2016-06-10  6:10     ` Cyril Bur
  0 siblings, 2 replies; 16+ messages in thread
From: Daniel Axtens @ 2016-06-09  1:35 UTC (permalink / raw)
  To: Cyril Bur, mpe; +Cc: linuxppc-dev, mikey, Anshuman Khandual

[-- Attachment #1: Type: text/plain, Size: 4140 bytes --]

Yay for tests!

I have a few minor nits, and one more major one (rc == 2 below).

> +/*
> + * Copyright 2015, Cyril Bur, IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
I realise this is well past a lost cause by now, but isn't the idea to
be version 2, not version 2 or later?

> +
> +#include "../basic_asm.h"
> +#include "../vsx_asm.h"
> +

Some of your other functions start with a comment. That would be super
helpful here - I'm still not super comfortable I understand the calling
convention. 
> +FUNC_START(check_vsx)
> +	PUSH_BASIC_STACK(32)
> +	std	r3,STACK_FRAME_PARAM(0)(sp)
> +	addi r3, r3, 16 * 12 #Second half of array
> +	bl store_vsx
> +	ld r3,STACK_FRAME_PARAM(0)(sp)
> +	bl vsx_memcmp
> +	POP_BASIC_STACK(32)
> +	blr
> +FUNC_END(check_vsx)
> +



> +long vsx_memcmp(vector int *a) {
> +	vector int zero = {0,0,0,0};
> +	int i;
> +
> +	FAIL_IF(a != varray);
> +
> +	for(i = 0; i < 12; i++) {
> +		if (memcmp(&a[i + 12], &zero, 16) == 0) {
> +			fprintf(stderr, "Detected zero from the VSX reg %d\n", i + 12);
> +			return 1;
> +		}
> +	}
> +
> +	if (memcmp(a, &a[12], 12 * 16)) {
I'm somewhat confused as to how this comparison works. You're comparing
the new saved ones to the old saved ones, yes?
> +		long *p = (long *)a;
> +		fprintf(stderr, "VSX mismatch\n");
> +		for (i = 0; i < 24; i=i+2)
> +			fprintf(stderr, "%d: 0x%08lx%08lx | 0x%08lx%08lx\n",
> +					i/2 + i%2 + 20, p[i], p[i + 1], p[i + 24], p[i + 25]);
> +		return 1;
> +	}
> +	return 0;
> +}
> +
> +void *preempt_vsx_c(void *p)
> +{
> +	int i, j;
> +	long rc;
> +	srand(pthread_self());
> +	for (i = 0; i < 12; i++)
> +		for (j = 0; j < 4; j++) {
> +			varray[i][j] = rand();
> +			/* Don't want zero because it hides kernel problems */
> +			if (varray[i][j] == 0)
> +				j--;
> +		}
> +	rc = preempt_vsx(varray, &threads_starting, &running);
> +	if (rc == 2)
How would rc == 2? AIUI, preempt_vsx returns the value of check_vsx,
which in turn returns the value of vsx_memcmp, which returns 1 or 0.

> +		fprintf(stderr, "Caught zeros in VSX compares\n");
Isn't it zeros or a mismatched value?
> +	return (void *)rc;
> +}
> +
> +int test_preempt_vsx(void)
> +{
> +	int i, rc, threads;
> +	pthread_t *tids;
> +
> +	threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
> +	tids = malloc(threads * sizeof(pthread_t));
> +	FAIL_IF(!tids);
> +
> +	running = true;
> +	threads_starting = threads;
> +	for (i = 0; i < threads; i++) {
> +		rc = pthread_create(&tids[i], NULL, preempt_vsx_c, NULL);
> +		FAIL_IF(rc);
> +	}
> +
> +	setbuf(stdout, NULL);
> +	/* Not really nessesary but nice to wait for every thread to start */
> +	printf("\tWaiting for %d workers to start...", threads_starting);
> +	while(threads_starting)
> +		asm volatile("": : :"memory");
I think __sync_synchronise() might be ... more idiomatic or something?
Not super fussy.

> +	printf("done\n");
> +
> +	printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
> +	sleep(PREEMPT_TIME);
> +	printf("done\n");
> +
> +	printf("\tStopping workers...");
> +	/*
> +	 * Working are checking this value every loop. In preempt_vsx 'cmpwi r5,0; bne 2b'.
> +	 * r5 will have loaded the value of running.
> +	 */
> +	running = 0;
Do you need some sort of synchronisation here? You're assuming it
eventually gets to the threads, which is of course true, but maybe it
would be a good idea to synchronise it more explicitly? Again, not super
fussy.
> +	for (i = 0; i < threads; i++) {
> +		void *rc_p;
> +		pthread_join(tids[i], &rc_p);
> +
> +		/*
> +		 * Harness will say the fail was here, look at why preempt_vsx
> +		 * returned
> +		 */
> +		if ((long) rc_p)
> +			printf("oops\n");
> +		FAIL_IF((long) rc_p);
> +	}
> +	printf("done\n");
> +
> +	return 0;
> +}
> +
Regards,
Daniel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption
  2016-06-09  1:35   ` Daniel Axtens
@ 2016-06-09  3:52     ` Michael Ellerman
  2016-06-10  6:10     ` Cyril Bur
  1 sibling, 0 replies; 16+ messages in thread
From: Michael Ellerman @ 2016-06-09  3:52 UTC (permalink / raw)
  To: Daniel Axtens, Cyril Bur; +Cc: linuxppc-dev, mikey, Anshuman Khandual

On Thu, 2016-06-09 at 11:35 +1000, Daniel Axtens wrote:
> > +/*
> > + * Copyright 2015, Cyril Bur, IBM Corp.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */
> I realise this is well past a lost cause by now, but isn't the idea to
> be version 2, not version 2 or later?

No.

I asked the powers that be and apparently for new code we're supposed to use v2
or later.

cheers

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] selftests/powerpc: Add test to check TM ucontext creation
  2016-06-08  4:00 ` [PATCH 2/5] selftests/powerpc: Add test to check TM ucontext creation Cyril Bur
@ 2016-06-09  5:12   ` Daniel Axtens
  2016-06-10  5:55     ` Cyril Bur
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Axtens @ 2016-06-09  5:12 UTC (permalink / raw)
  To: Cyril Bur, mpe; +Cc: linuxppc-dev, mikey, Anshuman Khandual

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

I'm trying not to be too nit picky or difficult on tests, so here's a
pre-written commit message for you:

"The kernel sets up two sets of ucontexts if the signal was to be
delivered while the thread was in a transaction. Expected behaviour is
that the currently executing code is in the first and the checkpointed
state (the state that will be rolled back to) is in the uc_link
ucontext.

The reason for this is that:

 - code which is not TM aware and installs a signal handler will expect
   to see/modify its currently running state in the uc.

 - but, that TM-unaware code may have dynamicially linked against code
   which is TM aware and is doing HTM under the hood, so the
   checkpointed state needs to be made available somewhere

Test if the live and checkpointed state is made stored correctly.
Test:
 - GPRs
 - FP registers
 - VMX
 - VSX
"

> +#define TBEGIN .long 0x7C00051D
> +#define TSUSPEND .long 0x7C0005DD
> +#define TRESUME .long 0x7C2005DD
You define these 3 opcodes in a number of files. I assume you're going
to consolidate them in v2?

> + * The kernel sets up two sets of ucontexts if the signal was to be delivered
> + * while the thread was in a transaction. Expected behaviour is that the
> + * currently executing code is in the first and the checkpointed state (the
> + * state that will be rolled back to) is in the uc_link ucontext.
> + *
> + * The reason for this is that code which is not TM aware and installs a signal
> + * handler will expect to see/modify its currently running state in the uc,
> + * this code may have dynamicially linked against code which is TM aware and is
> + * doing HTM under the hood.

I had real trouble parsing this sentence the first few times. I think
it's missing a while:

The reason for this is that _while_ code which is not TM aware...

(Although it would be better in several sentences :P)

> +++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
> @@ -0,0 +1,96 @@
> +/*
> + * Copyright 2016, Cyril Bur, IBM Corp.
> + * Licensed under GPLv2.
Ironically, it seems this now needs to be GPLv2+, probably with the
regular license grant paragraph.

> +	/* Always be 64bit, don't really care about 32bit */
Forgive my ignorance of the test suite: are we guaranteed this by the
build system, or should we add a SKIP_IF() for it?
> +	for (i = 0; i < NV_GPR_REGS && !fail; i++) {
> +		fail = (ucp->uc_mcontext.gp_regs[i + 14] != gps[i]);
> +		fail |= (tm_ucp->uc_mcontext.gp_regs[i + 14] != gps[i + NV_GPR_REGS]);
> +	}
> +	if (fail)
> +		printf("Failed on %d GPR %lu or %lu\n", i - 1,
> +				ucp->uc_mcontext.gp_regs[i + 13], tm_ucp->uc_mcontext.gp_regs[i + 13]);
> +}
> +

Looking good otherwise!

Regards,
Daniel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 859 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] selftests/powerpc: Add test to check TM ucontext creation
  2016-06-09  5:12   ` Daniel Axtens
@ 2016-06-10  5:55     ` Cyril Bur
  0 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-06-10  5:55 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev

On Thu, 09 Jun 2016 15:12:51 +1000
Daniel Axtens <dja@axtens.net> wrote:

As stated in the cover-letter, this patch needs the most work and quite a lot
too.

Turns out the comment you took is wrong (the code actually does the opposite),
once again, just sent the series to get eyes on the actual patch.

> I'm trying not to be too nit picky or difficult on tests, so here's a
> pre-written commit message for you:
> 
> "The kernel sets up two sets of ucontexts if the signal was to be
> delivered while the thread was in a transaction. Expected behaviour is
> that the currently executing code is in the first and the checkpointed
> state (the state that will be rolled back to) is in the uc_link
> ucontext.
> 
> The reason for this is that:
> 
>  - code which is not TM aware and installs a signal handler will expect
>    to see/modify its currently running state in the uc.
> 
>  - but, that TM-unaware code may have dynamicially linked against code
>    which is TM aware and is doing HTM under the hood, so the
>    checkpointed state needs to be made available somewhere
> 
> Test if the live and checkpointed state is made stored correctly.
> Test:
>  - GPRs
>  - FP registers
>  - VMX
>  - VSX
> "
> 
> > +#define TBEGIN .long 0x7C00051D
> > +#define TSUSPEND .long 0x7C0005DD
> > +#define TRESUME .long 0x7C2005DD  
> You define these 3 opcodes in a number of files. I assume you're going
> to consolidate them in v2?
> 
> > + * The kernel sets up two sets of ucontexts if the signal was to be delivered
> > + * while the thread was in a transaction. Expected behaviour is that the
> > + * currently executing code is in the first and the checkpointed state (the
> > + * state that will be rolled back to) is in the uc_link ucontext.
> > + *
> > + * The reason for this is that code which is not TM aware and installs a signal
> > + * handler will expect to see/modify its currently running state in the uc,
> > + * this code may have dynamicially linked against code which is TM aware and is
> > + * doing HTM under the hood.  
> 
> I had real trouble parsing this sentence the first few times. I think
> it's missing a while:
> 
> The reason for this is that _while_ code which is not TM aware...
> 
> (Although it would be better in several sentences :P)
> 
> > +++ b/tools/testing/selftests/powerpc/tm/tm-signal-context-chk-gpr.c
> > @@ -0,0 +1,96 @@
> > +/*
> > + * Copyright 2016, Cyril Bur, IBM Corp.
> > + * Licensed under GPLv2.  
> Ironically, it seems this now needs to be GPLv2+, probably with the
> regular license grant paragraph.
> 
> > +	/* Always be 64bit, don't really care about 32bit */  
> Forgive my ignorance of the test suite: are we guaranteed this by the
> build system, or should we add a SKIP_IF() for it?
> > +	for (i = 0; i < NV_GPR_REGS && !fail; i++) {
> > +		fail = (ucp->uc_mcontext.gp_regs[i + 14] != gps[i]);
> > +		fail |= (tm_ucp->uc_mcontext.gp_regs[i + 14] != gps[i + NV_GPR_REGS]);
> > +	}
> > +	if (fail)
> > +		printf("Failed on %d GPR %lu or %lu\n", i - 1,
> > +				ucp->uc_mcontext.gp_regs[i + 13], tm_ucp->uc_mcontext.gp_regs[i + 13]);
> > +}
> > +  
> 
> Looking good otherwise!
> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption
  2016-06-09  1:35   ` Daniel Axtens
  2016-06-09  3:52     ` Michael Ellerman
@ 2016-06-10  6:10     ` Cyril Bur
  1 sibling, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-06-10  6:10 UTC (permalink / raw)
  To: Daniel Axtens; +Cc: linuxppc-dev

On Thu, 09 Jun 2016 11:35:55 +1000
Daniel Axtens <dja@axtens.net> wrote:

> Yay for tests!
> 
> I have a few minor nits, and one more major one (rc == 2 below).
> 
> > +/*
> > + * Copyright 2015, Cyril Bur, IBM Corp.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version
> > + * 2 of the License, or (at your option) any later version.
> > + */  
> I realise this is well past a lost cause by now, but isn't the idea to
> be version 2, not version 2 or later?
> 
> > +
> > +#include "../basic_asm.h"
> > +#include "../vsx_asm.h"
> > +  
> 
> Some of your other functions start with a comment. That would be super
> helpful here - I'm still not super comfortable I understand the calling
> convention. 
> > +FUNC_START(check_vsx)
> > +	PUSH_BASIC_STACK(32)
> > +	std	r3,STACK_FRAME_PARAM(0)(sp)
> > +	addi r3, r3, 16 * 12 #Second half of array
> > +	bl store_vsx
> > +	ld r3,STACK_FRAME_PARAM(0)(sp)
> > +	bl vsx_memcmp
> > +	POP_BASIC_STACK(32)
> > +	blr
> > +FUNC_END(check_vsx)
> > +  
> 
> 
> 
> > +long vsx_memcmp(vector int *a) {
> > +	vector int zero = {0,0,0,0};
> > +	int i;
> > +
> > +	FAIL_IF(a != varray);
> > +
> > +	for(i = 0; i < 12; i++) {
> > +		if (memcmp(&a[i + 12], &zero, 16) == 0) {
> > +			fprintf(stderr, "Detected zero from the VSX reg %d\n", i + 12);
> > +			return 1;
> > +		}
> > +	}
> > +
> > +	if (memcmp(a, &a[12], 12 * 16)) {  
> I'm somewhat confused as to how this comparison works. You're comparing
> the new saved ones to the old saved ones, yes?

check_vmx() has put the live registers on the end of the array... so the first
12 in 'a' are the known values and the next 12 are the live values... they
should match.

> > +		long *p = (long *)a;
> > +		fprintf(stderr, "VSX mismatch\n");
> > +		for (i = 0; i < 24; i=i+2)
> > +			fprintf(stderr, "%d: 0x%08lx%08lx | 0x%08lx%08lx\n",
> > +					i/2 + i%2 + 20, p[i], p[i + 1], p[i + 24], p[i + 25]);
> > +		return 1;
> > +	}
> > +	return 0;
> > +}
> > +
> > +void *preempt_vsx_c(void *p)
> > +{
> > +	int i, j;
> > +	long rc;
> > +	srand(pthread_self());
> > +	for (i = 0; i < 12; i++)
> > +		for (j = 0; j < 4; j++) {
> > +			varray[i][j] = rand();
> > +			/* Don't want zero because it hides kernel problems */
> > +			if (varray[i][j] == 0)
> > +				j--;
> > +		}
> > +	rc = preempt_vsx(varray, &threads_starting, &running);
> > +	if (rc == 2)  
> How would rc == 2? AIUI, preempt_vsx returns the value of check_vsx,
> which in turn returns the value of vsx_memcmp, which returns 1 or 0.
> 
> > +		fprintf(stderr, "Caught zeros in VSX compares\n");  
> Isn't it zeros or a mismatched value?

I think that patch went through too many iterations and no enough cleanups.
Fixed

> > +	return (void *)rc;
> > +}
> > +
> > +int test_preempt_vsx(void)
> > +{
> > +	int i, rc, threads;
> > +	pthread_t *tids;
> > +
> > +	threads = sysconf(_SC_NPROCESSORS_ONLN) * THREAD_FACTOR;
> > +	tids = malloc(threads * sizeof(pthread_t));
> > +	FAIL_IF(!tids);
> > +
> > +	running = true;
> > +	threads_starting = threads;
> > +	for (i = 0; i < threads; i++) {
> > +		rc = pthread_create(&tids[i], NULL, preempt_vsx_c, NULL);
> > +		FAIL_IF(rc);
> > +	}
> > +
> > +	setbuf(stdout, NULL);
> > +	/* Not really nessesary but nice to wait for every thread to start */
> > +	printf("\tWaiting for %d workers to start...", threads_starting);
> > +	while(threads_starting)
> > +		asm volatile("": : :"memory");  
> I think __sync_synchronise() might be ... more idiomatic or something?
> Not super fussy.
> 

Best to be consistent with how all the other powerpc/math tests do it, which
was initially an MPE recommendation.

> > +	printf("done\n");
> > +
> > +	printf("\tWaiting for %d seconds to let some workers get preempted...", PREEMPT_TIME);
> > +	sleep(PREEMPT_TIME);
> > +	printf("done\n");
> > +
> > +	printf("\tStopping workers...");
> > +	/*
> > +	 * Working are checking this value every loop. In preempt_vsx 'cmpwi r5,0; bne 2b'.
> > +	 * r5 will have loaded the value of running.
> > +	 */
> > +	running = 0;  
> Do you need some sort of synchronisation here? You're assuming it
> eventually gets to the threads, which is of course true, but maybe it
> would be a good idea to synchronise it more explicitly? Again, not super
> fussy.

Why bother? A barrier might 'nice' it really buys us nothing. Also that's how
all the other powerpc/math tests currently kill workers.

> > +	for (i = 0; i < threads; i++) {
> > +		void *rc_p;
> > +		pthread_join(tids[i], &rc_p);
> > +
> > +		/*
> > +		 * Harness will say the fail was here, look at why preempt_vsx
> > +		 * returned
> > +		 */
> > +		if ((long) rc_p)
> > +			printf("oops\n");
> > +		FAIL_IF((long) rc_p);
> > +	}
> > +	printf("done\n");
> > +
> > +	return 0;
> > +}
> > +  
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers
  2016-06-08  4:00 ` [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers Cyril Bur
@ 2016-06-28  3:53   ` Simon Guo
  2016-06-30  1:31     ` Cyril Bur
  2016-07-17  3:25   ` Simon Guo
  1 sibling, 1 reply; 16+ messages in thread
From: Simon Guo @ 2016-06-28  3:53 UTC (permalink / raw)
  To: Cyril Bur; +Cc: mpe, linuxppc-dev, mikey, Anshuman Khandual

hi Cyril,

On Wed, Jun 08, 2016 at 02:00:34PM +1000, Cyril Bur wrote:
> @@ -1108,11 +1084,11 @@ struct task_struct *__switch_to(struct task_struct *prev,
>  	 */
>  	save_sprs(&prev->thread);
>  
> -	__switch_to_tm(prev);
> -
>  	/* Save FPU, Altivec, VSX and SPE state */
>  	giveup_all(prev);
>  
> +	__switch_to_tm(prev);
> +

There should be a bug.
giveup_all() will clear MSR[FP] bit. 
__switch_to_tm() reads that bit to decide whether the FP 
register needs to be flushed to thread_struct.
=== tm_reclaim() (invoked by __switch_to_tm)========================
        andi.   r0, r4, MSR_FP
        beq     dont_backup_fp

        addi    r7, r3, THREAD_CKFPSTATE
        SAVE_32FPRS_VSRS(0, R6, R7)     /* r6 scratch, r7 transact fp
state */

        mffs    fr0
        stfd    fr0,FPSTATE_FPSCR(r7)

dont_backup_fp:
=============================

But now the __switch_to_tm() is moved behind giveup_all().
So __switch_to_tm() loses MSR[FP] and cannot decide whether saving ckpt FPU or not.

The same applies to VMX/VSX.

Thanks,
- Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers
  2016-06-28  3:53   ` Simon Guo
@ 2016-06-30  1:31     ` Cyril Bur
  0 siblings, 0 replies; 16+ messages in thread
From: Cyril Bur @ 2016-06-30  1:31 UTC (permalink / raw)
  To: Simon Guo; +Cc: mpe, linuxppc-dev, mikey, Anshuman Khandual

On Tue, 28 Jun 2016 11:53:13 +0800
Simon Guo <simonguo@linux.vnet.ibm.com> wrote:

> hi Cyril,
> 
> On Wed, Jun 08, 2016 at 02:00:34PM +1000, Cyril Bur wrote:
> > @@ -1108,11 +1084,11 @@ struct task_struct *__switch_to(struct task_struct *prev,
> >  	 */
> >  	save_sprs(&prev->thread);
> >  
> > -	__switch_to_tm(prev);
> > -
> >  	/* Save FPU, Altivec, VSX and SPE state */
> >  	giveup_all(prev);
> >  
> > +	__switch_to_tm(prev);
> > +  
> 

Hi Simon,

> There should be a bug.
> giveup_all() will clear MSR[FP] bit. 
> __switch_to_tm() reads that bit to decide whether the FP 
> register needs to be flushed to thread_struct.
> === tm_reclaim() (invoked by __switch_to_tm)========================
>         andi.   r0, r4, MSR_FP
>         beq     dont_backup_fp
> 
>         addi    r7, r3, THREAD_CKFPSTATE
>         SAVE_32FPRS_VSRS(0, R6, R7)     /* r6 scratch, r7 transact fp
> state */
> 
>         mffs    fr0
>         stfd    fr0,FPSTATE_FPSCR(r7)
> 
> dont_backup_fp:
> =============================
> 
> But now the __switch_to_tm() is moved behind giveup_all().
> So __switch_to_tm() loses MSR[FP] and cannot decide whether saving ckpt FPU or not.
> 

Good catch! Yes it looks that way indeed. I thought I had a test to catch this
because this is the big risk here but upon reflection it looks like I don't
(mostly because it seems a condition to catch that is hard to craft).

I'll add a test and a fix.

Thanks.

Cyril

> The same applies to VMX/VSX.

Yeah.

> 
> Thanks,
> - Simon
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers
  2016-06-08  4:00 ` [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers Cyril Bur
  2016-06-28  3:53   ` Simon Guo
@ 2016-07-17  3:25   ` Simon Guo
  2016-07-18  1:28     ` Cyril Bur
  1 sibling, 1 reply; 16+ messages in thread
From: Simon Guo @ 2016-07-17  3:25 UTC (permalink / raw)
  To: Cyril Bur; +Cc: mpe, linuxppc-dev, mikey, Anshuman Khandual, Simon Guo

Hi Cyril,
On Wed, Jun 08, 2016 at 02:00:34PM +1000, Cyril Bur wrote:
> @@ -917,24 +907,10 @@ static inline void tm_recheckpoint_new_task(struct task_struct *new)
>  		 "(new->msr 0x%lx, new->origmsr 0x%lx)\n",
>  		 new->pid, new->thread.regs->msr, msr);
>  
> -	/* This loads the checkpointed FP/VEC state, if used */
>  	tm_recheckpoint(&new->thread, msr);
>  
> -	/* This loads the speculative FP/VEC state, if used */
> -	if (msr & MSR_FP) {
> -		do_load_up_transact_fpu(&new->thread);
> -		new->thread.regs->msr |=
> -			(MSR_FP | new->thread.fpexc_mode);
> -	}
> -#ifdef CONFIG_ALTIVEC
> -	if (msr & MSR_VEC) {
> -		do_load_up_transact_altivec(&new->thread);
> -		new->thread.regs->msr |= MSR_VEC;
> -	}
> -#endif
> -	/* We may as well turn on VSX too since all the state is restored now */
> -	if (msr & MSR_VSX)
> -		new->thread.regs->msr |= MSR_VSX;
> +	/* Won't restore math get called later? */
> +	restore_math(new->thread.regs);

I have some question for the "restore_math" in tm_recheckpoint_new_task().

Per my understanding, now after tm_recheckpoint, the fp_state content
is obsolete.  However restore_math() will try to restore FP/Vec/VSX state 
from fp_state, (orginally it is right, since fp_state was the valid
checkpointed state and consistent with FP register ). Should we remove
the restore_math() here?

And, should the thread's MSR now set FP bit in tm_recheckpoint(), to 
indicate that FP register content is "fresh" in contrast to thread.fp_state?

Please correct me if I am wrong.

Thanks,
- Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers
  2016-07-17  3:25   ` Simon Guo
@ 2016-07-18  1:28     ` Cyril Bur
  2016-07-20  9:36       ` Simon Guo
  0 siblings, 1 reply; 16+ messages in thread
From: Cyril Bur @ 2016-07-18  1:28 UTC (permalink / raw)
  To: Simon Guo; +Cc: mpe, linuxppc-dev, mikey, Anshuman Khandual, Simon Guo

On Sun, 17 Jul 2016 11:25:43 +0800
Simon Guo <simonguo@linux.vnet.ibm.com> wrote:

> Hi Cyril,
> On Wed, Jun 08, 2016 at 02:00:34PM +1000, Cyril Bur wrote:
> > @@ -917,24 +907,10 @@ static inline void tm_recheckpoint_new_task(struct task_struct *new)
> >  		 "(new->msr 0x%lx, new->origmsr 0x%lx)\n",
> >  		 new->pid, new->thread.regs->msr, msr);
> >  
> > -	/* This loads the checkpointed FP/VEC state, if used */
> >  	tm_recheckpoint(&new->thread, msr);
> >  
> > -	/* This loads the speculative FP/VEC state, if used */
> > -	if (msr & MSR_FP) {
> > -		do_load_up_transact_fpu(&new->thread);
> > -		new->thread.regs->msr |=
> > -			(MSR_FP | new->thread.fpexc_mode);
> > -	}
> > -#ifdef CONFIG_ALTIVEC
> > -	if (msr & MSR_VEC) {
> > -		do_load_up_transact_altivec(&new->thread);
> > -		new->thread.regs->msr |= MSR_VEC;
> > -	}
> > -#endif
> > -	/* We may as well turn on VSX too since all the state is restored now */
> > -	if (msr & MSR_VSX)
> > -		new->thread.regs->msr |= MSR_VSX;
> > +	/* Won't restore math get called later? */
> > +	restore_math(new->thread.regs);  
> 
> I have some question for the "restore_math" in tm_recheckpoint_new_task().
> 

Please ask!

> Per my understanding, now after tm_recheckpoint, the fp_state content
> is obsolete.  However restore_math() will try to restore FP/Vec/VSX state 
> from fp_state, (orginally it is right, since fp_state was the valid
> checkpointed state and consistent with FP register ). Should we remove
> the restore_math() here?
> 

The aim of this patch is to ensure that pt_regs, fp_state and vr_state always
hold a threads 'live' registers. So, after a recheckpoint fp_state is where the
the state should be. tm_reclaim_thread() does a save_all() before doing the
reclaim.

This means that the call to restore_math() is a replacement for all deleted
lines above it.

I added it here because I'd prefer to be safe but I left that comment in
because I suspect restore_math() will be called later and we can get away with
not calling it here.

> And, should the thread's MSR now set FP bit in tm_recheckpoint(), to 
> indicate that FP register content is "fresh" in contrast to thread.fp_state?
> 

I'm not sure what you mean by 'fresh'. You do highlight that we'll have to be
sure that the MSR bits are off (so that restore_math() doesn't assume the
registers are already loaded) which makes me think that tm_reclaim_thread()
should be doing a giveup_all(), I'll fix that.

I hope that helps,

Cyril

> Please correct me if I am wrong.
> 
> Thanks,
> - Simon
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers
  2016-07-18  1:28     ` Cyril Bur
@ 2016-07-20  9:36       ` Simon Guo
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Guo @ 2016-07-20  9:36 UTC (permalink / raw)
  To: Cyril Bur; +Cc: Simon Guo, mpe, linuxppc-dev, mikey, Anshuman Khandual

On Mon, Jul 18, 2016 at 11:28:30AM +1000, Cyril Bur wrote:
> On Sun, 17 Jul 2016 11:25:43 +0800
> 
> The aim of this patch is to ensure that pt_regs, fp_state and vr_state always
> hold a threads 'live' registers. So, after a recheckpoint fp_state is where the
> the state should be. tm_reclaim_thread() does a save_all() before doing the
> reclaim.
> 
> This means that the call to restore_math() is a replacement for all deleted
> lines above it.
> 
> I added it here because I'd prefer to be safe but I left that comment in
> because I suspect restore_math() will be called later and we can get away with
> not calling it here.
> 
> > And, should the thread's MSR now set FP bit in tm_recheckpoint(), to 
> > indicate that FP register content is "fresh" in contrast to thread.fp_state?
> > 
> 
> I'm not sure what you mean by 'fresh'. You do highlight that we'll have to be
> sure that the MSR bits are off (so that restore_math() doesn't assume the
> registers are already loaded) which makes me think that tm_reclaim_thread()
> should be doing a giveup_all(), I'll fix that.
> 
> I hope that helps,
> 

Thanks Cyril. The explanation is detail and helpful.

- Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-07-20  9:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08  4:00 [PATCH 0/5] Consistent TM structures Cyril Bur
2016-06-08  4:00 ` [PATCH 1/5] selftests/powerpc: Check for VSX preservation across userspace preemption Cyril Bur
2016-06-09  1:35   ` Daniel Axtens
2016-06-09  3:52     ` Michael Ellerman
2016-06-10  6:10     ` Cyril Bur
2016-06-08  4:00 ` [PATCH 2/5] selftests/powerpc: Add test to check TM ucontext creation Cyril Bur
2016-06-09  5:12   ` Daniel Axtens
2016-06-10  5:55     ` Cyril Bur
2016-06-08  4:00 ` [PATCH 3/5] powerpc: tm: Always use fp_state and vr_state to store live registers Cyril Bur
2016-06-28  3:53   ` Simon Guo
2016-06-30  1:31     ` Cyril Bur
2016-07-17  3:25   ` Simon Guo
2016-07-18  1:28     ` Cyril Bur
2016-07-20  9:36       ` Simon Guo
2016-06-08  4:00 ` [PATCH 4/5] powerpc: tm: Rename transct_(*) to ck(\1)_state Cyril Bur
2016-06-08  4:00 ` [PATCH 5/5] powerpc: Remove do_load_up_transact_{fpu,altivec} Cyril Bur

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.