* [PATCH v2 1/3] powerpc: Sort and de-dup primary opcodes in ppc-opcode.h
2022-03-30 14:07 [PATCH v2 0/3] powerpc: Remove system call emulation Naveen N. Rao
@ 2022-03-30 14:07 ` Naveen N. Rao
2022-03-30 14:07 ` [PATCH v2 2/3] powerpc: Reject probes on instructions that can't be single stepped Naveen N. Rao
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Naveen N. Rao @ 2022-03-30 14:07 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, mopsfelder
Cc: linuxppc-dev
Some of the primary opcodes are duplicated. Remove those, and sort the
rest of the primary opcodes to make it easy to read.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/ppc-opcode.h | 69 ++++++++++++---------------
1 file changed, 31 insertions(+), 38 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 82f1f0041c6f79..a5d89cd3e8d12d 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -127,8 +127,37 @@
/* opcode and xopcode for instructions */
-#define OP_TRAP 3
-#define OP_TRAP_64 2
+#define OP_PREFIX 1
+#define OP_TRAP_64 2
+#define OP_TRAP 3
+#define OP_31 31
+#define OP_LWZ 32
+#define OP_LWZU 33
+#define OP_LBZ 34
+#define OP_LBZU 35
+#define OP_STW 36
+#define OP_STWU 37
+#define OP_STB 38
+#define OP_STBU 39
+#define OP_LHZ 40
+#define OP_LHZU 41
+#define OP_LHA 42
+#define OP_LHAU 43
+#define OP_STH 44
+#define OP_STHU 45
+#define OP_LMW 46
+#define OP_STMW 47
+#define OP_LFS 48
+#define OP_LFSU 49
+#define OP_LFD 50
+#define OP_LFDU 51
+#define OP_STFS 52
+#define OP_STFSU 53
+#define OP_STFD 54
+#define OP_STFDU 55
+#define OP_LQ 56
+#define OP_LD 58
+#define OP_STD 62
#define OP_31_XOP_TRAP 4
#define OP_31_XOP_LDX 21
@@ -208,42 +237,6 @@
/* VMX Vector Store Instructions */
#define OP_31_XOP_STVX 231
-/* Prefixed Instructions */
-#define OP_PREFIX 1
-
-#define OP_31 31
-#define OP_LWZ 32
-#define OP_STFS 52
-#define OP_STFSU 53
-#define OP_STFD 54
-#define OP_STFDU 55
-#define OP_LD 58
-#define OP_LWZU 33
-#define OP_LBZ 34
-#define OP_LBZU 35
-#define OP_STW 36
-#define OP_STWU 37
-#define OP_STD 62
-#define OP_STB 38
-#define OP_STBU 39
-#define OP_LHZ 40
-#define OP_LHZU 41
-#define OP_LHA 42
-#define OP_LHAU 43
-#define OP_STH 44
-#define OP_STHU 45
-#define OP_LMW 46
-#define OP_STMW 47
-#define OP_LFS 48
-#define OP_LFSU 49
-#define OP_LFD 50
-#define OP_LFDU 51
-#define OP_STFS 52
-#define OP_STFSU 53
-#define OP_STFD 54
-#define OP_STFDU 55
-#define OP_LQ 56
-
/* sorted alphabetically */
#define PPC_INST_BCCTR_FLUSH 0x4c400420
#define PPC_INST_COPY 0x7c20060c
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/3] powerpc: Reject probes on instructions that can't be single stepped
2022-03-30 14:07 [PATCH v2 0/3] powerpc: Remove system call emulation Naveen N. Rao
2022-03-30 14:07 ` [PATCH v2 1/3] powerpc: Sort and de-dup primary opcodes in ppc-opcode.h Naveen N. Rao
@ 2022-03-30 14:07 ` Naveen N. Rao
2022-03-30 14:07 ` [PATCH v2 3/3] powerpc/64: remove system call instruction emulation Naveen N. Rao
2022-05-15 10:28 ` [PATCH v2 0/3] powerpc: Remove system call emulation Michael Ellerman
3 siblings, 0 replies; 5+ messages in thread
From: Naveen N. Rao @ 2022-03-30 14:07 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, mopsfelder
Cc: linuxppc-dev
Per the ISA, a Trace interrupt is not generated for:
- [h|u]rfi[d]
- rfscv
- sc, scv, and Trap instructions that trap
- Power-Saving Mode instructions
- other instructions that cause interrupts (other than Trace interrupts)
- the first instructions of any interrupt handler (applies to Branch and Single Step tracing;
CIABR matches may still occur)
- instructions that are emulated by software
Add a helper to check for instructions belonging to the first four
categories above and to reject kprobes, uprobes and xmon breakpoints on
such instructions. We reject probing on instructions belonging to these
categories across all ISA versions and across both BookS and BookE.
For trap instructions, we can't know in advance if they can cause a
trap, and there is no good reason to allow probing on those. Also,
uprobes already refuses to probe trap instructions and kprobes does not
allow probes on trap instructions used for kernel warnings and bugs. As
such, stop allowing any type of probes/breakpoints on trap instruction
across uprobes, kprobes and xmon.
For some of the fp/altivec instructions that can generate an interrupt
and which we emulate in the kernel (altivec assist, for example), we
check and turn off single stepping in emulate_single_step().
Instructions generating a DSI are restarted and single stepping normally
completes once the instruction is completed.
In uprobes, if a single stepped instruction results in a non-fatal
signal to be delivered to the task, such signals are "delayed" until
after the instruction completes. For fatal signals, single stepping is
cancelled and the instruction restarted in-place so that core dump
captures proper addresses.
In kprobes, we do not allow probes on instructions having an extable
entry and we also do not allow probing interrupt vectors.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/ppc-opcode.h | 18 ++++++++++++++
arch/powerpc/include/asm/probes.h | 36 +++++++++++++++++++++++++++
arch/powerpc/kernel/kprobes.c | 4 +--
arch/powerpc/kernel/uprobes.c | 5 ++++
arch/powerpc/xmon/xmon.c | 11 ++++----
5 files changed, 66 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index a5d89cd3e8d12d..683e9bc618a74d 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -130,6 +130,8 @@
#define OP_PREFIX 1
#define OP_TRAP_64 2
#define OP_TRAP 3
+#define OP_SC 17
+#define OP_19 19
#define OP_31 31
#define OP_LWZ 32
#define OP_LWZU 33
@@ -159,6 +161,20 @@
#define OP_LD 58
#define OP_STD 62
+#define OP_19_XOP_RFID 18
+#define OP_19_XOP_RFMCI 38
+#define OP_19_XOP_RFDI 39
+#define OP_19_XOP_RFI 50
+#define OP_19_XOP_RFCI 51
+#define OP_19_XOP_RFSCV 82
+#define OP_19_XOP_HRFID 274
+#define OP_19_XOP_URFID 306
+#define OP_19_XOP_STOP 370
+#define OP_19_XOP_DOZE 402
+#define OP_19_XOP_NAP 434
+#define OP_19_XOP_SLEEP 466
+#define OP_19_XOP_RVWINKLE 498
+
#define OP_31_XOP_TRAP 4
#define OP_31_XOP_LDX 21
#define OP_31_XOP_LWZX 23
@@ -179,6 +195,8 @@
#define OP_31_XOP_LHZUX 311
#define OP_31_XOP_MSGSNDP 142
#define OP_31_XOP_MSGCLRP 174
+#define OP_31_XOP_MTMSR 146
+#define OP_31_XOP_MTMSRD 178
#define OP_31_XOP_TLBIE 306
#define OP_31_XOP_MFSPR 339
#define OP_31_XOP_LWAX 341
diff --git a/arch/powerpc/include/asm/probes.h b/arch/powerpc/include/asm/probes.h
index c5d984700d241a..6f66e358aa3780 100644
--- a/arch/powerpc/include/asm/probes.h
+++ b/arch/powerpc/include/asm/probes.h
@@ -8,6 +8,7 @@
* Copyright IBM Corporation, 2012
*/
#include <linux/types.h>
+#include <asm/disassemble.h>
typedef u32 ppc_opcode_t;
#define BREAKPOINT_INSTRUCTION 0x7fe00008 /* trap */
@@ -31,6 +32,41 @@ typedef u32 ppc_opcode_t;
#define MSR_SINGLESTEP (MSR_SE)
#endif
+static inline bool can_single_step(u32 inst)
+{
+ switch (get_op(inst)) {
+ case OP_TRAP_64: return false;
+ case OP_TRAP: return false;
+ case OP_SC: return false;
+ case OP_19:
+ switch (get_xop(inst)) {
+ case OP_19_XOP_RFID: return false;
+ case OP_19_XOP_RFMCI: return false;
+ case OP_19_XOP_RFDI: return false;
+ case OP_19_XOP_RFI: return false;
+ case OP_19_XOP_RFCI: return false;
+ case OP_19_XOP_RFSCV: return false;
+ case OP_19_XOP_HRFID: return false;
+ case OP_19_XOP_URFID: return false;
+ case OP_19_XOP_STOP: return false;
+ case OP_19_XOP_DOZE: return false;
+ case OP_19_XOP_NAP: return false;
+ case OP_19_XOP_SLEEP: return false;
+ case OP_19_XOP_RVWINKLE: return false;
+ }
+ break;
+ case OP_31:
+ switch (get_xop(inst)) {
+ case OP_31_XOP_TRAP: return false;
+ case OP_31_XOP_TRAP_64: return false;
+ case OP_31_XOP_MTMSR: return false;
+ case OP_31_XOP_MTMSRD: return false;
+ }
+ break;
+ }
+ return true;
+}
+
/* Enable single stepping for the current task */
static inline void enable_single_step(struct pt_regs *regs)
{
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 9a492fdec1dfbe..0936a6c8c256b9 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -129,8 +129,8 @@ int arch_prepare_kprobe(struct kprobe *p)
if ((unsigned long)p->addr & 0x03) {
printk("Attempt to register kprobe at an unaligned address\n");
ret = -EINVAL;
- } else if (IS_MTMSRD(insn) || IS_RFID(insn)) {
- printk("Cannot register a kprobe on mtmsr[d]/rfi[d]\n");
+ } else if (!can_single_step(ppc_inst_val(insn))) {
+ printk("Cannot register a kprobe on instructions that can't be single stepped\n");
ret = -EINVAL;
} else if ((unsigned long)p->addr & ~PAGE_MASK &&
ppc_inst_prefixed(ppc_inst_read(p->addr - 1))) {
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index c6975467d9ffdc..95a41ae9dfa755 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -48,6 +48,11 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
return -EINVAL;
}
+ if (!can_single_step(ppc_inst_val(ppc_inst_read(auprobe->insn)))) {
+ pr_info_ratelimited("Cannot register a uprobe on instructions that can't be single stepped\n");
+ return -ENOTSUPP;
+ }
+
return 0;
}
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index fd72753e8ad502..a92c5739d954e2 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -921,9 +921,9 @@ static void insert_bpts(void)
bp->enabled = 0;
continue;
}
- if (IS_MTMSRD(instr) || IS_RFID(instr)) {
- printf("Breakpoint at %lx is on an mtmsrd or rfid "
- "instruction, disabling it\n", bp->address);
+ if (!can_single_step(ppc_inst_val(instr))) {
+ printf("Breakpoint at %lx is on an instruction that can't be single stepped, disabling it\n",
+ bp->address);
bp->enabled = 0;
continue;
}
@@ -1470,9 +1470,8 @@ static long check_bp_loc(unsigned long addr)
printf("Can't read instruction at address %lx\n", addr);
return 0;
}
- if (IS_MTMSRD(instr) || IS_RFID(instr)) {
- printf("Breakpoints may not be placed on mtmsrd or rfid "
- "instructions\n");
+ if (!can_single_step(ppc_inst_val(instr))) {
+ printf("Breakpoints may not be placed on instructions that can't be single stepped\n");
return 0;
}
return 1;
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 3/3] powerpc/64: remove system call instruction emulation
2022-03-30 14:07 [PATCH v2 0/3] powerpc: Remove system call emulation Naveen N. Rao
2022-03-30 14:07 ` [PATCH v2 1/3] powerpc: Sort and de-dup primary opcodes in ppc-opcode.h Naveen N. Rao
2022-03-30 14:07 ` [PATCH v2 2/3] powerpc: Reject probes on instructions that can't be single stepped Naveen N. Rao
@ 2022-03-30 14:07 ` Naveen N. Rao
2022-05-15 10:28 ` [PATCH v2 0/3] powerpc: Remove system call emulation Michael Ellerman
3 siblings, 0 replies; 5+ messages in thread
From: Naveen N. Rao @ 2022-03-30 14:07 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, mopsfelder
Cc: linuxppc-dev
From: Nicholas Piggin <npiggin@gmail.com>
emulate_step() instruction emulation including sc instruction emulation
initially appeared in xmon. It was then moved into sstep.c where kprobes
could use it too, and later hw_breakpoint and uprobes started to use it.
Until uprobes, the only instruction emulation users were for kernel
mode instructions.
- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.
At one point, there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there were any in-tree users. So system call emulation is not required
by the above users.
uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. Userspace system call emulation is broken and
it's not clear it ever worked well.
The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the interrupt
return code takes the kernel->kernel restore path, which does not
restore everything as the system call interrupt handler would expect
coming from userspace. regs->iamr appears to get lost for example,
because the kernel->kernel return does not restore the user iamr.
Accounting such as irqflags tracing and CPU accounting does not get
flipped back to user mode as the system call handler expects, so those
appear to enter the kernel twice without returning to userspace.
These things may be individually fixable with various complication, but
it is a big complexity for unclear real benefit.
Furthermore, it is not possible to single step a system call instruction
since it causes an interrupt. As such, a separate patch disables probing
on system call instructions.
This patch removes system call emulation and disables stepping system
calls.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64']
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
arch/powerpc/kernel/interrupt_64.S | 10 -------
arch/powerpc/lib/sstep.c | 46 +++++++-----------------------
2 files changed, 10 insertions(+), 46 deletions(-)
diff --git a/arch/powerpc/kernel/interrupt_64.S b/arch/powerpc/kernel/interrupt_64.S
index 7bab2d7de372e0..6471034c790973 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -219,16 +219,6 @@ system_call_vectored common 0x3000
*/
system_call_vectored sigill 0x7ff0
-
-/*
- * Entered via kernel return set up by kernel/sstep.c, must match entry regs
- */
- .globl system_call_vectored_emulate
-system_call_vectored_emulate:
-_ASM_NOKPROBE_SYMBOL(system_call_vectored_emulate)
- li r10,IRQS_ALL_DISABLED
- stb r10,PACAIRQSOFTMASK(r13)
- b system_call_vectored_common
#endif /* CONFIG_PPC_BOOK3S */
.balign IFETCH_ALIGN_BYTES
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 3fda8d0a05b43f..01c8fd39f34981 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -15,9 +15,6 @@
#include <asm/cputable.h>
#include <asm/disassemble.h>
-extern char system_call_common[];
-extern char system_call_vectored_emulate[];
-
#ifdef CONFIG_PPC64
/* Bits in SRR1 that are copied from MSR */
#define MSR_MASK 0xffffffff87c0ffffUL
@@ -1376,7 +1373,6 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
if (branch_taken(word, regs, op))
op->type |= BRTAKEN;
return 1;
-#ifdef CONFIG_PPC64
case 17: /* sc */
if ((word & 0xfe2) == 2)
op->type = SYSCALL;
@@ -1388,7 +1384,6 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
} else
op->type = UNKNOWN;
return 0;
-#endif
case 18: /* b */
op->type = BRANCH | BRTAKEN;
imm = word & 0x03fffffc;
@@ -3643,43 +3638,22 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
regs_set_return_msr(regs, (regs->msr & ~op.val) | (val & op.val));
goto instr_done;
-#ifdef CONFIG_PPC64
case SYSCALL: /* sc */
/*
- * N.B. this uses knowledge about how the syscall
- * entry code works. If that is changed, this will
- * need to be changed also.
+ * Per ISA v3.1, section 7.5.15 'Trace Interrupt', we can't
+ * single step a system call instruction:
+ *
+ * Successful completion for an instruction means that the
+ * instruction caused no other interrupt. Thus a Trace
+ * interrupt never occurs for a System Call or System Call
+ * Vectored instruction, or for a Trap instruction that
+ * traps.
*/
- if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
- cpu_has_feature(CPU_FTR_REAL_LE) &&
- regs->gpr[0] == 0x1ebe) {
- regs_set_return_msr(regs, regs->msr ^ MSR_LE);
- goto instr_done;
- }
- regs->gpr[9] = regs->gpr[13];
- regs->gpr[10] = MSR_KERNEL;
- regs->gpr[11] = regs->nip + 4;
- regs->gpr[12] = regs->msr & MSR_MASK;
- regs->gpr[13] = (unsigned long) get_paca();
- regs_set_return_ip(regs, (unsigned long) &system_call_common);
- regs_set_return_msr(regs, MSR_KERNEL);
- return 1;
-
-#ifdef CONFIG_PPC_BOOK3S_64
+ return -1;
case SYSCALL_VECTORED_0: /* scv 0 */
- regs->gpr[9] = regs->gpr[13];
- regs->gpr[10] = MSR_KERNEL;
- regs->gpr[11] = regs->nip + 4;
- regs->gpr[12] = regs->msr & MSR_MASK;
- regs->gpr[13] = (unsigned long) get_paca();
- regs_set_return_ip(regs, (unsigned long) &system_call_vectored_emulate);
- regs_set_return_msr(regs, MSR_KERNEL);
- return 1;
-#endif
-
+ return -1;
case RFI:
return -1;
-#endif
}
return 0;
--
2.35.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 0/3] powerpc: Remove system call emulation
2022-03-30 14:07 [PATCH v2 0/3] powerpc: Remove system call emulation Naveen N. Rao
` (2 preceding siblings ...)
2022-03-30 14:07 ` [PATCH v2 3/3] powerpc/64: remove system call instruction emulation Naveen N. Rao
@ 2022-05-15 10:28 ` Michael Ellerman
3 siblings, 0 replies; 5+ messages in thread
From: Michael Ellerman @ 2022-05-15 10:28 UTC (permalink / raw)
To: mopsfelder, Naveen N. Rao, Christophe Leroy, Michael Ellerman,
Nicholas Piggin
Cc: linuxppc-dev
On Wed, 30 Mar 2022 19:37:16 +0530, Naveen N. Rao wrote:
> Since v1, the main change is to use helpers to decode primary/extended
> opcode and the addition of macros for some of the used opcodes.
>
> - Naveen
>
>
>
> [...]
Applied to powerpc/next.
[1/3] powerpc: Sort and de-dup primary opcodes in ppc-opcode.h
https://git.kernel.org/powerpc/c/f31c618373f2051a32e30002d8eacad7bbbd3885
[2/3] powerpc: Reject probes on instructions that can't be single stepped
https://git.kernel.org/powerpc/c/54cdacd7d3b3c1a8dc10965f56c8b5eb8eda1a33
[3/3] powerpc/64: remove system call instruction emulation
https://git.kernel.org/powerpc/c/a553476c44fb6bd3dc3a7e5efef8f130f0f34850
cheers
^ permalink raw reply [flat|nested] 5+ messages in thread