linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] 2nd attempt at PR KVM + SCV and syscall
@ 2022-01-18  5:58 Nicholas Piggin
  2022-01-18  5:58 ` [RFC PATCH 1/2] powerpc/64: remove system call instruction emulation Nicholas Piggin
  2022-01-18  5:58 ` [RFC PATCH 2/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled Nicholas Piggin
  0 siblings, 2 replies; 4+ messages in thread
From: Nicholas Piggin @ 2022-01-18  5:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Naveen N . Rao, Cédric Le Goater, Nicholas Piggin

The more I looked into system call emulation, the harder it seems to
get.

Second proposal is remove it entirely because it's already broken, and
just boot-time restrict SCV support if we run with PR possible, in hash
mode, on pseries. Unfortunately that catches Power9 PowerVM, but at
least OpenPOWER and Power10 by default is okay.

We probably have to do this as a minimal backport at first even if we
did later decide we need to fix things in a better way because it looks
like it would take a lot of work.

Any comments? mpe helped with looking at the tracing logic and history
of the code (thanks) but any mistakes are mine, Naveen are we on the
right track here?

Thanks,
Nick

Nicholas Piggin (2):
  powerpc/64: remove system call instruction emulation
  KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled

 arch/powerpc/kernel/exceptions-64s.S |  4 ++++
 arch/powerpc/kernel/setup_64.c       | 15 ++++++++++++
 arch/powerpc/kvm/book3s_pr.c         | 20 +++++++++++-----
 arch/powerpc/lib/sstep.c             | 36 ----------------------------
 4 files changed, 33 insertions(+), 42 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [RFC PATCH 1/2] powerpc/64: remove system call instruction emulation
  2022-01-18  5:58 [RFC PATCH 0/2] 2nd attempt at PR KVM + SCV and syscall Nicholas Piggin
@ 2022-01-18  5:58 ` Nicholas Piggin
  2022-01-19 17:46   ` Naveen N. Rao
  2022-01-18  5:58 ` [RFC PATCH 2/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled Nicholas Piggin
  1 sibling, 1 reply; 4+ messages in thread
From: Nicholas Piggin @ 2022-01-18  5:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Naveen N . Rao, Cédric Le Goater, Nicholas Piggin

emulate_step instruction emulation including sc instruction emulation
initially appeared in xmon. It then emulation code was then moved into
sstep.c where kprobes could use it too, and later hw_breakpoint and
uprobes started to use it.

Until uprobes, the only instruction emulation users were for kernel
mode instructions.

- xmon only steps / breaks on kernel addresses.
- kprobes is kernel only.
- hw_breakpoint only emulates kernel instructions, single steps user.

At one point there was support for the kernel to execute sc
instructions, although that is long removed and it's not clear whether
there was any upstream instructions or this was used by out of tree
modules. So system call emulation is not required by the above users.

uprobes uses emulate_step and it appears possible to emulate sc
instruction in userspace. This type of system call emulation is broken
and it's not clear it ever worked well.

The big complication is that userspace takes an interrupt to the kernel
to emulate the instruction. The user->kernel interrupt sets up registers
and interrupt stack frame expecting to return to userspace, then system
call instruction emulation re-directs that stack frame to the kernel,
early in the system call interrupt handler. This means the the interrupt
return code takes the kernel->kernel restore path, which does not restore
everything as the system call interrupt handler would expect coming from
userspace. regs->iamr appears to get lost for example, because the
kernel->kernel return does not restore the user iamr. Accounting such as
irqflags tracing and CPU accounting does not get flipped back to user
mode as the system call handler expects, so those appear to enter the
kernel twice without returning to userspace.

These things may be individually fixable with various complication, but
it is a big complexity to support this just for uprobes.

uprobes emulates the instruction rather than stepping for performance
reasons. System calls instructions should not be a significant source
of uprobes and already expensive, so skipping emulation should not be
a significant problem.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/lib/sstep.c | 36 ------------------------------------
 1 file changed, 36 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index a94b0cd0bdc5..ee3bc45fb23b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -15,9 +15,6 @@
 #include <asm/cputable.h>
 #include <asm/disassemble.h>
 
-extern char system_call_common[];
-extern char system_call_vectored_emulate[];
-
 #ifdef CONFIG_PPC64
 /* Bits in SRR1 that are copied from MSR */
 #define MSR_MASK	0xffffffff87c0ffffUL
@@ -3650,39 +3647,6 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
 		goto instr_done;
 
 #ifdef CONFIG_PPC64
-	case SYSCALL:	/* sc */
-		/*
-		 * N.B. this uses knowledge about how the syscall
-		 * entry code works.  If that is changed, this will
-		 * need to be changed also.
-		 */
-		if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
-				cpu_has_feature(CPU_FTR_REAL_LE) &&
-				regs->gpr[0] == 0x1ebe) {
-			regs_set_return_msr(regs, regs->msr ^ MSR_LE);
-			goto instr_done;
-		}
-		regs->gpr[9] = regs->gpr[13];
-		regs->gpr[10] = MSR_KERNEL;
-		regs->gpr[11] = regs->nip + 4;
-		regs->gpr[12] = regs->msr & MSR_MASK;
-		regs->gpr[13] = (unsigned long) get_paca();
-		regs_set_return_ip(regs, (unsigned long) &system_call_common);
-		regs_set_return_msr(regs, MSR_KERNEL);
-		return 1;
-
-#ifdef CONFIG_PPC_BOOK3S_64
-	case SYSCALL_VECTORED_0:	/* scv 0 */
-		regs->gpr[9] = regs->gpr[13];
-		regs->gpr[10] = MSR_KERNEL;
-		regs->gpr[11] = regs->nip + 4;
-		regs->gpr[12] = regs->msr & MSR_MASK;
-		regs->gpr[13] = (unsigned long) get_paca();
-		regs_set_return_ip(regs, (unsigned long) &system_call_vectored_emulate);
-		regs_set_return_msr(regs, MSR_KERNEL);
-		return 1;
-#endif
-
 	case RFI:
 		return -1;
 #endif
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [RFC PATCH 2/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled
  2022-01-18  5:58 [RFC PATCH 0/2] 2nd attempt at PR KVM + SCV and syscall Nicholas Piggin
  2022-01-18  5:58 ` [RFC PATCH 1/2] powerpc/64: remove system call instruction emulation Nicholas Piggin
@ 2022-01-18  5:58 ` Nicholas Piggin
  1 sibling, 0 replies; 4+ messages in thread
From: Nicholas Piggin @ 2022-01-18  5:58 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Naveen N . Rao, Cédric Le Goater, Nicholas Piggin

PR KVM does not support running with AIL enabled, and SCV does is not
supported with AIL disabled.

Fix this by ensuring the SCV facility is disabled with FSCR while a
CPU can be running with AIL=0. PowerNV host supports disabling AIL on a
per-CPU basis, so SCV just needs to be disabled when a vCPU is run.

The pSeries machine can only switch AIL on a system-wide basis, so it
must disable SCV support at boot if the configuration can potentially
run a PR KVM guest.

Alternatives considered and rejected:
- SCV support can not be disabled by PR KVM after boot, because it is
  advertised to userspace with HWCAP.
- AIL can not be disabled on a per-CPU basis. At least when running on
  pseries it is a per-LPAR setting.
- Support for real-mode SCV vectors will not be added because they are
  at 0x17000 so making such a large fixed head space causes immediate
  value limits to be exceeded, requiring a lot rework and more code.
- Disabling SCV for any PR KVM possible kernel will cause a slowdown
  when not using PR KVM.
- A boot time option to disable SCV to use PR KVM is user-hostile.
- System call instruction emulation for SCV facility unavailable
  instructions is too complex and old emulation code was subtly broken
  and removed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/exceptions-64s.S |  4 ++++
 arch/powerpc/kernel/setup_64.c       | 15 +++++++++++++++
 arch/powerpc/kvm/book3s_pr.c         | 20 ++++++++++++++------
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 55caeee37c08..b66dd6f775a4 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -809,6 +809,10 @@ __start_interrupts:
  * - MSR_EE|MSR_RI is clear (no reentrant exceptions)
  * - Standard kernel environment is set up (stack, paca, etc)
  *
+ * KVM:
+ * These interrupts do not elevate HV 0->1, so HV is not involved. PR KVM
+ * ensures that FSCR[SCV] is disabled whenever it has to force AIL off.
+ *
  * Call convention:
  *
  * syscall register convention is in Documentation/powerpc/syscall64-abi.rst
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index d87f7c1103ce..5b8740ece1b2 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -197,6 +197,21 @@ static void __init configure_exceptions(void)
 
 	/* Under a PAPR hypervisor, we need hypercalls */
 	if (firmware_has_feature(FW_FEATURE_SET_MODE)) {
+		/*
+		 * PR KVM does not support AIL mode interrupts in the host, and
+		 * SCV system call interrupt vectors are only implemented for
+		 * AIL mode. Under pseries, AIL mode can only be enabled and
+		 * disabled system-wide so when PR KVM is loaded, all CPUs in
+		 * the host are set to AIL=0 mode. SCV can not be disabled
+		 * dynamically because the feature is advertised to host
+		 * userspace, so SCV support must not be enabled if PR KVM can
+		 * possibly be run.
+		 */
+		if (IS_ENABLED(CONFIG_KVM_BOOK3S_PR_POSSIBLE) && !radix_enabled()) {
+			init_task.thread.fscr &= ~FSCR_SCV;
+			cur_cpu_spec->cpu_user_features2 &= ~PPC_FEATURE2_SCV;
+		}
+
 		/* Enable AIL if possible */
 		if (!pseries_enable_reloc_on_exc()) {
 			init_task.thread.fscr &= ~FSCR_SCV;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 6bc9425acb32..9845bd509185 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -140,9 +140,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu)
 #endif
 
 	/* Disable AIL if supported */
-	if (cpu_has_feature(CPU_FTR_HVMODE) &&
-	    cpu_has_feature(CPU_FTR_ARCH_207S))
-		mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
+	if (cpu_has_feature(CPU_FTR_HVMODE)) {
+		if (cpu_has_feature(CPU_FTR_ARCH_207S))
+			mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
+		if (cpu_has_feature(CPU_FTR_ARCH_300) && (current->thread.fscr & FSCR_SCV))
+			mtspr(SPRN_FSCR, mfspr(SPRN_FSCR) & ~FSCR_SCV);
+	}
 
 	vcpu->cpu = smp_processor_id();
 #ifdef CONFIG_PPC_BOOK3S_32
@@ -175,9 +178,12 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu)
 	kvmppc_save_tm_pr(vcpu);
 
 	/* Enable AIL if supported */
-	if (cpu_has_feature(CPU_FTR_HVMODE) &&
-	    cpu_has_feature(CPU_FTR_ARCH_207S))
-		mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_AIL_3);
+	if (cpu_has_feature(CPU_FTR_HVMODE)) {
+		if (cpu_has_feature(CPU_FTR_ARCH_207S))
+			mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_AIL_3);
+		if (cpu_has_feature(CPU_FTR_ARCH_300) && (current->thread.fscr & FSCR_SCV))
+			mtspr(SPRN_FSCR, mfspr(SPRN_FSCR) | FSCR_SCV);
+	}
 
 	vcpu->cpu = -1;
 }
@@ -1037,6 +1043,8 @@ static int kvmppc_handle_fac(struct kvm_vcpu *vcpu, ulong fac)
 
 void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr)
 {
+	if (fscr & FSCR_SCV)
+		fscr &= ~FSCR_SCV; /* SCV must not be enabled */
 	if ((vcpu->arch.fscr & FSCR_TAR) && !(fscr & FSCR_TAR)) {
 		/* TAR got dropped, drop it in shadow too */
 		kvmppc_giveup_fac(vcpu, FSCR_TAR_LG);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH 1/2] powerpc/64: remove system call instruction emulation
  2022-01-18  5:58 ` [RFC PATCH 1/2] powerpc/64: remove system call instruction emulation Nicholas Piggin
@ 2022-01-19 17:46   ` Naveen N. Rao
  0 siblings, 0 replies; 4+ messages in thread
From: Naveen N. Rao @ 2022-01-19 17:46 UTC (permalink / raw)
  To: linuxppc-dev, Nicholas Piggin; +Cc: Cédric Le Goater

Nicholas Piggin wrote:
> emulate_step instruction emulation including sc instruction emulation
> initially appeared in xmon. It then emulation code was then moved into
> sstep.c where kprobes could use it too, and later hw_breakpoint and
> uprobes started to use it.
> 
> Until uprobes, the only instruction emulation users were for kernel
> mode instructions.
> 
> - xmon only steps / breaks on kernel addresses.
> - kprobes is kernel only.
> - hw_breakpoint only emulates kernel instructions, single steps user.
> 
> At one point there was support for the kernel to execute sc
> instructions, although that is long removed and it's not clear whether
> there was any upstream instructions or this was used by out of tree
> modules. So system call emulation is not required by the above users.
> 
> uprobes uses emulate_step and it appears possible to emulate sc
> instruction in userspace. This type of system call emulation is broken
> and it's not clear it ever worked well.
> 
> The big complication is that userspace takes an interrupt to the kernel
> to emulate the instruction. The user->kernel interrupt sets up registers
> and interrupt stack frame expecting to return to userspace, then system
> call instruction emulation re-directs that stack frame to the kernel,
> early in the system call interrupt handler. This means the the interrupt
> return code takes the kernel->kernel restore path, which does not restore
> everything as the system call interrupt handler would expect coming from
> userspace. regs->iamr appears to get lost for example, because the
> kernel->kernel return does not restore the user iamr. Accounting such as
> irqflags tracing and CPU accounting does not get flipped back to user
> mode as the system call handler expects, so those appear to enter the
> kernel twice without returning to userspace.
> 
> These things may be individually fixable with various complication, but
> it is a big complexity to support this just for uprobes.
> 
> uprobes emulates the instruction rather than stepping for performance
> reasons. System calls instructions should not be a significant source
> of uprobes and already expensive, so skipping emulation should not be
> a significant problem.

I agree with that assessment, though I think we will need to disable 
probing on 'sc'/'scv' instructions. Per the ISA, section 6.5.15 on 
'Trace Interrupt', we can't single step a system call instruction:
    "Successful completion for an instruction means that the
    instruction caused no other interrupt and, if the thread
    is in Transactional state, did not cause the transaction
    to fail in such a way that the instruction did not com-
    plete (see Section 5.3.1 of Book II). Thus a Trace inter-
    rupt never occurs for a System Call or System Call
    Vectored instruction, or for a Trap instruction that traps,
    or for a dcbf that is executed in Transactional state.
    The instruction that causes a Trace interrupt is called
    the “traced instruction”."

I am not aware of any use case requiring probes on a system call 
instruction, so I think we can disable probing on system call 
instructions (patch further below).

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/lib/sstep.c | 36 ------------------------------------
>  1 file changed, 36 deletions(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index a94b0cd0bdc5..ee3bc45fb23b 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -15,9 +15,6 @@
>  #include <asm/cputable.h>
>  #include <asm/disassemble.h>
> 
> -extern char system_call_common[];
> -extern char system_call_vectored_emulate[];
> -
>  #ifdef CONFIG_PPC64
>  /* Bits in SRR1 that are copied from MSR */
>  #define MSR_MASK	0xffffffff87c0ffffUL
> @@ -3650,39 +3647,6 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
>  		goto instr_done;
> 
>  #ifdef CONFIG_PPC64
> -	case SYSCALL:	/* sc */
> -		/*
> -		 * N.B. this uses knowledge about how the syscall
> -		 * entry code works.  If that is changed, this will
> -		 * need to be changed also.
> -		 */
> -		if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
> -				cpu_has_feature(CPU_FTR_REAL_LE) &&
> -				regs->gpr[0] == 0x1ebe) {
> -			regs_set_return_msr(regs, regs->msr ^ MSR_LE);
> -			goto instr_done;
> -		}
> -		regs->gpr[9] = regs->gpr[13];
> -		regs->gpr[10] = MSR_KERNEL;
> -		regs->gpr[11] = regs->nip + 4;
> -		regs->gpr[12] = regs->msr & MSR_MASK;
> -		regs->gpr[13] = (unsigned long) get_paca();
> -		regs_set_return_ip(regs, (unsigned long) &system_call_common);
> -		regs_set_return_msr(regs, MSR_KERNEL);
> -		return 1;
> -
> -#ifdef CONFIG_PPC_BOOK3S_64
> -	case SYSCALL_VECTORED_0:	/* scv 0 */
> -		regs->gpr[9] = regs->gpr[13];
> -		regs->gpr[10] = MSR_KERNEL;
> -		regs->gpr[11] = regs->nip + 4;
> -		regs->gpr[12] = regs->msr & MSR_MASK;
> -		regs->gpr[13] = (unsigned long) get_paca();
> -		regs_set_return_ip(regs, (unsigned long) &system_call_vectored_emulate);
> -		regs_set_return_msr(regs, MSR_KERNEL);
> -		return 1;
> -#endif
> -

Given that we should not be probing syscall instructions, I think it is 
better to return -1 for these two, similar to the RFI below. With that 
change, for this patch:
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

>  	case RFI:
>  		return -1;
>  #endif


Thanks,
Naveen

--
[PATCH] powerpc/uprobes: Reject uprobe on a system call instruction

Per the ISA, a Trace interrupt is not generated for a system call
[vectored] instruction. Reject uprobes on such instructions as we are
not emulating a system call [vectored] instruction anymore.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 1 +
 arch/powerpc/kernel/uprobes.c         | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index efad07081cc0e5..fedf843bcdddeb 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -411,6 +411,7 @@
 #define PPC_RAW_DCBFPS(a, b)		(0x7c0000ac | ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
 #define PPC_RAW_DCBSTPS(a, b)		(0x7c0000ac | ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
 #define PPC_RAW_SC()			(0x44000002)
+#define PPC_RAW_SCV()			(0x44000001)
 #define PPC_RAW_SYNC()			(0x7c0004ac)
 #define PPC_RAW_ISYNC()			(0x4c00012c)
 
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index c6975467d9ffdc..bedca31391d043 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,12 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
 	if (addr & 0x03)
 		return -EINVAL;
 
+	if (ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SC() ||
+	    ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SCV()) {
+		pr_info("Rejecting uprobe on system call instruction\n");
+		return -EINVAL;
+	}
+
 	if (cpu_has_feature(CPU_FTR_ARCH_31) &&
 	    ppc_inst_prefixed(ppc_inst_read(auprobe->insn)) &&
 	    (addr & 0x3f) == 60) {

base-commit: 863a7c25c334ed368b4508fccae92d6bb61e71a4
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-01-19 17:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-18  5:58 [RFC PATCH 0/2] 2nd attempt at PR KVM + SCV and syscall Nicholas Piggin
2022-01-18  5:58 ` [RFC PATCH 1/2] powerpc/64: remove system call instruction emulation Nicholas Piggin
2022-01-19 17:46   ` Naveen N. Rao
2022-01-18  5:58 ` [RFC PATCH 2/2] KVM: PPC: Book3S PR: Disable SCV when running AIL is disabled Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).