linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] VMAP_STACK support for book3s64
@ 2022-11-04 17:27 Andrew Donnellan
  2022-11-04 17:27 ` [RFC PATCH 1/6] powerpc/64s: Fix assembly to support larger values of THREAD_SIZE Andrew Donnellan
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Andrew Donnellan @ 2022-11-04 17:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-hardening, cmr

This series begins implementing VMAP_STACK support for book3s64 platforms,
building on the existing 32-bit work that Christophe Leroy has done.

Right now, it doesn't boot on my POWER9 machine - I'm sending this as is because
I'm about to go on holidays for a couple of weeks, and I'll pick it up once I'm
back.

The primary issue is the amount of arch code that has to run in real mode for
some reason or another - this includes OPAL, the cpu idle driver, KVM, and a
few other bits and pieces.

Right now, VMAP_STACK is only enabled if KVM_BOOK3S_64_HV=n - I'm working on
patches for KVM support but they're not quite ready yet.

If anyone has better suggestions for the extremely ugly approach to fixing
OPAL calls, suggest away!

Andrew Donnellan (6):
  powerpc/64s: Fix assembly to support larger values of THREAD_SIZE
  powerpc/64s: Helpers to switch between linear and vmapped stack
    pointers
  powerpc/powernv: Keep MSR in register across OPAL entry/return path
  powerpc/powernv: Convert pointers to physical addresses in OPAL call
    args
  powerpc/powernv/idle: Convert stack pointer to physical address
  powerpc/64s: Enable CONFIG_VMAP_STACK

 arch/powerpc/include/asm/asm-compat.h         |  2 +
 arch/powerpc/include/asm/book3s/64/stack.h    | 71 +++++++++++++++++++
 arch/powerpc/include/asm/opal.h               |  1 +
 arch/powerpc/include/asm/paca.h               |  4 ++
 arch/powerpc/include/asm/processor.h          |  6 ++
 arch/powerpc/kernel/asm-offsets.c             |  8 +++
 arch/powerpc/kernel/entry_64.S                | 11 ++-
 arch/powerpc/kernel/irq.c                     |  8 ++-
 arch/powerpc/kernel/misc_64.S                 |  4 +-
 arch/powerpc/kernel/process.c                 |  8 +++
 arch/powerpc/kernel/smp.c                     |  7 ++
 arch/powerpc/kvm/book3s_hv_builtin.c          |  2 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S       |  3 +-
 arch/powerpc/mm/book3s64/slb.c                | 11 ++-
 arch/powerpc/platforms/Kconfig.cputype        |  1 +
 arch/powerpc/platforms/powernv/eeh-powernv.c  | 20 +++---
 arch/powerpc/platforms/powernv/idle.c         | 47 +++++++++++-
 arch/powerpc/platforms/powernv/ocxl.c         |  3 +-
 arch/powerpc/platforms/powernv/opal-core.c    |  4 +-
 arch/powerpc/platforms/powernv/opal-dump.c    |  6 +-
 arch/powerpc/platforms/powernv/opal-elog.c    | 10 +--
 arch/powerpc/platforms/powernv/opal-fadump.c  | 12 ++--
 arch/powerpc/platforms/powernv/opal-flash.c   |  5 +-
 arch/powerpc/platforms/powernv/opal-hmi.c     |  3 +-
 arch/powerpc/platforms/powernv/opal-irqchip.c |  4 +-
 arch/powerpc/platforms/powernv/opal-lpc.c     |  8 +--
 arch/powerpc/platforms/powernv/opal-nvram.c   |  4 +-
 arch/powerpc/platforms/powernv/opal-power.c   |  4 +-
 .../powerpc/platforms/powernv/opal-powercap.c |  2 +-
 arch/powerpc/platforms/powernv/opal-prd.c     |  6 +-
 arch/powerpc/platforms/powernv/opal-psr.c     |  2 +-
 arch/powerpc/platforms/powernv/opal-rtc.c     |  2 +-
 arch/powerpc/platforms/powernv/opal-secvar.c  |  9 ++-
 arch/powerpc/platforms/powernv/opal-sensor.c  |  4 +-
 .../powerpc/platforms/powernv/opal-sysparam.c |  4 +-
 .../powerpc/platforms/powernv/opal-wrappers.S | 43 ++++++-----
 arch/powerpc/platforms/powernv/opal-xscom.c   |  2 +-
 arch/powerpc/platforms/powernv/opal.c         | 16 ++---
 arch/powerpc/platforms/powernv/pci-ioda.c     | 14 ++--
 arch/powerpc/platforms/powernv/pci.c          | 25 ++++---
 arch/powerpc/platforms/powernv/setup.c        |  2 +-
 arch/powerpc/platforms/powernv/smp.c          |  2 +-
 arch/powerpc/sysdev/xics/icp-opal.c           |  2 +-
 arch/powerpc/sysdev/xics/ics-opal.c           |  8 +--
 arch/powerpc/sysdev/xive/native.c             | 33 +++++----
 arch/powerpc/xmon/xmon.c                      |  4 ++
 drivers/char/ipmi/ipmi_powernv.c              |  6 +-
 drivers/char/powernv-op-panel.c               |  2 +-
 drivers/i2c/busses/i2c-opal.c                 |  2 +-
 drivers/leds/leds-powernv.c                   |  6 +-
 drivers/mtd/devices/powernv_flash.c           |  4 +-
 drivers/rtc/rtc-opal.c                        |  4 +-
 52 files changed, 347 insertions(+), 134 deletions(-)
 create mode 100644 arch/powerpc/include/asm/book3s/64/stack.h

-- 
2.38.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/6] powerpc/64s: Fix assembly to support larger values of THREAD_SIZE
  2022-11-04 17:27 [RFC PATCH 0/6] VMAP_STACK support for book3s64 Andrew Donnellan
@ 2022-11-04 17:27 ` Andrew Donnellan
  2022-11-04 17:51   ` Christophe Leroy
  2022-11-04 17:27 ` [RFC PATCH 2/6] powerpc/64s: Helpers to switch between linear and vmapped stack pointers Andrew Donnellan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Donnellan @ 2022-11-04 17:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-hardening, cmr

When CONFIG_VMAP_STACK is enabled, we set THREAD_SIZE to be at least the
size of a page.

There's a few bits of assembly in the book3s64 code that use THREAD_SIZE in
immediate mode instructions, which can only take an operand of up to 16
bits signed, which isn't quite large enough.

Fix these spots to use a scratch register or use two immediate mode
instructions instead, so we can later enable VMAP_STACK.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
---
 arch/powerpc/include/asm/asm-compat.h   | 2 ++
 arch/powerpc/kernel/entry_64.S          | 4 +++-
 arch/powerpc/kernel/irq.c               | 8 ++++++--
 arch/powerpc/kernel/misc_64.S           | 4 +++-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 3 ++-
 5 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h
index 2bc53c646ccd..30dd7813bf3b 100644
--- a/arch/powerpc/include/asm/asm-compat.h
+++ b/arch/powerpc/include/asm/asm-compat.h
@@ -11,6 +11,7 @@
 #define PPC_LL		stringify_in_c(ld)
 #define PPC_STL		stringify_in_c(std)
 #define PPC_STLU	stringify_in_c(stdu)
+#define PPC_STLUX	stringify_in_c(stdux)
 #define PPC_LCMPI	stringify_in_c(cmpdi)
 #define PPC_LCMPLI	stringify_in_c(cmpldi)
 #define PPC_LCMP	stringify_in_c(cmpd)
@@ -45,6 +46,7 @@
 #define PPC_LL		stringify_in_c(lwz)
 #define PPC_STL		stringify_in_c(stw)
 #define PPC_STLU	stringify_in_c(stwu)
+#define PPC_STLUX	stringify_in_c(stwux)
 #define PPC_LCMPI	stringify_in_c(cmpwi)
 #define PPC_LCMPLI	stringify_in_c(cmplwi)
 #define PPC_LCMP	stringify_in_c(cmpw)
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 3e2e37e6ecab..af25db6e0205 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -238,7 +238,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	/* Note: this uses SWITCH_FRAME_SIZE rather than INT_FRAME_SIZE
 	   because we don't need to leave the 288-byte ABI gap at the
 	   top of the kernel stack. */
-	addi	r7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE
+	li	r9,0
+	ori	r9,r9,THREAD_SIZE-SWITCH_FRAME_SIZE
+	add	r7,r7,r9
 
 	/*
 	 * PMU interrupts in radix may come in here. They will use r1, not
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 9ede61a5a469..098cf6adceec 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -204,7 +204,9 @@ static __always_inline void call_do_softirq(const void *sp)
 {
 	/* Temporarily switch r1 to sp, call __do_softirq() then restore r1. */
 	asm volatile (
-		 PPC_STLU "	%%r1, %[offset](%[sp])	;"
+		"li		%%r0, 0			;"
+		"ori		%%r0, %%r0, %[offset]	;"
+		 PPC_STLUX "	%%r1, %[sp], %%r0	;"
 		"mr		%%r1, %[sp]		;"
 		"bl		%[callee]		;"
 		 PPC_LL "	%%r1, 0(%%r1)		;"
@@ -256,7 +258,9 @@ static __always_inline void call_do_irq(struct pt_regs *regs, void *sp)
 
 	/* Temporarily switch r1 to sp, call __do_irq() then restore r1. */
 	asm volatile (
-		 PPC_STLU "	%%r1, %[offset](%[sp])	;"
+		"li		%%r0, 0			;"
+		"ori		%%r0, %%r0, %[offset]	;"
+		 PPC_STLUX "	%%r1, %[sp], %%r0	;"
 		"mr		%%r4, %%r1		;"
 		"mr		%%r1, %[sp]		;"
 		"bl		%[callee]		;"
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 36184cada00b..ff71b98500a3 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -384,7 +384,9 @@ _GLOBAL(kexec_sequence)
 	std	r0,16(r1)
 
 	/* switch stacks to newstack -- &kexec_stack.stack */
-	stdu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r3)
+	li	r0,0
+	ori	r0,r0,THREAD_SIZE-STACK_FRAME_OVERHEAD
+	stdux	r1,r3,r0
 	mr	r1,r3
 
 	li	r0,0
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 37f50861dd98..d05e3d324f4d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2686,7 +2686,8 @@ kvmppc_bad_host_intr:
 	mr	r9, r1
 	std	r1, PACAR1(r13)
 	ld	r1, PACAEMERGSP(r13)
-	subi	r1, r1, THREAD_SIZE/2 + INT_FRAME_SIZE
+	subi	r1, r1, THREAD_SIZE/2
+	subi	r1, r1, INT_FRAME_SIZE
 	std	r9, 0(r1)
 	std	r0, GPR0(r1)
 	std	r9, GPR1(r1)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/6] powerpc/64s: Helpers to switch between linear and vmapped stack pointers
  2022-11-04 17:27 [RFC PATCH 0/6] VMAP_STACK support for book3s64 Andrew Donnellan
  2022-11-04 17:27 ` [RFC PATCH 1/6] powerpc/64s: Fix assembly to support larger values of THREAD_SIZE Andrew Donnellan
@ 2022-11-04 17:27 ` Andrew Donnellan
  2022-11-05  8:00   ` Christophe Leroy
  2022-11-04 17:27 ` [RFC PATCH 3/6] powerpc/powernv: Keep MSR in register across OPAL entry/return path Andrew Donnellan
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Donnellan @ 2022-11-04 17:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-hardening, cmr

powerpc unfortunately has too many places where we run stuff in real mode.

With CONFIG_VMAP_STACK enabled, this means we need to be able to swap the
stack pointer to use the linear mapping when we enter a real mode section,
and back afterwards.

Store the top bits of the stack pointer in both the linear map and the
vmalloc space in the PACA, and add some helper macros/functions to swap
between them.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>

---

Some of the helpers that are currently unused will be used in the next
version of the series for the KVM real mode handling
---
 arch/powerpc/include/asm/book3s/64/stack.h | 71 ++++++++++++++++++++++
 arch/powerpc/include/asm/opal.h            |  1 +
 arch/powerpc/include/asm/paca.h            |  4 ++
 arch/powerpc/include/asm/processor.h       |  6 ++
 arch/powerpc/kernel/asm-offsets.c          |  8 +++
 arch/powerpc/kernel/entry_64.S             |  7 +++
 arch/powerpc/kernel/process.c              |  4 ++
 arch/powerpc/kernel/smp.c                  |  7 +++
 arch/powerpc/xmon/xmon.c                   |  4 ++
 9 files changed, 112 insertions(+)
 create mode 100644 arch/powerpc/include/asm/book3s/64/stack.h

diff --git a/arch/powerpc/include/asm/book3s/64/stack.h b/arch/powerpc/include/asm/book3s/64/stack.h
new file mode 100644
index 000000000000..6b31adb1a026
--- /dev/null
+++ b/arch/powerpc/include/asm/book3s/64/stack.h
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+// Helpers for VMAP_STACK on book3s64
+// Copyright (C) 2022 IBM Corporation (Andrew Donnellan)
+
+#ifndef _ASM_POWERPC_BOOK3S_64_STACK_H
+#define _ASM_POWERPC_BOOK3S_64_STACK_H
+
+#include <asm/thread_info.h>
+
+#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
+
+#ifdef __ASSEMBLY__
+// Switch the current stack pointer in r1 between a linear map address and a
+// vmalloc address. Used when we need to go in and out of real mode with
+// CONFIG_VMAP_STACK enabled.
+//
+// tmp: scratch register that can be clobbered
+
+#define SWAP_STACK_LINEAR(tmp)			\
+	ld	tmp, PACAKSTACK_LINEAR_BASE(r13);	\
+	andi.	r1, r1, THREAD_SIZE - 1;		\
+	or	r1, r1, tmp;
+#define SWAP_STACK_VMALLOC(tmp)			\
+	ld	tmp, PACAKSTACK_VMALLOC_BASE(r13);	\
+	andi.	r1, r1, THREAD_SIZE - 1;		\
+	or	r1, r1, tmp;
+
+#else // __ASSEMBLY__
+
+#include <asm/paca.h>
+#include <asm/reg.h>
+#include <linux/mm.h>
+
+#define stack_pa(ptr) (is_vmalloc_addr((ptr)) ? (void *)vmalloc_to_phys((void *)(ptr)) : (void *)ptr)
+
+static __always_inline void swap_stack_linear(void)
+{
+	current_stack_pointer = get_paca()->kstack_linear_base |	\
+		(current_stack_pointer & (THREAD_SIZE - 1));
+}
+
+static __always_inline void swap_stack_vmalloc(void)
+{
+	current_stack_pointer = get_paca()->kstack_vmalloc_base |	\
+		(current_stack_pointer & (THREAD_SIZE - 1));
+}
+
+#endif // __ASSEMBLY__
+
+#else // CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64
+
+#define SWAP_STACK_LINEAR(tmp)
+#define SWAP_STACK_VMALLOC(tmp)
+
+static __always_inline void *stack_pa(void *ptr)
+{
+	return ptr;
+}
+
+static __always_inline void swap_stack_linear(void)
+{
+}
+
+static __always_inline void swap_stack_vmalloc(void)
+{
+}
+
+#endif // CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64
+
+#endif // _ASM_POWERPC_BOOK3S_64_STACK_H
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 726125a534de..0360360ad2cf 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -13,6 +13,7 @@
 #ifndef __ASSEMBLY__
 
 #include <linux/notifier.h>
+#include <asm/book3s/64/stack.h>
 
 /* We calculate number of sg entries based on PAGE_SIZE */
 #define SG_ENTRIES_PER_NODE ((PAGE_SIZE - 16) / sizeof(struct opal_sg_entry))
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 09f1790d0ae1..51d060036fa1 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -163,6 +163,10 @@ struct paca_struct {
 	 */
 	struct task_struct *__current;	/* Pointer to current */
 	u64 kstack;			/* Saved Kernel stack addr */
+#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
+	u64 kstack_vmalloc_base;	/* Base address of stack in the vmalloc mapping */
+	u64 kstack_linear_base;		/* Base address of stack in the linear mapping */
+#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
 	u64 saved_r1;			/* r1 save for RTAS calls or PM or EE=0 */
 	u64 saved_msr;			/* MSR saved here by enter_rtas */
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 631802999d59..999078452aa4 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -132,6 +132,12 @@ struct debug_reg {
 
 struct thread_struct {
 	unsigned long	ksp;		/* Kernel stack pointer */
+#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
+	// Kernel stack base addresses in vmalloc and linear mappings
+	// Used for swapping to linear map in real mode code
+	unsigned long	ksp_vmalloc_base;
+	unsigned long	ksp_linear_base;
+#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
 
 #ifdef CONFIG_PPC64
 	unsigned long	ksp_vsid;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 4ce2a4aa3985..46ace958d3ce 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -99,6 +99,10 @@ int main(void)
 #endif
 
 	OFFSET(KSP, thread_struct, ksp);
+#ifdef CONFIG_VMAP_STACK
+	OFFSET(KSP_VMALLOC_BASE, thread_struct, ksp_vmalloc_base);
+	OFFSET(KSP_LINEAR_BASE, thread_struct, ksp_linear_base);
+#endif /* CONFIG_VMAP_STACK */
 	OFFSET(PT_REGS, thread_struct, regs);
 #ifdef CONFIG_BOOKE
 	OFFSET(THREAD_NORMSAVES, thread_struct, normsave[0]);
@@ -181,6 +185,10 @@ int main(void)
 	OFFSET(PACAPACAINDEX, paca_struct, paca_index);
 	OFFSET(PACAPROCSTART, paca_struct, cpu_start);
 	OFFSET(PACAKSAVE, paca_struct, kstack);
+#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
+	OFFSET(PACAKSTACK_VMALLOC_BASE, paca_struct, kstack_vmalloc_base);
+	OFFSET(PACAKSTACK_LINEAR_BASE, paca_struct, kstack_linear_base);
+#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
 	OFFSET(PACACURRENT, paca_struct, __current);
 	DEFINE(PACA_THREAD_INFO, offsetof(struct paca_struct, __current) +
 				 offsetof(struct task_struct, thread_info));
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index af25db6e0205..cd9e56b25934 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -253,6 +253,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mr	r1,r8		/* start using new stack pointer */
 	std	r7,PACAKSAVE(r13)
 
+#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
+	ld	r8,KSP_LINEAR_BASE(r4)
+	std	r8,PACAKSTACK_LINEAR_BASE(r13)
+	ld	r8,KSP_VMALLOC_BASE(r4)
+	std	r8,PACAKSTACK_VMALLOC_BASE(r13)
+#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
+
 	ld	r6,_CCR(r1)
 	mtcrf	0xFF,r6
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 67da147fe34d..07917726c629 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1782,6 +1782,10 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	kregs = (struct pt_regs *) sp;
 	sp -= STACK_FRAME_OVERHEAD;
 	p->thread.ksp = sp;
+#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
+	p->thread.ksp_vmalloc_base = sp & ~(THREAD_SIZE - 1);
+	p->thread.ksp_linear_base = (u64)__va(vmalloc_to_pfn((void *)sp) << PAGE_SHIFT);
+#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 	for (i = 0; i < nr_wp_slots(); i++)
 		p->thread.ptrace_bps[i] = NULL;
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 0da6e59161cd..466ccab5adb8 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -60,6 +60,7 @@
 #include <asm/ftrace.h>
 #include <asm/kup.h>
 #include <asm/fadump.h>
+#include <asm/book3s/64/stack.h>
 
 #ifdef DEBUG
 #include <asm/udbg.h>
@@ -1250,6 +1251,12 @@ static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 	paca_ptrs[cpu]->__current = idle;
 	paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
 				 THREAD_SIZE - STACK_FRAME_OVERHEAD;
+#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
+	paca_ptrs[cpu]->kstack_linear_base = is_vmalloc_addr((void *)paca_ptrs[cpu]->kstack) ?
+		vmalloc_to_phys((void *)(paca_ptrs[cpu]->kstack)) :
+		paca_ptrs[cpu]->kstack;
+	paca_ptrs[cpu]->kstack_vmalloc_base = paca_ptrs[cpu]->kstack & (THREAD_SIZE - 1);
+#endif // CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64
 #endif
 	task_thread_info(idle)->cpu = cpu;
 	secondary_current = current_set[cpu] = idle;
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index f51c882bf902..236287c4a231 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2697,6 +2697,10 @@ static void dump_one_paca(int cpu)
 	DUMP(p, __current, "%-*px");
 	DUMP(p, kstack, "%#-*llx");
 	printf(" %-*s = 0x%016llx\n", 25, "kstack_base", p->kstack & ~(THREAD_SIZE - 1));
+#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
+	DUMP(p, kstack_linear_base, "%#-*llx");
+	DUMP(p, kstack_vmalloc_base, "%#-*llx");
+#endif
 #ifdef CONFIG_STACKPROTECTOR
 	DUMP(p, canary, "%#-*lx");
 #endif
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/6] powerpc/powernv: Keep MSR in register across OPAL entry/return path
  2022-11-04 17:27 [RFC PATCH 0/6] VMAP_STACK support for book3s64 Andrew Donnellan
  2022-11-04 17:27 ` [RFC PATCH 1/6] powerpc/64s: Fix assembly to support larger values of THREAD_SIZE Andrew Donnellan
  2022-11-04 17:27 ` [RFC PATCH 2/6] powerpc/64s: Helpers to switch between linear and vmapped stack pointers Andrew Donnellan
@ 2022-11-04 17:27 ` Andrew Donnellan
  2022-11-04 18:00   ` Christophe Leroy
  2022-11-04 17:27 ` [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args Andrew Donnellan
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Donnellan @ 2022-11-04 17:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-hardening, cmr

When we enter and return from an OPAL call, there's three pieces of state
we have to save and restore: the stack pointer, the PACA pointer, and the
MSR. However, there's only two registers that OPAL is guaranteed to
preserve for us (r1 for the stack pointer and r13 for the PACA), so the MSR
gets saved on the stack.

This becomes problematic when we enable VMAP_STACK, as we need to re-enable
translation in order to access the virtually mapped stack... and to
re-enable translation, we need to restore the MSR.

Keep the MSR in r13, and instead store the PACA pointer on the stack - we
can restore the MSR first, then restore the PACA into r13.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
---
 .../powerpc/platforms/powernv/opal-wrappers.S | 43 +++++++++++--------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 0ed95f753416..d692869ee0ce 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -23,40 +23,49 @@
 _GLOBAL_TOC(__opal_call)
 	mflr	r0
 	std	r0,PPC_LR_STKOFF(r1)
-	ld	r12,STK_PARAM(R12)(r1)
-	li	r0,MSR_IR|MSR_DR|MSR_LE
-	andc	r12,r12,r0
 	LOAD_REG_ADDR(r11, opal_return)
 	mtlr	r11
 	LOAD_REG_ADDR(r11, opal)
 	ld	r2,0(r11)
 	ld	r11,8(r11)
 	mtspr	SPRN_HSRR0,r11
-	mtspr	SPRN_HSRR1,r12
+
 	/* set token to r0 */
 	ld	r0,STK_PARAM(R11)(r1)
+
+	/*
+	 * We need to keep the MSR value in a register that is preserved by
+	 * OPAL, so that we don't need to access the stack before we restore
+	 * the MSR, as the stack may be vmalloced and thus require MMU.
+	 *
+	 * Move the PACA from R13 into the stack red zone, and put MSR in R13.
+	 */
+	std	r13,-8(r1)
+	ld	r13,STK_PARAM(R12)(r1)
+
+	/* Switch off MMU, LE */
+	li	r11,MSR_IR|MSR_DR|MSR_LE
+	andc	r11,r13,r11
+
+	mtspr	SPRN_HSRR1,r11
 	hrfid
 opal_return:
 	/*
 	 * Restore MSR on OPAL return. The MSR is set to big-endian.
 	 */
 #ifdef __BIG_ENDIAN__
-	ld	r11,STK_PARAM(R12)(r1)
-	mtmsrd	r11
+	mtmsrd	r13
 #else
 	/* Endian can only be switched with rfi, must byte reverse MSR load */
-	.short 0x4039	 /* li r10,STK_PARAM(R12)		*/
-	.byte (STK_PARAM(R12) >> 8) & 0xff
-	.byte STK_PARAM(R12) & 0xff
-
-	.long 0x280c6a7d /* ldbrx r11,r10,r1			*/
-	.long 0x05009f42 /* bcl 20,31,$+4			*/
-	.long 0xa602487d /* mflr r10				*/
-	.long 0x14004a39 /* addi r10,r10,20			*/
-	.long 0xa64b5a7d /* mthsrr0 r10				*/
-	.long 0xa64b7b7d /* mthsrr1 r11				*/
-	.long 0x2402004c /* hrfid				*/
+	.long 0x05009f42 /* bcl 20,31,$+4   (LR <- next insn addr)	*/
+	.long 0xa602487d /* mflr r10					*/
+	.long 0x14004a39 /* addi r10,r10,20 (r10 <- addr after #endif)	*/
+	.long 0xa64b5a7d /* mthsrr0 r10	    (new NIP)			*/
+	.long 0xa64bbb7d /* mthsrr1 r13	    (new MSR)			*/
+	.long 0x2402004c /* hrfid					*/
 #endif
+	/* Restore PACA */
+	ld	r13,-8(r1)
 	LOAD_PACA_TOC()
 	ld	r0,PPC_LR_STKOFF(r1)
 	mtlr	r0
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args
  2022-11-04 17:27 [RFC PATCH 0/6] VMAP_STACK support for book3s64 Andrew Donnellan
                   ` (2 preceding siblings ...)
  2022-11-04 17:27 ` [RFC PATCH 3/6] powerpc/powernv: Keep MSR in register across OPAL entry/return path Andrew Donnellan
@ 2022-11-04 17:27 ` Andrew Donnellan
  2022-11-07  0:00   ` Russell Currey
  2022-11-08 16:21   ` Christophe Leroy
  2022-11-04 17:27 ` [RFC PATCH 5/6] powerpc/powernv/idle: Convert stack pointer to physical address Andrew Donnellan
  2022-11-04 17:27 ` [RFC PATCH 6/6] powerpc/64s: Enable CONFIG_VMAP_STACK Andrew Donnellan
  5 siblings, 2 replies; 17+ messages in thread
From: Andrew Donnellan @ 2022-11-04 17:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-hardening, cmr

A number of OPAL calls take addresses as arguments (e.g. buffers with
strings to print, etc). These addresses need to be physical addresses, as
OPAL runs in real mode.

Since the hardware ignores the top two bits of the address in real mode,
passing addresses in the kernel's linear map works fine even if we don't
wrap them in __pa().

With VMAP_STACK, however, we're going to have to use vmalloc_to_phys() to
convert addresses from the stack into an address that OPAL can use.

Introduce a new macro, stack_pa(), that uses vmalloc_to_phys() for
addresses in the vmalloc area, and __pa() for linear map addresses. Add it
to all the existing callsites where we pass pointers to OPAL.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_builtin.c          |  2 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c  | 20 ++++++-----
 arch/powerpc/platforms/powernv/ocxl.c         |  3 +-
 arch/powerpc/platforms/powernv/opal-core.c    |  4 +--
 arch/powerpc/platforms/powernv/opal-dump.c    |  6 ++--
 arch/powerpc/platforms/powernv/opal-elog.c    | 10 +++---
 arch/powerpc/platforms/powernv/opal-fadump.c  | 12 +++----
 arch/powerpc/platforms/powernv/opal-flash.c   |  5 +--
 arch/powerpc/platforms/powernv/opal-hmi.c     |  3 +-
 arch/powerpc/platforms/powernv/opal-irqchip.c |  4 +--
 arch/powerpc/platforms/powernv/opal-lpc.c     |  8 ++---
 arch/powerpc/platforms/powernv/opal-nvram.c   |  4 +--
 arch/powerpc/platforms/powernv/opal-power.c   |  4 +--
 .../powerpc/platforms/powernv/opal-powercap.c |  2 +-
 arch/powerpc/platforms/powernv/opal-prd.c     |  6 ++--
 arch/powerpc/platforms/powernv/opal-psr.c     |  2 +-
 arch/powerpc/platforms/powernv/opal-rtc.c     |  2 +-
 arch/powerpc/platforms/powernv/opal-secvar.c  |  9 +++--
 arch/powerpc/platforms/powernv/opal-sensor.c  |  4 +--
 .../powerpc/platforms/powernv/opal-sysparam.c |  4 +--
 arch/powerpc/platforms/powernv/opal-xscom.c   |  2 +-
 arch/powerpc/platforms/powernv/opal.c         | 16 ++++-----
 arch/powerpc/platforms/powernv/pci-ioda.c     | 14 ++++----
 arch/powerpc/platforms/powernv/pci.c          | 25 +++++++-------
 arch/powerpc/platforms/powernv/setup.c        |  2 +-
 arch/powerpc/platforms/powernv/smp.c          |  2 +-
 arch/powerpc/sysdev/xics/icp-opal.c           |  2 +-
 arch/powerpc/sysdev/xics/ics-opal.c           |  8 ++---
 arch/powerpc/sysdev/xive/native.c             | 33 ++++++++++++-------
 drivers/char/ipmi/ipmi_powernv.c              |  6 ++--
 drivers/char/powernv-op-panel.c               |  2 +-
 drivers/i2c/busses/i2c-opal.c                 |  2 +-
 drivers/leds/leds-powernv.c                   |  6 ++--
 drivers/mtd/devices/powernv_flash.c           |  4 +--
 drivers/rtc/rtc-opal.c                        |  4 +--
 35 files changed, 135 insertions(+), 107 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index da85f046377a..dba041d659d2 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -414,7 +414,7 @@ static long kvmppc_read_one_intr(bool *again)
 	xics_phys = local_paca->kvm_hstate.xics_phys;
 	rc = 0;
 	if (!xics_phys)
-		rc = opal_int_get_xirr(&xirr, false);
+		rc = opal_int_get_xirr(stack_pa(&xirr), false);
 	else
 		xirr = __raw_rm_readl(xics_phys + XICS_XIRR);
 	if (rc < 0)
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c b/arch/powerpc/platforms/powernv/eeh-powernv.c
index a83cb679dd59..f069aa28f969 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -517,7 +517,7 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe *pe)
 	struct pnv_phb *phb = pe->phb->private_data;
 	s64 rc;
 
-	rc = opal_pci_get_phb_diag_data2(phb->opal_id, pe->data,
+	rc = opal_pci_get_phb_diag_data2(phb->opal_id, stack_pa(pe->data),
 					 phb->diag_data_size);
 	if (rc != OPAL_SUCCESS)
 		pr_warn("%s: Failure %lld getting PHB#%x diag-data\n",
@@ -534,8 +534,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe *pe)
 
 	rc = opal_pci_eeh_freeze_status(phb->opal_id,
 					pe->addr,
-					&fstate,
-					&pcierr,
+					stack_pa(&fstate),
+					stack_pa(&pcierr),
 					NULL);
 	if (rc != OPAL_SUCCESS) {
 		pr_warn("%s: Failure %lld getting PHB#%x state\n",
@@ -594,8 +594,8 @@ static int pnv_eeh_get_pe_state(struct eeh_pe *pe)
 	} else {
 		rc = opal_pci_eeh_freeze_status(phb->opal_id,
 						pe->addr,
-						&fstate,
-						&pcierr,
+						stack_pa(&fstate),
+						stack_pa(&pcierr),
 						NULL);
 		if (rc != OPAL_SUCCESS) {
 			pr_warn("%s: Failure %lld getting PHB#%x-PE%x state\n",
@@ -1287,7 +1287,8 @@ static void pnv_eeh_get_and_dump_hub_diag(struct pci_controller *hose)
 		(struct OpalIoP7IOCErrorData*)phb->diag_data;
 	long rc;
 
-	rc = opal_pci_get_hub_diag_data(phb->hub_id, data, sizeof(*data));
+	rc = opal_pci_get_hub_diag_data(phb->hub_id, stack_pa(data),
+					sizeof(*data));
 	if (rc != OPAL_SUCCESS) {
 		pr_warn("%s: Failed to get HUB#%llx diag-data (%ld)\n",
 			__func__, phb->hub_id, rc);
@@ -1432,7 +1433,9 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
 			continue;
 
 		rc = opal_pci_next_error(phb->opal_id,
-					 &frozen_pe_no, &err_type, &severity);
+					 stack_pa(&frozen_pe_no),
+					 stack_pa(&err_type),
+					 stack_pa(&severity));
 		if (rc != OPAL_SUCCESS) {
 			pr_devel("%s: Invalid return value on "
 				 "PHB#%x (0x%lx) from opal_pci_next_error",
@@ -1511,7 +1514,8 @@ static int pnv_eeh_next_error(struct eeh_pe **pe)
 
 				/* Dump PHB diag-data */
 				rc = opal_pci_get_phb_diag_data2(phb->opal_id,
-					phb->diag_data, phb->diag_data_size);
+								 stack_pa(phb->diag_data),
+								 phb->diag_data_size);
 				if (rc == OPAL_SUCCESS)
 					pnv_pci_dump_phb_diag_data(hose,
 							phb->diag_data);
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index 629067781cec..33d5b85df078 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -450,7 +450,8 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
 		return -ENOMEM;
 
 	bdfn = (dev->bus->number << 8) | dev->devfn;
-	rc = opal_npu_spa_setup(phb->opal_id, bdfn, virt_to_phys(spa_mem),
+	rc = opal_npu_spa_setup(phb->opal_id, bdfn,
+				(uint64_t)stack_pa(spa_mem),
 				PE_mask);
 	if (rc) {
 		dev_err(&dev->dev, "Can't setup Shared Process Area: %d\n", rc);
diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c
index bb7657115f1d..6a4a1fd9ec33 100644
--- a/arch/powerpc/platforms/powernv/opal-core.c
+++ b/arch/powerpc/platforms/powernv/opal-core.c
@@ -475,7 +475,7 @@ static void __init opalcore_config_init(void)
 	}
 
 	/* Get OPAL metadata */
-	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_OPAL, &addr);
+	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_OPAL, stack_pa(&addr));
 	if ((ret != OPAL_SUCCESS) || !addr) {
 		pr_err("Failed to get OPAL metadata (%d)\n", ret);
 		goto error_out;
@@ -486,7 +486,7 @@ static void __init opalcore_config_init(void)
 	opalc_metadata = __va(addr);
 
 	/* Get OPAL CPU metadata */
-	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, &addr);
+	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, stack_pa(&addr));
 	if ((ret != OPAL_SUCCESS) || !addr) {
 		pr_err("Failed to get OPAL CPU metadata (%d)\n", ret);
 		goto error_out;
diff --git a/arch/powerpc/platforms/powernv/opal-dump.c b/arch/powerpc/platforms/powernv/opal-dump.c
index 16c5860f1372..9d48257988bc 100644
--- a/arch/powerpc/platforms/powernv/opal-dump.c
+++ b/arch/powerpc/platforms/powernv/opal-dump.c
@@ -223,9 +223,9 @@ static int64_t dump_read_info(uint32_t *dump_id, uint32_t *dump_size, uint32_t *
 
 	type = cpu_to_be32(0xffffffff);
 
-	rc = opal_dump_info2(&id, &size, &type);
+	rc = opal_dump_info2(stack_pa(&id), stack_pa(&size), stack_pa(&type));
 	if (rc == OPAL_PARAMETER)
-		rc = opal_dump_info(&id, &size);
+		rc = opal_dump_info(stack_pa(&id), stack_pa(&size));
 
 	if (rc) {
 		pr_warn("%s: Failed to get dump info (%d)\n",
@@ -262,7 +262,7 @@ static int64_t dump_read_data(struct dump_obj *dump)
 	}
 
 	/* First entry address */
-	addr = __pa(list);
+	addr = (uint64_t)stack_pa(list);
 
 	/* Fetch data */
 	rc = OPAL_BUSY_EVENT;
diff --git a/arch/powerpc/platforms/powernv/opal-elog.c b/arch/powerpc/platforms/powernv/opal-elog.c
index 554fdd7f88b8..8750d7729e7c 100644
--- a/arch/powerpc/platforms/powernv/opal-elog.c
+++ b/arch/powerpc/platforms/powernv/opal-elog.c
@@ -169,7 +169,7 @@ static ssize_t raw_attr_read(struct file *filep, struct kobject *kobj,
 		if (!elog->buffer)
 			return -EIO;
 
-		opal_rc = opal_read_elog(__pa(elog->buffer),
+		opal_rc = opal_read_elog((uint64_t)stack_pa(elog->buffer),
 					 elog->size, elog->id);
 		if (opal_rc != OPAL_SUCCESS) {
 			pr_err_ratelimited("ELOG: log read failed for log-id=%llx\n",
@@ -212,8 +212,8 @@ static void create_elog_obj(uint64_t id, size_t size, uint64_t type)
 	elog->buffer = kzalloc(elog->size, GFP_KERNEL);
 
 	if (elog->buffer) {
-		rc = opal_read_elog(__pa(elog->buffer),
-					 elog->size, elog->id);
+		rc = opal_read_elog((uint64_t)stack_pa(elog->buffer),
+				    elog->size, elog->id);
 		if (rc != OPAL_SUCCESS) {
 			pr_err("ELOG: log read failed for log-id=%llx\n",
 			       elog->id);
@@ -270,7 +270,9 @@ static irqreturn_t elog_event(int irq, void *data)
 	char name[2+16+1];
 	struct kobject *kobj;
 
-	rc = opal_get_elog_size(&id, &size, &type);
+	rc = opal_get_elog_size(stack_pa(&id),
+				stack_pa(&size),
+				stack_pa(&type));
 	if (rc != OPAL_SUCCESS) {
 		pr_err("ELOG: OPAL log info read failed\n");
 		return IRQ_HANDLED;
diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c b/arch/powerpc/platforms/powernv/opal-fadump.c
index 964f464b1b0e..d4bdf4540c1f 100644
--- a/arch/powerpc/platforms/powernv/opal-fadump.c
+++ b/arch/powerpc/platforms/powernv/opal-fadump.c
@@ -47,7 +47,7 @@ void __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, u64 node)
 	if (!prop)
 		return;
 
-	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &addr);
+	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, stack_pa(&addr));
 	if ((ret != OPAL_SUCCESS) || !addr) {
 		pr_debug("Could not get Kernel metadata (%lld)\n", ret);
 		return;
@@ -63,7 +63,7 @@ void __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, u64 node)
 	if (be16_to_cpu(opal_fdm_active->registered_regions) == 0)
 		return;
 
-	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_BOOT_MEM, &addr);
+	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_BOOT_MEM, stack_pa(&addr));
 	if ((ret != OPAL_SUCCESS) || !addr) {
 		pr_err("Failed to get boot memory tag (%lld)\n", ret);
 		return;
@@ -607,7 +607,7 @@ static void opal_fadump_trigger(struct fadump_crash_info_header *fdh,
 	 */
 	fdh->crashing_cpu = (u32)mfspr(SPRN_PIR);
 
-	rc = opal_cec_reboot2(OPAL_REBOOT_MPIPL, msg);
+	rc = opal_cec_reboot2(OPAL_REBOOT_MPIPL, stack_pa(msg));
 	if (rc == OPAL_UNSUPPORTED) {
 		pr_emerg("Reboot type %d not supported.\n",
 			 OPAL_REBOOT_MPIPL);
@@ -690,7 +690,7 @@ void __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, u64 node)
 	if (!prop)
 		return;
 
-	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &be_addr);
+	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, stack_pa(&be_addr));
 	if ((ret != OPAL_SUCCESS) || !be_addr) {
 		pr_err("Failed to get Kernel metadata (%lld)\n", ret);
 		return;
@@ -712,8 +712,8 @@ void __init opal_fadump_dt_scan(struct fw_dump *fadump_conf, u64 node)
 		return;
 	}
 
-	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, &be_addr);
-	if (be_addr) {
+	ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, stack_pa(&be_addr));
+	if (addr) {
 		addr = be64_to_cpu(be_addr);
 		pr_debug("CPU metadata addr: %llx\n", addr);
 		opal_cpu_metadata = __va(addr);
diff --git a/arch/powerpc/platforms/powernv/opal-flash.c b/arch/powerpc/platforms/powernv/opal-flash.c
index d5ea04e8e4c5..fb989707ce94 100644
--- a/arch/powerpc/platforms/powernv/opal-flash.c
+++ b/arch/powerpc/platforms/powernv/opal-flash.c
@@ -134,7 +134,8 @@ static inline void opal_flash_validate(void)
 	__be32 size = cpu_to_be32(validate_flash_data.buf_size);
 	__be32 result;
 
-	ret = opal_validate_flash(__pa(buf), &size, &result);
+	ret = opal_validate_flash((uint64_t)stack_pa(buf), stack_pa(&size),
+				  stack_pa(&result));
 
 	validate_flash_data.status = ret;
 	validate_flash_data.buf_size = be32_to_cpu(size);
@@ -290,7 +291,7 @@ static int opal_flash_update(int op)
 		goto invalid_img;
 
 	/* First entry address */
-	addr = __pa(list);
+	addr = (unsigned long)stack_pa(list);
 
 flash:
 	rc = opal_update_flash(addr);
diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c b/arch/powerpc/platforms/powernv/opal-hmi.c
index f0c1830deb51..a7df32dfd090 100644
--- a/arch/powerpc/platforms/powernv/opal-hmi.c
+++ b/arch/powerpc/platforms/powernv/opal-hmi.c
@@ -303,7 +303,8 @@ static void hmi_event_handler(struct work_struct *work)
 
 	if (unrecoverable) {
 		/* Pull all HMI events from OPAL before we panic. */
-		while (opal_get_msg(__pa(&msg), sizeof(msg)) == OPAL_SUCCESS) {
+		while (opal_get_msg((uint64_t)stack_pa(&msg),
+				    sizeof(msg)) == OPAL_SUCCESS) {
 			u32 type;
 
 			type = be32_to_cpu(msg.msg_type);
diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c b/arch/powerpc/platforms/powernv/opal-irqchip.c
index d55652b5f6fa..0af8e517884c 100644
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -60,7 +60,7 @@ void opal_handle_events(void)
 		cond_resched();
 	}
 	last_outstanding_events = 0;
-	if (opal_poll_events(&events) != OPAL_SUCCESS)
+	if (opal_poll_events(stack_pa(&events)) != OPAL_SUCCESS)
 		return;
 	e = be64_to_cpu(events) & opal_event_irqchip.mask;
 	if (e)
@@ -123,7 +123,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
 {
 	__be64 events;
 
-	opal_handle_interrupt(virq_to_hw(irq), &events);
+	opal_handle_interrupt(virq_to_hw(irq), stack_pa(&events));
 	last_outstanding_events = be64_to_cpu(events);
 	if (opal_have_pending_events())
 		opal_wake_poller();
diff --git a/arch/powerpc/platforms/powernv/opal-lpc.c b/arch/powerpc/platforms/powernv/opal-lpc.c
index d129d6d45a50..01114ab629dc 100644
--- a/arch/powerpc/platforms/powernv/opal-lpc.c
+++ b/arch/powerpc/platforms/powernv/opal-lpc.c
@@ -28,7 +28,7 @@ static u8 opal_lpc_inb(unsigned long port)
 
 	if (opal_lpc_chip_id < 0 || port > 0xffff)
 		return 0xff;
-	rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port, &data, 1);
+	rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port, stack_pa(&data), 1);
 	return rc ? 0xff : be32_to_cpu(data);
 }
 
@@ -41,7 +41,7 @@ static __le16 __opal_lpc_inw(unsigned long port)
 		return 0xffff;
 	if (port & 1)
 		return (__le16)opal_lpc_inb(port) << 8 | opal_lpc_inb(port + 1);
-	rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port, &data, 2);
+	rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port, stack_pa(&data), 2);
 	return rc ? 0xffff : be32_to_cpu(data);
 }
 static u16 opal_lpc_inw(unsigned long port)
@@ -61,7 +61,7 @@ static __le32 __opal_lpc_inl(unsigned long port)
 		       (__le32)opal_lpc_inb(port + 1) << 16 |
 		       (__le32)opal_lpc_inb(port + 2) <<  8 |
 			       opal_lpc_inb(port + 3);
-	rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port, &data, 4);
+	rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port, stack_pa(&data), 4);
 	return rc ? 0xffffffff : be32_to_cpu(data);
 }
 
@@ -208,7 +208,7 @@ static ssize_t lpc_debug_read(struct file *filp, char __user *ubuf,
 				len = 2;
 		}
 		rc = opal_lpc_read(opal_lpc_chip_id, lpc->lpc_type, pos,
-				   &data, len);
+				   stack_pa(&data), len);
 		if (rc)
 			return -ENXIO;
 
diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c b/arch/powerpc/platforms/powernv/opal-nvram.c
index 380bc2d7ebbf..d92e5070baf2 100644
--- a/arch/powerpc/platforms/powernv/opal-nvram.c
+++ b/arch/powerpc/platforms/powernv/opal-nvram.c
@@ -33,7 +33,7 @@ static ssize_t opal_nvram_read(char *buf, size_t count, loff_t *index)
 	off = *index;
 	if ((off + count) > nvram_size)
 		count = nvram_size - off;
-	rc = opal_read_nvram(__pa(buf), count, off);
+	rc = opal_read_nvram((uint64_t)stack_pa(buf), count, off);
 	if (rc != OPAL_SUCCESS)
 		return -EIO;
 	*index += count;
@@ -56,7 +56,7 @@ static ssize_t opal_nvram_write(char *buf, size_t count, loff_t *index)
 		count = nvram_size - off;
 
 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
-		rc = opal_write_nvram(__pa(buf), count, off);
+		rc = opal_write_nvram((uint64_t)stack_pa(buf), count, off);
 		if (rc == OPAL_BUSY_EVENT) {
 			if (in_interrupt() || irqs_disabled())
 				mdelay(OPAL_BUSY_DELAY_MS);
diff --git a/arch/powerpc/platforms/powernv/opal-power.c b/arch/powerpc/platforms/powernv/opal-power.c
index db99ffcb7b82..6927bcd3630e 100644
--- a/arch/powerpc/platforms/powernv/opal-power.c
+++ b/arch/powerpc/platforms/powernv/opal-power.c
@@ -31,7 +31,7 @@ static bool detect_epow(void)
 	* to OPAL. OPAL returns EPOW info along with classes present.
 	*/
 	epow_classes = cpu_to_be16(OPAL_SYSEPOW_MAX);
-	rc = opal_get_epow_status(opal_epow_status, &epow_classes);
+	rc = opal_get_epow_status(stack_pa(opal_epow_status), stack_pa(&epow_classes));
 	if (rc != OPAL_SUCCESS) {
 		pr_err("Failed to get EPOW event information\n");
 		return false;
@@ -59,7 +59,7 @@ static bool __init poweroff_pending(void)
 	__be64 opal_dpo_timeout;
 
 	/* Check for DPO event */
-	rc = opal_get_dpo_status(&opal_dpo_timeout);
+	rc = opal_get_dpo_status(stack_pa(&opal_dpo_timeout));
 	if (rc == OPAL_SUCCESS) {
 		pr_info("Existing DPO event detected.\n");
 		return true;
diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c b/arch/powerpc/platforms/powernv/opal-powercap.c
index 7bfe4cbeb35a..63e0e4427aea 100644
--- a/arch/powerpc/platforms/powernv/opal-powercap.c
+++ b/arch/powerpc/platforms/powernv/opal-powercap.c
@@ -46,7 +46,7 @@ static ssize_t powercap_show(struct kobject *kobj, struct kobj_attribute *attr,
 	if (ret)
 		goto out_token;
 
-	ret = opal_get_powercap(pcap_attr->handle, token, (u32 *)__pa(&pcap));
+	ret = opal_get_powercap(pcap_attr->handle, token, (u32 *)stack_pa(&pcap));
 	switch (ret) {
 	case OPAL_ASYNC_COMPLETION:
 		ret = opal_async_wait_response(token, &msg);
diff --git a/arch/powerpc/platforms/powernv/opal-prd.c b/arch/powerpc/platforms/powernv/opal-prd.c
index 113bdb151f68..649e8510ec00 100644
--- a/arch/powerpc/platforms/powernv/opal-prd.c
+++ b/arch/powerpc/platforms/powernv/opal-prd.c
@@ -234,7 +234,7 @@ static ssize_t opal_prd_write(struct file *file, const char __user *buf,
 	if (IS_ERR(msg))
 		return PTR_ERR(msg);
 
-	rc = opal_prd_msg(msg);
+	rc = opal_prd_msg(stack_pa(msg));
 	if (rc) {
 		pr_warn("write: opal_prd_msg returned %d\n", rc);
 		size = -EIO;
@@ -252,7 +252,7 @@ static int opal_prd_release(struct inode *inode, struct file *file)
 	msg.size = cpu_to_be16(sizeof(msg));
 	msg.type = OPAL_PRD_MSG_TYPE_FINI;
 
-	opal_prd_msg((struct opal_prd_msg *)&msg);
+	opal_prd_msg((struct opal_prd_msg *)stack_pa(&msg));
 
 	atomic_xchg(&prd_usage, 0);
 
@@ -281,7 +281,7 @@ static long opal_prd_ioctl(struct file *file, unsigned int cmd,
 			return -EFAULT;
 
 		scom.rc = opal_xscom_read(scom.chip, scom.addr,
-				(__be64 *)&scom.data);
+					  (__be64 *)stack_pa(&scom.data));
 		scom.data = be64_to_cpu(scom.data);
 		pr_devel("ioctl SCOM_READ: chip %llx addr %016llx data %016llx rc %lld\n",
 				scom.chip, scom.addr, scom.data, scom.rc);
diff --git a/arch/powerpc/platforms/powernv/opal-psr.c b/arch/powerpc/platforms/powernv/opal-psr.c
index 6441e17b6996..c37257b1ffe4 100644
--- a/arch/powerpc/platforms/powernv/opal-psr.c
+++ b/arch/powerpc/platforms/powernv/opal-psr.c
@@ -40,7 +40,7 @@ static ssize_t psr_show(struct kobject *kobj, struct kobj_attribute *attr,
 		goto out_token;
 
 	ret = opal_get_power_shift_ratio(psr_attr->handle, token,
-					    (u32 *)__pa(&psr));
+					    (u32 *)stack_pa(&psr));
 	switch (ret) {
 	case OPAL_ASYNC_COMPLETION:
 		ret = opal_async_wait_response(token, &msg);
diff --git a/arch/powerpc/platforms/powernv/opal-rtc.c b/arch/powerpc/platforms/powernv/opal-rtc.c
index a9bcf9217e64..891651455066 100644
--- a/arch/powerpc/platforms/powernv/opal-rtc.c
+++ b/arch/powerpc/platforms/powernv/opal-rtc.c
@@ -43,7 +43,7 @@ time64_t __init opal_get_boot_time(void)
 		return 0;
 
 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
-		rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
+		rc = opal_rtc_read(stack_pa(&__y_m_d), stack_pa(&__h_m_s_ms));
 		if (rc == OPAL_BUSY_EVENT) {
 			mdelay(OPAL_BUSY_DELAY_MS);
 			opal_poll_events(NULL);
diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c b/arch/powerpc/platforms/powernv/opal-secvar.c
index 14133e120bdd..a44df58d565d 100644
--- a/arch/powerpc/platforms/powernv/opal-secvar.c
+++ b/arch/powerpc/platforms/powernv/opal-secvar.c
@@ -64,7 +64,8 @@ static int opal_get_variable(const char *key, uint64_t ksize,
 
 	*dsize = cpu_to_be64(*dsize);
 
-	rc = opal_secvar_get(key, ksize, data, dsize);
+	rc = opal_secvar_get(stack_pa(key), ksize, stack_pa(data),
+			     stack_pa(dsize));
 
 	*dsize = be64_to_cpu(*dsize);
 
@@ -81,7 +82,8 @@ static int opal_get_next_variable(const char *key, uint64_t *keylen,
 
 	*keylen = cpu_to_be64(*keylen);
 
-	rc = opal_secvar_get_next(key, keylen, keybufsize);
+	rc = opal_secvar_get_next(stack_pa(key), stack_pa(keylen),
+				  keybufsize);
 
 	*keylen = be64_to_cpu(*keylen);
 
@@ -96,7 +98,8 @@ static int opal_set_variable(const char *key, uint64_t ksize, u8 *data,
 	if (!key || !data)
 		return -EINVAL;
 
-	rc = opal_secvar_enqueue_update(key, ksize, data, dsize);
+	rc = opal_secvar_enqueue_update(stack_pa(key), ksize, stack_pa(data),
+					dsize);
 
 	return opal_status_to_err(rc);
 }
diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c b/arch/powerpc/platforms/powernv/opal-sensor.c
index 3192c614a1e1..ff5f78bb419b 100644
--- a/arch/powerpc/platforms/powernv/opal-sensor.c
+++ b/arch/powerpc/platforms/powernv/opal-sensor.c
@@ -25,7 +25,7 @@ int opal_get_sensor_data(u32 sensor_hndl, u32 *sensor_data)
 	if (token < 0)
 		return token;
 
-	ret = opal_sensor_read(sensor_hndl, token, &data);
+	ret = opal_sensor_read(sensor_hndl, token, stack_pa(&data));
 	switch (ret) {
 	case OPAL_ASYNC_COMPLETION:
 		ret = opal_async_wait_response(token, &msg);
@@ -78,7 +78,7 @@ int opal_get_sensor_data_u64(u32 sensor_hndl, u64 *sensor_data)
 	if (token < 0)
 		return token;
 
-	ret = opal_sensor_read_u64(sensor_hndl, token, &data);
+	ret = opal_sensor_read_u64(sensor_hndl, token, stack_pa(&data));
 	switch (ret) {
 	case OPAL_ASYNC_COMPLETION:
 		ret = opal_async_wait_response(token, &msg);
diff --git a/arch/powerpc/platforms/powernv/opal-sysparam.c b/arch/powerpc/platforms/powernv/opal-sysparam.c
index a12312afe4ef..3882b31a9e61 100644
--- a/arch/powerpc/platforms/powernv/opal-sysparam.c
+++ b/arch/powerpc/platforms/powernv/opal-sysparam.c
@@ -41,7 +41,7 @@ static ssize_t opal_get_sys_param(u32 param_id, u32 length, void *buffer)
 		goto out;
 	}
 
-	ret = opal_get_param(token, param_id, (u64)buffer, length);
+	ret = opal_get_param(token, param_id, (u64)stack_pa(buffer), length);
 	if (ret != OPAL_ASYNC_COMPLETION) {
 		ret = opal_error_code(ret);
 		goto out_token;
@@ -76,7 +76,7 @@ static int opal_set_sys_param(u32 param_id, u32 length, void *buffer)
 		goto out;
 	}
 
-	ret = opal_set_param(token, param_id, (u64)buffer, length);
+	ret = opal_set_param(token, param_id, (u64)stack_pa(buffer), length);
 
 	if (ret != OPAL_ASYNC_COMPLETION) {
 		ret = opal_error_code(ret);
diff --git a/arch/powerpc/platforms/powernv/opal-xscom.c b/arch/powerpc/platforms/powernv/opal-xscom.c
index 6b4eed2ef4fa..b318e2ef4ba2 100644
--- a/arch/powerpc/platforms/powernv/opal-xscom.c
+++ b/arch/powerpc/platforms/powernv/opal-xscom.c
@@ -58,7 +58,7 @@ static int opal_scom_read(uint32_t chip, uint64_t addr, u64 reg, u64 *value)
 	__be64 v;
 
 	reg = opal_scom_unmangle(addr + reg);
-	rc = opal_xscom_read(chip, reg, (__be64 *)__pa(&v));
+	rc = opal_xscom_read(chip, reg, (__be64 *)stack_pa(&v));
 	if (rc) {
 		*value = 0xfffffffffffffffful;
 		return -EIO;
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index cdf3838f08d3..ada336d77e64 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -357,7 +357,7 @@ static void opal_handle_message(void)
 	s64 ret;
 	u32 type;
 
-	ret = opal_get_msg(__pa(opal_msg), opal_msg_size);
+	ret = opal_get_msg((uint64_t)stack_pa(opal_msg), opal_msg_size);
 	/* No opal message pending. */
 	if (ret == OPAL_RESOURCE)
 		return;
@@ -431,11 +431,11 @@ int opal_get_chars(uint32_t vtermno, char *buf, int count)
 
 	if (!opal.entry)
 		return -ENODEV;
-	opal_poll_events(&evt);
+	opal_poll_events(stack_pa(&evt));
 	if ((be64_to_cpu(evt) & OPAL_EVENT_CONSOLE_INPUT) == 0)
 		return 0;
 	len = cpu_to_be64(count);
-	rc = opal_console_read(vtermno, &len, buf);
+	rc = opal_console_read(vtermno, stack_pa(&len), stack_pa(buf));
 	if (rc == OPAL_SUCCESS)
 		return be64_to_cpu(len);
 	return 0;
@@ -453,7 +453,7 @@ static int __opal_put_chars(uint32_t vtermno, const char *data, int total_len, b
 
 	if (atomic)
 		spin_lock_irqsave(&opal_write_lock, flags);
-	rc = opal_console_write_buffer_space(vtermno, &olen);
+	rc = opal_console_write_buffer_space(vtermno, stack_pa(&olen));
 	if (rc || be64_to_cpu(olen) < total_len) {
 		/* Closed -> drop characters */
 		if (rc)
@@ -465,7 +465,7 @@ static int __opal_put_chars(uint32_t vtermno, const char *data, int total_len, b
 
 	/* Should not get a partial write here because space is available. */
 	olen = cpu_to_be64(total_len);
-	rc = opal_console_write(vtermno, &olen, data);
+	rc = opal_console_write(vtermno, stack_pa(&olen), stack_pa((void *)data));
 	if (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
 		if (rc == OPAL_BUSY_EVENT)
 			opal_poll_events(NULL);
@@ -527,7 +527,7 @@ static s64 __opal_flush_console(uint32_t vtermno)
 		 */
 		WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
 
-		opal_poll_events(&evt);
+		opal_poll_events(stack_pa(&evt));
 		if (!(be64_to_cpu(evt) & OPAL_EVENT_CONSOLE_OUTPUT))
 			return OPAL_SUCCESS;
 		return OPAL_BUSY;
@@ -647,7 +647,7 @@ void __noreturn pnv_platform_error_reboot(struct pt_regs *regs, const char *msg)
 	 * Don't bother to shut things down because this will
 	 * xstop the system.
 	 */
-	if (opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR, msg)
+	if (opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR, stack_pa((void *)msg))
 						== OPAL_UNSUPPORTED) {
 		pr_emerg("Reboot type %d not supported for %s\n",
 				OPAL_REBOOT_PLATFORM_ERROR, msg);
@@ -720,7 +720,7 @@ int opal_hmi_exception_early2(struct pt_regs *regs)
 	 * Check 64-bit flag mask to find out if an event was generated,
 	 * and whether TB is still valid or not etc.
 	 */
-	rc = opal_handle_hmi2(&out_flags);
+	rc = opal_handle_hmi2(stack_pa(&out_flags));
 	if (rc != OPAL_SUCCESS)
 		return 0;
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 5c144c05cbfd..4d85e8253f94 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -628,7 +628,8 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
 
 	/* Check the master PE */
 	rc = opal_pci_eeh_freeze_status(phb->opal_id, pe_no,
-					&state, &pcierr, NULL);
+					stack_pa(&state),
+					stack_pa(&pcierr), NULL);
 	if (rc != OPAL_SUCCESS) {
 		pr_warn("%s: Failure %lld getting "
 			"PHB#%x-PE#%x state\n",
@@ -644,8 +645,8 @@ static int pnv_ioda_get_pe_state(struct pnv_phb *phb, int pe_no)
 	list_for_each_entry(slave, &pe->slaves, list) {
 		rc = opal_pci_eeh_freeze_status(phb->opal_id,
 						slave->pe_number,
-						&fstate,
-						&pcierr,
+						stack_pa(&fstate),
+						stack_pa(&pcierr),
 						NULL);
 		if (rc != OPAL_SUCCESS) {
 			pr_warn("%s: Failure %lld getting "
@@ -2061,7 +2062,7 @@ static int __pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 		__be64 addr64;
 
 		rc = opal_get_msi_64(phb->opal_id, pe->mve_number, xive_num, 1,
-				     &addr64, &data);
+				     stack_pa(&addr64), stack_pa(&data));
 		if (rc) {
 			pr_warn("%s: OPAL error %d getting 64-bit MSI data\n",
 				pci_name(dev), rc);
@@ -2073,7 +2074,7 @@ static int __pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev,
 		__be32 addr32;
 
 		rc = opal_get_msi_32(phb->opal_id, pe->mve_number, xive_num, 1,
-				     &addr32, &data);
+				     stack_pa(&addr32), stack_pa(&data));
 		if (rc) {
 			pr_warn("%s: OPAL error %d getting 32-bit MSI data\n",
 				pci_name(dev), rc);
@@ -2415,7 +2416,8 @@ static int pnv_pci_diag_data_set(void *data, u64 val)
 	s64 ret;
 
 	/* Retrieve the diag data from firmware */
-	ret = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag_data,
+	ret = opal_pci_get_phb_diag_data2(phb->opal_id,
+					  stack_pa(phb->diag_data),
 					  phb->diag_data_size);
 	if (ret != OPAL_SUCCESS)
 		return -EIO;
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 233a50e65fce..0c21b5aa24f5 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -84,7 +84,7 @@ int pnv_pci_get_device_tree(uint32_t phandle, void *buf, uint64_t len)
 	if (!opal_check_token(OPAL_GET_DEVICE_TREE))
 		return -ENXIO;
 
-	rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
+	rc = opal_get_device_tree(phandle, (uint64_t)stack_pa(buf), len);
 	if (rc < OPAL_SUCCESS)
 		return -EIO;
 
@@ -99,7 +99,7 @@ int pnv_pci_get_presence_state(uint64_t id, uint8_t *state)
 	if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATE))
 		return -ENXIO;
 
-	rc = opal_pci_get_presence_state(id, (uint64_t)state);
+	rc = opal_pci_get_presence_state(id, (uint64_t)stack_pa(state));
 	if (rc != OPAL_SUCCESS)
 		return -EIO;
 
@@ -114,7 +114,7 @@ int pnv_pci_get_power_state(uint64_t id, uint8_t *state)
 	if (!opal_check_token(OPAL_PCI_GET_POWER_STATE))
 		return -ENXIO;
 
-	rc = opal_pci_get_power_state(id, (uint64_t)state);
+	rc = opal_pci_get_power_state(id, (uint64_t)stack_pa(state));
 	if (rc != OPAL_SUCCESS)
 		return -EIO;
 
@@ -135,7 +135,7 @@ int pnv_pci_set_power_state(uint64_t id, uint8_t state, struct opal_msg *msg)
 	if (unlikely(token < 0))
 		return token;
 
-	rc = opal_pci_set_power_state(token, id, (uint64_t)&state);
+	rc = opal_pci_set_power_state(token, id, (uint64_t)stack_pa(&state));
 	if (rc == OPAL_SUCCESS) {
 		ret = 0;
 		goto exit;
@@ -493,7 +493,8 @@ static void pnv_pci_handle_eeh_config(struct pnv_phb *phb, u32 pe_no)
 	spin_lock_irqsave(&phb->lock, flags);
 
 	/* Fetch PHB diag-data */
-	rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag_data,
+	rc = opal_pci_get_phb_diag_data2(phb->opal_id,
+					 stack_pa(phb->diag_data),
 					 phb->diag_data_size);
 	has_diag = (rc == OPAL_SUCCESS);
 
@@ -554,8 +555,8 @@ static void pnv_pci_config_check_eeh(struct pci_dn *pdn)
 	} else {
 		rc = opal_pci_eeh_freeze_status(phb->opal_id,
 						pe_no,
-						&fstate,
-						&pcierr,
+						stack_pa(&fstate),
+						stack_pa(&pcierr),
 						NULL);
 		if (rc) {
 			pr_warn("%s: Failure %lld getting PHB#%x-PE#%x state\n",
@@ -592,20 +593,22 @@ int pnv_pci_cfg_read(struct pci_dn *pdn,
 	switch (size) {
 	case 1: {
 		u8 v8;
-		rc = opal_pci_config_read_byte(phb->opal_id, bdfn, where, &v8);
+		rc = opal_pci_config_read_byte(phb->opal_id, bdfn, where,
+					       stack_pa(&v8));
 		*val = (rc == OPAL_SUCCESS) ? v8 : 0xff;
 		break;
 	}
 	case 2: {
 		__be16 v16;
 		rc = opal_pci_config_read_half_word(phb->opal_id, bdfn, where,
-						   &v16);
+						    stack_pa(&v16));
 		*val = (rc == OPAL_SUCCESS) ? be16_to_cpu(v16) : 0xffff;
 		break;
 	}
 	case 4: {
 		__be32 v32;
-		rc = opal_pci_config_read_word(phb->opal_id, bdfn, where, &v32);
+		rc = opal_pci_config_read_word(phb->opal_id, bdfn, where,
+					       stack_pa(&v32));
 		*val = (rc == OPAL_SUCCESS) ? be32_to_cpu(v32) : 0xffffffff;
 		break;
 	}
@@ -765,7 +768,7 @@ int pnv_pci_set_tunnel_bar(struct pci_dev *dev, u64 addr, int enable)
 		return -ENXIO;
 
 	mutex_lock(&tunnel_mutex);
-	rc = opal_pci_get_pbcq_tunnel_bar(phb->opal_id, &val);
+	rc = opal_pci_get_pbcq_tunnel_bar(phb->opal_id, stack_pa(&val));
 	if (rc != OPAL_SUCCESS) {
 		rc = -EIO;
 		goto out;
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 61ab2d38ff4b..aae6ad04c65f 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -407,7 +407,7 @@ static void pnv_kexec_wait_secondaries_down(void)
 
 		for (;;) {
 			rc = opal_query_cpu_status(get_hard_smp_processor_id(i),
-						   &status);
+						   stack_pa(&status));
 			if (rc != OPAL_SUCCESS || status != OPAL_THREAD_STARTED)
 				break;
 			barrier();
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 9e1a25398f98..2f70e0bf9873 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -86,7 +86,7 @@ static int pnv_smp_kick_cpu(int nr)
 	 * first time. OPAL v3 allows us to query OPAL to know if it
 	 * has the CPUs, so we do that
 	 */
-	rc = opal_query_cpu_status(pcpu, &status);
+	rc = opal_query_cpu_status(pcpu, stack_pa(&status));
 	if (rc != OPAL_SUCCESS) {
 		pr_warn("OPAL Error %ld querying CPU %d state\n", rc, nr);
 		return -ENODEV;
diff --git a/arch/powerpc/sysdev/xics/icp-opal.c b/arch/powerpc/sysdev/xics/icp-opal.c
index 4dae624b9f2f..a79b98349a1e 100644
--- a/arch/powerpc/sysdev/xics/icp-opal.c
+++ b/arch/powerpc/sysdev/xics/icp-opal.c
@@ -53,7 +53,7 @@ static unsigned int icp_opal_get_xirr(void)
 		return kvm_xirr;
 
 	/* Then ask OPAL */
-	rc = opal_int_get_xirr(&hw_xirr, false);
+	rc = opal_int_get_xirr(stack_pa(&hw_xirr), false);
 	if (rc < 0)
 		return 0;
 	return be32_to_cpu(hw_xirr);
diff --git a/arch/powerpc/sysdev/xics/ics-opal.c b/arch/powerpc/sysdev/xics/ics-opal.c
index 6cfbb4fac7fb..5bf54470b35d 100644
--- a/arch/powerpc/sysdev/xics/ics-opal.c
+++ b/arch/powerpc/sysdev/xics/ics-opal.c
@@ -105,7 +105,7 @@ static int ics_opal_set_affinity(struct irq_data *d,
 	if (hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS)
 		return -1;
 
-	rc = opal_get_xive(hw_irq, &oserver, &priority);
+	rc = opal_get_xive(hw_irq, stack_pa(&oserver), stack_pa(&priority));
 	if (rc != OPAL_SUCCESS) {
 		pr_err("%s: opal_get_xive(irq=%d [hw 0x%x]) error %lld\n",
 		       __func__, d->irq, hw_irq, rc);
@@ -160,7 +160,7 @@ static int ics_opal_check(struct ics *ics, unsigned int hw_irq)
 		return -EINVAL;
 
 	/* Check if HAL knows about this interrupt */
-	rc = opal_get_xive(hw_irq, &server, &priority);
+	rc = opal_get_xive(hw_irq, stack_pa(&server), stack_pa(&priority));
 	if (rc != OPAL_SUCCESS)
 		return -ENXIO;
 
@@ -174,7 +174,7 @@ static void ics_opal_mask_unknown(struct ics *ics, unsigned long vec)
 	int8_t priority;
 
 	/* Check if HAL knows about this interrupt */
-	rc = opal_get_xive(vec, &server, &priority);
+	rc = opal_get_xive(vec, stack_pa(&server), stack_pa(&priority));
 	if (rc != OPAL_SUCCESS)
 		return;
 
@@ -188,7 +188,7 @@ static long ics_opal_get_server(struct ics *ics, unsigned long vec)
 	int8_t priority;
 
 	/* Check if HAL knows about this interrupt */
-	rc = opal_get_xive(vec, &server, &priority);
+	rc = opal_get_xive(vec, stack_pa(&server), stack_pa(&priority));
 	if (rc != OPAL_SUCCESS)
 		return -1;
 	return ics_opal_unmangle_server(be16_to_cpu(server));
diff --git a/arch/powerpc/sysdev/xive/native.c b/arch/powerpc/sysdev/xive/native.c
index 3925825954bc..a2082ee866ca 100644
--- a/arch/powerpc/sysdev/xive/native.c
+++ b/arch/powerpc/sysdev/xive/native.c
@@ -52,8 +52,11 @@ int xive_native_populate_irq_data(u32 hw_irq, struct xive_irq_data *data)
 
 	memset(data, 0, sizeof(*data));
 
-	rc = opal_xive_get_irq_info(hw_irq, &flags, &eoi_page, &trig_page,
-				    &esb_shift, &src_chip);
+	rc = opal_xive_get_irq_info(hw_irq, stack_pa(&flags),
+				    stack_pa(&eoi_page),
+				    stack_pa(&trig_page),
+				    stack_pa(&esb_shift),
+				    stack_pa(&src_chip));
 	if (rc) {
 		pr_err("opal_xive_get_irq_info(0x%x) returned %lld\n",
 		       hw_irq, rc);
@@ -117,7 +120,8 @@ static int xive_native_get_irq_config(u32 hw_irq, u32 *target, u8 *prio,
 	__be64 vp;
 	__be32 lirq;
 
-	rc = opal_xive_get_irq_config(hw_irq, &vp, prio, &lirq);
+	rc = opal_xive_get_irq_config(hw_irq, stack_pa(&vp), stack_pa(prio),
+				      stack_pa(&lirq));
 
 	*target = be64_to_cpu(vp);
 	*sw_irq = be32_to_cpu(lirq);
@@ -150,8 +154,8 @@ int xive_native_configure_queue(u32 vp_id, struct xive_q *q, u8 prio,
 	q->toggle = 0;
 
 	rc = opal_xive_get_queue_info(vp_id, prio, NULL, NULL,
-				      &qeoi_page_be,
-				      &esc_irq_be,
+				      stack_pa(&qeoi_page_be),
+				      stack_pa(&esc_irq_be),
 				      NULL);
 	if (rc) {
 		vp_err(vp_id, "Failed to get queue %d info : %lld\n", prio, rc);
@@ -416,7 +420,8 @@ static void xive_native_setup_cpu(unsigned int cpu, struct xive_cpu *xc)
 	}
 
 	/* Grab it's CAM value */
-	rc = opal_xive_get_vp_info(vp, NULL, &vp_cam_be, NULL, NULL);
+	rc = opal_xive_get_vp_info(vp, NULL, stack_pa(&vp_cam_be), NULL,
+				   NULL);
 	if (rc) {
 		pr_err("Failed to get pool VP info CPU %d\n", cpu);
 		return;
@@ -756,7 +761,8 @@ int xive_native_get_vp_info(u32 vp_id, u32 *out_cam_id, u32 *out_chip_id)
 	__be32 vp_chip_id_be;
 	s64 rc;
 
-	rc = opal_xive_get_vp_info(vp_id, NULL, &vp_cam_be, NULL, &vp_chip_id_be);
+	rc = opal_xive_get_vp_info(vp_id, NULL, stack_pa(&vp_cam_be), NULL,
+				   stack_pa(&vp_chip_id_be));
 	if (rc) {
 		vp_err(vp_id, "Failed to get VP info : %lld\n", rc);
 		return -EIO;
@@ -794,8 +800,11 @@ int xive_native_get_queue_info(u32 vp_id, u32 prio,
 	__be64 qflags;
 	s64 rc;
 
-	rc = opal_xive_get_queue_info(vp_id, prio, &qpage, &qsize,
-				      &qeoi_page, &escalate_irq, &qflags);
+	rc = opal_xive_get_queue_info(vp_id, prio, stack_pa(&qpage),
+				      stack_pa(&qsize),
+				      stack_pa(&qeoi_page),
+				      stack_pa(&escalate_irq),
+				      stack_pa(&qflags));
 	if (rc) {
 		vp_err(vp_id, "failed to get queue %d info : %lld\n", prio, rc);
 		return -EIO;
@@ -822,8 +831,8 @@ int xive_native_get_queue_state(u32 vp_id, u32 prio, u32 *qtoggle, u32 *qindex)
 	__be32 opal_qindex;
 	s64 rc;
 
-	rc = opal_xive_get_queue_state(vp_id, prio, &opal_qtoggle,
-				       &opal_qindex);
+	rc = opal_xive_get_queue_state(vp_id, prio, stack_pa(&opal_qtoggle),
+				       stack_pa(&opal_qindex));
 	if (rc) {
 		vp_err(vp_id, "failed to get queue %d state : %lld\n", prio, rc);
 		return -EIO;
@@ -864,7 +873,7 @@ int xive_native_get_vp_state(u32 vp_id, u64 *out_state)
 	__be64 state;
 	s64 rc;
 
-	rc = opal_xive_get_vp_state(vp_id, &state);
+	rc = opal_xive_get_vp_state(vp_id, stack_pa(&state));
 	if (rc) {
 		vp_err(vp_id, "failed to get vp state : %lld\n", rc);
 		return -EIO;
diff --git a/drivers/char/ipmi/ipmi_powernv.c b/drivers/char/ipmi/ipmi_powernv.c
index da22a8cbe68e..55032e205e8e 100644
--- a/drivers/char/ipmi/ipmi_powernv.c
+++ b/drivers/char/ipmi/ipmi_powernv.c
@@ -91,7 +91,7 @@ static void ipmi_powernv_send(void *send_info, struct ipmi_smi_msg *msg)
 
 	pr_devel("%s: opal_ipmi_send(0x%llx, %p, %ld)\n", __func__,
 			smi->interface_id, opal_msg, size);
-	rc = opal_ipmi_send(smi->interface_id, opal_msg, size);
+	rc = opal_ipmi_send(smi->interface_id, stack_pa(opal_msg), size);
 	pr_devel("%s:  -> %d\n", __func__, rc);
 
 	if (!rc) {
@@ -132,8 +132,8 @@ static int ipmi_powernv_recv(struct ipmi_smi_powernv *smi)
 	size = cpu_to_be64(sizeof(*opal_msg) + IPMI_MAX_MSG_LENGTH);
 
 	rc = opal_ipmi_recv(smi->interface_id,
-			opal_msg,
-			&size);
+			    stack_pa(opal_msg),
+			    stack_pa(&size));
 	size = be64_to_cpu(size);
 	pr_devel("%s:   -> %d (size %lld)\n", __func__,
 			rc, rc == 0 ? size : 0);
diff --git a/drivers/char/powernv-op-panel.c b/drivers/char/powernv-op-panel.c
index 3c99696b145e..10588093e2e2 100644
--- a/drivers/char/powernv-op-panel.c
+++ b/drivers/char/powernv-op-panel.c
@@ -60,7 +60,7 @@ static int __op_panel_update_display(void)
 		return token;
 	}
 
-	rc = opal_write_oppanel_async(token, oppanel_lines, num_lines);
+	rc = opal_write_oppanel_async(token, stack_pa(oppanel_lines), num_lines);
 	switch (rc) {
 	case OPAL_ASYNC_COMPLETION:
 		rc = opal_async_wait_response(token, &msg);
diff --git a/drivers/i2c/busses/i2c-opal.c b/drivers/i2c/busses/i2c-opal.c
index 9f773b4f5ed8..d1d1fb3a55ba 100644
--- a/drivers/i2c/busses/i2c-opal.c
+++ b/drivers/i2c/busses/i2c-opal.c
@@ -49,7 +49,7 @@ static int i2c_opal_send_request(u32 bus_id, struct opal_i2c_request *req)
 		return token;
 	}
 
-	rc = opal_i2c_request(token, bus_id, req);
+	rc = opal_i2c_request(token, bus_id, stack_pa(req));
 	if (rc != OPAL_ASYNC_COMPLETION) {
 		rc = i2c_opal_translate_error(rc);
 		goto exit;
diff --git a/drivers/leds/leds-powernv.c b/drivers/leds/leds-powernv.c
index 743e2cdd0891..b65bfdf6fa18 100644
--- a/drivers/leds/leds-powernv.c
+++ b/drivers/leds/leds-powernv.c
@@ -99,7 +99,7 @@ static int powernv_led_set(struct powernv_led_data *powernv_led,
 	}
 
 	rc = opal_leds_set_ind(token, powernv_led->loc_code,
-			       led_mask, led_value, &max_type);
+			       led_mask, led_value, stack_pa(&max_type));
 	if (rc != OPAL_ASYNC_COMPLETION) {
 		dev_err(dev, "%s: OPAL set LED call failed for %s [rc=%d]\n",
 			__func__, powernv_led->loc_code, rc);
@@ -142,7 +142,9 @@ static enum led_brightness powernv_led_get(struct powernv_led_data *powernv_led)
 	max_type = powernv_led_common->max_led_type;
 
 	rc = opal_leds_get_ind(powernv_led->loc_code,
-			       &mask, &value, &max_type);
+			       stack_pa(&mask),
+			       stack_pa(&value),
+			       stack_pa(&max_type));
 	if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
 		dev_err(dev, "%s: OPAL get led call failed [rc=%d]\n",
 			__func__, rc);
diff --git a/drivers/mtd/devices/powernv_flash.c b/drivers/mtd/devices/powernv_flash.c
index 36e060386e59..a2d0e61d0afe 100644
--- a/drivers/mtd/devices/powernv_flash.c
+++ b/drivers/mtd/devices/powernv_flash.c
@@ -66,10 +66,10 @@ static int powernv_flash_async_op(struct mtd_info *mtd, enum flash_op op,
 
 	switch (op) {
 	case FLASH_OP_READ:
-		rc = opal_flash_read(info->id, offset, __pa(buf), len, token);
+		rc = opal_flash_read(info->id, offset, (uint64_t)stack_pa(buf), len, token);
 		break;
 	case FLASH_OP_WRITE:
-		rc = opal_flash_write(info->id, offset, __pa(buf), len, token);
+		rc = opal_flash_write(info->id, offset, (uint64_t)stack_pa(buf), len, token);
 		break;
 	case FLASH_OP_ERASE:
 		rc = opal_flash_erase(info->id, offset, len, token);
diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
index ad41aaf8a17f..9e627fb7115a 100644
--- a/drivers/rtc/rtc-opal.c
+++ b/drivers/rtc/rtc-opal.c
@@ -53,7 +53,7 @@ static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
 	__be64 __h_m_s_ms;
 
 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
-		rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
+		rc = opal_rtc_read(stack_pa(&__y_m_d), stack_pa(&__h_m_s_ms));
 		if (rc == OPAL_BUSY_EVENT) {
 			msleep(OPAL_BUSY_DELAY_MS);
 			opal_poll_events(NULL);
@@ -127,7 +127,7 @@ static int opal_get_tpo_time(struct device *dev, struct rtc_wkalrm *alarm)
 		return token;
 	}
 
-	rc = opal_tpo_read(token, &__y_m_d, &__h_m);
+	rc = opal_tpo_read(token, stack_pa(&__y_m_d), stack_pa(&__h_m));
 	if (rc != OPAL_ASYNC_COMPLETION) {
 		rc = -EIO;
 		goto exit;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 5/6] powerpc/powernv/idle: Convert stack pointer to physical address
  2022-11-04 17:27 [RFC PATCH 0/6] VMAP_STACK support for book3s64 Andrew Donnellan
                   ` (3 preceding siblings ...)
  2022-11-04 17:27 ` [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args Andrew Donnellan
@ 2022-11-04 17:27 ` Andrew Donnellan
  2022-11-08 16:17   ` Christophe Leroy
  2022-11-04 17:27 ` [RFC PATCH 6/6] powerpc/64s: Enable CONFIG_VMAP_STACK Andrew Donnellan
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Donnellan @ 2022-11-04 17:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-hardening, cmr

When we go into idle, we must disable the MMU. Currently, we can still
access the stack once the MMU is disabled, because the stack is in the
linear map.

Once we enable CONFIG_VMAP_STACK, the normal stack pointer will be in the
vmalloc area. To cope with this, manually convert the stack pointer to a
physical address using stack_pa() before going into idle, and restore the
original pointer on the way back out.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>

---

This currently doesn't boot on my POWER9. I'm also going to clean this up
to use the helpers from earlier in this series.
---
 arch/powerpc/platforms/powernv/idle.c | 47 +++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 841cb7f31f4f..6430fb488981 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -22,6 +22,7 @@
 #include <asm/smp.h>
 #include <asm/runlatch.h>
 #include <asm/dbell.h>
+#include <asm/reg.h>
 
 #include "powernv.h"
 #include "subcore.h"
@@ -509,6 +510,11 @@ static unsigned long power7_offline(void)
 {
 	unsigned long srr1;
 
+#ifdef CONFIG_VMAP_STACK
+	unsigned long ksp_ea = current_stack_pointer;
+	current_stack_pointer = (unsigned long)stack_pa((void *)ksp_ea);
+#endif
+
 	mtmsr(MSR_IDLE);
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -543,6 +549,9 @@ static unsigned long power7_offline(void)
 		srr1 = idle_kvm_start_guest(srr1);
 #endif
 
+#ifdef CONFIG_VMAP_STACK
+	current_stack_pointer = ksp_ea;
+#endif
 	mtmsr(MSR_KERNEL);
 
 	return srr1;
@@ -552,14 +561,24 @@ static unsigned long power7_offline(void)
 void power7_idle_type(unsigned long type)
 {
 	unsigned long srr1;
+#ifdef CONFIG_VMAP_STACK
+	unsigned long ksp_ea;
+#endif
 
 	if (!prep_irq_for_idle_irqsoff())
 		return;
 
+#ifdef CONFIG_VMAP_STACK
+	ksp_ea = current_stack_pointer;
+	current_stack_pointer = (unsigned long)stack_pa((void *)ksp_ea);
+#endif
 	mtmsr(MSR_IDLE);
 	__ppc64_runlatch_off();
 	srr1 = power7_idle_insn(type);
 	__ppc64_runlatch_on();
+#ifdef CONFIG_VMAP_STACK
+	current_stack_pointer = ksp_ea;
+#endif
 	mtmsr(MSR_KERNEL);
 
 	fini_irq_for_idle_irqsoff();
@@ -615,6 +634,9 @@ static unsigned long power9_idle_stop(unsigned long psscr)
 	unsigned long mmcra = 0;
 	struct p9_sprs sprs = {}; /* avoid false used-uninitialised */
 	bool sprs_saved = false;
+#ifdef CONFIG_VMAP_STACK
+	unsigned long ksp_ea;
+#endif
 
 	if (!(psscr & (PSSCR_EC|PSSCR_ESL))) {
 		/* EC=ESL=0 case */
@@ -633,7 +655,7 @@ static unsigned long power9_idle_stop(unsigned long psscr)
 		 */
 		BUG_ON((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS);
 
-		goto out;
+		goto out_noloss;
 	}
 
 	/* EC=ESL=1 case */
@@ -688,6 +710,10 @@ static unsigned long power9_idle_stop(unsigned long psscr)
 	sprs.iamr	= mfspr(SPRN_IAMR);
 	sprs.uamor	= mfspr(SPRN_UAMOR);
 
+#ifdef CONFIG_VMAP_STACK
+	ksp_ea = current_stack_pointer;
+	current_stack_pointer = (unsigned long)stack_pa((void *)ksp_ea);
+#endif
 	srr1 = isa300_idle_stop_mayloss(psscr);		/* go idle */
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -797,6 +823,10 @@ static unsigned long power9_idle_stop(unsigned long psscr)
 		__slb_restore_bolted_realmode();
 
 out:
+#ifdef CONFIG_VMAP_STACK
+	current_stack_pointer = ksp_ea;
+#endif
+out_noloss:
 	mtmsr(MSR_KERNEL);
 
 	return srr1;
@@ -898,6 +928,9 @@ static unsigned long power10_idle_stop(unsigned long psscr)
 	unsigned long pls;
 //	struct p10_sprs sprs = {}; /* avoid false used-uninitialised */
 	bool sprs_saved = false;
+#ifdef CONFIG_VMAP_STACK
+	unsigned long ksp_ea;
+#endif
 
 	if (!(psscr & (PSSCR_EC|PSSCR_ESL))) {
 		/* EC=ESL=0 case */
@@ -916,7 +949,7 @@ static unsigned long power10_idle_stop(unsigned long psscr)
 		 */
 		BUG_ON((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS);
 
-		goto out;
+		goto out_noloss;
 	}
 
 	/* EC=ESL=1 case */
@@ -927,7 +960,11 @@ static unsigned long power10_idle_stop(unsigned long psscr)
 
 		atomic_start_thread_idle();
 	}
-
+#ifdef CONFIG_VMAP_STACK
+	ksp_ea = current_stack_pointer;
+	current_stack_pointer = (unsigned long)stack_pa((void *)ksp_ea);
+#endif /* CONFIG_VMAP_STACK */
+	mtmsr(MSR_IDLE);
 	srr1 = isa300_idle_stop_mayloss(psscr);		/* go idle */
 
 	psscr = mfspr(SPRN_PSSCR);
@@ -982,6 +1019,10 @@ static unsigned long power10_idle_stop(unsigned long psscr)
 		__slb_restore_bolted_realmode();
 
 out:
+#ifdef CONFIG_VMAP_STACK
+	current_stack_pointer = ksp_ea;
+#endif /* CONFIG_VMAP_STACK */
+out_noloss:
 	mtmsr(MSR_KERNEL);
 
 	return srr1;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 6/6] powerpc/64s: Enable CONFIG_VMAP_STACK
  2022-11-04 17:27 [RFC PATCH 0/6] VMAP_STACK support for book3s64 Andrew Donnellan
                   ` (4 preceding siblings ...)
  2022-11-04 17:27 ` [RFC PATCH 5/6] powerpc/powernv/idle: Convert stack pointer to physical address Andrew Donnellan
@ 2022-11-04 17:27 ` Andrew Donnellan
  2022-11-05 17:07   ` Christophe Leroy
  5 siblings, 1 reply; 17+ messages in thread
From: Andrew Donnellan @ 2022-11-04 17:27 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-hardening, cmr

Enable CONFIG_VMAP_STACK for book3s64.

To do this, we need to make some slight adjustments to set the stack SLB
entry up for vmalloc rather than linear.

For now, only enable if KVM_BOOK3S_64_HV is disabled (there's some real mode
handlers we need to fix there).

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
---
 arch/powerpc/kernel/process.c          |  4 ++++
 arch/powerpc/mm/book3s64/slb.c         | 11 +++++++++--
 arch/powerpc/platforms/Kconfig.cputype |  1 +
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 07917726c629..cadf2db5a2a8 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1685,7 +1685,11 @@ static void setup_ksp_vsid(struct task_struct *p, unsigned long sp)
 {
 #ifdef CONFIG_PPC_64S_HASH_MMU
 	unsigned long sp_vsid;
+#ifdef CONFIG_VMAP_STACK
+	unsigned long llp = mmu_psize_defs[mmu_vmalloc_psize].sllp;
+#else /* CONFIG_VMAP_STACK */
 	unsigned long llp = mmu_psize_defs[mmu_linear_psize].sllp;
+#endif /* CONFIG_VMAP_STACK */
 
 	if (radix_enabled())
 		return;
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 6956f637a38c..0e21f0eaa7bb 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -541,7 +541,7 @@ void slb_set_size(u16 size)
 void slb_initialize(void)
 {
 	unsigned long linear_llp, vmalloc_llp, io_llp;
-	unsigned long lflags;
+	unsigned long lflags, kstack_flags;
 	static int slb_encoding_inited;
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
 	unsigned long vmemmap_llp;
@@ -582,11 +582,18 @@ void slb_initialize(void)
 	 * get_paca()->kstack hasn't been initialized yet.
 	 * For secondary cpus, we need to bolt the kernel stack entry now.
 	 */
+
+#ifdef CONFIG_VMAP_STACK
+	kstack_flags = SLB_VSID_KERNEL | vmalloc_llp;
+#else
+	kstack_flags = SLB_VSID_KERNEL | linear_llp;
+#endif
 	slb_shadow_clear(KSTACK_INDEX);
 	if (raw_smp_processor_id() != boot_cpuid &&
 	    (get_paca()->kstack & slb_esid_mask(mmu_kernel_ssize)) > PAGE_OFFSET)
 		create_shadowed_slbe(get_paca()->kstack,
-				     mmu_kernel_ssize, lflags, KSTACK_INDEX);
+				     mmu_kernel_ssize, kstack_flags,
+				     KSTACK_INDEX);
 
 	asm volatile("isync":::"memory");
 }
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 0c4eed9aea80..998317257797 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -104,6 +104,7 @@ config PPC_BOOK3S_64
 	select IRQ_WORK
 	select PPC_64S_HASH_MMU if !PPC_RADIX_MMU
 	select KASAN_VMALLOC if KASAN
+	select HAVE_ARCH_VMAP_STACK if KVM_BOOK3S_64_HV = n
 
 config PPC_BOOK3E_64
 	bool "Embedded processors"
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/6] powerpc/64s: Fix assembly to support larger values of THREAD_SIZE
  2022-11-04 17:27 ` [RFC PATCH 1/6] powerpc/64s: Fix assembly to support larger values of THREAD_SIZE Andrew Donnellan
@ 2022-11-04 17:51   ` Christophe Leroy
  2023-04-26  7:03     ` Andrew Donnellan
  0 siblings, 1 reply; 17+ messages in thread
From: Christophe Leroy @ 2022-11-04 17:51 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: linux-hardening, cmr



Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
> When CONFIG_VMAP_STACK is enabled, we set THREAD_SIZE to be at least the
> size of a page.
> 
> There's a few bits of assembly in the book3s64 code that use THREAD_SIZE in
> immediate mode instructions, which can only take an operand of up to 16
> bits signed, which isn't quite large enough.
> 
> Fix these spots to use a scratch register or use two immediate mode
> instructions instead, so we can later enable VMAP_STACK.
> 
> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/asm-compat.h   | 2 ++
>   arch/powerpc/kernel/entry_64.S          | 4 +++-
>   arch/powerpc/kernel/irq.c               | 8 ++++++--
>   arch/powerpc/kernel/misc_64.S           | 4 +++-
>   arch/powerpc/kvm/book3s_hv_rmhandlers.S | 3 ++-
>   5 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/asm-compat.h b/arch/powerpc/include/asm/asm-compat.h
> index 2bc53c646ccd..30dd7813bf3b 100644
> --- a/arch/powerpc/include/asm/asm-compat.h
> +++ b/arch/powerpc/include/asm/asm-compat.h
> @@ -11,6 +11,7 @@
>   #define PPC_LL		stringify_in_c(ld)
>   #define PPC_STL		stringify_in_c(std)
>   #define PPC_STLU	stringify_in_c(stdu)
> +#define PPC_STLUX	stringify_in_c(stdux)
>   #define PPC_LCMPI	stringify_in_c(cmpdi)
>   #define PPC_LCMPLI	stringify_in_c(cmpldi)
>   #define PPC_LCMP	stringify_in_c(cmpd)
> @@ -45,6 +46,7 @@
>   #define PPC_LL		stringify_in_c(lwz)
>   #define PPC_STL		stringify_in_c(stw)
>   #define PPC_STLU	stringify_in_c(stwu)
> +#define PPC_STLUX	stringify_in_c(stwux)
>   #define PPC_LCMPI	stringify_in_c(cmpwi)
>   #define PPC_LCMPLI	stringify_in_c(cmplwi)
>   #define PPC_LCMP	stringify_in_c(cmpw)
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 3e2e37e6ecab..af25db6e0205 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -238,7 +238,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>   	/* Note: this uses SWITCH_FRAME_SIZE rather than INT_FRAME_SIZE
>   	   because we don't need to leave the 288-byte ABI gap at the
>   	   top of the kernel stack. */
> -	addi	r7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE
> +	li	r9,0
> +	ori	r9,r9,THREAD_SIZE-SWITCH_FRAME_SIZE
> +	add	r7,r7,r9

So you assume THREAD_SIZE is never more than 64k ? Is that a valid 
assumption ?

What about the below instead:

	addis	r7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE@ha
	addi	r7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE@l

>   
>   	/*
>   	 * PMU interrupts in radix may come in here. They will use r1, not
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 9ede61a5a469..098cf6adceec 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -204,7 +204,9 @@ static __always_inline void call_do_softirq(const void *sp)
>   {
>   	/* Temporarily switch r1 to sp, call __do_softirq() then restore r1. */
>   	asm volatile (
> -		 PPC_STLU "	%%r1, %[offset](%[sp])	;"
> +		"li		%%r0, 0			;"
> +		"ori		%%r0, %%r0, %[offset]	;"

Same, you assume offset to be max 64k, is that correct ?

What about
		lis		r0, offset@h
		ori		r0, r0, offset@l

> +		 PPC_STLUX "	%%r1, %[sp], %%r0	;"
>   		"mr		%%r1, %[sp]		;"
>   		"bl		%[callee]		;"
>   		 PPC_LL "	%%r1, 0(%%r1)		;"
> @@ -256,7 +258,9 @@ static __always_inline void call_do_irq(struct pt_regs *regs, void *sp)
>   
>   	/* Temporarily switch r1 to sp, call __do_irq() then restore r1. */
>   	asm volatile (
> -		 PPC_STLU "	%%r1, %[offset](%[sp])	;"
> +		"li		%%r0, 0			;"
> +		"ori		%%r0, %%r0, %[offset]	;"
> +		 PPC_STLUX "	%%r1, %[sp], %%r0	;"

Same

>   		"mr		%%r4, %%r1		;"
>   		"mr		%%r1, %[sp]		;"
>   		"bl		%[callee]		;"
> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index 36184cada00b..ff71b98500a3 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -384,7 +384,9 @@ _GLOBAL(kexec_sequence)
>   	std	r0,16(r1)
>   
>   	/* switch stacks to newstack -- &kexec_stack.stack */
> -	stdu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r3)
> +	li	r0,0
> +	ori	r0,r0,THREAD_SIZE-STACK_FRAME_OVERHEAD
> +	stdux	r1,r3,r0

Same

>   	mr	r1,r3
>   
>   	li	r0,0
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 37f50861dd98..d05e3d324f4d 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -2686,7 +2686,8 @@ kvmppc_bad_host_intr:
>   	mr	r9, r1
>   	std	r1, PACAR1(r13)
>   	ld	r1, PACAEMERGSP(r13)
> -	subi	r1, r1, THREAD_SIZE/2 + INT_FRAME_SIZE
> +	subi	r1, r1, THREAD_SIZE/2
> +	subi	r1, r1, INT_FRAME_SIZE

Same, what about

	subis	r1, r1, THREAD_SIZE/2 + INT_FRAME_SIZE@ha
	subi	r1, r1, THREAD_SIZE/2 + INT_FRAME_SIZE@l

>   	std	r9, 0(r1)
>   	std	r0, GPR0(r1)
>   	std	r9, GPR1(r1)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/6] powerpc/powernv: Keep MSR in register across OPAL entry/return path
  2022-11-04 17:27 ` [RFC PATCH 3/6] powerpc/powernv: Keep MSR in register across OPAL entry/return path Andrew Donnellan
@ 2022-11-04 18:00   ` Christophe Leroy
  0 siblings, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2022-11-04 18:00 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: linux-hardening, cmr



Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
> When we enter and return from an OPAL call, there's three pieces of state
> we have to save and restore: the stack pointer, the PACA pointer, and the
> MSR. However, there's only two registers that OPAL is guaranteed to
> preserve for us (r1 for the stack pointer and r13 for the PACA), so the MSR
> gets saved on the stack.
> 
> This becomes problematic when we enable VMAP_STACK, as we need to re-enable
> translation in order to access the virtually mapped stack... and to
> re-enable translation, we need to restore the MSR.

Do you need to restore MSR really ? Can't you just set MSR_DR to access 
the stack then restore MSR ? Or maybe you don't want to do it in two 
steps for performance reason ?

> 
> Keep the MSR in r13, and instead store the PACA pointer on the stack - we
> can restore the MSR first, then restore the PACA into r13.
> 
> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
> ---
>   .../powerpc/platforms/powernv/opal-wrappers.S | 43 +++++++++++--------
>   1 file changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
> index 0ed95f753416..d692869ee0ce 100644
> --- a/arch/powerpc/platforms/powernv/opal-wrappers.S
> +++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
> @@ -23,40 +23,49 @@
>   _GLOBAL_TOC(__opal_call)
>   	mflr	r0
>   	std	r0,PPC_LR_STKOFF(r1)
> -	ld	r12,STK_PARAM(R12)(r1)
> -	li	r0,MSR_IR|MSR_DR|MSR_LE
> -	andc	r12,r12,r0
>   	LOAD_REG_ADDR(r11, opal_return)
>   	mtlr	r11
>   	LOAD_REG_ADDR(r11, opal)
>   	ld	r2,0(r11)
>   	ld	r11,8(r11)
>   	mtspr	SPRN_HSRR0,r11
> -	mtspr	SPRN_HSRR1,r12
> +
>   	/* set token to r0 */
>   	ld	r0,STK_PARAM(R11)(r1)
> +
> +	/*
> +	 * We need to keep the MSR value in a register that is preserved by
> +	 * OPAL, so that we don't need to access the stack before we restore
> +	 * the MSR, as the stack may be vmalloced and thus require MMU.
> +	 *
> +	 * Move the PACA from R13 into the stack red zone, and put MSR in R13.
> +	 */
> +	std	r13,-8(r1)
> +	ld	r13,STK_PARAM(R12)(r1)
> +
> +	/* Switch off MMU, LE */
> +	li	r11,MSR_IR|MSR_DR|MSR_LE
> +	andc	r11,r13,r11
> +
> +	mtspr	SPRN_HSRR1,r11
>   	hrfid
>   opal_return:
>   	/*
>   	 * Restore MSR on OPAL return. The MSR is set to big-endian.
>   	 */
>   #ifdef __BIG_ENDIAN__
> -	ld	r11,STK_PARAM(R12)(r1)
> -	mtmsrd	r11
> +	mtmsrd	r13
>   #else
>   	/* Endian can only be switched with rfi, must byte reverse MSR load */
> -	.short 0x4039	 /* li r10,STK_PARAM(R12)		*/
> -	.byte (STK_PARAM(R12) >> 8) & 0xff
> -	.byte STK_PARAM(R12) & 0xff
> -
> -	.long 0x280c6a7d /* ldbrx r11,r10,r1			*/
> -	.long 0x05009f42 /* bcl 20,31,$+4			*/
> -	.long 0xa602487d /* mflr r10				*/
> -	.long 0x14004a39 /* addi r10,r10,20			*/
> -	.long 0xa64b5a7d /* mthsrr0 r10				*/
> -	.long 0xa64b7b7d /* mthsrr1 r11				*/
> -	.long 0x2402004c /* hrfid				*/
> +	.long 0x05009f42 /* bcl 20,31,$+4   (LR <- next insn addr)	*/
> +	.long 0xa602487d /* mflr r10					*/
> +	.long 0x14004a39 /* addi r10,r10,20 (r10 <- addr after #endif)	*/
> +	.long 0xa64b5a7d /* mthsrr0 r10	    (new NIP)			*/
> +	.long 0xa64bbb7d /* mthsrr1 r13	    (new MSR)			*/
> +	.long 0x2402004c /* hrfid					*/
>   #endif
> +	/* Restore PACA */
> +	ld	r13,-8(r1)
>   	LOAD_PACA_TOC()
>   	ld	r0,PPC_LR_STKOFF(r1)
>   	mtlr	r0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/6] powerpc/64s: Helpers to switch between linear and vmapped stack pointers
  2022-11-04 17:27 ` [RFC PATCH 2/6] powerpc/64s: Helpers to switch between linear and vmapped stack pointers Andrew Donnellan
@ 2022-11-05  8:00   ` Christophe Leroy
  2022-11-05 19:28     ` Christophe Leroy
  2022-11-07 12:38     ` Nicholas Piggin
  0 siblings, 2 replies; 17+ messages in thread
From: Christophe Leroy @ 2022-11-05  8:00 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: linux-hardening, cmr



Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
> powerpc unfortunately has too many places where we run stuff in real mode.
> 
> With CONFIG_VMAP_STACK enabled, this means we need to be able to swap the
> stack pointer to use the linear mapping when we enter a real mode section,
> and back afterwards.
> 
> Store the top bits of the stack pointer in both the linear map and the
> vmalloc space in the PACA, and add some helper macros/functions to swap
> between them.

That may work when pagesize is 64k because stack is on a single page, 
but I doubt is works with 4k pages, because vmalloc may allocate non 
contiguous pages.

> 
> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
> 
> ---
> 
> Some of the helpers that are currently unused will be used in the next
> version of the series for the KVM real mode handling
> ---
>   arch/powerpc/include/asm/book3s/64/stack.h | 71 ++++++++++++++++++++++
>   arch/powerpc/include/asm/opal.h            |  1 +
>   arch/powerpc/include/asm/paca.h            |  4 ++
>   arch/powerpc/include/asm/processor.h       |  6 ++
>   arch/powerpc/kernel/asm-offsets.c          |  8 +++
>   arch/powerpc/kernel/entry_64.S             |  7 +++
>   arch/powerpc/kernel/process.c              |  4 ++
>   arch/powerpc/kernel/smp.c                  |  7 +++
>   arch/powerpc/xmon/xmon.c                   |  4 ++
>   9 files changed, 112 insertions(+)
>   create mode 100644 arch/powerpc/include/asm/book3s/64/stack.h
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/stack.h b/arch/powerpc/include/asm/book3s/64/stack.h
> new file mode 100644
> index 000000000000..6b31adb1a026
> --- /dev/null
> +++ b/arch/powerpc/include/asm/book3s/64/stack.h
> @@ -0,0 +1,71 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +// Helpers for VMAP_STACK on book3s64
> +// Copyright (C) 2022 IBM Corporation (Andrew Donnellan)
> +
> +#ifndef _ASM_POWERPC_BOOK3S_64_STACK_H
> +#define _ASM_POWERPC_BOOK3S_64_STACK_H
> +
> +#include <asm/thread_info.h>
> +
> +#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
> +
> +#ifdef __ASSEMBLY__
> +// Switch the current stack pointer in r1 between a linear map address and a
> +// vmalloc address. Used when we need to go in and out of real mode with
> +// CONFIG_VMAP_STACK enabled.
> +//
> +// tmp: scratch register that can be clobbered
> +
> +#define SWAP_STACK_LINEAR(tmp)			\
> +	ld	tmp, PACAKSTACK_LINEAR_BASE(r13);	\
> +	andi.	r1, r1, THREAD_SIZE - 1;		\

Do you assume THREAD_SIZE to never be more than 64k ?

> +	or	r1, r1, tmp;

You can probably do better with rldimi instruction.

> +#define SWAP_STACK_VMALLOC(tmp)			\
> +	ld	tmp, PACAKSTACK_VMALLOC_BASE(r13);	\
> +	andi.	r1, r1, THREAD_SIZE - 1;		\
> +	or	r1, r1, tmp;

Same

> +
> +#else // __ASSEMBLY__
> +
> +#include <asm/paca.h>
> +#include <asm/reg.h>
> +#include <linux/mm.h>
> +
> +#define stack_pa(ptr) (is_vmalloc_addr((ptr)) ? (void *)vmalloc_to_phys((void *)(ptr)) : (void *)ptr)
> +
> +static __always_inline void swap_stack_linear(void)
> +{
> +	current_stack_pointer = get_paca()->kstack_linear_base |	\
> +		(current_stack_pointer & (THREAD_SIZE - 1));

That looks hacky. I think you can't just change current_stack_pointer on 
the fly. You have to provide something similar to call_do_softirq() or 
call_do_irq()

> +}
> +
> +static __always_inline void swap_stack_vmalloc(void)
> +{
> +	current_stack_pointer = get_paca()->kstack_vmalloc_base |	\
> +		(current_stack_pointer & (THREAD_SIZE - 1));

Same

> +}
> +
> +#endif // __ASSEMBLY__
> +
> +#else // CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64
> +
> +#define SWAP_STACK_LINEAR(tmp)
> +#define SWAP_STACK_VMALLOC(tmp)
> +
> +static __always_inline void *stack_pa(void *ptr)
> +{
> +	return ptr;
> +}
> +
> +static __always_inline void swap_stack_linear(void)
> +{
> +}
> +
> +static __always_inline void swap_stack_vmalloc(void)
> +{
> +}
> +
> +#endif // CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64
> +
> +#endif // _ASM_POWERPC_BOOK3S_64_STACK_H
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 726125a534de..0360360ad2cf 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -13,6 +13,7 @@
>   #ifndef __ASSEMBLY__
>   
>   #include <linux/notifier.h>
> +#include <asm/book3s/64/stack.h>
>   
>   /* We calculate number of sg entries based on PAGE_SIZE */
>   #define SG_ENTRIES_PER_NODE ((PAGE_SIZE - 16) / sizeof(struct opal_sg_entry))
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index 09f1790d0ae1..51d060036fa1 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -163,6 +163,10 @@ struct paca_struct {
>   	 */
>   	struct task_struct *__current;	/* Pointer to current */
>   	u64 kstack;			/* Saved Kernel stack addr */
> +#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
> +	u64 kstack_vmalloc_base;	/* Base address of stack in the vmalloc mapping */
> +	u64 kstack_linear_base;		/* Base address of stack in the linear mapping */
> +#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
>   	u64 saved_r1;			/* r1 save for RTAS calls or PM or EE=0 */
>   	u64 saved_msr;			/* MSR saved here by enter_rtas */
>   #ifdef CONFIG_PPC64
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 631802999d59..999078452aa4 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -132,6 +132,12 @@ struct debug_reg {
>   
>   struct thread_struct {
>   	unsigned long	ksp;		/* Kernel stack pointer */
> +#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
> +	// Kernel stack base addresses in vmalloc and linear mappings
> +	// Used for swapping to linear map in real mode code
> +	unsigned long	ksp_vmalloc_base;
> +	unsigned long	ksp_linear_base;
> +#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
>   
>   #ifdef CONFIG_PPC64
>   	unsigned long	ksp_vsid;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 4ce2a4aa3985..46ace958d3ce 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -99,6 +99,10 @@ int main(void)
>   #endif
>   
>   	OFFSET(KSP, thread_struct, ksp);
> +#ifdef CONFIG_VMAP_STACK
> +	OFFSET(KSP_VMALLOC_BASE, thread_struct, ksp_vmalloc_base);
> +	OFFSET(KSP_LINEAR_BASE, thread_struct, ksp_linear_base);
> +#endif /* CONFIG_VMAP_STACK */
>   	OFFSET(PT_REGS, thread_struct, regs);
>   #ifdef CONFIG_BOOKE
>   	OFFSET(THREAD_NORMSAVES, thread_struct, normsave[0]);
> @@ -181,6 +185,10 @@ int main(void)
>   	OFFSET(PACAPACAINDEX, paca_struct, paca_index);
>   	OFFSET(PACAPROCSTART, paca_struct, cpu_start);
>   	OFFSET(PACAKSAVE, paca_struct, kstack);
> +#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
> +	OFFSET(PACAKSTACK_VMALLOC_BASE, paca_struct, kstack_vmalloc_base);
> +	OFFSET(PACAKSTACK_LINEAR_BASE, paca_struct, kstack_linear_base);
> +#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
>   	OFFSET(PACACURRENT, paca_struct, __current);
>   	DEFINE(PACA_THREAD_INFO, offsetof(struct paca_struct, __current) +
>   				 offsetof(struct task_struct, thread_info));
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index af25db6e0205..cd9e56b25934 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -253,6 +253,13 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>   	mr	r1,r8		/* start using new stack pointer */
>   	std	r7,PACAKSAVE(r13)
>   
> +#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
> +	ld	r8,KSP_LINEAR_BASE(r4)
> +	std	r8,PACAKSTACK_LINEAR_BASE(r13)
> +	ld	r8,KSP_VMALLOC_BASE(r4)
> +	std	r8,PACAKSTACK_VMALLOC_BASE(r13)

Do you only have r8 to play with ? Otherwise I'd suggest to perform the 
two ld then the two std. Or maybe that doesn't matter on ppc64.

> +#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
> +
>   	ld	r6,_CCR(r1)
>   	mtcrf	0xFF,r6
>   
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 67da147fe34d..07917726c629 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1782,6 +1782,10 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>   	kregs = (struct pt_regs *) sp;
>   	sp -= STACK_FRAME_OVERHEAD;
>   	p->thread.ksp = sp;
> +#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
> +	p->thread.ksp_vmalloc_base = sp & ~(THREAD_SIZE - 1);
> +	p->thread.ksp_linear_base = (u64)__va(vmalloc_to_pfn((void *)sp) << PAGE_SHIFT);

What about:

	page_to_virt(vmalloc_to_page((void *)sp))

But is that really the linear base you want, isn't it the phys address ? 
In that case you can do:

	page_to_phys(vmalloc_to_page((void *)sp))



> +#endif /* CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64 */
>   #ifdef CONFIG_HAVE_HW_BREAKPOINT
>   	for (i = 0; i < nr_wp_slots(); i++)
>   		p->thread.ptrace_bps[i] = NULL;
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 0da6e59161cd..466ccab5adb8 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -60,6 +60,7 @@
>   #include <asm/ftrace.h>
>   #include <asm/kup.h>
>   #include <asm/fadump.h>
> +#include <asm/book3s/64/stack.h>

Could we avoid including a book3s/64 header directly ? Could it come via 
a more generic one, maybe pgtable.h ?

As you can see with

	git grep "asm/book3s/64" arch/powerpc/

There are no direct inclusion of book3s64 headers in generic C files.


>   
>   #ifdef DEBUG
>   #include <asm/udbg.h>
> @@ -1250,6 +1251,12 @@ static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
>   	paca_ptrs[cpu]->__current = idle;
>   	paca_ptrs[cpu]->kstack = (unsigned long)task_stack_page(idle) +
>   				 THREAD_SIZE - STACK_FRAME_OVERHEAD;
> +#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
> +	paca_ptrs[cpu]->kstack_linear_base = is_vmalloc_addr((void *)paca_ptrs[cpu]->kstack) ?
> +		vmalloc_to_phys((void *)(paca_ptrs[cpu]->kstack)) :
> +		paca_ptrs[cpu]->kstack;
> +	paca_ptrs[cpu]->kstack_vmalloc_base = paca_ptrs[cpu]->kstack & (THREAD_SIZE - 1);
> +#endif // CONFIG_VMAP_STACK && CONFIG_PPC_BOOK3S_64
>   #endif
>   	task_thread_info(idle)->cpu = cpu;
>   	secondary_current = current_set[cpu] = idle;
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index f51c882bf902..236287c4a231 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -2697,6 +2697,10 @@ static void dump_one_paca(int cpu)
>   	DUMP(p, __current, "%-*px");
>   	DUMP(p, kstack, "%#-*llx");
>   	printf(" %-*s = 0x%016llx\n", 25, "kstack_base", p->kstack & ~(THREAD_SIZE - 1));
> +#if defined(CONFIG_VMAP_STACK) && defined(CONFIG_PPC_BOOK3S_64)
> +	DUMP(p, kstack_linear_base, "%#-*llx");
> +	DUMP(p, kstack_vmalloc_base, "%#-*llx");
> +#endif
>   #ifdef CONFIG_STACKPROTECTOR
>   	DUMP(p, canary, "%#-*lx");
>   #endif

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 6/6] powerpc/64s: Enable CONFIG_VMAP_STACK
  2022-11-04 17:27 ` [RFC PATCH 6/6] powerpc/64s: Enable CONFIG_VMAP_STACK Andrew Donnellan
@ 2022-11-05 17:07   ` Christophe Leroy
  0 siblings, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2022-11-05 17:07 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: linux-hardening, cmr



Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
> Enable CONFIG_VMAP_STACK for book3s64.
> 
> To do this, we need to make some slight adjustments to set the stack SLB
> entry up for vmalloc rather than linear.
> 
> For now, only enable if KVM_BOOK3S_64_HV is disabled (there's some real mode
> handlers we need to fix there).

There is one missing point : with VMAP_STACK, a stack overflow will 
generate a page fault. You have to handle it at interrupt entry, before 
going back to virtual mode, otherwise it will fault forever.

See how it is done in arch/powerpc/kernel/head_32.h, in macro 
EXCEPTION_PROLOG_1

> 
> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
> ---
>   arch/powerpc/kernel/process.c          |  4 ++++
>   arch/powerpc/mm/book3s64/slb.c         | 11 +++++++++--
>   arch/powerpc/platforms/Kconfig.cputype |  1 +
>   3 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 07917726c629..cadf2db5a2a8 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -1685,7 +1685,11 @@ static void setup_ksp_vsid(struct task_struct *p, unsigned long sp)
>   {
>   #ifdef CONFIG_PPC_64S_HASH_MMU
>   	unsigned long sp_vsid;
> +#ifdef CONFIG_VMAP_STACK
> +	unsigned long llp = mmu_psize_defs[mmu_vmalloc_psize].sllp;
> +#else /* CONFIG_VMAP_STACK */
>   	unsigned long llp = mmu_psize_defs[mmu_linear_psize].sllp;
> +#endif /* CONFIG_VMAP_STACK */

I think you could use IS_ENABLED() instead of an ifdef:

	unsigned long llp;

	if (IS_ENABLED(CONFIG_VMAP_STACK))
		llp = mmu_psize_defs[mmu_vmalloc_psize].sllp;
	else
		llp = mmu_psize_defs[mmu_linear_psize].sllp;

>   
>   	if (radix_enabled())
>   		return;
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index 6956f637a38c..0e21f0eaa7bb 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -541,7 +541,7 @@ void slb_set_size(u16 size)
>   void slb_initialize(void)
>   {
>   	unsigned long linear_llp, vmalloc_llp, io_llp;
> -	unsigned long lflags;
> +	unsigned long lflags, kstack_flags;
>   	static int slb_encoding_inited;
>   #ifdef CONFIG_SPARSEMEM_VMEMMAP
>   	unsigned long vmemmap_llp;
> @@ -582,11 +582,18 @@ void slb_initialize(void)
>   	 * get_paca()->kstack hasn't been initialized yet.
>   	 * For secondary cpus, we need to bolt the kernel stack entry now.
>   	 */
> +
> +#ifdef CONFIG_VMAP_STACK
> +	kstack_flags = SLB_VSID_KERNEL | vmalloc_llp;
> +#else
> +	kstack_flags = SLB_VSID_KERNEL | linear_llp;
> +#endif

Same, should be

	if (IS_ENABLED(CONFIG_VMAP_STACK))
		kstack_flags = SLB_VSID_KERNEL | vmalloc_llp;
	else
		kstack_flags = SLB_VSID_KERNEL | linear_llp;

>   	slb_shadow_clear(KSTACK_INDEX);
>   	if (raw_smp_processor_id() != boot_cpuid &&
>   	    (get_paca()->kstack & slb_esid_mask(mmu_kernel_ssize)) > PAGE_OFFSET)
>   		create_shadowed_slbe(get_paca()->kstack,
> -				     mmu_kernel_ssize, lflags, KSTACK_INDEX);
> +				     mmu_kernel_ssize, kstack_flags,
> +				     KSTACK_INDEX);
>   
>   	asm volatile("isync":::"memory");
>   }
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index 0c4eed9aea80..998317257797 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -104,6 +104,7 @@ config PPC_BOOK3S_64
>   	select IRQ_WORK
>   	select PPC_64S_HASH_MMU if !PPC_RADIX_MMU
>   	select KASAN_VMALLOC if KASAN
> +	select HAVE_ARCH_VMAP_STACK if KVM_BOOK3S_64_HV = n

Is it different from

	select HAVE_ARCH_VMAP_STACK if !KVM_BOOK3S_64_HV ?

>   
>   config PPC_BOOK3E_64
>   	bool "Embedded processors"

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/6] powerpc/64s: Helpers to switch between linear and vmapped stack pointers
  2022-11-05  8:00   ` Christophe Leroy
@ 2022-11-05 19:28     ` Christophe Leroy
  2022-11-07 12:38     ` Nicholas Piggin
  1 sibling, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2022-11-05 19:28 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: Nick Piggin, linux-hardening, cmr



Le 05/11/2022 à 09:00, Christophe Leroy a écrit :
> 
> 
> Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
>> powerpc unfortunately has too many places where we run stuff in real mode.
>>
>> With CONFIG_VMAP_STACK enabled, this means we need to be able to swap the
>> stack pointer to use the linear mapping when we enter a real mode section,
>> and back afterwards.
>>
>> Store the top bits of the stack pointer in both the linear map and the
>> vmalloc space in the PACA, and add some helper macros/functions to swap
>> between them.
> 
> That may work when pagesize is 64k because stack is on a single page,
> but I doubt is works with 4k pages, because vmalloc may allocate non
> contiguous pages.
> 

[snip]

> 
>> +
>> +#else // __ASSEMBLY__
>> +
>> +#include <asm/paca.h>
>> +#include <asm/reg.h>
>> +#include <linux/mm.h>
>> +
>> +#define stack_pa(ptr) (is_vmalloc_addr((ptr)) ? (void *)vmalloc_to_phys((void *)(ptr)) : (void *)ptr)
>> +
>> +static __always_inline void swap_stack_linear(void)
>> +{
>> +	current_stack_pointer = get_paca()->kstack_linear_base |	\
>> +		(current_stack_pointer & (THREAD_SIZE - 1));
> 
> That looks hacky. I think you can't just change current_stack_pointer on
> the fly. You have to provide something similar to call_do_softirq() or
> call_do_irq()
> 

Maybe you can have a look at Nic's RFC for calling functions in real 
mode : 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210212012041.392566-1-npiggin@gmail.com/

Christophe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args
  2022-11-04 17:27 ` [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args Andrew Donnellan
@ 2022-11-07  0:00   ` Russell Currey
  2022-11-08 16:21   ` Christophe Leroy
  1 sibling, 0 replies; 17+ messages in thread
From: Russell Currey @ 2022-11-07  0:00 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: linux-hardening, cmr

On Sat, 2022-11-05 at 04:27 +1100, Andrew Donnellan wrote:
> A number of OPAL calls take addresses as arguments (e.g. buffers with
> strings to print, etc). These addresses need to be physical
> addresses, as
> OPAL runs in real mode.
> 
> Since the hardware ignores the top two bits of the address in real
> mode,
> passing addresses in the kernel's linear map works fine even if we
> don't
> wrap them in __pa().
> 
> With VMAP_STACK, however, we're going to have to use
> vmalloc_to_phys() to
> convert addresses from the stack into an address that OPAL can use.
> 
> Introduce a new macro, stack_pa(), that uses vmalloc_to_phys() for
> addresses in the vmalloc area, and __pa() for linear map addresses.
> Add it
> to all the existing callsites where we pass pointers to OPAL.
> 
> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_builtin.c          |  2 +-
>  arch/powerpc/platforms/powernv/eeh-powernv.c  | 20 ++++++-----
>  arch/powerpc/platforms/powernv/ocxl.c         |  3 +-
>  arch/powerpc/platforms/powernv/opal-core.c    |  4 +--
>  arch/powerpc/platforms/powernv/opal-dump.c    |  6 ++--
>  arch/powerpc/platforms/powernv/opal-elog.c    | 10 +++---
>  arch/powerpc/platforms/powernv/opal-fadump.c  | 12 +++----
>  arch/powerpc/platforms/powernv/opal-flash.c   |  5 +--
>  arch/powerpc/platforms/powernv/opal-hmi.c     |  3 +-
>  arch/powerpc/platforms/powernv/opal-irqchip.c |  4 +--
>  arch/powerpc/platforms/powernv/opal-lpc.c     |  8 ++---
>  arch/powerpc/platforms/powernv/opal-nvram.c   |  4 +--
>  arch/powerpc/platforms/powernv/opal-power.c   |  4 +--
>  .../powerpc/platforms/powernv/opal-powercap.c |  2 +-
>  arch/powerpc/platforms/powernv/opal-prd.c     |  6 ++--
>  arch/powerpc/platforms/powernv/opal-psr.c     |  2 +-
>  arch/powerpc/platforms/powernv/opal-rtc.c     |  2 +-
>  arch/powerpc/platforms/powernv/opal-secvar.c  |  9 +++--
>  arch/powerpc/platforms/powernv/opal-sensor.c  |  4 +--
>  .../powerpc/platforms/powernv/opal-sysparam.c |  4 +--
>  arch/powerpc/platforms/powernv/opal-xscom.c   |  2 +-
>  arch/powerpc/platforms/powernv/opal.c         | 16 ++++-----
>  arch/powerpc/platforms/powernv/pci-ioda.c     | 14 ++++----
>  arch/powerpc/platforms/powernv/pci.c          | 25 +++++++-------
>  arch/powerpc/platforms/powernv/setup.c        |  2 +-
>  arch/powerpc/platforms/powernv/smp.c          |  2 +-
>  arch/powerpc/sysdev/xics/icp-opal.c           |  2 +-
>  arch/powerpc/sysdev/xics/ics-opal.c           |  8 ++---
>  arch/powerpc/sysdev/xive/native.c             | 33 ++++++++++++-----
> --
>  drivers/char/ipmi/ipmi_powernv.c              |  6 ++--
>  drivers/char/powernv-op-panel.c               |  2 +-
>  drivers/i2c/busses/i2c-opal.c                 |  2 +-
>  drivers/leds/leds-powernv.c                   |  6 ++--
>  drivers/mtd/devices/powernv_flash.c           |  4 +--
>  drivers/rtc/rtc-opal.c                        |  4 +--
>  35 files changed, 135 insertions(+), 107 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c
> b/arch/powerpc/kvm/book3s_hv_builtin.c
> index da85f046377a..dba041d659d2 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -414,7 +414,7 @@ static long kvmppc_read_one_intr(bool *again)
>         xics_phys = local_paca->kvm_hstate.xics_phys;
>         rc = 0;
>         if (!xics_phys)
> -               rc = opal_int_get_xirr(&xirr, false);
> +               rc = opal_int_get_xirr(stack_pa(&xirr), false);
>         else
>                 xirr = __raw_rm_readl(xics_phys + XICS_XIRR);
>         if (rc < 0)
> diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c
> b/arch/powerpc/platforms/powernv/eeh-powernv.c
> index a83cb679dd59..f069aa28f969 100644
> --- a/arch/powerpc/platforms/powernv/eeh-powernv.c
> +++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
> @@ -517,7 +517,7 @@ static void pnv_eeh_get_phb_diag(struct eeh_pe
> *pe)
>         struct pnv_phb *phb = pe->phb->private_data;
>         s64 rc;
>  
> -       rc = opal_pci_get_phb_diag_data2(phb->opal_id, pe->data,
> +       rc = opal_pci_get_phb_diag_data2(phb->opal_id, stack_pa(pe-
> >data),
>                                          phb->diag_data_size);
>         if (rc != OPAL_SUCCESS)
>                 pr_warn("%s: Failure %lld getting PHB#%x diag-
> data\n",
> @@ -534,8 +534,8 @@ static int pnv_eeh_get_phb_state(struct eeh_pe
> *pe)
>  
>         rc = opal_pci_eeh_freeze_status(phb->opal_id,
>                                         pe->addr,
> -                                       &fstate,
> -                                       &pcierr,
> +                                       stack_pa(&fstate),
> +                                       stack_pa(&pcierr),
>                                         NULL);
>         if (rc != OPAL_SUCCESS) {
>                 pr_warn("%s: Failure %lld getting PHB#%x state\n",
> @@ -594,8 +594,8 @@ static int pnv_eeh_get_pe_state(struct eeh_pe
> *pe)
>         } else {
>                 rc = opal_pci_eeh_freeze_status(phb->opal_id,
>                                                 pe->addr,
> -                                               &fstate,
> -                                               &pcierr,
> +                                               stack_pa(&fstate),
> +                                               stack_pa(&pcierr),
>                                                 NULL);
>                 if (rc != OPAL_SUCCESS) {
>                         pr_warn("%s: Failure %lld getting PHB#%x-PE%x
> state\n",
> @@ -1287,7 +1287,8 @@ static void
> pnv_eeh_get_and_dump_hub_diag(struct pci_controller *hose)
>                 (struct OpalIoP7IOCErrorData*)phb->diag_data;
>         long rc;
>  
> -       rc = opal_pci_get_hub_diag_data(phb->hub_id, data,
> sizeof(*data));
> +       rc = opal_pci_get_hub_diag_data(phb->hub_id, stack_pa(data),
> +                                       sizeof(*data));
>         if (rc != OPAL_SUCCESS) {
>                 pr_warn("%s: Failed to get HUB#%llx diag-data
> (%ld)\n",
>                         __func__, phb->hub_id, rc);
> @@ -1432,7 +1433,9 @@ static int pnv_eeh_next_error(struct eeh_pe
> **pe)
>                         continue;
>  
>                 rc = opal_pci_next_error(phb->opal_id,
> -                                        &frozen_pe_no, &err_type,
> &severity);
> +                                        stack_pa(&frozen_pe_no),
> +                                        stack_pa(&err_type),
> +                                        stack_pa(&severity));
>                 if (rc != OPAL_SUCCESS) {
>                         pr_devel("%s: Invalid return value on "
>                                  "PHB#%x (0x%lx) from
> opal_pci_next_error",
> @@ -1511,7 +1514,8 @@ static int pnv_eeh_next_error(struct eeh_pe
> **pe)
>  
>                                 /* Dump PHB diag-data */
>                                 rc = opal_pci_get_phb_diag_data2(phb-
> >opal_id,
> -                                       phb->diag_data, phb-
> >diag_data_size);
> +                                                               
> stack_pa(phb->diag_data),
> +                                                                phb-
> >diag_data_size);
>                                 if (rc == OPAL_SUCCESS)
>                                         pnv_pci_dump_phb_diag_data(ho
> se,
>                                                         phb-
> >diag_data);
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c
> b/arch/powerpc/platforms/powernv/ocxl.c
> index 629067781cec..33d5b85df078 100644
> --- a/arch/powerpc/platforms/powernv/ocxl.c
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> @@ -450,7 +450,8 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void
> *spa_mem, int PE_mask,
>                 return -ENOMEM;
>  
>         bdfn = (dev->bus->number << 8) | dev->devfn;
> -       rc = opal_npu_spa_setup(phb->opal_id, bdfn,
> virt_to_phys(spa_mem),
> +       rc = opal_npu_spa_setup(phb->opal_id, bdfn,
> +                               (uint64_t)stack_pa(spa_mem),
>                                 PE_mask);
>         if (rc) {
>                 dev_err(&dev->dev, "Can't setup Shared Process Area:
> %d\n", rc);
> diff --git a/arch/powerpc/platforms/powernv/opal-core.c
> b/arch/powerpc/platforms/powernv/opal-core.c
> index bb7657115f1d..6a4a1fd9ec33 100644
> --- a/arch/powerpc/platforms/powernv/opal-core.c
> +++ b/arch/powerpc/platforms/powernv/opal-core.c
> @@ -475,7 +475,7 @@ static void __init opalcore_config_init(void)
>         }
>  
>         /* Get OPAL metadata */
> -       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_OPAL, &addr);
> +       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_OPAL,
> stack_pa(&addr));
>         if ((ret != OPAL_SUCCESS) || !addr) {
>                 pr_err("Failed to get OPAL metadata (%d)\n", ret);
>                 goto error_out;
> @@ -486,7 +486,7 @@ static void __init opalcore_config_init(void)
>         opalc_metadata = __va(addr);
>  
>         /* Get OPAL CPU metadata */
> -       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, &addr);
> +       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU,
> stack_pa(&addr));
>         if ((ret != OPAL_SUCCESS) || !addr) {
>                 pr_err("Failed to get OPAL CPU metadata (%d)\n",
> ret);
>                 goto error_out;
> diff --git a/arch/powerpc/platforms/powernv/opal-dump.c
> b/arch/powerpc/platforms/powernv/opal-dump.c
> index 16c5860f1372..9d48257988bc 100644
> --- a/arch/powerpc/platforms/powernv/opal-dump.c
> +++ b/arch/powerpc/platforms/powernv/opal-dump.c
> @@ -223,9 +223,9 @@ static int64_t dump_read_info(uint32_t *dump_id,
> uint32_t *dump_size, uint32_t *
>  
>         type = cpu_to_be32(0xffffffff);
>  
> -       rc = opal_dump_info2(&id, &size, &type);
> +       rc = opal_dump_info2(stack_pa(&id), stack_pa(&size),
> stack_pa(&type));
>         if (rc == OPAL_PARAMETER)
> -               rc = opal_dump_info(&id, &size);
> +               rc = opal_dump_info(stack_pa(&id), stack_pa(&size));
>  
>         if (rc) {
>                 pr_warn("%s: Failed to get dump info (%d)\n",
> @@ -262,7 +262,7 @@ static int64_t dump_read_data(struct dump_obj
> *dump)
>         }
>  
>         /* First entry address */
> -       addr = __pa(list);
> +       addr = (uint64_t)stack_pa(list);
>  
>         /* Fetch data */
>         rc = OPAL_BUSY_EVENT;
> diff --git a/arch/powerpc/platforms/powernv/opal-elog.c
> b/arch/powerpc/platforms/powernv/opal-elog.c
> index 554fdd7f88b8..8750d7729e7c 100644
> --- a/arch/powerpc/platforms/powernv/opal-elog.c
> +++ b/arch/powerpc/platforms/powernv/opal-elog.c
> @@ -169,7 +169,7 @@ static ssize_t raw_attr_read(struct file *filep,
> struct kobject *kobj,
>                 if (!elog->buffer)
>                         return -EIO;
>  
> -               opal_rc = opal_read_elog(__pa(elog->buffer),
> +               opal_rc = opal_read_elog((uint64_t)stack_pa(elog-
> >buffer),
>                                          elog->size, elog->id);
>                 if (opal_rc != OPAL_SUCCESS) {
>                         pr_err_ratelimited("ELOG: log read failed for
> log-id=%llx\n",
> @@ -212,8 +212,8 @@ static void create_elog_obj(uint64_t id, size_t
> size, uint64_t type)
>         elog->buffer = kzalloc(elog->size, GFP_KERNEL);
>  
>         if (elog->buffer) {
> -               rc = opal_read_elog(__pa(elog->buffer),
> -                                        elog->size, elog->id);
> +               rc = opal_read_elog((uint64_t)stack_pa(elog->buffer),
> +                                   elog->size, elog->id);
>                 if (rc != OPAL_SUCCESS) {
>                         pr_err("ELOG: log read failed for log-
> id=%llx\n",
>                                elog->id);
> @@ -270,7 +270,9 @@ static irqreturn_t elog_event(int irq, void
> *data)
>         char name[2+16+1];
>         struct kobject *kobj;
>  
> -       rc = opal_get_elog_size(&id, &size, &type);
> +       rc = opal_get_elog_size(stack_pa(&id),
> +                               stack_pa(&size),
> +                               stack_pa(&type));
>         if (rc != OPAL_SUCCESS) {
>                 pr_err("ELOG: OPAL log info read failed\n");
>                 return IRQ_HANDLED;
> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c
> b/arch/powerpc/platforms/powernv/opal-fadump.c
> index 964f464b1b0e..d4bdf4540c1f 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -47,7 +47,7 @@ void __init opal_fadump_dt_scan(struct fw_dump
> *fadump_conf, u64 node)
>         if (!prop)
>                 return;
>  
> -       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &addr);
> +       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL,
> stack_pa(&addr));
>         if ((ret != OPAL_SUCCESS) || !addr) {
>                 pr_debug("Could not get Kernel metadata (%lld)\n",
> ret);
>                 return;
> @@ -63,7 +63,7 @@ void __init opal_fadump_dt_scan(struct fw_dump
> *fadump_conf, u64 node)
>         if (be16_to_cpu(opal_fdm_active->registered_regions) == 0)
>                 return;
>  
> -       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_BOOT_MEM, &addr);
> +       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_BOOT_MEM,
> stack_pa(&addr));
>         if ((ret != OPAL_SUCCESS) || !addr) {
>                 pr_err("Failed to get boot memory tag (%lld)\n",
> ret);
>                 return;
> @@ -607,7 +607,7 @@ static void opal_fadump_trigger(struct
> fadump_crash_info_header *fdh,
>          */
>         fdh->crashing_cpu = (u32)mfspr(SPRN_PIR);
>  
> -       rc = opal_cec_reboot2(OPAL_REBOOT_MPIPL, msg);
> +       rc = opal_cec_reboot2(OPAL_REBOOT_MPIPL, stack_pa(msg));

Hi Andrew, some compilers cranky here:

/linux/arch/powerpc/platforms/powernv/opal-fadump.c:610:52: error:
passing 'const char *' to parameter of type 'void *' discards
qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
        rc = opal_cec_reboot2(OPAL_REBOOT_MPIPL, stack_pa(msg));
                                                          ^~~
/linux/arch/powerpc/include/asm/book3s/64/stack.h:56:45: note: passing
argument to parameter 'ptr' here
static __always_inline void *stack_pa(void *ptr)


>         if (rc == OPAL_UNSUPPORTED) {
>                 pr_emerg("Reboot type %d not supported.\n",
>                          OPAL_REBOOT_MPIPL);
> @@ -690,7 +690,7 @@ void __init opal_fadump_dt_scan(struct fw_dump
> *fadump_conf, u64 node)
>         if (!prop)
>                 return;
>  
> -       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL, &be_addr);
> +       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_KERNEL,
> stack_pa(&be_addr));
>         if ((ret != OPAL_SUCCESS) || !be_addr) {
>                 pr_err("Failed to get Kernel metadata (%lld)\n",
> ret);
>                 return;
> @@ -712,8 +712,8 @@ void __init opal_fadump_dt_scan(struct fw_dump
> *fadump_conf, u64 node)
>                 return;
>         }
>  
> -       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU, &be_addr);
> -       if (be_addr) {
> +       ret = opal_mpipl_query_tag(OPAL_MPIPL_TAG_CPU,
> stack_pa(&be_addr));
> +       if (addr) {
>                 addr = be64_to_cpu(be_addr);
>                 pr_debug("CPU metadata addr: %llx\n", addr);
>                 opal_cpu_metadata = __va(addr);
> diff --git a/arch/powerpc/platforms/powernv/opal-flash.c
> b/arch/powerpc/platforms/powernv/opal-flash.c
> index d5ea04e8e4c5..fb989707ce94 100644
> --- a/arch/powerpc/platforms/powernv/opal-flash.c
> +++ b/arch/powerpc/platforms/powernv/opal-flash.c
> @@ -134,7 +134,8 @@ static inline void opal_flash_validate(void)
>         __be32 size = cpu_to_be32(validate_flash_data.buf_size);
>         __be32 result;
>  
> -       ret = opal_validate_flash(__pa(buf), &size, &result);
> +       ret = opal_validate_flash((uint64_t)stack_pa(buf),
> stack_pa(&size),
> +                                 stack_pa(&result));
>  
>         validate_flash_data.status = ret;
>         validate_flash_data.buf_size = be32_to_cpu(size);
> @@ -290,7 +291,7 @@ static int opal_flash_update(int op)
>                 goto invalid_img;
>  
>         /* First entry address */
> -       addr = __pa(list);
> +       addr = (unsigned long)stack_pa(list);
>  
>  flash:
>         rc = opal_update_flash(addr);
> diff --git a/arch/powerpc/platforms/powernv/opal-hmi.c
> b/arch/powerpc/platforms/powernv/opal-hmi.c
> index f0c1830deb51..a7df32dfd090 100644
> --- a/arch/powerpc/platforms/powernv/opal-hmi.c
> +++ b/arch/powerpc/platforms/powernv/opal-hmi.c
> @@ -303,7 +303,8 @@ static void hmi_event_handler(struct work_struct
> *work)
>  
>         if (unrecoverable) {
>                 /* Pull all HMI events from OPAL before we panic. */
> -               while (opal_get_msg(__pa(&msg), sizeof(msg)) ==
> OPAL_SUCCESS) {
> +               while (opal_get_msg((uint64_t)stack_pa(&msg),
> +                                   sizeof(msg)) == OPAL_SUCCESS) {
>                         u32 type;
>  
>                         type = be32_to_cpu(msg.msg_type);
> diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c
> b/arch/powerpc/platforms/powernv/opal-irqchip.c
> index d55652b5f6fa..0af8e517884c 100644
> --- a/arch/powerpc/platforms/powernv/opal-irqchip.c
> +++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
> @@ -60,7 +60,7 @@ void opal_handle_events(void)
>                 cond_resched();
>         }
>         last_outstanding_events = 0;
> -       if (opal_poll_events(&events) != OPAL_SUCCESS)
> +       if (opal_poll_events(stack_pa(&events)) != OPAL_SUCCESS)
>                 return;
>         e = be64_to_cpu(events) & opal_event_irqchip.mask;
>         if (e)
> @@ -123,7 +123,7 @@ static irqreturn_t opal_interrupt(int irq, void
> *data)
>  {
>         __be64 events;
>  
> -       opal_handle_interrupt(virq_to_hw(irq), &events);
> +       opal_handle_interrupt(virq_to_hw(irq), stack_pa(&events));
>         last_outstanding_events = be64_to_cpu(events);
>         if (opal_have_pending_events())
>                 opal_wake_poller();
> diff --git a/arch/powerpc/platforms/powernv/opal-lpc.c
> b/arch/powerpc/platforms/powernv/opal-lpc.c
> index d129d6d45a50..01114ab629dc 100644
> --- a/arch/powerpc/platforms/powernv/opal-lpc.c
> +++ b/arch/powerpc/platforms/powernv/opal-lpc.c
> @@ -28,7 +28,7 @@ static u8 opal_lpc_inb(unsigned long port)
>  
>         if (opal_lpc_chip_id < 0 || port > 0xffff)
>                 return 0xff;
> -       rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port,
> &data, 1);
> +       rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port,
> stack_pa(&data), 1);
>         return rc ? 0xff : be32_to_cpu(data);
>  }
>  
> @@ -41,7 +41,7 @@ static __le16 __opal_lpc_inw(unsigned long port)
>                 return 0xffff;
>         if (port & 1)
>                 return (__le16)opal_lpc_inb(port) << 8 |
> opal_lpc_inb(port + 1);
> -       rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port,
> &data, 2);
> +       rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port,
> stack_pa(&data), 2);
>         return rc ? 0xffff : be32_to_cpu(data);
>  }
>  static u16 opal_lpc_inw(unsigned long port)
> @@ -61,7 +61,7 @@ static __le32 __opal_lpc_inl(unsigned long port)
>                        (__le32)opal_lpc_inb(port + 1) << 16 |
>                        (__le32)opal_lpc_inb(port + 2) <<  8 |
>                                opal_lpc_inb(port + 3);
> -       rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port,
> &data, 4);
> +       rc = opal_lpc_read(opal_lpc_chip_id, OPAL_LPC_IO, port,
> stack_pa(&data), 4);
>         return rc ? 0xffffffff : be32_to_cpu(data);
>  }
>  
> @@ -208,7 +208,7 @@ static ssize_t lpc_debug_read(struct file *filp,
> char __user *ubuf,
>                                 len = 2;
>                 }
>                 rc = opal_lpc_read(opal_lpc_chip_id, lpc->lpc_type,
> pos,
> -                                  &data, len);
> +                                  stack_pa(&data), len);
>                 if (rc)
>                         return -ENXIO;
>  
> diff --git a/arch/powerpc/platforms/powernv/opal-nvram.c
> b/arch/powerpc/platforms/powernv/opal-nvram.c
> index 380bc2d7ebbf..d92e5070baf2 100644
> --- a/arch/powerpc/platforms/powernv/opal-nvram.c
> +++ b/arch/powerpc/platforms/powernv/opal-nvram.c
> @@ -33,7 +33,7 @@ static ssize_t opal_nvram_read(char *buf, size_t
> count, loff_t *index)
>         off = *index;
>         if ((off + count) > nvram_size)
>                 count = nvram_size - off;
> -       rc = opal_read_nvram(__pa(buf), count, off);
> +       rc = opal_read_nvram((uint64_t)stack_pa(buf), count, off);
>         if (rc != OPAL_SUCCESS)
>                 return -EIO;
>         *index += count;
> @@ -56,7 +56,7 @@ static ssize_t opal_nvram_write(char *buf, size_t
> count, loff_t *index)
>                 count = nvram_size - off;
>  
>         while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
> -               rc = opal_write_nvram(__pa(buf), count, off);
> +               rc = opal_write_nvram((uint64_t)stack_pa(buf), count,
> off);
>                 if (rc == OPAL_BUSY_EVENT) {
>                         if (in_interrupt() || irqs_disabled())
>                                 mdelay(OPAL_BUSY_DELAY_MS);
> diff --git a/arch/powerpc/platforms/powernv/opal-power.c
> b/arch/powerpc/platforms/powernv/opal-power.c
> index db99ffcb7b82..6927bcd3630e 100644
> --- a/arch/powerpc/platforms/powernv/opal-power.c
> +++ b/arch/powerpc/platforms/powernv/opal-power.c
> @@ -31,7 +31,7 @@ static bool detect_epow(void)
>         * to OPAL. OPAL returns EPOW info along with classes present.
>         */
>         epow_classes = cpu_to_be16(OPAL_SYSEPOW_MAX);
> -       rc = opal_get_epow_status(opal_epow_status, &epow_classes);
> +       rc = opal_get_epow_status(stack_pa(opal_epow_status),
> stack_pa(&epow_classes));
>         if (rc != OPAL_SUCCESS) {
>                 pr_err("Failed to get EPOW event information\n");
>                 return false;
> @@ -59,7 +59,7 @@ static bool __init poweroff_pending(void)
>         __be64 opal_dpo_timeout;
>  
>         /* Check for DPO event */
> -       rc = opal_get_dpo_status(&opal_dpo_timeout);
> +       rc = opal_get_dpo_status(stack_pa(&opal_dpo_timeout));
>         if (rc == OPAL_SUCCESS) {
>                 pr_info("Existing DPO event detected.\n");
>                 return true;
> diff --git a/arch/powerpc/platforms/powernv/opal-powercap.c
> b/arch/powerpc/platforms/powernv/opal-powercap.c
> index 7bfe4cbeb35a..63e0e4427aea 100644
> --- a/arch/powerpc/platforms/powernv/opal-powercap.c
> +++ b/arch/powerpc/platforms/powernv/opal-powercap.c
> @@ -46,7 +46,7 @@ static ssize_t powercap_show(struct kobject *kobj,
> struct kobj_attribute *attr,
>         if (ret)
>                 goto out_token;
>  
> -       ret = opal_get_powercap(pcap_attr->handle, token, (u32
> *)__pa(&pcap));
> +       ret = opal_get_powercap(pcap_attr->handle, token, (u32
> *)stack_pa(&pcap));
>         switch (ret) {
>         case OPAL_ASYNC_COMPLETION:
>                 ret = opal_async_wait_response(token, &msg);
> diff --git a/arch/powerpc/platforms/powernv/opal-prd.c
> b/arch/powerpc/platforms/powernv/opal-prd.c
> index 113bdb151f68..649e8510ec00 100644
> --- a/arch/powerpc/platforms/powernv/opal-prd.c
> +++ b/arch/powerpc/platforms/powernv/opal-prd.c
> @@ -234,7 +234,7 @@ static ssize_t opal_prd_write(struct file *file,
> const char __user *buf,
>         if (IS_ERR(msg))
>                 return PTR_ERR(msg);
>  
> -       rc = opal_prd_msg(msg);
> +       rc = opal_prd_msg(stack_pa(msg));
>         if (rc) {
>                 pr_warn("write: opal_prd_msg returned %d\n", rc);
>                 size = -EIO;
> @@ -252,7 +252,7 @@ static int opal_prd_release(struct inode *inode,
> struct file *file)
>         msg.size = cpu_to_be16(sizeof(msg));
>         msg.type = OPAL_PRD_MSG_TYPE_FINI;
>  
> -       opal_prd_msg((struct opal_prd_msg *)&msg);
> +       opal_prd_msg((struct opal_prd_msg *)stack_pa(&msg));
>  
>         atomic_xchg(&prd_usage, 0);
>  
> @@ -281,7 +281,7 @@ static long opal_prd_ioctl(struct file *file,
> unsigned int cmd,
>                         return -EFAULT;
>  
>                 scom.rc = opal_xscom_read(scom.chip, scom.addr,
> -                               (__be64 *)&scom.data);
> +                                         (__be64
> *)stack_pa(&scom.data));
>                 scom.data = be64_to_cpu(scom.data);
>                 pr_devel("ioctl SCOM_READ: chip %llx addr %016llx
> data %016llx rc %lld\n",
>                                 scom.chip, scom.addr, scom.data,
> scom.rc);
> diff --git a/arch/powerpc/platforms/powernv/opal-psr.c
> b/arch/powerpc/platforms/powernv/opal-psr.c
> index 6441e17b6996..c37257b1ffe4 100644
> --- a/arch/powerpc/platforms/powernv/opal-psr.c
> +++ b/arch/powerpc/platforms/powernv/opal-psr.c
> @@ -40,7 +40,7 @@ static ssize_t psr_show(struct kobject *kobj,
> struct kobj_attribute *attr,
>                 goto out_token;
>  
>         ret = opal_get_power_shift_ratio(psr_attr->handle, token,
> -                                           (u32 *)__pa(&psr));
> +                                           (u32 *)stack_pa(&psr));
>         switch (ret) {
>         case OPAL_ASYNC_COMPLETION:
>                 ret = opal_async_wait_response(token, &msg);
> diff --git a/arch/powerpc/platforms/powernv/opal-rtc.c
> b/arch/powerpc/platforms/powernv/opal-rtc.c
> index a9bcf9217e64..891651455066 100644
> --- a/arch/powerpc/platforms/powernv/opal-rtc.c
> +++ b/arch/powerpc/platforms/powernv/opal-rtc.c
> @@ -43,7 +43,7 @@ time64_t __init opal_get_boot_time(void)
>                 return 0;
>  
>         while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
> -               rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
> +               rc = opal_rtc_read(stack_pa(&__y_m_d),
> stack_pa(&__h_m_s_ms));
>                 if (rc == OPAL_BUSY_EVENT) {
>                         mdelay(OPAL_BUSY_DELAY_MS);
>                         opal_poll_events(NULL);
> diff --git a/arch/powerpc/platforms/powernv/opal-secvar.c
> b/arch/powerpc/platforms/powernv/opal-secvar.c
> index 14133e120bdd..a44df58d565d 100644
> --- a/arch/powerpc/platforms/powernv/opal-secvar.c
> +++ b/arch/powerpc/platforms/powernv/opal-secvar.c
> @@ -64,7 +64,8 @@ static int opal_get_variable(const char *key,
> uint64_t ksize,
>  
>         *dsize = cpu_to_be64(*dsize);
>  
> -       rc = opal_secvar_get(key, ksize, data, dsize);
> +       rc = opal_secvar_get(stack_pa(key), ksize, stack_pa(data),
> +                            stack_pa(dsize));
>  
>         *dsize = be64_to_cpu(*dsize);
>  
> @@ -81,7 +82,8 @@ static int opal_get_next_variable(const char *key,
> uint64_t *keylen,
>  
>         *keylen = cpu_to_be64(*keylen);
>  
> -       rc = opal_secvar_get_next(key, keylen, keybufsize);
> +       rc = opal_secvar_get_next(stack_pa(key), stack_pa(keylen),
> +                                 keybufsize);
>  
>         *keylen = be64_to_cpu(*keylen);
>  
> @@ -96,7 +98,8 @@ static int opal_set_variable(const char *key,
> uint64_t ksize, u8 *data,
>         if (!key || !data)
>                 return -EINVAL;
>  
> -       rc = opal_secvar_enqueue_update(key, ksize, data, dsize);
> +       rc = opal_secvar_enqueue_update(stack_pa(key), ksize,
> stack_pa(data),
> +                                       dsize);
>  
>         return opal_status_to_err(rc);
>  }
> diff --git a/arch/powerpc/platforms/powernv/opal-sensor.c
> b/arch/powerpc/platforms/powernv/opal-sensor.c
> index 3192c614a1e1..ff5f78bb419b 100644
> --- a/arch/powerpc/platforms/powernv/opal-sensor.c
> +++ b/arch/powerpc/platforms/powernv/opal-sensor.c
> @@ -25,7 +25,7 @@ int opal_get_sensor_data(u32 sensor_hndl, u32
> *sensor_data)
>         if (token < 0)
>                 return token;
>  
> -       ret = opal_sensor_read(sensor_hndl, token, &data);
> +       ret = opal_sensor_read(sensor_hndl, token, stack_pa(&data));
>         switch (ret) {
>         case OPAL_ASYNC_COMPLETION:
>                 ret = opal_async_wait_response(token, &msg);
> @@ -78,7 +78,7 @@ int opal_get_sensor_data_u64(u32 sensor_hndl, u64
> *sensor_data)
>         if (token < 0)
>                 return token;
>  
> -       ret = opal_sensor_read_u64(sensor_hndl, token, &data);
> +       ret = opal_sensor_read_u64(sensor_hndl, token,
> stack_pa(&data));
>         switch (ret) {
>         case OPAL_ASYNC_COMPLETION:
>                 ret = opal_async_wait_response(token, &msg);
> diff --git a/arch/powerpc/platforms/powernv/opal-sysparam.c
> b/arch/powerpc/platforms/powernv/opal-sysparam.c
> index a12312afe4ef..3882b31a9e61 100644
> --- a/arch/powerpc/platforms/powernv/opal-sysparam.c
> +++ b/arch/powerpc/platforms/powernv/opal-sysparam.c
> @@ -41,7 +41,7 @@ static ssize_t opal_get_sys_param(u32 param_id, u32
> length, void *buffer)
>                 goto out;
>         }
>  
> -       ret = opal_get_param(token, param_id, (u64)buffer, length);
> +       ret = opal_get_param(token, param_id, (u64)stack_pa(buffer),
> length);
>         if (ret != OPAL_ASYNC_COMPLETION) {
>                 ret = opal_error_code(ret);
>                 goto out_token;
> @@ -76,7 +76,7 @@ static int opal_set_sys_param(u32 param_id, u32
> length, void *buffer)
>                 goto out;
>         }
>  
> -       ret = opal_set_param(token, param_id, (u64)buffer, length);
> +       ret = opal_set_param(token, param_id, (u64)stack_pa(buffer),
> length);
>  
>         if (ret != OPAL_ASYNC_COMPLETION) {
>                 ret = opal_error_code(ret);
> diff --git a/arch/powerpc/platforms/powernv/opal-xscom.c
> b/arch/powerpc/platforms/powernv/opal-xscom.c
> index 6b4eed2ef4fa..b318e2ef4ba2 100644
> --- a/arch/powerpc/platforms/powernv/opal-xscom.c
> +++ b/arch/powerpc/platforms/powernv/opal-xscom.c
> @@ -58,7 +58,7 @@ static int opal_scom_read(uint32_t chip, uint64_t
> addr, u64 reg, u64 *value)
>         __be64 v;
>  
>         reg = opal_scom_unmangle(addr + reg);
> -       rc = opal_xscom_read(chip, reg, (__be64 *)__pa(&v));
> +       rc = opal_xscom_read(chip, reg, (__be64 *)stack_pa(&v));
>         if (rc) {
>                 *value = 0xfffffffffffffffful;
>                 return -EIO;
> diff --git a/arch/powerpc/platforms/powernv/opal.c
> b/arch/powerpc/platforms/powernv/opal.c
> index cdf3838f08d3..ada336d77e64 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -357,7 +357,7 @@ static void opal_handle_message(void)
>         s64 ret;
>         u32 type;
>  
> -       ret = opal_get_msg(__pa(opal_msg), opal_msg_size);
> +       ret = opal_get_msg((uint64_t)stack_pa(opal_msg),
> opal_msg_size);
>         /* No opal message pending. */
>         if (ret == OPAL_RESOURCE)
>                 return;
> @@ -431,11 +431,11 @@ int opal_get_chars(uint32_t vtermno, char *buf,
> int count)
>  
>         if (!opal.entry)
>                 return -ENODEV;
> -       opal_poll_events(&evt);
> +       opal_poll_events(stack_pa(&evt));
>         if ((be64_to_cpu(evt) & OPAL_EVENT_CONSOLE_INPUT) == 0)
>                 return 0;
>         len = cpu_to_be64(count);
> -       rc = opal_console_read(vtermno, &len, buf);
> +       rc = opal_console_read(vtermno, stack_pa(&len),
> stack_pa(buf));
>         if (rc == OPAL_SUCCESS)
>                 return be64_to_cpu(len);
>         return 0;
> @@ -453,7 +453,7 @@ static int __opal_put_chars(uint32_t vtermno,
> const char *data, int total_len, b
>  
>         if (atomic)
>                 spin_lock_irqsave(&opal_write_lock, flags);
> -       rc = opal_console_write_buffer_space(vtermno, &olen);
> +       rc = opal_console_write_buffer_space(vtermno,
> stack_pa(&olen));
>         if (rc || be64_to_cpu(olen) < total_len) {
>                 /* Closed -> drop characters */
>                 if (rc)
> @@ -465,7 +465,7 @@ static int __opal_put_chars(uint32_t vtermno,
> const char *data, int total_len, b
>  
>         /* Should not get a partial write here because space is
> available. */
>         olen = cpu_to_be64(total_len);
> -       rc = opal_console_write(vtermno, &olen, data);
> +       rc = opal_console_write(vtermno, stack_pa(&olen),
> stack_pa((void *)data));
>         if (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
>                 if (rc == OPAL_BUSY_EVENT)
>                         opal_poll_events(NULL);
> @@ -527,7 +527,7 @@ static s64 __opal_flush_console(uint32_t vtermno)
>                  */
>                 WARN_ONCE(1, "opal: OPAL_CONSOLE_FLUSH missing.\n");
>  
> -               opal_poll_events(&evt);
> +               opal_poll_events(stack_pa(&evt));
>                 if (!(be64_to_cpu(evt) & OPAL_EVENT_CONSOLE_OUTPUT))
>                         return OPAL_SUCCESS;
>                 return OPAL_BUSY;
> @@ -647,7 +647,7 @@ void __noreturn pnv_platform_error_reboot(struct
> pt_regs *regs, const char *msg)
>          * Don't bother to shut things down because this will
>          * xstop the system.
>          */
> -       if (opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR, msg)
> +       if (opal_cec_reboot2(OPAL_REBOOT_PLATFORM_ERROR,
> stack_pa((void *)msg))
>                                                 == OPAL_UNSUPPORTED)
> {
>                 pr_emerg("Reboot type %d not supported for %s\n",
>                                 OPAL_REBOOT_PLATFORM_ERROR, msg);
> @@ -720,7 +720,7 @@ int opal_hmi_exception_early2(struct pt_regs
> *regs)
>          * Check 64-bit flag mask to find out if an event was
> generated,
>          * and whether TB is still valid or not etc.
>          */
> -       rc = opal_handle_hmi2(&out_flags);
> +       rc = opal_handle_hmi2(stack_pa(&out_flags));
>         if (rc != OPAL_SUCCESS)
>                 return 0;
>  
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 5c144c05cbfd..4d85e8253f94 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -628,7 +628,8 @@ static int pnv_ioda_get_pe_state(struct pnv_phb
> *phb, int pe_no)
>  
>         /* Check the master PE */
>         rc = opal_pci_eeh_freeze_status(phb->opal_id, pe_no,
> -                                       &state, &pcierr, NULL);
> +                                       stack_pa(&state),
> +                                       stack_pa(&pcierr), NULL);
>         if (rc != OPAL_SUCCESS) {
>                 pr_warn("%s: Failure %lld getting "
>                         "PHB#%x-PE#%x state\n",
> @@ -644,8 +645,8 @@ static int pnv_ioda_get_pe_state(struct pnv_phb
> *phb, int pe_no)
>         list_for_each_entry(slave, &pe->slaves, list) {
>                 rc = opal_pci_eeh_freeze_status(phb->opal_id,
>                                                 slave->pe_number,
> -                                               &fstate,
> -                                               &pcierr,
> +                                               stack_pa(&fstate),
> +                                               stack_pa(&pcierr),
>                                                 NULL);
>                 if (rc != OPAL_SUCCESS) {
>                         pr_warn("%s: Failure %lld getting "
> @@ -2061,7 +2062,7 @@ static int __pnv_pci_ioda_msi_setup(struct
> pnv_phb *phb, struct pci_dev *dev,
>                 __be64 addr64;
>  
>                 rc = opal_get_msi_64(phb->opal_id, pe->mve_number,
> xive_num, 1,
> -                                    &addr64, &data);
> +                                    stack_pa(&addr64),
> stack_pa(&data));
>                 if (rc) {
>                         pr_warn("%s: OPAL error %d getting 64-bit MSI
> data\n",
>                                 pci_name(dev), rc);
> @@ -2073,7 +2074,7 @@ static int __pnv_pci_ioda_msi_setup(struct
> pnv_phb *phb, struct pci_dev *dev,
>                 __be32 addr32;
>  
>                 rc = opal_get_msi_32(phb->opal_id, pe->mve_number,
> xive_num, 1,
> -                                    &addr32, &data);
> +                                    stack_pa(&addr32),
> stack_pa(&data));
>                 if (rc) {
>                         pr_warn("%s: OPAL error %d getting 32-bit MSI
> data\n",
>                                 pci_name(dev), rc);
> @@ -2415,7 +2416,8 @@ static int pnv_pci_diag_data_set(void *data,
> u64 val)
>         s64 ret;
>  
>         /* Retrieve the diag data from firmware */
> -       ret = opal_pci_get_phb_diag_data2(phb->opal_id, phb-
> >diag_data,
> +       ret = opal_pci_get_phb_diag_data2(phb->opal_id,
> +                                         stack_pa(phb->diag_data),
>                                           phb->diag_data_size);
>         if (ret != OPAL_SUCCESS)
>                 return -EIO;
> diff --git a/arch/powerpc/platforms/powernv/pci.c
> b/arch/powerpc/platforms/powernv/pci.c
> index 233a50e65fce..0c21b5aa24f5 100644
> --- a/arch/powerpc/platforms/powernv/pci.c
> +++ b/arch/powerpc/platforms/powernv/pci.c
> @@ -84,7 +84,7 @@ int pnv_pci_get_device_tree(uint32_t phandle, void
> *buf, uint64_t len)
>         if (!opal_check_token(OPAL_GET_DEVICE_TREE))
>                 return -ENXIO;
>  
> -       rc = opal_get_device_tree(phandle, (uint64_t)buf, len);
> +       rc = opal_get_device_tree(phandle, (uint64_t)stack_pa(buf),
> len);
>         if (rc < OPAL_SUCCESS)
>                 return -EIO;
>  
> @@ -99,7 +99,7 @@ int pnv_pci_get_presence_state(uint64_t id, uint8_t
> *state)
>         if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATE))
>                 return -ENXIO;
>  
> -       rc = opal_pci_get_presence_state(id, (uint64_t)state);
> +       rc = opal_pci_get_presence_state(id,
> (uint64_t)stack_pa(state));
>         if (rc != OPAL_SUCCESS)
>                 return -EIO;
>  
> @@ -114,7 +114,7 @@ int pnv_pci_get_power_state(uint64_t id, uint8_t
> *state)
>         if (!opal_check_token(OPAL_PCI_GET_POWER_STATE))
>                 return -ENXIO;
>  
> -       rc = opal_pci_get_power_state(id, (uint64_t)state);
> +       rc = opal_pci_get_power_state(id, (uint64_t)stack_pa(state));
>         if (rc != OPAL_SUCCESS)
>                 return -EIO;
>  
> @@ -135,7 +135,7 @@ int pnv_pci_set_power_state(uint64_t id, uint8_t
> state, struct opal_msg *msg)
>         if (unlikely(token < 0))
>                 return token;
>  
> -       rc = opal_pci_set_power_state(token, id, (uint64_t)&state);
> +       rc = opal_pci_set_power_state(token, id,
> (uint64_t)stack_pa(&state));
>         if (rc == OPAL_SUCCESS) {
>                 ret = 0;
>                 goto exit;
> @@ -493,7 +493,8 @@ static void pnv_pci_handle_eeh_config(struct
> pnv_phb *phb, u32 pe_no)
>         spin_lock_irqsave(&phb->lock, flags);
>  
>         /* Fetch PHB diag-data */
> -       rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb-
> >diag_data,
> +       rc = opal_pci_get_phb_diag_data2(phb->opal_id,
> +                                        stack_pa(phb->diag_data),
>                                          phb->diag_data_size);
>         has_diag = (rc == OPAL_SUCCESS);
>  
> @@ -554,8 +555,8 @@ static void pnv_pci_config_check_eeh(struct
> pci_dn *pdn)
>         } else {
>                 rc = opal_pci_eeh_freeze_status(phb->opal_id,
>                                                 pe_no,
> -                                               &fstate,
> -                                               &pcierr,
> +                                               stack_pa(&fstate),
> +                                               stack_pa(&pcierr),
>                                                 NULL);
>                 if (rc) {
>                         pr_warn("%s: Failure %lld getting PHB#%x-
> PE#%x state\n",
> @@ -592,20 +593,22 @@ int pnv_pci_cfg_read(struct pci_dn *pdn,
>         switch (size) {
>         case 1: {
>                 u8 v8;
> -               rc = opal_pci_config_read_byte(phb->opal_id, bdfn,
> where, &v8);
> +               rc = opal_pci_config_read_byte(phb->opal_id, bdfn,
> where,
> +                                              stack_pa(&v8));
>                 *val = (rc == OPAL_SUCCESS) ? v8 : 0xff;
>                 break;
>         }
>         case 2: {
>                 __be16 v16;
>                 rc = opal_pci_config_read_half_word(phb->opal_id,
> bdfn, where,
> -                                                  &v16);
> +                                                   stack_pa(&v16));
>                 *val = (rc == OPAL_SUCCESS) ? be16_to_cpu(v16) :
> 0xffff;
>                 break;
>         }
>         case 4: {
>                 __be32 v32;
> -               rc = opal_pci_config_read_word(phb->opal_id, bdfn,
> where, &v32);
> +               rc = opal_pci_config_read_word(phb->opal_id, bdfn,
> where,
> +                                              stack_pa(&v32));
>                 *val = (rc == OPAL_SUCCESS) ? be32_to_cpu(v32) :
> 0xffffffff;
>                 break;
>         }
> @@ -765,7 +768,7 @@ int pnv_pci_set_tunnel_bar(struct pci_dev *dev,
> u64 addr, int enable)
>                 return -ENXIO;
>  
>         mutex_lock(&tunnel_mutex);
> -       rc = opal_pci_get_pbcq_tunnel_bar(phb->opal_id, &val);
> +       rc = opal_pci_get_pbcq_tunnel_bar(phb->opal_id,
> stack_pa(&val));
>         if (rc != OPAL_SUCCESS) {
>                 rc = -EIO;
>                 goto out;
> diff --git a/arch/powerpc/platforms/powernv/setup.c
> b/arch/powerpc/platforms/powernv/setup.c
> index 61ab2d38ff4b..aae6ad04c65f 100644
> --- a/arch/powerpc/platforms/powernv/setup.c
> +++ b/arch/powerpc/platforms/powernv/setup.c
> @@ -407,7 +407,7 @@ static void pnv_kexec_wait_secondaries_down(void)
>  
>                 for (;;) {
>                         rc =
> opal_query_cpu_status(get_hard_smp_processor_id(i),
> -                                                  &status);
> +                                                 
> stack_pa(&status));
>                         if (rc != OPAL_SUCCESS || status !=
> OPAL_THREAD_STARTED)
>                                 break;
>                         barrier();
> diff --git a/arch/powerpc/platforms/powernv/smp.c
> b/arch/powerpc/platforms/powernv/smp.c
> index 9e1a25398f98..2f70e0bf9873 100644
> --- a/arch/powerpc/platforms/powernv/smp.c
> +++ b/arch/powerpc/platforms/powernv/smp.c
> @@ -86,7 +86,7 @@ static int pnv_smp_kick_cpu(int nr)
>          * first time. OPAL v3 allows us to query OPAL to know if it
>          * has the CPUs, so we do that
>          */
> -       rc = opal_query_cpu_status(pcpu, &status);
> +       rc = opal_query_cpu_status(pcpu, stack_pa(&status));
>         if (rc != OPAL_SUCCESS) {
>                 pr_warn("OPAL Error %ld querying CPU %d state\n", rc,
> nr);
>                 return -ENODEV;
> diff --git a/arch/powerpc/sysdev/xics/icp-opal.c
> b/arch/powerpc/sysdev/xics/icp-opal.c
> index 4dae624b9f2f..a79b98349a1e 100644
> --- a/arch/powerpc/sysdev/xics/icp-opal.c
> +++ b/arch/powerpc/sysdev/xics/icp-opal.c
> @@ -53,7 +53,7 @@ static unsigned int icp_opal_get_xirr(void)
>                 return kvm_xirr;
>  
>         /* Then ask OPAL */
> -       rc = opal_int_get_xirr(&hw_xirr, false);
> +       rc = opal_int_get_xirr(stack_pa(&hw_xirr), false);
>         if (rc < 0)
>                 return 0;
>         return be32_to_cpu(hw_xirr);
> diff --git a/arch/powerpc/sysdev/xics/ics-opal.c
> b/arch/powerpc/sysdev/xics/ics-opal.c
> index 6cfbb4fac7fb..5bf54470b35d 100644
> --- a/arch/powerpc/sysdev/xics/ics-opal.c
> +++ b/arch/powerpc/sysdev/xics/ics-opal.c
> @@ -105,7 +105,7 @@ static int ics_opal_set_affinity(struct irq_data
> *d,
>         if (hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS)
>                 return -1;
>  
> -       rc = opal_get_xive(hw_irq, &oserver, &priority);
> +       rc = opal_get_xive(hw_irq, stack_pa(&oserver),
> stack_pa(&priority));
>         if (rc != OPAL_SUCCESS) {
>                 pr_err("%s: opal_get_xive(irq=%d [hw 0x%x]) error
> %lld\n",
>                        __func__, d->irq, hw_irq, rc);
> @@ -160,7 +160,7 @@ static int ics_opal_check(struct ics *ics,
> unsigned int hw_irq)
>                 return -EINVAL;
>  
>         /* Check if HAL knows about this interrupt */
> -       rc = opal_get_xive(hw_irq, &server, &priority);
> +       rc = opal_get_xive(hw_irq, stack_pa(&server),
> stack_pa(&priority));
>         if (rc != OPAL_SUCCESS)
>                 return -ENXIO;
>  
> @@ -174,7 +174,7 @@ static void ics_opal_mask_unknown(struct ics
> *ics, unsigned long vec)
>         int8_t priority;
>  
>         /* Check if HAL knows about this interrupt */
> -       rc = opal_get_xive(vec, &server, &priority);
> +       rc = opal_get_xive(vec, stack_pa(&server),
> stack_pa(&priority));
>         if (rc != OPAL_SUCCESS)
>                 return;
>  
> @@ -188,7 +188,7 @@ static long ics_opal_get_server(struct ics *ics,
> unsigned long vec)
>         int8_t priority;
>  
>         /* Check if HAL knows about this interrupt */
> -       rc = opal_get_xive(vec, &server, &priority);
> +       rc = opal_get_xive(vec, stack_pa(&server),
> stack_pa(&priority));
>         if (rc != OPAL_SUCCESS)
>                 return -1;
>         return ics_opal_unmangle_server(be16_to_cpu(server));
> diff --git a/arch/powerpc/sysdev/xive/native.c
> b/arch/powerpc/sysdev/xive/native.c
> index 3925825954bc..a2082ee866ca 100644
> --- a/arch/powerpc/sysdev/xive/native.c
> +++ b/arch/powerpc/sysdev/xive/native.c
> @@ -52,8 +52,11 @@ int xive_native_populate_irq_data(u32 hw_irq,
> struct xive_irq_data *data)
>  
>         memset(data, 0, sizeof(*data));
>  
> -       rc = opal_xive_get_irq_info(hw_irq, &flags, &eoi_page,
> &trig_page,
> -                                   &esb_shift, &src_chip);
> +       rc = opal_xive_get_irq_info(hw_irq, stack_pa(&flags),
> +                                   stack_pa(&eoi_page),
> +                                   stack_pa(&trig_page),
> +                                   stack_pa(&esb_shift),
> +                                   stack_pa(&src_chip));
>         if (rc) {
>                 pr_err("opal_xive_get_irq_info(0x%x) returned
> %lld\n",
>                        hw_irq, rc);
> @@ -117,7 +120,8 @@ static int xive_native_get_irq_config(u32 hw_irq,
> u32 *target, u8 *prio,
>         __be64 vp;
>         __be32 lirq;
>  
> -       rc = opal_xive_get_irq_config(hw_irq, &vp, prio, &lirq);
> +       rc = opal_xive_get_irq_config(hw_irq, stack_pa(&vp),
> stack_pa(prio),
> +                                     stack_pa(&lirq));
>  
>         *target = be64_to_cpu(vp);
>         *sw_irq = be32_to_cpu(lirq);
> @@ -150,8 +154,8 @@ int xive_native_configure_queue(u32 vp_id, struct
> xive_q *q, u8 prio,
>         q->toggle = 0;
>  
>         rc = opal_xive_get_queue_info(vp_id, prio, NULL, NULL,
> -                                     &qeoi_page_be,
> -                                     &esc_irq_be,
> +                                     stack_pa(&qeoi_page_be),
> +                                     stack_pa(&esc_irq_be),
>                                       NULL);
>         if (rc) {
>                 vp_err(vp_id, "Failed to get queue %d info : %lld\n",
> prio, rc);
> @@ -416,7 +420,8 @@ static void xive_native_setup_cpu(unsigned int
> cpu, struct xive_cpu *xc)
>         }
>  
>         /* Grab it's CAM value */
> -       rc = opal_xive_get_vp_info(vp, NULL, &vp_cam_be, NULL, NULL);
> +       rc = opal_xive_get_vp_info(vp, NULL, stack_pa(&vp_cam_be),
> NULL,
> +                                  NULL);
>         if (rc) {
>                 pr_err("Failed to get pool VP info CPU %d\n", cpu);
>                 return;
> @@ -756,7 +761,8 @@ int xive_native_get_vp_info(u32 vp_id, u32
> *out_cam_id, u32 *out_chip_id)
>         __be32 vp_chip_id_be;
>         s64 rc;
>  
> -       rc = opal_xive_get_vp_info(vp_id, NULL, &vp_cam_be, NULL,
> &vp_chip_id_be);
> +       rc = opal_xive_get_vp_info(vp_id, NULL, stack_pa(&vp_cam_be),
> NULL,
> +                                  stack_pa(&vp_chip_id_be));
>         if (rc) {
>                 vp_err(vp_id, "Failed to get VP info : %lld\n", rc);
>                 return -EIO;
> @@ -794,8 +800,11 @@ int xive_native_get_queue_info(u32 vp_id, u32
> prio,
>         __be64 qflags;
>         s64 rc;
>  
> -       rc = opal_xive_get_queue_info(vp_id, prio, &qpage, &qsize,
> -                                     &qeoi_page, &escalate_irq,
> &qflags);
> +       rc = opal_xive_get_queue_info(vp_id, prio, stack_pa(&qpage),
> +                                     stack_pa(&qsize),
> +                                     stack_pa(&qeoi_page),
> +                                     stack_pa(&escalate_irq),
> +                                     stack_pa(&qflags));
>         if (rc) {
>                 vp_err(vp_id, "failed to get queue %d info : %lld\n",
> prio, rc);
>                 return -EIO;
> @@ -822,8 +831,8 @@ int xive_native_get_queue_state(u32 vp_id, u32
> prio, u32 *qtoggle, u32 *qindex)
>         __be32 opal_qindex;
>         s64 rc;
>  
> -       rc = opal_xive_get_queue_state(vp_id, prio, &opal_qtoggle,
> -                                      &opal_qindex);
> +       rc = opal_xive_get_queue_state(vp_id, prio,
> stack_pa(&opal_qtoggle),
> +                                      stack_pa(&opal_qindex));
>         if (rc) {
>                 vp_err(vp_id, "failed to get queue %d state :
> %lld\n", prio, rc);
>                 return -EIO;
> @@ -864,7 +873,7 @@ int xive_native_get_vp_state(u32 vp_id, u64
> *out_state)
>         __be64 state;
>         s64 rc;
>  
> -       rc = opal_xive_get_vp_state(vp_id, &state);
> +       rc = opal_xive_get_vp_state(vp_id, stack_pa(&state));
>         if (rc) {
>                 vp_err(vp_id, "failed to get vp state : %lld\n", rc);
>                 return -EIO;
> diff --git a/drivers/char/ipmi/ipmi_powernv.c
> b/drivers/char/ipmi/ipmi_powernv.c
> index da22a8cbe68e..55032e205e8e 100644
> --- a/drivers/char/ipmi/ipmi_powernv.c
> +++ b/drivers/char/ipmi/ipmi_powernv.c
> @@ -91,7 +91,7 @@ static void ipmi_powernv_send(void *send_info,
> struct ipmi_smi_msg *msg)
>  
>         pr_devel("%s: opal_ipmi_send(0x%llx, %p, %ld)\n", __func__,
>                         smi->interface_id, opal_msg, size);
> -       rc = opal_ipmi_send(smi->interface_id, opal_msg, size);
> +       rc = opal_ipmi_send(smi->interface_id, stack_pa(opal_msg),
> size);
>         pr_devel("%s:  -> %d\n", __func__, rc);
>  
>         if (!rc) {
> @@ -132,8 +132,8 @@ static int ipmi_powernv_recv(struct
> ipmi_smi_powernv *smi)
>         size = cpu_to_be64(sizeof(*opal_msg) + IPMI_MAX_MSG_LENGTH);
>  
>         rc = opal_ipmi_recv(smi->interface_id,
> -                       opal_msg,
> -                       &size);
> +                           stack_pa(opal_msg),
> +                           stack_pa(&size));
>         size = be64_to_cpu(size);
>         pr_devel("%s:   -> %d (size %lld)\n", __func__,
>                         rc, rc == 0 ? size : 0);
> diff --git a/drivers/char/powernv-op-panel.c b/drivers/char/powernv-
> op-panel.c
> index 3c99696b145e..10588093e2e2 100644
> --- a/drivers/char/powernv-op-panel.c
> +++ b/drivers/char/powernv-op-panel.c
> @@ -60,7 +60,7 @@ static int __op_panel_update_display(void)
>                 return token;
>         }
>  
> -       rc = opal_write_oppanel_async(token, oppanel_lines,
> num_lines);
> +       rc = opal_write_oppanel_async(token, stack_pa(oppanel_lines),
> num_lines);
>         switch (rc) {
>         case OPAL_ASYNC_COMPLETION:
>                 rc = opal_async_wait_response(token, &msg);
> diff --git a/drivers/i2c/busses/i2c-opal.c b/drivers/i2c/busses/i2c-
> opal.c
> index 9f773b4f5ed8..d1d1fb3a55ba 100644
> --- a/drivers/i2c/busses/i2c-opal.c
> +++ b/drivers/i2c/busses/i2c-opal.c
> @@ -49,7 +49,7 @@ static int i2c_opal_send_request(u32 bus_id, struct
> opal_i2c_request *req)
>                 return token;
>         }
>  
> -       rc = opal_i2c_request(token, bus_id, req);
> +       rc = opal_i2c_request(token, bus_id, stack_pa(req));
>         if (rc != OPAL_ASYNC_COMPLETION) {
>                 rc = i2c_opal_translate_error(rc);
>                 goto exit;
> diff --git a/drivers/leds/leds-powernv.c b/drivers/leds/leds-
> powernv.c
> index 743e2cdd0891..b65bfdf6fa18 100644
> --- a/drivers/leds/leds-powernv.c
> +++ b/drivers/leds/leds-powernv.c
> @@ -99,7 +99,7 @@ static int powernv_led_set(struct powernv_led_data
> *powernv_led,
>         }
>  
>         rc = opal_leds_set_ind(token, powernv_led->loc_code,
> -                              led_mask, led_value, &max_type);
> +                              led_mask, led_value,
> stack_pa(&max_type));
>         if (rc != OPAL_ASYNC_COMPLETION) {
>                 dev_err(dev, "%s: OPAL set LED call failed for %s
> [rc=%d]\n",
>                         __func__, powernv_led->loc_code, rc);
> @@ -142,7 +142,9 @@ static enum led_brightness powernv_led_get(struct
> powernv_led_data *powernv_led)
>         max_type = powernv_led_common->max_led_type;
>  
>         rc = opal_leds_get_ind(powernv_led->loc_code,
> -                              &mask, &value, &max_type);
> +                              stack_pa(&mask),
> +                              stack_pa(&value),
> +                              stack_pa(&max_type));
>         if (rc != OPAL_SUCCESS && rc != OPAL_PARTIAL) {
>                 dev_err(dev, "%s: OPAL get led call failed
> [rc=%d]\n",
>                         __func__, rc);
> diff --git a/drivers/mtd/devices/powernv_flash.c
> b/drivers/mtd/devices/powernv_flash.c
> index 36e060386e59..a2d0e61d0afe 100644
> --- a/drivers/mtd/devices/powernv_flash.c
> +++ b/drivers/mtd/devices/powernv_flash.c
> @@ -66,10 +66,10 @@ static int powernv_flash_async_op(struct mtd_info
> *mtd, enum flash_op op,
>  
>         switch (op) {
>         case FLASH_OP_READ:
> -               rc = opal_flash_read(info->id, offset, __pa(buf),
> len, token);
> +               rc = opal_flash_read(info->id, offset,
> (uint64_t)stack_pa(buf), len, token);
>                 break;
>         case FLASH_OP_WRITE:
> -               rc = opal_flash_write(info->id, offset, __pa(buf),
> len, token);
> +               rc = opal_flash_write(info->id, offset,
> (uint64_t)stack_pa(buf), len, token);
>                 break;
>         case FLASH_OP_ERASE:
>                 rc = opal_flash_erase(info->id, offset, len, token);
> diff --git a/drivers/rtc/rtc-opal.c b/drivers/rtc/rtc-opal.c
> index ad41aaf8a17f..9e627fb7115a 100644
> --- a/drivers/rtc/rtc-opal.c
> +++ b/drivers/rtc/rtc-opal.c
> @@ -53,7 +53,7 @@ static int opal_get_rtc_time(struct device *dev,
> struct rtc_time *tm)
>         __be64 __h_m_s_ms;
>  
>         while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
> -               rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
> +               rc = opal_rtc_read(stack_pa(&__y_m_d),
> stack_pa(&__h_m_s_ms));
>                 if (rc == OPAL_BUSY_EVENT) {
>                         msleep(OPAL_BUSY_DELAY_MS);
>                         opal_poll_events(NULL);
> @@ -127,7 +127,7 @@ static int opal_get_tpo_time(struct device *dev,
> struct rtc_wkalrm *alarm)
>                 return token;
>         }
>  
> -       rc = opal_tpo_read(token, &__y_m_d, &__h_m);
> +       rc = opal_tpo_read(token, stack_pa(&__y_m_d),
> stack_pa(&__h_m));
>         if (rc != OPAL_ASYNC_COMPLETION) {
>                 rc = -EIO;
>                 goto exit;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/6] powerpc/64s: Helpers to switch between linear and vmapped stack pointers
  2022-11-05  8:00   ` Christophe Leroy
  2022-11-05 19:28     ` Christophe Leroy
@ 2022-11-07 12:38     ` Nicholas Piggin
  1 sibling, 0 replies; 17+ messages in thread
From: Nicholas Piggin @ 2022-11-07 12:38 UTC (permalink / raw)
  To: Christophe Leroy, Andrew Donnellan, linuxppc-dev; +Cc: linux-hardening, cmr

On Sat Nov 5, 2022 at 6:00 PM AEST, Christophe Leroy wrote:
>
>
> Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
> > powerpc unfortunately has too many places where we run stuff in real mode.
> > 
> > With CONFIG_VMAP_STACK enabled, this means we need to be able to swap the
> > stack pointer to use the linear mapping when we enter a real mode section,
> > and back afterwards.
> > 
> > Store the top bits of the stack pointer in both the linear map and the
> > vmalloc space in the PACA, and add some helper macros/functions to swap
> > between them.
>
> That may work when pagesize is 64k because stack is on a single page, 
> but I doubt is works with 4k pages, because vmalloc may allocate non 
> contiguous pages.

Yeah. This could be a first-stage though, and depend on 64k page size
and stack size, or !KVM or whatever. When the real-mode code is solved,
that could be relaxed.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/6] powerpc/powernv/idle: Convert stack pointer to physical address
  2022-11-04 17:27 ` [RFC PATCH 5/6] powerpc/powernv/idle: Convert stack pointer to physical address Andrew Donnellan
@ 2022-11-08 16:17   ` Christophe Leroy
  0 siblings, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2022-11-08 16:17 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: linux-hardening, cmr



Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
> When we go into idle, we must disable the MMU. Currently, we can still
> access the stack once the MMU is disabled, because the stack is in the
> linear map.
> 
> Once we enable CONFIG_VMAP_STACK, the normal stack pointer will be in the
> vmalloc area. To cope with this, manually convert the stack pointer to a
> physical address using stack_pa() before going into idle, and restore the
> original pointer on the way back out.
> 
> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
> 
> ---
> 
> This currently doesn't boot on my POWER9. I'm also going to clean this up
> to use the helpers from earlier in this series.
> ---
>   arch/powerpc/platforms/powernv/idle.c | 47 +++++++++++++++++++++++++--
>   1 file changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 841cb7f31f4f..6430fb488981 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -22,6 +22,7 @@
>   #include <asm/smp.h>
>   #include <asm/runlatch.h>
>   #include <asm/dbell.h>
> +#include <asm/reg.h>
>   
>   #include "powernv.h"
>   #include "subcore.h"
> @@ -509,6 +510,11 @@ static unsigned long power7_offline(void)
>   {
>   	unsigned long srr1;
>   
> +#ifdef CONFIG_VMAP_STACK
> +	unsigned long ksp_ea = current_stack_pointer;
> +	current_stack_pointer = (unsigned long)stack_pa((void *)ksp_ea);

Same as other patch, I think you can't just change stack pointer on the 
fly, you have to change it carefully via assembly and perform a function 
call, just like done for irqs.

> +#endif
> +
>   	mtmsr(MSR_IDLE);
>   
>   #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> @@ -543,6 +549,9 @@ static unsigned long power7_offline(void)
>   		srr1 = idle_kvm_start_guest(srr1);
>   #endif
>   
> +#ifdef CONFIG_VMAP_STACK

You could avoid many of the #ifdef and replace them with IS_ENABLED()

> +	current_stack_pointer = ksp_ea;
> +#endif
>   	mtmsr(MSR_KERNEL);
>   
>   	return srr1;
> @@ -552,14 +561,24 @@ static unsigned long power7_offline(void)
>   void power7_idle_type(unsigned long type)
>   {
>   	unsigned long srr1;
> +#ifdef CONFIG_VMAP_STACK
> +	unsigned long ksp_ea;
> +#endif
>   
>   	if (!prep_irq_for_idle_irqsoff())
>   		return;
>   
> +#ifdef CONFIG_VMAP_STACK
> +	ksp_ea = current_stack_pointer;
> +	current_stack_pointer = (unsigned long)stack_pa((void *)ksp_ea);
> +#endif
>   	mtmsr(MSR_IDLE);
>   	__ppc64_runlatch_off();
>   	srr1 = power7_idle_insn(type);
>   	__ppc64_runlatch_on();
> +#ifdef CONFIG_VMAP_STACK
> +	current_stack_pointer = ksp_ea;
> +#endif
>   	mtmsr(MSR_KERNEL);
>   
>   	fini_irq_for_idle_irqsoff();
> @@ -615,6 +634,9 @@ static unsigned long power9_idle_stop(unsigned long psscr)
>   	unsigned long mmcra = 0;
>   	struct p9_sprs sprs = {}; /* avoid false used-uninitialised */
>   	bool sprs_saved = false;
> +#ifdef CONFIG_VMAP_STACK
> +	unsigned long ksp_ea;
> +#endif
>   
>   	if (!(psscr & (PSSCR_EC|PSSCR_ESL))) {
>   		/* EC=ESL=0 case */
> @@ -633,7 +655,7 @@ static unsigned long power9_idle_stop(unsigned long psscr)
>   		 */
>   		BUG_ON((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS);
>   
> -		goto out;
> +		goto out_noloss;
>   	}
>   
>   	/* EC=ESL=1 case */
> @@ -688,6 +710,10 @@ static unsigned long power9_idle_stop(unsigned long psscr)
>   	sprs.iamr	= mfspr(SPRN_IAMR);
>   	sprs.uamor	= mfspr(SPRN_UAMOR);
>   
> +#ifdef CONFIG_VMAP_STACK
> +	ksp_ea = current_stack_pointer;
> +	current_stack_pointer = (unsigned long)stack_pa((void *)ksp_ea);
> +#endif
>   	srr1 = isa300_idle_stop_mayloss(psscr);		/* go idle */
>   
>   #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> @@ -797,6 +823,10 @@ static unsigned long power9_idle_stop(unsigned long psscr)
>   		__slb_restore_bolted_realmode();
>   
>   out:
> +#ifdef CONFIG_VMAP_STACK
> +	current_stack_pointer = ksp_ea;
> +#endif
> +out_noloss:
>   	mtmsr(MSR_KERNEL);
>   
>   	return srr1;
> @@ -898,6 +928,9 @@ static unsigned long power10_idle_stop(unsigned long psscr)
>   	unsigned long pls;
>   //	struct p10_sprs sprs = {}; /* avoid false used-uninitialised */
>   	bool sprs_saved = false;
> +#ifdef CONFIG_VMAP_STACK
> +	unsigned long ksp_ea;
> +#endif
>   
>   	if (!(psscr & (PSSCR_EC|PSSCR_ESL))) {
>   		/* EC=ESL=0 case */
> @@ -916,7 +949,7 @@ static unsigned long power10_idle_stop(unsigned long psscr)
>   		 */
>   		BUG_ON((srr1 & SRR1_WAKESTATE) != SRR1_WS_NOLOSS);
>   
> -		goto out;
> +		goto out_noloss;
>   	}
>   
>   	/* EC=ESL=1 case */
> @@ -927,7 +960,11 @@ static unsigned long power10_idle_stop(unsigned long psscr)
>   
>   		atomic_start_thread_idle();
>   	}
> -
> +#ifdef CONFIG_VMAP_STACK
> +	ksp_ea = current_stack_pointer;
> +	current_stack_pointer = (unsigned long)stack_pa((void *)ksp_ea);
> +#endif /* CONFIG_VMAP_STACK */
> +	mtmsr(MSR_IDLE);
>   	srr1 = isa300_idle_stop_mayloss(psscr);		/* go idle */
>   
>   	psscr = mfspr(SPRN_PSSCR);
> @@ -982,6 +1019,10 @@ static unsigned long power10_idle_stop(unsigned long psscr)
>   		__slb_restore_bolted_realmode();
>   
>   out:
> +#ifdef CONFIG_VMAP_STACK
> +	current_stack_pointer = ksp_ea;
> +#endif /* CONFIG_VMAP_STACK */
> +out_noloss:
>   	mtmsr(MSR_KERNEL);
>   
>   	return srr1;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args
  2022-11-04 17:27 ` [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args Andrew Donnellan
  2022-11-07  0:00   ` Russell Currey
@ 2022-11-08 16:21   ` Christophe Leroy
  1 sibling, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2022-11-08 16:21 UTC (permalink / raw)
  To: Andrew Donnellan, linuxppc-dev; +Cc: linux-hardening, cmr



Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
> A number of OPAL calls take addresses as arguments (e.g. buffers with
> strings to print, etc). These addresses need to be physical addresses, as
> OPAL runs in real mode.
> 
> Since the hardware ignores the top two bits of the address in real mode,
> passing addresses in the kernel's linear map works fine even if we don't
> wrap them in __pa().
> 
> With VMAP_STACK, however, we're going to have to use vmalloc_to_phys() to
> convert addresses from the stack into an address that OPAL can use.


I think you should then avoid using the stack for all those parameters. 
It should be handled just like DMA in drivers : Don't use the stack, use 
kmalloc memory instead.

> 
> Introduce a new macro, stack_pa(), that uses vmalloc_to_phys() for
> addresses in the vmalloc area, and __pa() for linear map addresses. Add it
> to all the existing callsites where we pass pointers to OPAL.

You should avoid that. Instead, just use kmalloc for local data. If 
that's data you get from outside the function, use kmemdup().

Christophe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/6] powerpc/64s: Fix assembly to support larger values of THREAD_SIZE
  2022-11-04 17:51   ` Christophe Leroy
@ 2023-04-26  7:03     ` Andrew Donnellan
  0 siblings, 0 replies; 17+ messages in thread
From: Andrew Donnellan @ 2023-04-26  7:03 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: linux-hardening, cmr, ruscur

On Fri, 2022-11-04 at 17:51 +0000, Christophe Leroy wrote:
> 
> 
> Le 04/11/2022 à 18:27, Andrew Donnellan a écrit :
> > When CONFIG_VMAP_STACK is enabled, we set THREAD_SIZE to be at
> > least the
> > size of a page.
> > 
> > There's a few bits of assembly in the book3s64 code that use
> > THREAD_SIZE in
> > immediate mode instructions, which can only take an operand of up
> > to 16
> > bits signed, which isn't quite large enough.
> > 
> > Fix these spots to use a scratch register or use two immediate mode
> > instructions instead, so we can later enable VMAP_STACK.
> > 
> > Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
> > ---
> >   arch/powerpc/include/asm/asm-compat.h   | 2 ++
> >   arch/powerpc/kernel/entry_64.S          | 4 +++-
> >   arch/powerpc/kernel/irq.c               | 8 ++++++--
> >   arch/powerpc/kernel/misc_64.S           | 4 +++-
> >   arch/powerpc/kvm/book3s_hv_rmhandlers.S | 3 ++-
> >   5 files changed, 16 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/asm-compat.h
> > b/arch/powerpc/include/asm/asm-compat.h
> > index 2bc53c646ccd..30dd7813bf3b 100644
> > --- a/arch/powerpc/include/asm/asm-compat.h
> > +++ b/arch/powerpc/include/asm/asm-compat.h
> > @@ -11,6 +11,7 @@
> >   #define PPC_LL                stringify_in_c(ld)
> >   #define PPC_STL               stringify_in_c(std)
> >   #define PPC_STLU      stringify_in_c(stdu)
> > +#define PPC_STLUX      stringify_in_c(stdux)
> >   #define PPC_LCMPI     stringify_in_c(cmpdi)
> >   #define PPC_LCMPLI    stringify_in_c(cmpldi)
> >   #define PPC_LCMP      stringify_in_c(cmpd)
> > @@ -45,6 +46,7 @@
> >   #define PPC_LL                stringify_in_c(lwz)
> >   #define PPC_STL               stringify_in_c(stw)
> >   #define PPC_STLU      stringify_in_c(stwu)
> > +#define PPC_STLUX      stringify_in_c(stwux)
> >   #define PPC_LCMPI     stringify_in_c(cmpwi)
> >   #define PPC_LCMPLI    stringify_in_c(cmplwi)
> >   #define PPC_LCMP      stringify_in_c(cmpw)
> > diff --git a/arch/powerpc/kernel/entry_64.S
> > b/arch/powerpc/kernel/entry_64.S
> > index 3e2e37e6ecab..af25db6e0205 100644
> > --- a/arch/powerpc/kernel/entry_64.S
> > +++ b/arch/powerpc/kernel/entry_64.S
> > @@ -238,7 +238,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> >         /* Note: this uses SWITCH_FRAME_SIZE rather than
> > INT_FRAME_SIZE
> >            because we don't need to leave the 288-byte ABI gap at
> > the
> >            top of the kernel stack. */
> > -       addi    r7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE
> > +       li      r9,0
> > +       ori     r9,r9,THREAD_SIZE-SWITCH_FRAME_SIZE
> > +       add     r7,r7,r9
> 
> So you assume THREAD_SIZE is never more than 64k ? Is that a valid 
> assumption ?

It looks like PPC_PAGE_SHIFT can be up to 18, which would make
THREAD_SIZE 256K, but that's only if you have 256K pages, which is a
44x specific feature. Otherwise AFAICT you can't get THREAD_SHIFT
larger than 16 and therefore THREAD_SIZE <= 64K.


> 
> What about the below instead:
> 
>         addis   r7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE@ha
>         addi    r7,r7,THREAD_SIZE-SWITCH_FRAME_SIZE@l

That looks better anyway, thanks.

> 
> >   
> >         /*
> >          * PMU interrupts in radix may come in here. They will use
> > r1, not
> > diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> > index 9ede61a5a469..098cf6adceec 100644
> > --- a/arch/powerpc/kernel/irq.c
> > +++ b/arch/powerpc/kernel/irq.c
> > @@ -204,7 +204,9 @@ static __always_inline void
> > call_do_softirq(const void *sp)
> >   {
> >         /* Temporarily switch r1 to sp, call __do_softirq() then
> > restore r1. */
> >         asm volatile (
> > -                PPC_STLU "     %%r1, %[offset](%[sp])  ;"
> > +               "li             %%r0, 0                 ;"
> > +               "ori            %%r0, %%r0, %[offset]   ;"
> 
> Same, you assume offset to be max 64k, is that correct ?
> 
> What about
>                 lis             r0, offset@h
>                 ori             r0, r0, offset@l
> 
> > +                PPC_STLUX "    %%r1, %[sp], %%r0       ;"
> >                 "mr             %%r1, %[sp]             ;"
> >                 "bl             %[callee]               ;"
> >                  PPC_LL "       %%r1, 0(%%r1)           ;"
> > @@ -256,7 +258,9 @@ static __always_inline void call_do_irq(struct
> > pt_regs *regs, void *sp)
> >   
> >         /* Temporarily switch r1 to sp, call __do_irq() then
> > restore r1. */
> >         asm volatile (
> > -                PPC_STLU "     %%r1, %[offset](%[sp])  ;"
> > +               "li             %%r0, 0                 ;"
> > +               "ori            %%r0, %%r0, %[offset]   ;"
> > +                PPC_STLUX "    %%r1, %[sp], %%r0       ;"
> 
> Same
> 
> >                 "mr             %%r4, %%r1              ;"
> >                 "mr             %%r1, %[sp]             ;"
> >                 "bl             %[callee]               ;"
> > diff --git a/arch/powerpc/kernel/misc_64.S
> > b/arch/powerpc/kernel/misc_64.S
> > index 36184cada00b..ff71b98500a3 100644
> > --- a/arch/powerpc/kernel/misc_64.S
> > +++ b/arch/powerpc/kernel/misc_64.S
> > @@ -384,7 +384,9 @@ _GLOBAL(kexec_sequence)
> >         std     r0,16(r1)
> >   
> >         /* switch stacks to newstack -- &kexec_stack.stack */
> > -       stdu    r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r3)
> > +       li      r0,0
> > +       ori     r0,r0,THREAD_SIZE-STACK_FRAME_OVERHEAD
> > +       stdux   r1,r3,r0
> 
> Same
> 
> >         mr      r1,r3
> >   
> >         li      r0,0
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index 37f50861dd98..d05e3d324f4d 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -2686,7 +2686,8 @@ kvmppc_bad_host_intr:
> >         mr      r9, r1
> >         std     r1, PACAR1(r13)
> >         ld      r1, PACAEMERGSP(r13)
> > -       subi    r1, r1, THREAD_SIZE/2 + INT_FRAME_SIZE
> > +       subi    r1, r1, THREAD_SIZE/2
> > +       subi    r1, r1, INT_FRAME_SIZE
> 
> Same, what about
> 
>         subis   r1, r1, THREAD_SIZE/2 + INT_FRAME_SIZE@ha
>         subi    r1, r1, THREAD_SIZE/2 + INT_FRAME_SIZE@l
> 
> >         std     r9, 0(r1)
> >         std     r0, GPR0(r1)
> >         std     r9, GPR1(r1)

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-04-26  7:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04 17:27 [RFC PATCH 0/6] VMAP_STACK support for book3s64 Andrew Donnellan
2022-11-04 17:27 ` [RFC PATCH 1/6] powerpc/64s: Fix assembly to support larger values of THREAD_SIZE Andrew Donnellan
2022-11-04 17:51   ` Christophe Leroy
2023-04-26  7:03     ` Andrew Donnellan
2022-11-04 17:27 ` [RFC PATCH 2/6] powerpc/64s: Helpers to switch between linear and vmapped stack pointers Andrew Donnellan
2022-11-05  8:00   ` Christophe Leroy
2022-11-05 19:28     ` Christophe Leroy
2022-11-07 12:38     ` Nicholas Piggin
2022-11-04 17:27 ` [RFC PATCH 3/6] powerpc/powernv: Keep MSR in register across OPAL entry/return path Andrew Donnellan
2022-11-04 18:00   ` Christophe Leroy
2022-11-04 17:27 ` [RFC PATCH 4/6] powerpc/powernv: Convert pointers to physical addresses in OPAL call args Andrew Donnellan
2022-11-07  0:00   ` Russell Currey
2022-11-08 16:21   ` Christophe Leroy
2022-11-04 17:27 ` [RFC PATCH 5/6] powerpc/powernv/idle: Convert stack pointer to physical address Andrew Donnellan
2022-11-08 16:17   ` Christophe Leroy
2022-11-04 17:27 ` [RFC PATCH 6/6] powerpc/64s: Enable CONFIG_VMAP_STACK Andrew Donnellan
2022-11-05 17:07   ` Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).