From: Nicholas Piggin <npiggin@gmail.com>
To: linuxppc-dev@lists.ozlabs.org, Michael Ellerman <mpe@ellerman.id.au>
Cc: aneesh.kumar@linux.ibm.com
Subject: Re: [PATCH 5/6] powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR
Date: Fri, 12 Feb 2021 10:36:43 +1000 [thread overview]
Message-ID: <1613086376.ygjdbhz8p5.astroid@bobo.none> (raw)
In-Reply-To: <20210211135130.3474832-5-mpe@ellerman.id.au>
Excerpts from Michael Ellerman's message of February 11, 2021 11:51 pm:
> When we enabled STRICT_KERNEL_RWX we received some reports of boot
> failures when using the Hash MMU and running under phyp. The crashes
> are intermittent, and often exhibit as a completely unresponsive
> system, or possibly an oops.
>
> One example, which was caught in xmon:
>
> [ 14.068327][ T1] devtmpfs: mounted
> [ 14.069302][ T1] Freeing unused kernel memory: 5568K
> [ 14.142060][ T347] BUG: Unable to handle kernel instruction fetch
> [ 14.142063][ T1] Run /sbin/init as init process
> [ 14.142074][ T347] Faulting instruction address: 0xc000000000004400
> cpu 0x2: Vector: 400 (Instruction Access) at [c00000000c7475e0]
> pc: c000000000004400: exc_virt_0x4400_instruction_access+0x0/0x80
> lr: c0000000001862d4: update_rq_clock+0x44/0x110
> sp: c00000000c747880
> msr: 8000000040001031
> current = 0xc00000000c60d380
> paca = 0xc00000001ec9de80 irqmask: 0x03 irq_happened: 0x01
> pid = 347, comm = kworker/2:1
> ...
> enter ? for help
> [c00000000c747880] c0000000001862d4 update_rq_clock+0x44/0x110 (unreliable)
> [c00000000c7478f0] c000000000198794 update_blocked_averages+0xb4/0x6d0
> [c00000000c7479f0] c000000000198e40 update_nohz_stats+0x90/0xd0
> [c00000000c747a20] c0000000001a13b4 _nohz_idle_balance+0x164/0x390
> [c00000000c747b10] c0000000001a1af8 newidle_balance+0x478/0x610
> [c00000000c747be0] c0000000001a1d48 pick_next_task_fair+0x58/0x480
> [c00000000c747c40] c000000000eaab5c __schedule+0x12c/0x950
> [c00000000c747cd0] c000000000eab3e8 schedule+0x68/0x120
> [c00000000c747d00] c00000000016b730 worker_thread+0x130/0x640
> [c00000000c747da0] c000000000174d50 kthread+0x1a0/0x1b0
> [c00000000c747e10] c00000000000e0f0 ret_from_kernel_thread+0x5c/0x6c
>
> This shows that CPU 2, which was idle, woke up and then appears to
> randomly take an instruction fault on a completely valid area of
> kernel text.
>
> The cause turns out to be the call to hash__mark_rodata_ro(), late in
> boot. Due to the way we layout text and rodata, that function actually
> changes the permissions for all of text and rodata to read-only plus
> execute.
>
> To do the permission change we use a hypervisor call, H_PROTECT. On
> phyp that appears to be implemented by briefly removing the mapping of
> the kernel text, before putting it back with the updated permissions.
> If any other CPU is executing during that window, it will see spurious
> faults on the kernel text and/or data, leading to crashes.
>
> To fix it we use stop machine to collect all other CPUs, and then have
> them drop into real mode (MMU off), while we change the mapping. That
> way they are unaffected by the mapping temporarily disappearing.
>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
> arch/powerpc/mm/book3s64/hash_pgtable.c | 105 +++++++++++++++++++++++-
> 1 file changed, 104 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_pgtable.c b/arch/powerpc/mm/book3s64/hash_pgtable.c
> index 3663d3cdffac..01de985df2c4 100644
> --- a/arch/powerpc/mm/book3s64/hash_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/hash_pgtable.c
> @@ -8,6 +8,7 @@
> #include <linux/sched.h>
> #include <linux/mm_types.h>
> #include <linux/mm.h>
> +#include <linux/stop_machine.h>
>
> #include <asm/sections.h>
> #include <asm/mmu.h>
> @@ -400,6 +401,19 @@ EXPORT_SYMBOL_GPL(hash__has_transparent_hugepage);
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> #ifdef CONFIG_STRICT_KERNEL_RWX
> +
> +struct change_memory_parms {
> + unsigned long start, end, newpp;
> + unsigned int step, nr_cpus, master_cpu;
> + atomic_t cpu_counter;
> +};
> +
> +// We'd rather this was on the stack but it has to be in the RMO
> +static struct change_memory_parms chmem_parms;
> +
> +// And therefore we need a lock to protect it from concurrent use
> +static DEFINE_MUTEX(chmem_lock);
> +
> static void change_memory_range(unsigned long start, unsigned long end,
> unsigned int step, unsigned long newpp)
> {
> @@ -414,6 +428,73 @@ static void change_memory_range(unsigned long start, unsigned long end,
> mmu_kernel_ssize);
> }
>
> +static int notrace chmem_secondary_loop(struct change_memory_parms *parms)
> +{
> + unsigned long msr, tmp, flags;
> + int *p;
> +
> + p = &parms->cpu_counter.counter;
> +
> + local_irq_save(flags);
> + __hard_EE_RI_disable();
> +
> + asm volatile (
> + // Switch to real mode and leave interrupts off
> + "mfmsr %[msr] ;"
> + "li %[tmp], %[MSR_IR_DR] ;"
> + "andc %[tmp], %[msr], %[tmp] ;"
> + "mtmsrd %[tmp] ;"
> +
> + // Tell the master we are in real mode
> + "1: "
> + "lwarx %[tmp], 0, %[p] ;"
> + "addic %[tmp], %[tmp], -1 ;"
> + "stwcx. %[tmp], 0, %[p] ;"
> + "bne- 1b ;"
> +
> + // Spin until the counter goes to zero
> + "2: ;"
> + "lwz %[tmp], 0(%[p]) ;"
> + "cmpwi %[tmp], 0 ;"
> + "bne- 2b ;"
> +
> + // Switch back to virtual mode
> + "mtmsrd %[msr] ;"
Pity we don't have something that can switch to emergency stack and
so we can write this stuff in C.
How's something like this suit you?
---
arch/powerpc/kernel/misc_64.S | 22 +++++++++++++++++++++
arch/powerpc/kernel/process.c | 37 +++++++++++++++++++++++++++++++++++
2 files changed, 59 insertions(+)
diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
index 070465825c21..5e911d0b0b16 100644
--- a/arch/powerpc/kernel/misc_64.S
+++ b/arch/powerpc/kernel/misc_64.S
@@ -27,6 +27,28 @@
.text
+#ifdef CONFIG_PPC_BOOK3S_64
+_GLOBAL(__call_realmode)
+ mflr r0
+ std r0,16(r1)
+ stdu r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r5)
+ mr r1,r5
+ mtctr r3
+ mr r3,r4
+ mfmsr r4
+ xori r4,r4,(MSR_IR|MSR_DR)
+ mtmsrd r4
+ bctrl
+ mfmsr r4
+ xori r4,r4,(MSR_IR|MSR_DR)
+ mtmsrd r4
+ ld r1,0(r1)
+ ld r0,16(r1)
+ mtlr r0
+ blr
+
+#endif
+
_GLOBAL(call_do_softirq)
mflr r0
std r0,16(r1)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index a66f435dabbf..260d60f665a3 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2197,6 +2197,43 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
put_task_stack(tsk);
}
+#ifdef CONFIG_PPC_BOOK3S_64
+int __call_realmode(int (*fn)(void *arg), void *arg, void *sp);
+
+/* XXX: find a better place for this
+ * Executing C code in real-mode in general Book3S-64 code can only be done
+ * via this function that switches the stack to one inside the real-mode-area,
+ * which may cover only a small first part of real memory on hash guest LPARs.
+ * fn must be NOKPROBES, must not access vmalloc or anything outside the RMA,
+ * probably shouldn't enable the MMU or interrupts, etc, and be very careful
+ * about calling other generic kernel or powerpc functions.
+ */
+int call_realmode(int (*fn)(void *arg), void *arg)
+{
+ unsigned long flags;
+ void *cursp, *emsp;
+ int ret;
+
+ /* Stack switch is only really required for HPT LPAR, but do it for all to help test coverage of tricky code */
+ cursp = (void *)(current_stack_pointer & ~(THREAD_SIZE - 1));
+ emsp = (void *)(local_paca->emergency_sp - THREAD_SIZE);
+
+ /* XXX check_stack_overflow(); */
+
+ if (WARN_ON_ONCE(cursp == emsp))
+ return -EBUSY;
+
+ local_irq_save(flags);
+ hard_irq_disable();
+
+ ret = __call_realmode(fn, arg, emsp);
+
+ local_irq_restore(flags);
+
+ return ret;
+}
+#endif
+
#ifdef CONFIG_PPC64
/* Called with hard IRQs off */
void notrace __ppc64_runlatch_on(void)
--
2.23.0
next prev parent reply other threads:[~2021-02-12 0:54 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-11 13:51 [PATCH 1/6] powerpc/mm/64s: Add _PAGE_KERNEL_ROX Michael Ellerman
2021-02-11 13:51 ` [PATCH 2/6] powerpc/pseries: Add key to flags in pSeries_lpar_hpte_updateboltedpp() Michael Ellerman
2021-02-16 5:39 ` Daniel Axtens
2021-02-18 23:25 ` Michael Ellerman
2021-02-11 13:51 ` [PATCH 3/6] powerpc/64s: Use htab_convert_pte_flags() in hash__mark_rodata_ro() Michael Ellerman
2021-02-16 5:50 ` Daniel Axtens
2021-02-11 13:51 ` [PATCH 4/6] powerpc/mm/64s/hash: Factor out change_memory_range() Michael Ellerman
2021-02-19 2:08 ` Daniel Axtens
2021-03-16 6:30 ` Michael Ellerman
2021-02-11 13:51 ` [PATCH 5/6] powerpc/mm/64s/hash: Add real-mode change_memory_range() for hash LPAR Michael Ellerman
2021-02-11 23:16 ` Nicholas Piggin
2021-03-20 13:04 ` Michael Ellerman
2021-03-22 2:56 ` Nicholas Piggin
2021-02-12 0:36 ` Nicholas Piggin [this message]
2021-03-16 6:40 ` Michael Ellerman
2021-03-22 3:09 ` Nicholas Piggin
2021-03-22 9:07 ` Michael Ellerman
2021-02-19 2:43 ` Daniel Axtens
2021-03-19 11:56 ` Michael Ellerman
2021-02-11 13:51 ` [PATCH 6/6] powerpc/mm/64s: Allow STRICT_KERNEL_RWX again Michael Ellerman
2021-04-10 14:28 ` [PATCH 1/6] powerpc/mm/64s: Add _PAGE_KERNEL_ROX Michael Ellerman
2021-04-19 5:17 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1613086376.ygjdbhz8p5.astroid@bobo.none \
--to=npiggin@gmail.com \
--cc=aneesh.kumar@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).