All of lore.kernel.org
 help / color / mirror / Atom feed
From: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH] powerpc/64s/hash: Fix hash_preload running with interrupts enabled
Date: Mon, 27 Jul 2020 15:02:20 +0530	[thread overview]
Message-ID: <4925309C-A338-4C0F-90E3-4522643021CB@linux.vnet.ibm.com> (raw)
In-Reply-To: <20200727060947.10060-1-npiggin@gmail.com>



> On 27-Jul-2020, at 11:39 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Commit 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the
> caller") removed the local_irq_disable from hash_preload, but it was
> required for more than just the page table walk: the hash pte busy bit is
> effectively a lock which may be taken in interrupt context, and the local
> update flag test must not be preempted before it's used.
> 
> This solves apparent lockups with perf interrupting __hash_page_64K. If
> get_perf_callchain then also takes a hash fault on the same page while it
> is already locked, it will loop forever taking hash faults, which looks like
> this:
> 
> cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70]
>    pc: c000000000072dc8: hash_page_mm+0x8/0x800
>    lr: c00000000000c5a4: do_hash_page+0x24/0x38
>    sp: c0002ac1cc69ac70
>   msr: 8000000000081033
>  current = 0xc0002ac1cc602e00
>  paca    = 0xc00000001de1f280   irqmask: 0x03   irq_happened: 0x01
>    pid   = 20118, comm = pread2_processe
> Linux version 5.8.0-rc6-00345-g1fad14f18bc6
> 49e:mon> t
> [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable)
> --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac
> [link register   ] c000000000335d10 copy_from_user_nofault+0xf0/0x150
> [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable)
> [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0
> [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410
> [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40
> [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360
> [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0
> [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790
> [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0
> [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0
> [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750
> [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510
> [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60
> [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0
> --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160
> [link register   ] c0000000000846f0 __hash_page_64K+0x210/0x540
> [c0002ac1cc69ba50] 0000000000000000 (unreliable)
> [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0
> [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0
> [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60
> [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60
> [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90
> [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c
> --- Exception: 300 (Data Access) at 00007fff8c861fe8
> SP (7ffff6b19660) is in userspace
> 
> Reported-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> Reported-by: Anton Blanchard <anton@ozlabs.org>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Fixes: 2f92447f9f96 ("powerpc/book3s64/hash: Use the pte_t address from the
> caller")
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>


Hi,

Tested with the patch and it fixes the lockups I was seeing with my test run.
Thanks for the fix.

Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>

> ---
> arch/powerpc/kernel/exceptions-64s.S  | 14 +++++++++++---
> arch/powerpc/mm/book3s64/hash_utils.c | 25 +++++++++++++++++++++++++
> arch/powerpc/perf/core-book3s.c       |  6 ++++++
> 3 files changed, 42 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 0fc8bad878b2..446e54c3f71e 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -3072,10 +3072,18 @@ do_hash_page:
> 	ori	r0,r0,DSISR_BAD_FAULT_64S@l
> 	and.	r0,r5,r0		/* weird error? */
> 	bne-	handle_page_fault	/* if not, try to insert a HPTE */
> +
> +	/*
> +	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
> +	 * don't call hash_page, just fail the fault. This is required to
> +	 * prevent re-entrancy problems in the hash code, namely perf
> +	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
> +	 * hash fault. See the comment in hash_preload().
> +	 */
> 	ld	r11, PACA_THREAD_INFO(r13)
> -	lwz	r0,TI_PREEMPT(r11)	/* If we're in an "NMI" */
> -	andis.	r0,r0,NMI_MASK@h	/* (i.e. an irq when soft-disabled) */
> -	bne	77f			/* then don't call hash_page now */
> +	lwz	r0,TI_PREEMPT(r11)
> +	andis.	r0,r0,NMI_MASK@h
> +	bne	77f
> 
> 	/*
> 	 * r3 contains the trap number
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 468169e33c86..9b9f92ad0e7a 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1559,6 +1559,7 @@ static void hash_preload(struct mm_struct *mm, pte_t *ptep, unsigned long ea,
> 	pgd_t *pgdir;
> 	int rc, ssize, update_flags = 0;
> 	unsigned long access = _PAGE_PRESENT | _PAGE_READ | (is_exec ? _PAGE_EXEC : 0);
> +	unsigned long flags;
> 
> 	BUG_ON(get_region_id(ea) != USER_REGION_ID);
> 
> @@ -1592,6 +1593,28 @@ static void hash_preload(struct mm_struct *mm, pte_t *ptep, unsigned long ea,
> 		return;
> #endif /* CONFIG_PPC_64K_PAGES */
> 
> +	/*
> +	 * __hash_page_* must run with interrupts off, as it sets the
> +	 * H_PAGE_BUSY bit. It's possible for perf interrupts to hit at any
> +	 * time and may take a hash fault reading the user stack, see
> +	 * read_user_stack_slow() in the powerpc/perf code.
> +	 *
> +	 * If that takes a hash fault on the same page as we lock here, it
> +	 * will bail out when seeing H_PAGE_BUSY set, and retry the access
> +	 * leading to an infinite loop.
> +	 *
> +	 * Disabling interrupts here does not prevent perf interrupts, but it
> +	 * will prevent them taking hash faults (see the NMI test in
> +	 * do_hash_page), then read_user_stack's copy_from_user_nofault will
> +	 * fail and perf will fall back to read_user_stack_slow(), which
> +	 * walks the Linux page tables.
> +	 *
> +	 * Interrupts must also be off for the duration of the
> +	 * mm_is_thread_local test and update, to prevent preempt running the
> +	 * mm on another CPU (XXX: this may be racy vs kthread_use_mm).
> +	 */
> +	local_irq_save(flags);
> +
> 	/* Is that local to this CPU ? */
> 	if (mm_is_thread_local(mm))
> 		update_flags |= HPTE_LOCAL_UPDATE;
> @@ -1614,6 +1637,8 @@ static void hash_preload(struct mm_struct *mm, pte_t *ptep, unsigned long ea,
> 				   mm_ctx_user_psize(&mm->context),
> 				   mm_ctx_user_psize(&mm->context),
> 				   pte_val(*ptep));
> +
> +	local_irq_restore(flags);
> }
> 
> /*
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index cd6a742ac6ef..01d70280d287 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -2179,6 +2179,12 @@ static void __perf_event_interrupt(struct pt_regs *regs)
> 
> 	perf_read_regs(regs);
> 
> +	/*
> +	 * If perf interrupts hit in a local_irq_disable (soft-masked) region,
> +	 * we consider them as NMIs. This is required to prevent hash faults on
> +	 * user addresses when reading callchains. See the NMI test in
> +	 * do_hash_page.
> +	 */
> 	nmi = perf_intr_is_nmi(regs);
> 	if (nmi)
> 		nmi_enter();
> -- 
> 2.23.0
> 


  reply	other threads:[~2020-07-27  9:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27  6:09 [PATCH] powerpc/64s/hash: Fix hash_preload running with interrupts enabled Nicholas Piggin
2020-07-27  9:32 ` Athira Rajeev [this message]
2020-07-27 12:35   ` Michael Ellerman
2020-07-27 17:21     ` Athira Rajeev
2020-07-28  0:44       ` Michael Ellerman
2020-07-29  4:18         ` Athira Rajeev
2020-08-02 13:24 ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4925309C-A338-4C0F-90E3-4522643021CB@linux.vnet.ibm.com \
    --to=atrajeev@linux.vnet.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.