linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leonardo Bras <leonardo@linux.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Arnd Bergmann <arnd@arndb.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@c-s.fr>,
	Steven Price <steven.price@arm.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Leonardo Bras <leonardo@linux.ibm.com>,
	Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Reza Arbab <arbab@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Allison Randal <allison@lohutok.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mike Rapoport <rppt@linux.ibm.com>,
	Michal Suchanek <msuchanek@suse.de>
Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	kvm-ppc@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-mm@kvack.org
Subject: [PATCH v6 01/11] asm-generic/pgtable: Adds generic functions to track lockless pgtable walks
Date: Thu,  6 Feb 2020 00:08:50 -0300	[thread overview]
Message-ID: <20200206030900.147032-2-leonardo@linux.ibm.com> (raw)
Message-ID: <20200206030850.yoLPZckX7_P-afzoJ_OILCkzpUG7sHR1Y66ebVLdGo4@z> (raw)
In-Reply-To: <20200206030900.147032-1-leonardo@linux.ibm.com>

It's necessary to track lockless pagetable walks, in order to avoid doing
THP splitting/collapsing during them.

The default solution is to disable irq before lockless pagetable walks and
enable it after it's finished.

On code, this means you can find local_irq_disable() and local_irq_enable()
around some pieces of code, usually without comments on why it is needed.

This patch proposes a set of generic functions to be called before starting
and after finishing a lockless pagetable walk. It is supposed to make clear
that a lockless pagetable walk happens there, and also carries information
on why the irq disable/enable is needed.

begin_lockless_pgtbl_walk()
        Insert before starting any lockless pgtable walk
end_lockless_pgtbl_walk()
        Insert after the end of any lockless pgtable walk
        (Mostly after the ptep is last used)

A memory barrier was also added just to make sure there is no speculative
read outside the interrupt disabled area. Other than that, it is not
supposed to have any change of behavior from current code.

It is planned to allow arch-specific versions, so that additional steps can
be added while keeping the code clean.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 include/asm-generic/pgtable.h | 51 +++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index e2e2bef07dd2..8d368d3c0974 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1222,6 +1222,57 @@ static inline bool arch_has_pfn_modify_check(void)
 #endif
 #endif
 
+#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL
+/*
+ * begin_lockless_pgtbl_walk: Must be inserted before a function call that does
+ *   lockless pagetable walks, such as __find_linux_pte()
+ */
+static inline
+unsigned long begin_lockless_pgtbl_walk(void)
+{
+	unsigned long irq_mask;
+
+	/*
+	 * Interrupts must be disabled during the lockless page table walk.
+	 * That's because the deleting or splitting involves flushing TLBs,
+	 * which in turn issues interrupts, that will block when disabled.
+	 */
+	local_irq_save(irq_mask);
+
+	/*
+	 * This memory barrier pairs with any code that is either trying to
+	 * delete page tables, or split huge pages. Without this barrier,
+	 * the page tables could be read speculatively outside of interrupt
+	 * disabling.
+	 */
+	smp_mb();
+
+	return irq_mask;
+}
+
+/*
+ * end_lockless_pgtbl_walk: Must be inserted after the last use of a pointer
+ *   returned by a lockless pagetable walk, such as __find_linux_pte()
+ */
+static inline void end_lockless_pgtbl_walk(unsigned long irq_mask)
+{
+	/*
+	 * This memory barrier pairs with any code that is either trying to
+	 * delete page tables, or split huge pages. Without this barrier,
+	 * the page tables could be read speculatively outside of interrupt
+	 * disabling.
+	 */
+	smp_mb();
+
+	/*
+	 * Interrupts must be disabled during the lockless page table walk.
+	 * That's because the deleting or splitting involves flushing TLBs,
+	 * which in turn issues interrupts, that will block when disabled.
+	 */
+	local_irq_restore(irq_mask);
+}
+#endif
+
 /*
  * On some architectures it depends on the mm if the p4d/pud or pmd
  * layer of the page table hierarchy is folded or not.
-- 
2.24.1

  parent reply	other threads:[~2020-02-06  3:08 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-06  3:08 [PATCH v6 00/11] Introduces new functions for tracking lockless pagetable walks Leonardo Bras
2020-02-06  3:08 ` Leonardo Bras
2020-02-06  3:08 ` Leonardo Bras [this message]
2020-02-06  3:08   ` [PATCH v6 01/11] asm-generic/pgtable: Adds generic functions to track lockless pgtable walks Leonardo Bras
2020-02-06  5:54   ` Christophe Leroy
2020-02-06  5:54     ` Christophe Leroy
2020-02-07  2:19     ` Leonardo Bras
2020-02-07  2:19       ` Leonardo Bras
2020-02-07  5:39   ` kbuild test robot
2020-02-07  5:39     ` kbuild test robot
2020-02-06  3:08 ` [PATCH v6 02/11] mm/gup: Use functions to track lockless pgtbl walks on gup_pgd_range Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  3:25   ` Leonardo Bras
2020-02-06  3:25     ` Leonardo Bras
2020-02-07 22:54     ` John Hubbard
2020-02-07 22:54       ` John Hubbard
2020-02-17 20:55       ` Leonardo Bras
2020-02-17 20:55         ` Leonardo Bras
2020-10-15 14:46     ` Michal Suchánek
2020-10-16  3:27       ` Aneesh Kumar K.V
2020-02-07  1:19   ` kbuild test robot
2020-02-07  1:19     ` kbuild test robot
2020-02-07  8:01   ` kbuild test robot
2020-02-07  8:01     ` kbuild test robot
2020-02-06  3:08 ` [PATCH v6 03/11] powerpc/mm: Adds arch-specificic functions to track lockless pgtable walks Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  5:46   ` Christophe Leroy
2020-02-06  5:46     ` Christophe Leroy
2020-02-07  4:38     ` Leonardo Bras
2020-02-07  4:38       ` Leonardo Bras
2020-02-17 20:32       ` Leonardo Bras
2020-02-17 20:32         ` Leonardo Bras
2020-02-06  3:08 ` [PATCH v6 04/11] powerpc/mce_power: Use functions to track lockless pgtbl walks Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  5:48   ` Christophe Leroy
2020-02-06  5:48     ` Christophe Leroy
2020-02-07  4:00     ` Leonardo Bras
2020-02-07  4:00       ` Leonardo Bras
2020-02-06  3:08 ` [PATCH v6 05/11] powerpc/perf: " Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  3:08 ` [PATCH v6 06/11] powerpc/mm/book3s64/hash: " Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  6:06   ` Christophe Leroy
2020-02-06  6:06     ` Christophe Leroy
2020-02-07  3:49     ` Leonardo Bras
2020-02-07  3:49       ` Leonardo Bras
2020-02-06  3:08 ` [PATCH v6 07/11] powerpc/kvm/e500: " Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  6:18   ` Christophe Leroy
2020-02-06  6:18     ` Christophe Leroy
2020-02-07  3:10     ` Leonardo Bras
2020-02-07  3:10       ` Leonardo Bras
2020-02-06  3:08 ` [PATCH v6 08/11] powerpc/kvm/book3s_hv: " Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  3:08 ` [PATCH v6 09/11] powerpc/kvm/book3s_64: " Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  3:08 ` [PATCH v6 10/11] powerpc/mm: Adds counting method to track lockless pagetable walks Leonardo Bras
2020-02-06  3:08   ` Leonardo Bras
2020-02-06  6:23   ` Christophe Leroy
2020-02-06  6:23     ` Christophe Leroy
2020-02-07  1:56     ` Leonardo Bras
2020-02-07  1:56       ` Leonardo Bras
2020-02-06  3:09 ` [PATCH v6 11/11] powerpc/mm/book3s64/pgtable: Uses counting method to skip serializing Leonardo Bras
2020-02-06  3:09   ` Leonardo Bras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200206030900.147032-2-leonardo@linux.ibm.com \
    --to=leonardo@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=allison@lohutok.net \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=arbab@linux.ibm.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=bsingharora@gmail.com \
    --cc=christophe.leroy@c-s.fr \
    --cc=gregkh@linuxfoundation.org \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.vnet.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=msuchanek@suse.de \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    --cc=robin.murphy@arm.com \
    --cc=rppt@linux.ibm.com \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).