All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Hildenbrand <david@redhat.com>,
	Peter Xu <peterx@redhat.com>, Yang Shi <shy828301@gmail.com>,
	John Hubbard <jhubbard@nvidia.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	Hugh Dickins <hughd@google.com>, Jason Gunthorpe <jgg@nvidia.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: [PATCH 5.10 11/54] mm: gup: fix the fast GUP race against THP collapse
Date: Thu, 13 Oct 2022 19:52:05 +0200	[thread overview]
Message-ID: <20221013175147.639795471@linuxfoundation.org> (raw)
In-Reply-To: <20221013175147.337501757@linuxfoundation.org>

From: Yang Shi <shy828301@gmail.com>

commit 70cbc3cc78a997d8247b50389d37c4e1736019da upstream.

Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm:
introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
sufficient to handle concurrent GUP-fast in all cases, it only handles
traditional IPI-based GUP-fast correctly.  On architectures that send an
IPI broadcast on TLB flush, it works as expected.  But on the
architectures that do not use IPI to broadcast TLB flush, it may have the
below race:

   CPU A                                          CPU B
THP collapse                                     fast GUP
                                              gup_pmd_range() <-- see valid pmd
                                                  gup_pte_range() <-- work on pte
pmdp_collapse_flush() <-- clear pmd and flush
__collapse_huge_page_isolate()
    check page pinned <-- before GUP bump refcount
                                                      pin the page
                                                      check PTE <-- no change
__collapse_huge_page_copy()
    copy data to huge page
    ptep_clear()
install huge pmd for the huge page
                                                      return the stale page
discard the stale page

The race can be fixed by checking whether PMD is changed or not after
taking the page pin in fast GUP, just like what it does for PTE.  If the
PMD is changed it means there may be parallel THP collapse, so GUP should
back off.

Also update the stale comment about serializing against fast GUP in
khugepaged.

Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/gup.c        |   34 ++++++++++++++++++++++++++++------
 mm/khugepaged.c |   10 ++++++----
 2 files changed, 34 insertions(+), 10 deletions(-)

--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2128,8 +2128,28 @@ static void __maybe_unused undo_dev_page
 }
 
 #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
-static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
-			 unsigned int flags, struct page **pages, int *nr)
+/*
+ * Fast-gup relies on pte change detection to avoid concurrent pgtable
+ * operations.
+ *
+ * To pin the page, fast-gup needs to do below in order:
+ * (1) pin the page (by prefetching pte), then (2) check pte not changed.
+ *
+ * For the rest of pgtable operations where pgtable updates can be racy
+ * with fast-gup, we need to do (1) clear pte, then (2) check whether page
+ * is pinned.
+ *
+ * Above will work for all pte-level operations, including THP split.
+ *
+ * For THP collapse, it's a bit more complicated because fast-gup may be
+ * walking a pgtable page that is being freed (pte is still valid but pmd
+ * can be cleared already).  To avoid race in such condition, we need to
+ * also check pmd here to make sure pmd doesn't change (corresponds to
+ * pmdp_collapse_flush() in the THP collapse code path).
+ */
+static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
+			 unsigned long end, unsigned int flags,
+			 struct page **pages, int *nr)
 {
 	struct dev_pagemap *pgmap = NULL;
 	int nr_start = *nr, ret = 0;
@@ -2169,7 +2189,8 @@ static int gup_pte_range(pmd_t pmd, unsi
 		if (!head)
 			goto pte_unmap;
 
-		if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+		if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
+		    unlikely(pte_val(pte) != pte_val(*ptep))) {
 			put_compound_head(head, 1, flags);
 			goto pte_unmap;
 		}
@@ -2214,8 +2235,9 @@ pte_unmap:
  * get_user_pages_fast_only implementation that can pin pages. Thus it's still
  * useful to have gup_huge_pmd even if we can't operate on ptes.
  */
-static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
-			 unsigned int flags, struct page **pages, int *nr)
+static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
+			 unsigned long end, unsigned int flags,
+			 struct page **pages, int *nr)
 {
 	return 0;
 }
@@ -2522,7 +2544,7 @@ static int gup_pmd_range(pud_t *pudp, pu
 			if (!gup_huge_pd(__hugepd(pmd_val(pmd)), addr,
 					 PMD_SHIFT, next, flags, pages, nr))
 				return 0;
-		} else if (!gup_pte_range(pmd, addr, next, flags, pages, nr))
+		} else if (!gup_pte_range(pmd, pmdp, addr, next, flags, pages, nr))
 			return 0;
 	} while (pmdp++, addr = next, addr != end);
 
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1144,10 +1144,12 @@ static void collapse_huge_page(struct mm
 
 	pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
 	/*
-	 * After this gup_fast can't run anymore. This also removes
-	 * any huge TLB entry from the CPU so we won't allow
-	 * huge and small TLB entries for the same virtual address
-	 * to avoid the risk of CPU bugs in that area.
+	 * This removes any huge TLB entry from the CPU so we won't allow
+	 * huge and small TLB entries for the same virtual address to
+	 * avoid the risk of CPU bugs in that area.
+	 *
+	 * Parallel fast GUP is fine since fast GUP will back off when
+	 * it detects PMD is changed.
 	 */
 	_pmd = pmdp_collapse_flush(vma, address, pmd);
 	spin_unlock(pmd_ptl);



  parent reply	other threads:[~2022-10-13 17:57 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-13 17:51 [PATCH 5.10 00/54] 5.10.148-rc1 review Greg Kroah-Hartman
2022-10-13 17:51 ` [PATCH 5.10 01/54] nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level() Greg Kroah-Hartman
2022-10-13 17:51 ` [PATCH 5.10 02/54] nilfs2: fix use-after-free bug of struct nilfs_root Greg Kroah-Hartman
2022-10-13 17:51 ` [PATCH 5.10 03/54] nilfs2: fix leak of nilfs_root in case of writer thread creation failure Greg Kroah-Hartman
2022-10-13 17:51 ` [PATCH 5.10 04/54] nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure Greg Kroah-Hartman
2022-10-13 17:51 ` [PATCH 5.10 05/54] ceph: dont truncate file in atomic_open Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 06/54] Makefile.extrawarn: Move -Wcast-function-type-strict to W=1 Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 07/54] docs: update mediator information in CoC docs Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 08/54] perf tools: Fixup get_current_dir_name() compilation Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 09/54] xsk: Inherit need_wakeup flag for shared sockets Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 10/54] ALSA: pcm: oss: Fix race at SNDCTL_DSP_SYNC Greg Kroah-Hartman
2022-10-13 17:52 ` Greg Kroah-Hartman [this message]
2022-10-13 17:52 ` [PATCH 5.10 12/54] powerpc/64s/radix: dont need to broadcast IPI for radix pmd collapse flush Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 13/54] fs: fix UAF/GPF bug in nilfs_mdt_destroy Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 14/54] compiler_attributes.h: move __compiletime_{error|warning} Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 15/54] firmware: arm_scmi: Add SCMI PM driver remove routine Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 16/54] dmaengine: xilinx_dma: Fix devm_platform_ioremap_resource error handling Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 17/54] dmaengine: xilinx_dma: cleanup for fetching xlnx,num-fstores property Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 18/54] dmaengine: xilinx_dma: Report error in case of dma_set_mask_and_coherent API failure Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 19/54] ARM: dts: fix Moxa SDIO compatible, remove sdhci misnomer Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 20/54] scsi: qedf: Fix a UAF bug in __qedf_probe() Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 21/54] net/ieee802154: fix uninit value bug in dgram_sendmsg Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 22/54] ALSA: hda/hdmi: Fix the converter reuse for the silent stream Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 23/54] um: Cleanup syscall_handler_t cast in syscalls_32.h Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 24/54] um: Cleanup compiler warning in arch/x86/um/tls_32.c Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 25/54] arch: um: Mark the stack non-executable to fix a binutils warning Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 26/54] net: atlantic: fix potential memory leak in aq_ndev_close() Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 27/54] drm/amd/display: update gamut remap if plane has changed Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 28/54] drm/amd/display: skip audio setup when audio stream is enabled Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 29/54] mmc: core: Replace with already defined values for readability Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 30/54] mmc: core: Terminate infinite loop in SD-UHS voltage switch Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 31/54] usb: mon: make mmapped memory read only Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 32/54] USB: serial: ftdi_sio: fix 300 bps rate for SIO Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 33/54] rpmsg: qcom: glink: replace strncpy() with strscpy_pad() Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 34/54] Revert "clk: ti: Stop using legacy clkctrl names for omap4 and 5" Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 35/54] random: restore O_NONBLOCK support Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 36/54] random: clamp credited irq bits to maximum mixed Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 37/54] ALSA: hda: Fix position reporting on Poulsbo Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 38/54] efi: Correct Macmini DMI match in uefi cert quirk Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 39/54] scsi: stex: Properly zero out the passthrough command structure Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 40/54] USB: serial: qcserial: add new usb-id for Dell branded EM7455 Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 41/54] random: avoid reading two cache lines on irq randomness Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 42/54] random: use expired timer rather than wq for mixing fast pool Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 43/54] wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans() Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 44/54] wifi: cfg80211/mac80211: reject bad MBSSID elements Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 45/54] wifi: cfg80211: ensure length byte is present before access Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 46/54] wifi: cfg80211: fix BSS refcounting bugs Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 47/54] wifi: cfg80211: avoid nontransmitted BSS list corruption Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 48/54] wifi: mac80211_hwsim: avoid mac80211 warning on bad rate Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 49/54] wifi: mac80211: fix crash in beacon protection for P2P-device Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 50/54] wifi: cfg80211: update hidden BSSes to avoid WARN_ON Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 51/54] Input: xpad - add supported devices as contributed on github Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 52/54] Input: xpad - fix wireless 360 controller breaking after suspend Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 53/54] misc: pci_endpoint_test: Aggregate params checking for xfer Greg Kroah-Hartman
2022-10-13 17:52 ` [PATCH 5.10 54/54] misc: pci_endpoint_test: Fix pci_endpoint_test_{copy,write,read}() panic Greg Kroah-Hartman
2022-10-13 20:41 ` [PATCH 5.10 00/54] 5.10.148-rc1 review Pavel Machek
2022-10-13 20:41 ` Florian Fainelli
2022-10-14 11:10 ` Naresh Kamboju
2022-10-14 11:49 ` Sudip Mukherjee (Codethink)
2022-10-14 15:54 ` Jon Hunter
2022-10-14 16:37 ` Shuah Khan
2022-10-14 17:29 ` Slade Watkins
2022-10-14 23:07 ` Guenter Roeck
2022-10-15  3:30 ` Rudi Heitbaum
2022-10-17  1:33 ` zhouzhixiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221013175147.639795471@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=shy828301@gmail.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.