All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Denis Lisov <dennis.lissov@gmail.com>,
	Qian Cai <cai@lca.pw>, Hugh Dickins <hughd@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH 4.14 46/70] mm/khugepaged: fix filemap page_to_pgoff(page) != offset
Date: Mon, 12 Oct 2020 15:27:02 +0200	[thread overview]
Message-ID: <20201012132632.398663262@linuxfoundation.org> (raw)
In-Reply-To: <20201012132630.201442517@linuxfoundation.org>

From: Hugh Dickins <hughd@google.com>

commit 033b5d77551167f8c24ca862ce83d3e0745f9245 upstream.

There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.

Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.

collapse_file(old start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=old offset
  new_page->mapping = mapping
  xas_store(&xas, new_page)

                          filemap_fault
                            page = find_get_page(mapping, offset)
                            // if offset falls inside hpage then
                            // compound_head(page) == hpage
                            lock_page_maybe_drop_mmap()
                              __lock_page(page)

  // collapse fails
  xas_store(&xas, old page)
  new_page->mapping = NULL
  unlock_page(new_page)

collapse_file(new start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=new offset
  new_page->mapping = mapping // mapping becomes valid again

                            // since compound_head(page) == hpage
                            // page_to_pgoff(page) got changed
                            VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)

An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above.  But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.

It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.

Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).

Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.

Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/khugepaged.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -801,6 +801,18 @@ static struct page *khugepaged_alloc_hug
 
 static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
 {
+	/*
+	 * If the hpage allocated earlier was briefly exposed in page cache
+	 * before collapse_file() failed, it is possible that racing lookups
+	 * have not yet completed, and would then be unpleasantly surprised by
+	 * finding the hpage reused for the same mapping at a different offset.
+	 * Just release the previous allocation if there is any danger of that.
+	 */
+	if (*hpage && page_count(*hpage) > 1) {
+		put_page(*hpage);
+		*hpage = NULL;
+	}
+
 	if (!*hpage)
 		*hpage = khugepaged_alloc_hugepage(wait);
 



  parent reply	other threads:[~2020-10-12 13:38 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-12 13:26 [PATCH 4.14 00/70] 4.14.201-rc1 review Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 01/70] vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 02/70] vsock/virtio: stop workers during the .remove() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 03/70] vsock/virtio: add transport parameter to the virtio_transport_reset_no_sock() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 04/70] net: virtio_vsock: Enhance connection semantics Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 05/70] USB: gadget: f_ncm: Fix NDP16 datagram validation Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 06/70] gpio: tc35894: fix up tc35894 interrupt configuration Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 07/70] Input: i8042 - add nopnp quirk for Acer Aspire 5 A515 Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 08/70] drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 09/70] drivers/net/wan/hdlc_fr: Add needed_headroom for PVC devices Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 10/70] drm/sun4i: mixer: Extend regmap max_register Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 11/70] net: dec: de2104x: Increase receive ring size for Tulip Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 12/70] rndis_host: increase sleep time in the query-response loop Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 13/70] drivers/net/wan/lapbether: Make skb->protocol consistent with the header Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 14/70] drivers/net/wan/hdlc: Set skb->protocol before transmitting Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 15/70] mac80211: do not allow bigger VHT MPDUs than the hardware supports Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 16/70] spi: fsl-espi: Only process interrupts for expected events Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 17/70] nvme-fc: fail new connections to a deleted host or remote port Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 18/70] pinctrl: mvebu: Fix i2c sda definition for 98DX3236 Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 19/70] nfs: Fix security label length not being reset Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 20/70] clk: samsung: exynos4: mark chipid clock as CLK_IGNORE_UNUSED Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 21/70] iommu/exynos: add missing put_device() call in exynos_iommu_of_xlate() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 22/70] i2c: cpm: Fix i2c_ram structure Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 23/70] Input: trackpoint - enable Synaptics trackpoints Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 24/70] random32: Restore __latent_entropy attribute on net_rand_state Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 25/70] net/packet: fix overflow in tpacket_rcv Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 26/70] epoll: do not insert into poll queues until all sanity checks are done Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 27/70] epoll: replace ->visited/visited_list with generation count Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 28/70] epoll: EPOLL_CTL_ADD: close the race in decision to take fast path Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 29/70] ep_create_wakeup_source(): dentry name can change under you Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 30/70] netfilter: ctnetlink: add a range check for l3/l4 protonum Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 31/70] drm/syncobj: Fix drm_syncobj_handle_to_fd refcount leak Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 32/70] fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 33/70] Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 34/70] Revert "ravb: Fixed to be able to unload modules" Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 35/70] fbcon: Fix global-out-of-bounds read in fbcon_get_font() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 36/70] net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 37/70] usermodehelper: reset umask to default before executing user process Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 38/70] platform/x86: thinkpad_acpi: initialize tp_nvram_state variable Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 39/70] platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 40/70] driver core: Fix probe_count imbalance in really_probe() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 41/70] perf top: Fix stdio interface input handling with glibc 2.28+ Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 42/70] mtd: rawnand: sunxi: Fix the probe error path Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.14 43/70] Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 44/70] ftrace: Move RCU is watching check after recursion check Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 45/70] macsec: avoid use-after-free in macsec_handle_frame() Greg Kroah-Hartman
2020-10-12 13:27 ` Greg Kroah-Hartman [this message]
2020-10-12 13:27 ` [PATCH 4.14 47/70] cifs: Fix incomplete memory allocation on setxattr path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 48/70] i2c: meson: fix clock setting overwrite Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 49/70] sctp: fix sctp_auth_init_hmacs() error path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 50/70] team: set dev->needed_headroom in team_setup_by_port() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 51/70] net: team: fix memory leak in __team_options_register Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 52/70] openvswitch: handle DNAT tuple collision Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 53/70] drm/amdgpu: prevent double kfree ttm->sg Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 54/70] xfrm: clone XFRMA_REPLAY_ESN_VAL in xfrm_do_migrate Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 55/70] xfrm: clone XFRMA_SEC_CTX " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 56/70] xfrm: clone whole liftime_cur structure " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 57/70] net: stmmac: removed enabling eee in EEE set callback Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 58/70] platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 59/70] xfrm: Use correct address family in xfrm_state_find Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 60/70] bonding: set dev->needed_headroom in bond_setup_by_slave() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 61/70] mdio: fix mdio-thunder.c dependency & build error Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 62/70] net: usb: ax88179_178a: fix missing stop entry in driver_info Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 63/70] rxrpc: Fix rxkad token xdr encoding Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 64/70] rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 65/70] rxrpc: Fix some missing _bh annotations on locking conn->state_lock Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 66/70] rxrpc: Fix server keyring leak Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 67/70] perf: Fix task_function_call() error handling Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 68/70] mmc: core: dont set limits.discard_granularity as 0 Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 69/70] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.14 70/70] net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails Greg Kroah-Hartman
2020-10-12 18:27 ` [PATCH 4.14 00/70] 4.14.201-rc1 review Jon Hunter
2020-10-13  6:29 ` Naresh Kamboju
2020-10-13 16:39 ` Guenter Roeck
2020-10-14  1:28 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201012132632.398663262@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=cai@lca.pw \
    --cc=dennis.lissov@gmail.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.