stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Jan Kara <jack@suse.cz>, Pawel Lebioda <pawel.lebioda@intel.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Matthew Wilcox <mawilcox@microsoft.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Dave Jiang <dave.jiang@intel.com>, Xiong Zhou <xzhou@redhat.com>,
	Eryu Guan <eguan@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.9 29/39] mm: avoid spurious bad pmd warning messages
Date: Mon, 26 Feb 2018 21:20:50 +0100	[thread overview]
Message-ID: <20180226201644.948091150@linuxfoundation.org> (raw)
In-Reply-To: <20180226201643.660109883@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ross Zwisler <ross.zwisler@linux.intel.com>

commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream.

When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks.  So, things like:

  -       if (pmd_trans_huge(*pmd))
  +       if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))

When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b5c ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:

  +       if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))

This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable().  This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1.  So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:

  mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)

Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.

Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/memory.c |   40 ++++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,6 +2848,17 @@ static int __do_fault(struct fault_env *
 	return ret;
 }
 
+/*
+ * The ordering of these checks is important for pmds with _PAGE_DEVMAP set.
+ * If we check pmd_trans_unstable() first we will trip the bad_pmd() check
+ * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly
+ * returning 1 but not before it spams dmesg with the pmd_clear_bad() output.
+ */
+static int pmd_devmap_trans_unstable(pmd_t *pmd)
+{
+	return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
+}
+
 static int pte_alloc_one_map(struct fault_env *fe)
 {
 	struct vm_area_struct *vma = fe->vma;
@@ -2871,18 +2882,27 @@ static int pte_alloc_one_map(struct faul
 map_pte:
 	/*
 	 * If a huge pmd materialized under us just retry later.  Use
-	 * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd
-	 * didn't become pmd_trans_huge under us and then back to pmd_none, as
-	 * a result of MADV_DONTNEED running immediately after a huge pmd fault
-	 * in a different thread of this mm, in turn leading to a misleading
-	 * pmd_trans_huge() retval.  All we have to ensure is that it is a
-	 * regular pmd that we can walk with pte_offset_map() and we can do that
-	 * through an atomic read in C, which is what pmd_trans_unstable()
-	 * provides.
+	 * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of
+	 * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge
+	 * under us and then back to pmd_none, as a result of MADV_DONTNEED
+	 * running immediately after a huge pmd fault in a different thread of
+	 * this mm, in turn leading to a misleading pmd_trans_huge() retval.
+	 * All we have to ensure is that it is a regular pmd that we can walk
+	 * with pte_offset_map() and we can do that through an atomic read in
+	 * C, which is what pmd_trans_unstable() provides.
 	 */
-	if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+	if (pmd_devmap_trans_unstable(fe->pmd))
 		return VM_FAULT_NOPAGE;
 
+	/*
+	 * At this point we know that our vmf->pmd points to a page of ptes
+	 * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge()
+	 * for the duration of the fault.  If a racing MADV_DONTNEED runs and
+	 * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still
+	 * be valid and we will re-check to make sure the vmf->pte isn't
+	 * pte_none() under vmf->ptl protection when we return to
+	 * alloc_set_pte().
+	 */
 	fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
 			&fe->ptl);
 	return 0;
@@ -3456,7 +3476,7 @@ static int handle_pte_fault(struct fault
 		fe->pte = NULL;
 	} else {
 		/* See comment in pte_alloc_one_map() */
-		if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+		if (pmd_devmap_trans_unstable(fe->pmd))
 			return 0;
 		/*
 		 * A regular pmd is established and it can't morph into a huge

  parent reply	other threads:[~2018-02-26 20:20 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-26 20:20 [PATCH 4.9 00/39] 4.9.85-stable review Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 01/39] netfilter: drop outermost socket lock in getsockopt() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 02/39] xtensa: fix high memory/reserved memory collision Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 03/39] scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 04/39] cfg80211: fix cfg80211_beacon_dup Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 05/39] X.509: fix BUG_ON() when hash algorithm is unsupported Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 06/39] PKCS#7: fix certificate chain verification Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 07/39] RDMA/uverbs: Protect from command mask overflow Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 08/39] iio: buffer: check if a buffer has been set up when poll is called Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 09/39] iio: adis_lib: Initialize trigger before requesting interrupt Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 10/39] x86/oprofile: Fix bogus GCC-8 warning in nmi_setup() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 11/39] irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 12/39] PCI/cxgb4: Extend T3 PCI quirk to T4+ devices Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 13/39] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 14/39] usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 15/39] arm64: Disable unhandled signal log messages by default Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 16/39] Add delay-init quirk for Corsair K70 RGB keyboards Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 17/39] drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 18/39] usb: dwc3: gadget: Set maxpacket size for ep0 IN Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 19/39] usb: ldusb: add PIDs for new CASSY devices supported by this driver Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 20/39] Revert "usb: musb: host: dont start next rx urb if current one failed" Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 21/39] usb: gadget: f_fs: Process all descriptors during bind Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 22/39] usb: renesas_usbhs: missed the "running" flag in usb_dmac with rx path Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 25/39] drm/amdgpu: Avoid leaking PM domain on driver unbind (v2) Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 26/39] drm/amdgpu: add new device to use atpx quirk Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 27/39] binder: add missing binder_unlock() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 28/39] X.509: fix NULL dereference when restricting key with unsupported_sig Greg Kroah-Hartman
2018-02-26 20:20 ` Greg Kroah-Hartman [this message]
2018-02-26 20:20 ` [PATCH 4.9 30/39] fs/dax.c: fix inefficiency in dax_writeback_mapping_range() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 31/39] libnvdimm: fix integer overflow static analysis warning Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 32/39] device-dax: implement ->split() to catch invalid munmap attempts Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 33/39] mm: introduce get_user_pages_longterm Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 34/39] v4l2: disable filesystem-dax mapping support Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 35/39] IB/core: disable memory registration of filesystem-dax vmas Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 36/39] libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 38/39] mm: fail get_vaddr_frames() for filesystem-dax mappings Greg Kroah-Hartman
2018-02-26 20:21 ` [PATCH 4.9 39/39] x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface Greg Kroah-Hartman
2018-02-27  0:57 ` [PATCH 4.9 00/39] 4.9.85-stable review Shuah Khan
2018-02-27  7:12 ` Naresh Kamboju
2018-02-27 14:56 ` Guenter Roeck
2018-02-27 18:34   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180226201644.948091150@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=eguan@redhat.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=pawel.lebioda@intel.com \
    --cc=ross.zwisler@linux.intel.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xzhou@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).