From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Jan Kara <jack@suse.cz>, Pawel Lebioda <pawel.lebioda@intel.com>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christoph Hellwig <hch@lst.de>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Matthew Wilcox <mawilcox@microsoft.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Dave Jiang <dave.jiang@intel.com>, Xiong Zhou <xzhou@redhat.com>,
Eryu Guan <eguan@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.9 29/39] mm: avoid spurious bad pmd warning messages
Date: Mon, 26 Feb 2018 21:20:50 +0100 [thread overview]
Message-ID: <20180226201644.948091150@linuxfoundation.org> (raw)
In-Reply-To: <20180226201643.660109883@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ross Zwisler <ross.zwisler@linux.intel.com>
commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream.
When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax:
dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge
pages, they were all added to the end of if() statements after existing
pmd_trans_huge() checks. So, things like:
- if (pmd_trans_huge(*pmd))
+ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
When further checks were added after pmd_trans_unstable() checks by
commit 7267ec008b5c ("mm: postpone page table allocation until we have
page to map") they were also added at the end of the conditional:
+ if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
This ordering is fine for pmd_trans_huge(), but doesn't work for
pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd()
check inside of pmd_none_or_trans_huge_or_clear_bad() (called by
pmd_trans_unstable()), which prints out a warning and returns 1. So, we
do end up doing the right thing, but only after spamming dmesg with
suspicious looking messages:
mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)
Reorder these checks in a helper so that pmd_devmap() is checked first,
avoiding the error messages, and add a comment explaining why the
ordering is important.
Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map")
Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Pawel Lebioda <pawel.lebioda@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Eryu Guan <eguan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/memory.c | 40 ++++++++++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 10 deletions(-)
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2848,6 +2848,17 @@ static int __do_fault(struct fault_env *
return ret;
}
+/*
+ * The ordering of these checks is important for pmds with _PAGE_DEVMAP set.
+ * If we check pmd_trans_unstable() first we will trip the bad_pmd() check
+ * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly
+ * returning 1 but not before it spams dmesg with the pmd_clear_bad() output.
+ */
+static int pmd_devmap_trans_unstable(pmd_t *pmd)
+{
+ return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
+}
+
static int pte_alloc_one_map(struct fault_env *fe)
{
struct vm_area_struct *vma = fe->vma;
@@ -2871,18 +2882,27 @@ static int pte_alloc_one_map(struct faul
map_pte:
/*
* If a huge pmd materialized under us just retry later. Use
- * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd
- * didn't become pmd_trans_huge under us and then back to pmd_none, as
- * a result of MADV_DONTNEED running immediately after a huge pmd fault
- * in a different thread of this mm, in turn leading to a misleading
- * pmd_trans_huge() retval. All we have to ensure is that it is a
- * regular pmd that we can walk with pte_offset_map() and we can do that
- * through an atomic read in C, which is what pmd_trans_unstable()
- * provides.
+ * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of
+ * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge
+ * under us and then back to pmd_none, as a result of MADV_DONTNEED
+ * running immediately after a huge pmd fault in a different thread of
+ * this mm, in turn leading to a misleading pmd_trans_huge() retval.
+ * All we have to ensure is that it is a regular pmd that we can walk
+ * with pte_offset_map() and we can do that through an atomic read in
+ * C, which is what pmd_trans_unstable() provides.
*/
- if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+ if (pmd_devmap_trans_unstable(fe->pmd))
return VM_FAULT_NOPAGE;
+ /*
+ * At this point we know that our vmf->pmd points to a page of ptes
+ * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge()
+ * for the duration of the fault. If a racing MADV_DONTNEED runs and
+ * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still
+ * be valid and we will re-check to make sure the vmf->pte isn't
+ * pte_none() under vmf->ptl protection when we return to
+ * alloc_set_pte().
+ */
fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address,
&fe->ptl);
return 0;
@@ -3456,7 +3476,7 @@ static int handle_pte_fault(struct fault
fe->pte = NULL;
} else {
/* See comment in pte_alloc_one_map() */
- if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
+ if (pmd_devmap_trans_unstable(fe->pmd))
return 0;
/*
* A regular pmd is established and it can't morph into a huge
next prev parent reply other threads:[~2018-02-26 20:20 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-26 20:20 [PATCH 4.9 00/39] 4.9.85-stable review Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 01/39] netfilter: drop outermost socket lock in getsockopt() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 02/39] xtensa: fix high memory/reserved memory collision Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 03/39] scsi: ibmvfc: fix misdefined reserved field in ibmvfc_fcp_rsp_info Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 04/39] cfg80211: fix cfg80211_beacon_dup Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 05/39] X.509: fix BUG_ON() when hash algorithm is unsupported Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 06/39] PKCS#7: fix certificate chain verification Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 07/39] RDMA/uverbs: Protect from command mask overflow Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 08/39] iio: buffer: check if a buffer has been set up when poll is called Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 09/39] iio: adis_lib: Initialize trigger before requesting interrupt Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 10/39] x86/oprofile: Fix bogus GCC-8 warning in nmi_setup() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 11/39] irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 12/39] PCI/cxgb4: Extend T3 PCI quirk to T4+ devices Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 13/39] ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 14/39] usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 15/39] arm64: Disable unhandled signal log messages by default Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 16/39] Add delay-init quirk for Corsair K70 RGB keyboards Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 17/39] drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 18/39] usb: dwc3: gadget: Set maxpacket size for ep0 IN Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 19/39] usb: ldusb: add PIDs for new CASSY devices supported by this driver Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 20/39] Revert "usb: musb: host: dont start next rx urb if current one failed" Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 21/39] usb: gadget: f_fs: Process all descriptors during bind Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 22/39] usb: renesas_usbhs: missed the "running" flag in usb_dmac with rx path Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 25/39] drm/amdgpu: Avoid leaking PM domain on driver unbind (v2) Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 26/39] drm/amdgpu: add new device to use atpx quirk Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 27/39] binder: add missing binder_unlock() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 28/39] X.509: fix NULL dereference when restricting key with unsupported_sig Greg Kroah-Hartman
2018-02-26 20:20 ` Greg Kroah-Hartman [this message]
2018-02-26 20:20 ` [PATCH 4.9 30/39] fs/dax.c: fix inefficiency in dax_writeback_mapping_range() Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 31/39] libnvdimm: fix integer overflow static analysis warning Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 32/39] device-dax: implement ->split() to catch invalid munmap attempts Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 33/39] mm: introduce get_user_pages_longterm Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 34/39] v4l2: disable filesystem-dax mapping support Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 35/39] IB/core: disable memory registration of filesystem-dax vmas Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 36/39] libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment Greg Kroah-Hartman
2018-02-26 20:20 ` [PATCH 4.9 38/39] mm: fail get_vaddr_frames() for filesystem-dax mappings Greg Kroah-Hartman
2018-02-26 20:21 ` [PATCH 4.9 39/39] x86/entry/64: Clear extra registers beyond syscall arguments, to reduce speculation attack surface Greg Kroah-Hartman
2018-02-27 0:57 ` [PATCH 4.9 00/39] 4.9.85-stable review Shuah Khan
2018-02-27 7:12 ` Naresh Kamboju
2018-02-27 14:56 ` Guenter Roeck
2018-02-27 18:34 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180226201644.948091150@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=darrick.wong@oracle.com \
--cc=dave.hansen@intel.com \
--cc=dave.jiang@intel.com \
--cc=eguan@redhat.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mawilcox@microsoft.com \
--cc=pawel.lebioda@intel.com \
--cc=ross.zwisler@linux.intel.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
--cc=xzhou@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).