All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Wang <jinpu.wang@profitbricks.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	stable@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jack Wang <jinpu.wang@profitbricks.com>
Subject: [PATCH 2/3] thp: fix MADV_DONTNEED vs. numa balancing race
Date: Thu,  7 Dec 2017 11:22:51 +0100	[thread overview]
Message-ID: <1512642172-7981-3-git-send-email-jinpu.wangl@profitbricks.com> (raw)
In-Reply-To: <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>

From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>

commit ced108037c2aa542b3ed8b7afd1576064ad1362a upstream

In case prot_numa, we are under down_read(mmap_sem).  It's critical to
not clear pmd intermittently to avoid race with MADV_DONTNEED which is
also under down_read(mmap_sem):

	CPU0:				CPU1:
				change_huge_pmd(prot_numa=1)
				 pmdp_huge_get_and_clear_notify()
madvise_dontneed()
 zap_pmd_range()
  pmd_trans_huge(*pmd) == 0 (without ptl)
  // skip the pmd
				 set_pmd_at();
				 // pmd is re-established

The race makes MADV_DONTNEED miss the huge pmd and don't clear it
which may break userspace.

Found by code analysis, never saw triggered.

Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[jwang: adjust context for 4.4]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 mm/huge_memory.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ea013cb..0127b78 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1588,7 +1588,39 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 	if (prot_numa && pmd_protnone(*pmd))
 		goto unlock;
 
-	entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
+	/*
+	 * In case prot_numa, we are under down_read(mmap_sem). It's critical
+	 * to not clear pmd intermittently to avoid race with MADV_DONTNEED
+	 * which is also under down_read(mmap_sem):
+	 *
+	 *	CPU0:				CPU1:
+	 *				change_huge_pmd(prot_numa=1)
+	 *				 pmdp_huge_get_and_clear_notify()
+	 * madvise_dontneed()
+	 *  zap_pmd_range()
+	 *   pmd_trans_huge(*pmd) == 0 (without ptl)
+	 *   // skip the pmd
+	 *				 set_pmd_at();
+	 *				 // pmd is re-established
+	 *
+	 * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
+	 * which may break userspace.
+	 *
+	 * pmdp_invalidate() is required to make sure we don't miss
+	 * dirty/young flags set by hardware.
+	 */
+	entry = *pmd;
+	pmdp_invalidate(vma, addr, pmd);
+
+	/*
+	 * Recover dirty/young flags.  It relies on pmdp_invalidate to not
+	 * corrupt them.
+	 */
+	if (pmd_dirty(*pmd))
+		entry = pmd_mkdirty(entry);
+	if (pmd_young(*pmd))
+		entry = pmd_mkyoung(entry);
+
 	entry = pmd_modify(entry, newprot);
 	if (preserve_write)
 		entry = pmd_mkwrite(entry);
-- 
2.7.4

  parent reply	other threads:[~2017-12-07 10:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1512642172-7981-1-git-send-email-jinpu.wangl@profitbricks.com>
2017-12-07 10:22 ` [PATCH 1/3] thp: reduce indentation level in change_huge_pmd() Jack Wang
2017-12-07 12:05   ` Greg Kroah-Hartman
2017-12-07 12:25     ` Jinpu Wang
2017-12-12  8:32       ` Greg Kroah-Hartman
2017-12-07 10:22 ` Jack Wang [this message]
2017-12-07 10:22 ` [PATCH 3/3] mm: drop unused pmdp_huge_get_and_clear_notify() Jack Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1512642172-7981-3-git-send-email-jinpu.wangl@profitbricks.com \
    --to=jinpu.wang@profitbricks.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.