linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: peterx@redhat.com, Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Khalid Aziz <khalid.aziz@oracle.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"David S. Miller" <davem@davemloft.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andi Kleen <ak@linux.intel.com>,
	Henry Willard <henry.willard@oracle.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Jerome Glisse <jglisse@redhat.com>,
	Zi Yan <zi.yan@cs.rutgers.edu>,
	linux-mm@kvack.org
Subject: [PATCH v2] mm: mprotect: check page dirty when change ptes
Date: Wed, 12 Sep 2018 14:49:21 +0800	[thread overview]
Message-ID: <20180912064921.31015-1-peterx@redhat.com> (raw)

Add an extra check on page dirty bit in change_pte_range() since there
might be case where PTE dirty bit is unset but it's actually dirtied.
One example is when a huge PMD is splitted after written: the dirty bit
will be set on the compound page however we won't have the dirty bit set
on each of the small page PTEs.

I noticed this when debugging with a customized kernel that implemented
userfaultfd write-protect.  In that case, the dirty bit will be critical
since that's required for userspace to handle the write protect page
fault (otherwise it'll get a SIGBUS with a loop of page faults).
However it should still be good even for upstream Linux to cover more
scenarios where we shouldn't need to do extra page faults on the small
pages if the previous huge page is already written, so the dirty bit
optimization path underneath can cover more.

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Mel Gorman <mgorman@techsingularity.net>
CC: Khalid Aziz <khalid.aziz@oracle.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: "David S. Miller" <davem@davemloft.net>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Andi Kleen <ak@linux.intel.com>
CC: Henry Willard <henry.willard@oracle.com>
CC: Anshuman Khandual <khandual@linux.vnet.ibm.com>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: Kirill A. Shutemov <kirill@shutemov.name>
CC: Jerome Glisse <jglisse@redhat.com>
CC: Zi Yan <zi.yan@cs.rutgers.edu>
CC: linux-mm@kvack.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Peter Xu <peterx@redhat.com>
---
v2:
- checking the dirty bit when changing PTE entries rather than fixing up
  the dirty bit when splitting the huge page PMD.
- rebase to 4.19-rc3

Instead of keeping this in my local tree, I'm giving it another shot to
see whether this could be acceptable for upstream since IMHO it should
still benefit the upstream.  Thanks,
---
 mm/mprotect.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 6d331620b9e5..5fe752515161 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -115,6 +115,17 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			if (preserve_write)
 				ptent = pte_mk_savedwrite(ptent);
 
+                       /*
+                        * The extra PageDirty() check will make sure
+                        * we'll capture the dirty page even if the PTE
+                        * dirty bit is unset.  One case is when the
+                        * PTE is splitted from a huge PMD, in that
+                        * case the dirty flag might only be set on the
+                        * compound page instead of this PTE.
+                        */
+			if (PageDirty(pte_page(ptent)))
+				ptent = pte_mkdirty(ptent);
+
 			/* Avoid taking write faults for known dirty pages */
 			if (dirty_accountable && pte_dirty(ptent) &&
 					(pte_soft_dirty(ptent) ||
-- 
2.17.1


             reply	other threads:[~2018-09-12  6:49 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-12  6:49 Peter Xu [this message]
2018-09-12 10:33 ` [PATCH v2] mm: mprotect: check page dirty when change ptes Kirill A. Shutemov
2018-09-12 13:03 ` Jerome Glisse
2018-09-12 13:24   ` Jerome Glisse
2018-09-13  7:37     ` Peter Xu
2018-09-13 14:23       ` Jerome Glisse
2018-09-14  0:42         ` Jerome Glisse
2018-09-14  7:16           ` Peter Xu
2018-09-15  0:41             ` Jerome Glisse
2018-09-27  7:43               ` Peter Xu
2018-09-27  7:56                 ` Jerome Glisse
2018-09-27  8:21                   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180912064921.31015-1-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=henry.willard@oracle.com \
    --cc=jglisse@redhat.com \
    --cc=khalid.aziz@oracle.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=tglx@linutronix.de \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).