All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Charles (Chas) Williams" <ciwillia@brocade.com>
To: <stable@vger.kernel.org>
Cc: Hugh Dickins <hughd@google.com>, Christoph Lameter <cl@linux.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>, Vlastimil Babka <vbabka@suse.cz>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Oleg Nesterov <oleg@redhat.com>,
	Sasha Levin <sasha.levin@oracle.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Charles (Chas) Williams" <ciwillia@brocade.com>
Subject: [PATCH 3.14.y 6/9] mm: migrate dirty page without clear_page_dirty_for_io etc
Date: Mon, 18 Jul 2016 16:58:59 -0400	[thread overview]
Message-ID: <1468875542-10978-6-git-send-email-ciwillia@brocade.com> (raw)
In-Reply-To: <1468875203-10816-1-git-send-email-ciwillia@brocade.com>

From: Hugh Dickins <hughd@google.com>

commit 42cb14b110a5698ccf26ce59c4441722605a3743 upstream.

clear_page_dirty_for_io() has accumulated writeback and memcg subtleties
since v2.6.16 first introduced page migration; and the set_page_dirty()
which completed its migration of PageDirty, later had to be moderated to
__set_page_dirty_nobuffers(); then PageSwapBacked had to skip that too.

No actual problems seen with this procedure recently, but if you look into
what the clear_page_dirty_for_io(page)+set_page_dirty(newpage) is actually
achieving, it turns out to be nothing more than moving the PageDirty flag,
and its NR_FILE_DIRTY stat from one zone to another.

It would be good to avoid a pile of irrelevant decrementations and
incrementations, and improper event counting, and unnecessary descent of
the radix_tree under tree_lock (to set the PAGECACHE_TAG_DIRTY which
radix_tree_replace_slot() left in place anyway).

Do the NR_FILE_DIRTY movement, like the other stats movements, while
interrupts still disabled in migrate_page_move_mapping(); and don't even
bother if the zone is the same.  Do the PageDirty movement there under
tree_lock too, where old page is frozen and newpage not yet visible:
bearing in mind that as soon as newpage becomes visible in radix_tree, an
un-page-locked set_page_dirty() might interfere (or perhaps that's just
not possible: anything doing so should already hold an additional
reference to the old page, preventing its migration; but play safe).

But we do still need to transfer PageDirty in migrate_page_copy(), for
those who don't go the mapping route through migrate_page_move_mapping().

CVE-2016-3070

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[ciwillia@brocade.com: backported to 3.14: adjusted context]
Signed-off-by: Charles (Chas) Williams <ciwillia@brocade.com>
---
 mm/migrate.c | 51 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 3acac4a..98c5464 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -30,6 +30,7 @@
 #include <linux/mempolicy.h>
 #include <linux/vmalloc.h>
 #include <linux/security.h>
+#include <linux/backing-dev.h>
 #include <linux/memcontrol.h>
 #include <linux/syscalls.h>
 #include <linux/hugetlb.h>
@@ -344,6 +345,8 @@ int migrate_page_move_mapping(struct address_space *mapping,
 		struct buffer_head *head, enum migrate_mode mode,
 		int extra_count)
 {
+	struct zone *oldzone, *newzone;
+	int dirty;
 	int expected_count = 1 + extra_count;
 	void **pslot;
 
@@ -354,6 +357,9 @@ int migrate_page_move_mapping(struct address_space *mapping,
 		return MIGRATEPAGE_SUCCESS;
 	}
 
+	oldzone = page_zone(page);
+	newzone = page_zone(newpage);
+
 	spin_lock_irq(&mapping->tree_lock);
 
 	pslot = radix_tree_lookup_slot(&mapping->page_tree,
@@ -394,6 +400,13 @@ int migrate_page_move_mapping(struct address_space *mapping,
 		set_page_private(newpage, page_private(page));
 	}
 
+	/* Move dirty while page refs frozen and newpage not yet exposed */
+	dirty = PageDirty(page);
+	if (dirty) {
+		ClearPageDirty(page);
+		SetPageDirty(newpage);
+	}
+
 	radix_tree_replace_slot(pslot, newpage);
 
 	/*
@@ -403,6 +416,9 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	 */
 	page_unfreeze_refs(page, expected_count - 1);
 
+	spin_unlock(&mapping->tree_lock);
+	/* Leave irq disabled to prevent preemption while updating stats */
+
 	/*
 	 * If moved to a different zone then also account
 	 * the page for that zone. Other VM counters will be
@@ -413,13 +429,19 @@ int migrate_page_move_mapping(struct address_space *mapping,
 	 * via NR_FILE_PAGES and NR_ANON_PAGES if they
 	 * are mapped to swap space.
 	 */
-	__dec_zone_page_state(page, NR_FILE_PAGES);
-	__inc_zone_page_state(newpage, NR_FILE_PAGES);
-	if (!PageSwapCache(page) && PageSwapBacked(page)) {
-		__dec_zone_page_state(page, NR_SHMEM);
-		__inc_zone_page_state(newpage, NR_SHMEM);
+	if (newzone != oldzone) {
+		__dec_zone_state(oldzone, NR_FILE_PAGES);
+		__inc_zone_state(newzone, NR_FILE_PAGES);
+		if (PageSwapBacked(page) && !PageSwapCache(page)) {
+			__dec_zone_state(oldzone, NR_SHMEM);
+			__inc_zone_state(newzone, NR_SHMEM);
+		}
+		if (dirty && mapping_cap_account_dirty(mapping)) {
+			__dec_zone_state(oldzone, NR_FILE_DIRTY);
+			__inc_zone_state(newzone, NR_FILE_DIRTY);
+		}
 	}
-	spin_unlock_irq(&mapping->tree_lock);
+	local_irq_enable();
 
 	return MIGRATEPAGE_SUCCESS;
 }
@@ -543,20 +565,9 @@ void migrate_page_copy(struct page *newpage, struct page *page)
 	if (PageMappedToDisk(page))
 		SetPageMappedToDisk(newpage);
 
-	if (PageDirty(page)) {
-		clear_page_dirty_for_io(page);
-		/*
-		 * Want to mark the page and the radix tree as dirty, and
-		 * redo the accounting that clear_page_dirty_for_io undid,
-		 * but we can't use set_page_dirty because that function
-		 * is actually a signal that all of the page has become dirty.
-		 * Whereas only part of our page may be dirty.
-		 */
-		if (PageSwapBacked(page))
-			SetPageDirty(newpage);
-		else
-			__set_page_dirty_nobuffers(newpage);
- 	}
+	/* Move dirty on pages not done by migrate_page_move_mapping() */
+	if (PageDirty(page))
+		SetPageDirty(newpage);
 
 	/*
 	 * Copy NUMA information to the new page, to prevent over-eager
-- 
2.5.5


  parent reply	other threads:[~2016-07-18 21:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-18 20:53 [PATCH 3.14.y 1/9] udp: properly support MSG_PEEK with truncated buffers Charles (Chas) Williams
2016-07-18 20:53 ` [PATCH 3.14.y 2/9] USB: fix invalid memory access in hub_activate() Charles (Chas) Williams
2016-07-18 20:53 ` [PATCH 3.14.y 3/9] cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind Charles (Chas) Williams
2016-08-14 14:43   ` Greg KH
2016-08-14 14:52     ` Bjørn Mork
2016-08-14 15:05       ` Greg KH
2016-07-18 20:53 ` [PATCH 3.14.y 4/9] KEYS: potential uninitialized variable Charles (Chas) Williams
2016-07-18 20:53 ` [PATCH 3.14.y 5/9] USB: usbfs: fix potential infoleak in devio Charles (Chas) Williams
2016-08-14 14:44   ` Greg Kroah-Hartman
2016-08-15 14:41     ` Charles (Chas) Williams
2016-07-18 20:58 ` Charles (Chas) Williams [this message]
2016-07-18 20:59 ` [PATCH 3.14.y 7/9] printk: do cond_resched() between lines while outputting to consoles Charles (Chas) Williams
2016-07-18 20:59 ` [PATCH 3.14.y 8/9] HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands Charles (Chas) Williams
2016-07-18 20:59 ` [PATCH 3.14.y 9/9] x86/mm: Add barriers and document switch_mm()-vs-flush synchronization Charles (Chas) Williams
2016-07-18 20:59   ` Charles (Chas) Williams
2016-08-14 14:42 ` [PATCH 3.14.y 1/9] udp: properly support MSG_PEEK with truncated buffers Greg KH
2016-08-15  7:21   ` Michal Kubecek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1468875542-10978-6-git-send-email-ciwillia@brocade.com \
    --to=ciwillia@brocade.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dave@stgolabs.net \
    --cc=dvyukov@google.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=oleg@redhat.com \
    --cc=riel@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.