All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
@ 2019-01-29 23:32 David Hildenbrand
  2019-01-31  2:04 ` Sasha Levin
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2019-01-29 23:32 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Hildenbrand, Andrew Morton, Mel Gorman,
	Kirill A. Shutemov, Michal Hocko, Naoya Horiguchi, Jan Kara,
	Andrea Arcangeli, Dominik Brodowski, Matthew Wilcox,
	Vratislav Bendel, Rafael Aquini, Konstantin Khlebnikov,
	Minchan Kim, stable

We had a race in the old balloon compaction code before commit b1123ea6d3b3
("mm: balloon: use general non-lru movable page feature") refactored it
that became visible after backporting commit 195a8c43e93d
("virtio-balloon: deflate via a page list") without the refactoring.

The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign
ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use
general non-lru movable page feature"). commit d6d86c0a7f8d
("mm/balloon_compaction: redesign ballooned pages management") was
backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].

There was a subtle race between dropping the page lock of the newpage
in __unmap_and_move() and checking for
__is_movable_balloon_page(newpage).

Just after dropping this page lock, virtio-balloon could go ahead and
deflate the newpage, effectively dequeueing it and clearing PageBalloon,
in turn making __is_movable_balloon_page(newpage) fail.

This resulted in dropping the reference of the newpage via
putback_lru_page(newpage) instead of put_page(newpage), leading to
page->lru getting modified and a !LRU page ending up in the LRU lists.
With commit 195a8c43e93d ("virtio-balloon: deflate via a page list")
backported, one would suddenly get corrupted lists in
release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Nowadays this race is no longer possible, but it is hidden behind very
ugly handling of __ClearPageMovable() and __PageMovable().

__ClearPageMovable() will not make __PageMovable() fail, only
PageMovable(). So the new check (__PageMovable(newpage)) will still hold
even after newpage was dequeued by virtio-balloon.

If anybody would ever change that special handling, the BUG would be
introduced again. So instead, make it explicit and use the information
of the original isolated page before migration.

This patch can be backported fairly easy to stable kernels (in contrast
to the refactoring).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vratislav Bendel <vbendel@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: stable@vger.kernel.org # 3.12 - 4.7
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/migrate.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 4512afab46ac..402198816d1a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1135,10 +1135,13 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
 	 * If migration is successful, decrease refcount of the newpage
 	 * which will not free the page because new page owner increased
 	 * refcounter. As well, if it is LRU page, add the page to LRU
-	 * list in here.
+	 * list in here. Use the old state of the isolated source page to
+	 * determine if we migrated a LRU page. newpage was already unlocked
+	 * and possibly modified by its owner - don't rely on the page
+	 * state.
 	 */
 	if (rc == MIGRATEPAGE_SUCCESS) {
-		if (unlikely(__PageMovable(newpage)))
+		if (unlikely(!is_lru))
 			put_page(newpage);
 		else
 			putback_lru_page(newpage);
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  2019-01-29 23:32 [PATCH v2] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it David Hildenbrand
@ 2019-01-31  2:04 ` Sasha Levin
  2019-02-01 13:43   ` [PATCH v2 for-4.4-stable] " David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Sasha Levin @ 2019-01-31  2:04 UTC (permalink / raw)
  To: Sasha Levin, David Hildenbrand, linux-mm
  Cc: linux-kernel, David Hildenbrand, ,
	Andrew Morton, Mel Gorman, Kirill A. Shutemov, Michal Hocko,
	Naoya Horiguchi, Jan Kara, Andrea Arcangeli, Dominik Brodowski,
	Matthew Wilcox, Vratislav Bendel, Rafael Aquini,
	Konstantin Khlebnikov, Minchan Kim, stable

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: d6d86c0a7f8d mm/balloon_compaction: redesign ballooned pages management.

The bot has tested the following trees: v4.20.5, v4.19.18, v4.14.96, v4.9.153, v4.4.172, v3.18.133.

v4.20.5: Build OK!
v4.19.18: Build OK!
v4.14.96: Build OK!
v4.9.153: Build OK!
v4.4.172: Failed to apply! Possible dependencies:
    1031bc589228 ("lib/vsprintf: add %*pg format specifier")
    14e0a214d62d ("tools, perf: make gfp_compact_table up to date")
    1f7866b4aebd ("mm, tracing: make show_gfp_flags() up to date")
    420adbe9fc1a ("mm, tracing: unify mm flags handling in tracepoints and printk")
    53f9263baba6 ("mm: rework mapcount accounting to enable 4k mapping of THPs")
    7cd12b4abfd2 ("mm, page_owner: track and print last migrate reason")
    7d2eba0557c1 ("mm: add tracepoint for scanning pages")
    c6c919eb90e0 ("mm: use put_page() to free page instead of putback_lru_page()")
    d435edca9288 ("mm, page_owner: copy page owner info during migration")
    d8c1bdeb5d6b ("page-flags: trivial cleanup for PageTrans* helpers")
    eca56ff906bd ("mm, shmem: add internal shmem resident memory accounting")
    edf14cdbf9a0 ("mm, printk: introduce new format string for flags")

v3.18.133: Failed to apply! Possible dependencies:
    2847cf95c68f ("mm/debug-pagealloc: cleanup page guard code")
    48c96a368579 ("mm/page_owner: keep track of page owners")
    7cd12b4abfd2 ("mm, page_owner: track and print last migrate reason")
    94f759d62b2c ("mm/page_owner.c: remove unnecessary stack_trace field")
    c6c919eb90e0 ("mm: use put_page() to free page instead of putback_lru_page()")
    d435edca9288 ("mm, page_owner: copy page owner info during migration")
    e2cfc91120fa ("mm/page_owner: set correct gfp_mask on page_owner")
    e30825f1869a ("mm/debug-pagealloc: prepare boottime configurable on/off")
    eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging")


How should we proceed with this patch?

--
Thanks,
Sasha


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 for-4.4-stable] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  2019-01-31  2:04 ` Sasha Levin
@ 2019-02-01 13:43   ` David Hildenbrand
  2019-02-01 14:09     ` Greg KH
  2019-02-01 14:27     ` David Hildenbrand
  0 siblings, 2 replies; 6+ messages in thread
From: David Hildenbrand @ 2019-02-01 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Hildenbrand, Andrew Morton, Mel Gorman,
	Kirill A. Shutemov, Michal Hocko, Naoya Horiguchi, Jan Kara,
	Andrea Arcangeli, Dominik Brodowski, Matthew Wilcox,
	Vratislav Bendel, Rafael Aquini, Konstantin Khlebnikov,
	Minchan Kim, Sasha Levin, stable

This is the backport for 4.4-stable.

We had a race in the old balloon compaction code before commit b1123ea6d3b3
("mm: balloon: use general non-lru movable page feature") refactored it
that became visible after backporting commit 195a8c43e93d
("virtio-balloon: deflate via a page list") without the refactoring.

The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign
ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use
general non-lru movable page feature"). commit d6d86c0a7f8d
("mm/balloon_compaction: redesign ballooned pages management") was
backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].

There was a subtle race between dropping the page lock of the newpage
in __unmap_and_move() and checking for
__is_movable_balloon_page(newpage).

Just after dropping this page lock, virtio-balloon could go ahead and
deflate the newpage, effectively dequeueing it and clearing PageBalloon,
in turn making __is_movable_balloon_page(newpage) fail.

This resulted in dropping the reference of the newpage via
putback_lru_page(newpage) instead of put_page(newpage), leading to
page->lru getting modified and a !LRU page ending up in the LRU lists.
With commit 195a8c43e93d ("virtio-balloon: deflate via a page list")
backported, one would suddenly get corrupted lists in
release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100

Nowadays this race is no longer possible, but it is hidden behind very
ugly handling of __ClearPageMovable() and __PageMovable().

__ClearPageMovable() will not make __PageMovable() fail, only
PageMovable(). So the new check (__PageMovable(newpage)) will still hold
even after newpage was dequeued by virtio-balloon.

If anybody would ever change that special handling, the BUG would be
introduced again. So instead, make it explicit and use the information
of the original isolated page before migration.

This patch can be backported fairly easy to stable kernels (in contrast
to the refactoring).

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vratislav Bendel <vbendel@redhat.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: stable@vger.kernel.org # 3.12 - 4.7
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
Reported-by: Vratislav Bendel <vbendel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/migrate.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index afedcfab60e2..3304c98f9a78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -936,6 +936,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 	int rc = MIGRATEPAGE_SUCCESS;
 	int *result = NULL;
 	struct page *newpage;
+	bool is_lru = !isolated_balloon_page(page);
 
 	newpage = get_new_page(page, private, &result);
 	if (!newpage)
@@ -984,10 +985,14 @@ out:
 	 * If migration was not successful and there's a freeing callback, use
 	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
 	 * during isolation.
+	 *
+	 * Use the old state of the isolated source page to determine if we
+	 * migrated a LRU page. newpage was already unlocked and possibly
+	 * modified by its owner - don't rely on the page state.
 	 */
 	if (put_new_page)
 		put_new_page(newpage, private);
-	else if (unlikely(__is_movable_balloon_page(newpage))) {
+	else if (unlikely(!is_lru)) {
 		/* drop our reference, page already in the balloon */
 		put_page(newpage);
 	} else
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 for-4.4-stable] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  2019-02-01 13:43   ` [PATCH v2 for-4.4-stable] " David Hildenbrand
@ 2019-02-01 14:09     ` Greg KH
  2019-02-01 14:18       ` David Hildenbrand
  2019-02-01 14:27     ` David Hildenbrand
  1 sibling, 1 reply; 6+ messages in thread
From: Greg KH @ 2019-02-01 14:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Kirill A. Shutemov, Michal Hocko, Naoya Horiguchi, Jan Kara,
	Andrea Arcangeli, Dominik Brodowski, Matthew Wilcox,
	Vratislav Bendel, Rafael Aquini, Konstantin Khlebnikov,
	Minchan Kim, Sasha Levin, stable

On Fri, Feb 01, 2019 at 02:43:47PM +0100, David Hildenbrand wrote:
> This is the backport for 4.4-stable.
> 
> We had a race in the old balloon compaction code before commit b1123ea6d3b3
> ("mm: balloon: use general non-lru movable page feature") refactored it
> that became visible after backporting commit 195a8c43e93d
> ("virtio-balloon: deflate via a page list") without the refactoring.
> 
> The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign
> ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use
> general non-lru movable page feature"). commit d6d86c0a7f8d
> ("mm/balloon_compaction: redesign ballooned pages management") was
> backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
> 
> There was a subtle race between dropping the page lock of the newpage
> in __unmap_and_move() and checking for
> __is_movable_balloon_page(newpage).
> 
> Just after dropping this page lock, virtio-balloon could go ahead and
> deflate the newpage, effectively dequeueing it and clearing PageBalloon,
> in turn making __is_movable_balloon_page(newpage) fail.
> 
> This resulted in dropping the reference of the newpage via
> putback_lru_page(newpage) instead of put_page(newpage), leading to
> page->lru getting modified and a !LRU page ending up in the LRU lists.
> With commit 195a8c43e93d ("virtio-balloon: deflate via a page list")
> backported, one would suddenly get corrupted lists in
> release_pages_balloon():
> - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
> - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
> 
> Nowadays this race is no longer possible, but it is hidden behind very
> ugly handling of __ClearPageMovable() and __PageMovable().
> 
> __ClearPageMovable() will not make __PageMovable() fail, only
> PageMovable(). So the new check (__PageMovable(newpage)) will still hold
> even after newpage was dequeued by virtio-balloon.
> 
> If anybody would ever change that special handling, the BUG would be
> introduced again. So instead, make it explicit and use the information
> of the original isolated page before migration.
> 
> This patch can be backported fairly easy to stable kernels (in contrast
> to the refactoring).
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Vratislav Bendel <vbendel@redhat.com>
> Cc: Rafael Aquini <aquini@redhat.com>
> Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: stable@vger.kernel.org # 3.12 - 4.7
> Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Rafael Aquini <aquini@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/migrate.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

What is the git commit id of this patch in Linus's tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 for-4.4-stable] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  2019-02-01 14:09     ` Greg KH
@ 2019-02-01 14:18       ` David Hildenbrand
  0 siblings, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2019-02-01 14:18 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-mm, linux-kernel, Andrew Morton, Mel Gorman,
	Kirill A. Shutemov, Michal Hocko, Naoya Horiguchi, Jan Kara,
	Andrea Arcangeli, Dominik Brodowski, Matthew Wilcox,
	Vratislav Bendel, Rafael Aquini, Konstantin Khlebnikov,
	Minchan Kim, Sasha Levin, stable

On 01.02.19 15:09, Greg KH wrote:
> On Fri, Feb 01, 2019 at 02:43:47PM +0100, David Hildenbrand wrote:
>> This is the backport for 4.4-stable.
>>
>> We had a race in the old balloon compaction code before commit b1123ea6d3b3
>> ("mm: balloon: use general non-lru movable page feature") refactored it
>> that became visible after backporting commit 195a8c43e93d
>> ("virtio-balloon: deflate via a page list") without the refactoring.
>>
>> The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign
>> ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use
>> general non-lru movable page feature"). commit d6d86c0a7f8d
>> ("mm/balloon_compaction: redesign ballooned pages management") was
>> backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
>>
>> There was a subtle race between dropping the page lock of the newpage
>> in __unmap_and_move() and checking for
>> __is_movable_balloon_page(newpage).
>>
>> Just after dropping this page lock, virtio-balloon could go ahead and
>> deflate the newpage, effectively dequeueing it and clearing PageBalloon,
>> in turn making __is_movable_balloon_page(newpage) fail.
>>
>> This resulted in dropping the reference of the newpage via
>> putback_lru_page(newpage) instead of put_page(newpage), leading to
>> page->lru getting modified and a !LRU page ending up in the LRU lists.
>> With commit 195a8c43e93d ("virtio-balloon: deflate via a page list")
>> backported, one would suddenly get corrupted lists in
>> release_pages_balloon():
>> - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
>> - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
>>
>> Nowadays this race is no longer possible, but it is hidden behind very
>> ugly handling of __ClearPageMovable() and __PageMovable().
>>
>> __ClearPageMovable() will not make __PageMovable() fail, only
>> PageMovable(). So the new check (__PageMovable(newpage)) will still hold
>> even after newpage was dequeued by virtio-balloon.
>>
>> If anybody would ever change that special handling, the BUG would be
>> introduced again. So instead, make it explicit and use the information
>> of the original isolated page before migration.
>>
>> This patch can be backported fairly easy to stable kernels (in contrast
>> to the refactoring).
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>> Cc: Jan Kara <jack@suse.cz>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Vratislav Bendel <vbendel@redhat.com>
>> Cc: Rafael Aquini <aquini@redhat.com>
>> Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Sasha Levin <sashal@kernel.org>
>> Cc: stable@vger.kernel.org # 3.12 - 4.7
>> Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
>> Reported-by: Vratislav Bendel <vbendel@redhat.com>
>> Acked-by: Michal Hocko <mhocko@suse.com>
>> Acked-by: Rafael Aquini <aquini@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  mm/migrate.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> What is the git commit id of this patch in Linus's tree?

It's still in Andrew's tree as far as I know. I just wanted to share the
backport right away (to show that it is easy :) ). Will resend it again
(with proper commit idea) once upstream.

> 
> thanks,
> 
> greg k-h
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 for-4.4-stable] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
  2019-02-01 13:43   ` [PATCH v2 for-4.4-stable] " David Hildenbrand
  2019-02-01 14:09     ` Greg KH
@ 2019-02-01 14:27     ` David Hildenbrand
  1 sibling, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2019-02-01 14:27 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Mel Gorman, Kirill A. Shutemov,
	Michal Hocko, Naoya Horiguchi, Jan Kara, Andrea Arcangeli,
	Dominik Brodowski, Matthew Wilcox, Vratislav Bendel,
	Rafael Aquini, Konstantin Khlebnikov, Minchan Kim, Sasha Levin,
	stable

On 01.02.19 14:43, David Hildenbrand wrote:
> This is the backport for 4.4-stable.
> 
> We had a race in the old balloon compaction code before commit b1123ea6d3b3
> ("mm: balloon: use general non-lru movable page feature") refactored it
> that became visible after backporting commit 195a8c43e93d
> ("virtio-balloon: deflate via a page list") without the refactoring.
> 
> The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign
> ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use
> general non-lru movable page feature"). commit d6d86c0a7f8d
> ("mm/balloon_compaction: redesign ballooned pages management") was
> backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
> 
> There was a subtle race between dropping the page lock of the newpage
> in __unmap_and_move() and checking for
> __is_movable_balloon_page(newpage).
> 
> Just after dropping this page lock, virtio-balloon could go ahead and
> deflate the newpage, effectively dequeueing it and clearing PageBalloon,
> in turn making __is_movable_balloon_page(newpage) fail.
> 
> This resulted in dropping the reference of the newpage via
> putback_lru_page(newpage) instead of put_page(newpage), leading to
> page->lru getting modified and a !LRU page ending up in the LRU lists.
> With commit 195a8c43e93d ("virtio-balloon: deflate via a page list")
> backported, one would suddenly get corrupted lists in
> release_pages_balloon():
> - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
> - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
> 
> Nowadays this race is no longer possible, but it is hidden behind very
> ugly handling of __ClearPageMovable() and __PageMovable().
> 
> __ClearPageMovable() will not make __PageMovable() fail, only
> PageMovable(). So the new check (__PageMovable(newpage)) will still hold
> even after newpage was dequeued by virtio-balloon.
> 
> If anybody would ever change that special handling, the BUG would be
> introduced again. So instead, make it explicit and use the information
> of the original isolated page before migration.
> 
> This patch can be backported fairly easy to stable kernels (in contrast
> to the refactoring).
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Vratislav Bendel <vbendel@redhat.com>
> Cc: Rafael Aquini <aquini@redhat.com>
> Cc: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: stable@vger.kernel.org # 3.12 - 4.7
> Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
> Reported-by: Vratislav Bendel <vbendel@redhat.com>
> Acked-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Rafael Aquini <aquini@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  mm/migrate.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index afedcfab60e2..3304c98f9a78 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -936,6 +936,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
>  	int rc = MIGRATEPAGE_SUCCESS;
>  	int *result = NULL;
>  	struct page *newpage;
> +	bool is_lru = !isolated_balloon_page(page);
>  
>  	newpage = get_new_page(page, private, &result);
>  	if (!newpage)
> @@ -984,10 +985,14 @@ out:
>  	 * If migration was not successful and there's a freeing callback, use
>  	 * it.  Otherwise, putback_lru_page() will drop the reference grabbed
>  	 * during isolation.
> +	 *
> +	 * Use the old state of the isolated source page to determine if we
> +	 * migrated a LRU page. newpage was already unlocked and possibly
> +	 * modified by its owner - don't rely on the page state.
>  	 */
>  	if (put_new_page)
>  		put_new_page(newpage, private);
> -	else if (unlikely(__is_movable_balloon_page(newpage))) {

And to be save, we should turn this into

 else if (rc == MIGRATEPAGE_SUCCESS && unlikely(!is_lru)) {

But will resend this either way as already mentioned to Greg.

> +	else if (unlikely(!is_lru)) {
>  		/* drop our reference, page already in the balloon */
>  		put_page(newpage);
>  	} else
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-02-01 14:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29 23:32 [PATCH v2] mm: migrate: don't rely on __PageMovable() of newpage after unlocking it David Hildenbrand
2019-01-31  2:04 ` Sasha Levin
2019-02-01 13:43   ` [PATCH v2 for-4.4-stable] " David Hildenbrand
2019-02-01 14:09     ` Greg KH
2019-02-01 14:18       ` David Hildenbrand
2019-02-01 14:27     ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.