linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/page_isolation: let caller take the zone lock for test_pages_isolated
@ 2016-03-16 16:49 Lucas Stach
  2016-03-17 17:18 ` Joonsoo Kim
  0 siblings, 1 reply; 3+ messages in thread
From: Lucas Stach @ 2016-03-16 16:49 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, David Rientjes, Vlastimil Babka, Joonsoo Kim,
	kernel, patchwork-lst

This fixes an annoying race in the CMA code leading to lots of "PFNs busy"
messages when CMA is used concurrently. This is harmless normally as CMA
will just retry the allocation at a different place, but it might lead to
increased fragmentation of the CMA area as well as failing allocations
when CMA is under memory pressure.

The issue is that test_pages_isolated checks if the range is free by
checking that all pages in the range are buddy pages. For this to work
the start pfn needs to be aligned to the higher order buddy page
including the start pfn if there is any.

This is not a problem for the memory hotplug code, as it always offlines
whole pageblocks, but CMA may want to isolate a smaller range. So for
the check to work correctly it down-aligns the start pfn to the higher
order buddy page. As the zone is not yet locked at that point a
concurrent page free might coalesce the pages to be checked into an
even bigger buddy page, causing the check to fail, while all pages are
in fact buddy pages.

By moving the zone locking to the caller of the test function, it's
possible to do it before CMA tries to find the proper start page and stop
any concurrent page coalescing to happen until the check is finished.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 mm/memory_hotplug.c |  2 ++
 mm/page_alloc.c     | 11 +++++++----
 mm/page_isolation.c | 13 ++++++++-----
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4af58a3a8ffa..05de25bd228e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1828,7 +1828,9 @@ repeat:
 	 */
 	dissolve_free_huge_pages(start_pfn, end_pfn);
 	/* check again */
+	spin_lock_irqsave(&zone->lock, flags);
 	offlined_pages = check_pages_isolated(start_pfn, end_pfn);
+	spin_unlock_irqrestore(&zone->lock, flags);
 	if (offlined_pages < 0) {
 		ret = -EBUSY;
 		goto failed_removal;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 838ca8bb64f7..50b3b8f9594f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6702,14 +6702,15 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 int alloc_contig_range(unsigned long start, unsigned long end,
 		       unsigned migratetype)
 {
-	unsigned long outer_start, outer_end;
+	struct zone *zone = page_zone(pfn_to_page(start));
+	unsigned long outer_start, outer_end, flags;
 	unsigned int order;
 	int ret = 0;
 
 	struct compact_control cc = {
 		.nr_migratepages = 0,
 		.order = -1,
-		.zone = page_zone(pfn_to_page(start)),
+		.zone = zone,
 		.mode = MIGRATE_SYNC,
 		.ignore_skip_hint = true,
 	};
@@ -6775,6 +6776,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 
 	order = 0;
 	outer_start = start;
+	spin_lock_irqsave(&zone->lock, flags);
 	while (!PageBuddy(pfn_to_page(outer_start))) {
 		if (++order >= MAX_ORDER) {
 			outer_start = start;
@@ -6797,10 +6799,11 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	}
 
 	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, false)) {
+	ret = test_pages_isolated(outer_start, end, false);
+	spin_unlock_irqrestore(&zone->lock, flags);
+	if (ret) {
 		pr_info("%s: [%lx, %lx) PFNs busy\n",
 			__func__, outer_start, end);
-		ret = -EBUSY;
 		goto done;
 	}
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 92c4c36501e7..357c8e39a08e 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -246,12 +246,18 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 	return pfn;
 }
 
+/*
+ * Test all pages in the range are isolated.
+ * All pages in [start_pfn...end_pfn) must be in the same zone.
+ * zone->lock must be held before call this.
+ *
+ * Returns 0 if all pages in the range are isolated.
+ */
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 			bool skip_hwpoisoned_pages)
 {
-	unsigned long pfn, flags;
+	unsigned long pfn;
 	struct page *page;
-	struct zone *zone;
 
 	/*
 	 * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
@@ -267,11 +273,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	if ((pfn < end_pfn) || !page)
 		return -EBUSY;
 	/* Check all pages are free or marked as ISOLATED */
-	zone = page_zone(page);
-	spin_lock_irqsave(&zone->lock, flags);
 	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
 						skip_hwpoisoned_pages);
-	spin_unlock_irqrestore(&zone->lock, flags);
 
 	trace_test_pages_isolated(start_pfn, end_pfn, pfn);
 
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm/page_isolation: let caller take the zone lock for test_pages_isolated
  2016-03-16 16:49 [PATCH] mm/page_isolation: let caller take the zone lock for test_pages_isolated Lucas Stach
@ 2016-03-17 17:18 ` Joonsoo Kim
  2016-03-17 17:33   ` Lucas Stach
  0 siblings, 1 reply; 3+ messages in thread
From: Joonsoo Kim @ 2016-03-17 17:18 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Linux Memory Management List, LKML, David Rientjes,
	Vlastimil Babka, kernel, patchwork-lst

2016-03-17 1:49 GMT+09:00 Lucas Stach <l.stach@pengutronix.de>:
> This fixes an annoying race in the CMA code leading to lots of "PFNs busy"
> messages when CMA is used concurrently. This is harmless normally as CMA
> will just retry the allocation at a different place, but it might lead to
> increased fragmentation of the CMA area as well as failing allocations
> when CMA is under memory pressure.
>
> The issue is that test_pages_isolated checks if the range is free by
> checking that all pages in the range are buddy pages. For this to work
> the start pfn needs to be aligned to the higher order buddy page
> including the start pfn if there is any.
>
> This is not a problem for the memory hotplug code, as it always offlines
> whole pageblocks, but CMA may want to isolate a smaller range. So for
> the check to work correctly it down-aligns the start pfn to the higher
> order buddy page. As the zone is not yet locked at that point a
> concurrent page free might coalesce the pages to be checked into an
> even bigger buddy page, causing the check to fail, while all pages are
> in fact buddy pages.
>
> By moving the zone locking to the caller of the test function, it's
> possible to do it before CMA tries to find the proper start page and stop
> any concurrent page coalescing to happen until the check is finished.

I think that this patch cannot prevent the same race on
isolate_freepages_range(). If buddy merging happens after we
passed test_pages_isolated(), isolate_freepages_range() cannot see
buddy page and will fail.

Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm/page_isolation: let caller take the zone lock for test_pages_isolated
  2016-03-17 17:18 ` Joonsoo Kim
@ 2016-03-17 17:33   ` Lucas Stach
  0 siblings, 0 replies; 3+ messages in thread
From: Lucas Stach @ 2016-03-17 17:33 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Linux Memory Management List, LKML, David Rientjes,
	Vlastimil Babka, kernel, patchwork-lst

Am Freitag, den 18.03.2016, 02:18 +0900 schrieb Joonsoo Kim:
> 2016-03-17 1:49 GMT+09:00 Lucas Stach <l.stach@pengutronix.de>:
> > This fixes an annoying race in the CMA code leading to lots of "PFNs busy"
> > messages when CMA is used concurrently. This is harmless normally as CMA
> > will just retry the allocation at a different place, but it might lead to
> > increased fragmentation of the CMA area as well as failing allocations
> > when CMA is under memory pressure.
> >
> > The issue is that test_pages_isolated checks if the range is free by
> > checking that all pages in the range are buddy pages. For this to work
> > the start pfn needs to be aligned to the higher order buddy page
> > including the start pfn if there is any.
> >
> > This is not a problem for the memory hotplug code, as it always offlines
> > whole pageblocks, but CMA may want to isolate a smaller range. So for
> > the check to work correctly it down-aligns the start pfn to the higher
> > order buddy page. As the zone is not yet locked at that point a
> > concurrent page free might coalesce the pages to be checked into an
> > even bigger buddy page, causing the check to fail, while all pages are
> > in fact buddy pages.
> >
> > By moving the zone locking to the caller of the test function, it's
> > possible to do it before CMA tries to find the proper start page and stop
> > any concurrent page coalescing to happen until the check is finished.
> 
> I think that this patch cannot prevent the same race on
> isolate_freepages_range(). If buddy merging happens after we
> passed test_pages_isolated(), isolate_freepages_range() cannot see
> buddy page and will fail.
> 
Your analysis seems correct. I'll fix this patch to hold the zone lock
across isolate_freepages_range.

Thanks,
Lucas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-03-17 17:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-16 16:49 [PATCH] mm/page_isolation: let caller take the zone lock for test_pages_isolated Lucas Stach
2016-03-17 17:18 ` Joonsoo Kim
2016-03-17 17:33   ` Lucas Stach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).