linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] drain pcp outside of page isolation
@ 2020-09-04 15:14 Pavel Tatashin
  2020-09-04 15:14 ` [PATCH v3 1/2] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin
  2020-09-04 15:14 ` [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range Pavel Tatashin
  0 siblings, 2 replies; 7+ messages in thread
From: Pavel Tatashin @ 2020-09-04 15:14 UTC (permalink / raw)
  To: linux-kernel, akpm, mhocko, linux-mm, pasha.tatashin, osalvador,
	richard.weiyang, david, vbabka, rientjes

Moved drain_all_pages() from start_isolate_page_range() to users. This
makes it more efficient, symmetric, and solves the race condition.

This is a proposal that I described in Version 1 thread, otherwise no changes
to patch 1.

Version 1:
https://lore.kernel.org/lkml/20200901124615.137200-1-pasha.tatashin@soleen.com

Version 2:
https://lore.kernel.org/lkml/20200903140032.380431-1-pasha.tatashin@soleen.com

Pavel Tatashin (2):
  mm/memory_hotplug: drain per-cpu pages again during memory offline
  mm: drain per-cpu pages outside of isolate_migratepages_range

 mm/memory_hotplug.c |  1 +
 mm/page_alloc.c     |  2 ++
 mm/page_isolation.c | 32 ++++++++++++--------------------
 3 files changed, 15 insertions(+), 20 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/2] mm/memory_hotplug: drain per-cpu pages again during memory offline
  2020-09-04 15:14 [PATCH v3 0/2] drain pcp outside of page isolation Pavel Tatashin
@ 2020-09-04 15:14 ` Pavel Tatashin
  2020-09-07  7:27   ` Michal Hocko
  2020-09-08  9:57   ` David Hildenbrand
  2020-09-04 15:14 ` [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range Pavel Tatashin
  1 sibling, 2 replies; 7+ messages in thread
From: Pavel Tatashin @ 2020-09-04 15:14 UTC (permalink / raw)
  To: linux-kernel, akpm, mhocko, linux-mm, pasha.tatashin, osalvador,
	richard.weiyang, david, vbabka, rientjes

There is a race during page offline that can lead to infinite loop:
a page never ends up on a buddy list and __offline_pages() keeps
retrying infinitely or until a termination signal is received.

Thread#1 - a new process:

load_elf_binary
 begin_new_exec
  exec_mmap
   mmput
    exit_mmap
     tlb_finish_mmu
      tlb_flush_mmu
       release_pages
        free_unref_page_list
         free_unref_page_prepare
          set_pcppage_migratetype(page, migratetype);
             // Set page->index migration type below  MIGRATE_PCPTYPES

Thread#2 - hot-removes memory
__offline_pages
  start_isolate_page_range
    set_migratetype_isolate
      set_pageblock_migratetype(page, MIGRATE_ISOLATE);
        Set migration type to MIGRATE_ISOLATE-> set
        drain_all_pages(zone);
             // drain per-cpu page lists to buddy allocator.

Thread#1 - continue
         free_unref_page_commit
           migratetype = get_pcppage_migratetype(page);
              // get old migration type
           list_add(&page->lru, &pcp->lists[migratetype]);
              // add new page to already drained pcp list

Thread#2
Never drains pcp again, and therefore gets stuck in the loop.

The fix is to try to drain per-cpu lists again after
check_pages_isolated_cb() fails.

Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list")

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: stable@vger.kernel.org
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/memory_hotplug.c | 14 ++++++++++++++
 mm/page_isolation.c |  8 ++++++++
 2 files changed, 22 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e9d5ab5d3ca0..b11a269e2356 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1575,6 +1575,20 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		/* check again */
 		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
 					    NULL, check_pages_isolated_cb);
+		/*
+		 * per-cpu pages are drained in start_isolate_page_range, but if
+		 * there are still pages that are not free, make sure that we
+		 * drain again, because when we isolated range we might
+		 * have raced with another thread that was adding pages to pcp
+		 * list.
+		 *
+		 * Forward progress should be still guaranteed because
+		 * pages on the pcp list can only belong to MOVABLE_ZONE
+		 * because has_unmovable_pages explicitly checks for
+		 * PageBuddy on freed pages on other zones.
+		 */
+		if (ret)
+			drain_all_pages(zone);
 	} while (ret);
 
 	/* Ok, all of our target is isolated.
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 242c03121d73..63a3db10a8c0 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * pageblocks we may have modified and return -EBUSY to caller. This
  * prevents two threads from simultaneously working on overlapping ranges.
  *
+ * Please note that there is no strong synchronization with the page allocator
+ * either. Pages might be freed while their page blocks are marked ISOLATED.
+ * In some cases pages might still end up on pcp lists and that would allow
+ * for their allocation even when they are in fact isolated already. Depending
+ * on how strong of a guarantee the caller needs drain_all_pages might be needed
+ * (e.g. __offline_pages will need to call it after check for isolated range for
+ * a next retry).
+ *
  * Return: the number of isolated pageblocks on success and -EBUSY if any part
  * of range cannot be isolated.
  */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range
  2020-09-04 15:14 [PATCH v3 0/2] drain pcp outside of page isolation Pavel Tatashin
  2020-09-04 15:14 ` [PATCH v3 1/2] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin
@ 2020-09-04 15:14 ` Pavel Tatashin
  2020-09-07  7:32   ` Michal Hocko
  1 sibling, 1 reply; 7+ messages in thread
From: Pavel Tatashin @ 2020-09-04 15:14 UTC (permalink / raw)
  To: linux-kernel, akpm, mhocko, linux-mm, pasha.tatashin, osalvador,
	richard.weiyang, david, vbabka, rientjes

It is expensive to drain per-cpu page lists as a thread is started on each
CPU, and we are waiting for them to complete.

Currently, we drain on every block of pages that is isolated. Instead lets
drain once after pages are isolated.

For example, when 2G of memory is hot-removed drain is called 16 times,
with this change it will be done only one time on average.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 mm/memory_hotplug.c | 15 +--------------
 mm/page_alloc.c     |  2 ++
 mm/page_isolation.c | 40 ++++++++++++----------------------------
 3 files changed, 15 insertions(+), 42 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index b11a269e2356..5a2ed1a94555 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1536,6 +1536,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 	}
 
 	do {
+		drain_all_pages(zone);
 		pfn = start_pfn;
 		do {
 			if (signal_pending(current)) {
@@ -1575,20 +1576,6 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		/* check again */
 		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
 					    NULL, check_pages_isolated_cb);
-		/*
-		 * per-cpu pages are drained in start_isolate_page_range, but if
-		 * there are still pages that are not free, make sure that we
-		 * drain again, because when we isolated range we might
-		 * have raced with another thread that was adding pages to pcp
-		 * list.
-		 *
-		 * Forward progress should be still guaranteed because
-		 * pages on the pcp list can only belong to MOVABLE_ZONE
-		 * because has_unmovable_pages explicitly checks for
-		 * PageBuddy on freed pages on other zones.
-		 */
-		if (ret)
-			drain_all_pages(zone);
 	} while (ret);
 
 	/* Ok, all of our target is isolated.
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fab5e97dc9ca..6d6a501a103e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8462,6 +8462,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	if (ret < 0)
 		return ret;
 
+	drain_all_pages(cc.zone);
+
 	/*
 	 * In case of -EBUSY, we'd like to know which page causes problem.
 	 * So, just fall through. test_pages_isolated() has a tracepoint
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 63a3db10a8c0..8dfa6c6c668d 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -19,8 +19,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 {
 	struct page *unmovable = NULL;
 	struct zone *zone;
-	unsigned long flags;
-	int ret = -EBUSY;
+	unsigned long flags, nr_pages;
+	int ret = -EBUSY, mt;
 
 	zone = page_zone(page);
 
@@ -39,24 +39,18 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
 	 * We just check MOVABLE pages.
 	 */
 	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
-	if (!unmovable) {
-		unsigned long nr_pages;
-		int mt = get_pageblock_migratetype(page);
-
-		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
-		zone->nr_isolate_pageblock++;
-		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
-									NULL);
-
-		__mod_zone_freepage_state(zone, -nr_pages, mt);
-		ret = 0;
-	}
+	if (unmovable)
+		goto out;
 
+	mt = get_pageblock_migratetype(page);
+	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
+	zone->nr_isolate_pageblock++;
+	nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, NULL);
+	__mod_zone_freepage_state(zone, -nr_pages, mt);
+	ret = 0;
 out:
 	spin_unlock_irqrestore(&zone->lock, flags);
-	if (!ret) {
-		drain_all_pages(zone);
-	} else {
+	if (ret) {
 		WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
 
 		if ((isol_flags & REPORT_FAILURE) && unmovable)
@@ -170,14 +164,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * pageblocks we may have modified and return -EBUSY to caller. This
  * prevents two threads from simultaneously working on overlapping ranges.
  *
- * Please note that there is no strong synchronization with the page allocator
- * either. Pages might be freed while their page blocks are marked ISOLATED.
- * In some cases pages might still end up on pcp lists and that would allow
- * for their allocation even when they are in fact isolated already. Depending
- * on how strong of a guarantee the caller needs drain_all_pages might be needed
- * (e.g. __offline_pages will need to call it after check for isolated range for
- * a next retry).
- *
  * Return: the number of isolated pageblocks on success and -EBUSY if any part
  * of range cannot be isolated.
  */
@@ -192,9 +178,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
 	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
 
-	for (pfn = start_pfn;
-	     pfn < end_pfn;
-	     pfn += pageblock_nr_pages) {
+	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page) {
 			if (set_migratetype_isolate(page, migratetype, flags)) {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] mm/memory_hotplug: drain per-cpu pages again during memory offline
  2020-09-04 15:14 ` [PATCH v3 1/2] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin
@ 2020-09-07  7:27   ` Michal Hocko
  2020-09-08  9:57   ` David Hildenbrand
  1 sibling, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2020-09-07  7:27 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, akpm, linux-mm, osalvador, richard.weiyang, david,
	vbabka, rientjes

On Fri 04-09-20 11:14:47, Pavel Tatashin wrote:
> There is a race during page offline that can lead to infinite loop:
> a page never ends up on a buddy list and __offline_pages() keeps
> retrying infinitely or until a termination signal is received.
> 
> Thread#1 - a new process:
> 
> load_elf_binary
>  begin_new_exec
>   exec_mmap
>    mmput
>     exit_mmap
>      tlb_finish_mmu
>       tlb_flush_mmu
>        release_pages
>         free_unref_page_list
>          free_unref_page_prepare
>           set_pcppage_migratetype(page, migratetype);
>              // Set page->index migration type below  MIGRATE_PCPTYPES
> 
> Thread#2 - hot-removes memory
> __offline_pages
>   start_isolate_page_range
>     set_migratetype_isolate
>       set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>         Set migration type to MIGRATE_ISOLATE-> set
>         drain_all_pages(zone);
>              // drain per-cpu page lists to buddy allocator.
> 
> Thread#1 - continue
>          free_unref_page_commit
>            migratetype = get_pcppage_migratetype(page);
>               // get old migration type
>            list_add(&page->lru, &pcp->lists[migratetype]);
>               // add new page to already drained pcp list
> 
> Thread#2
> Never drains pcp again, and therefore gets stuck in the loop.
> 
> The fix is to try to drain per-cpu lists again after
> check_pages_isolated_cb() fails.
> 
> Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list")
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: stable@vger.kernel.org
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Already acked the mmotm version but let's add it here as well.
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/memory_hotplug.c | 14 ++++++++++++++
>  mm/page_isolation.c |  8 ++++++++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e9d5ab5d3ca0..b11a269e2356 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1575,6 +1575,20 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  		/* check again */
>  		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
>  					    NULL, check_pages_isolated_cb);
> +		/*
> +		 * per-cpu pages are drained in start_isolate_page_range, but if
> +		 * there are still pages that are not free, make sure that we
> +		 * drain again, because when we isolated range we might
> +		 * have raced with another thread that was adding pages to pcp
> +		 * list.
> +		 *
> +		 * Forward progress should be still guaranteed because
> +		 * pages on the pcp list can only belong to MOVABLE_ZONE
> +		 * because has_unmovable_pages explicitly checks for
> +		 * PageBuddy on freed pages on other zones.
> +		 */
> +		if (ret)
> +			drain_all_pages(zone);
>  	} while (ret);
>  
>  	/* Ok, all of our target is isolated.
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 242c03121d73..63a3db10a8c0 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * pageblocks we may have modified and return -EBUSY to caller. This
>   * prevents two threads from simultaneously working on overlapping ranges.
>   *
> + * Please note that there is no strong synchronization with the page allocator
> + * either. Pages might be freed while their page blocks are marked ISOLATED.
> + * In some cases pages might still end up on pcp lists and that would allow
> + * for their allocation even when they are in fact isolated already. Depending
> + * on how strong of a guarantee the caller needs drain_all_pages might be needed
> + * (e.g. __offline_pages will need to call it after check for isolated range for
> + * a next retry).
> + *
>   * Return: the number of isolated pageblocks on success and -EBUSY if any part
>   * of range cannot be isolated.
>   */
> -- 
> 2.25.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range
  2020-09-04 15:14 ` [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range Pavel Tatashin
@ 2020-09-07  7:32   ` Michal Hocko
  2020-09-08  9:26     ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2020-09-07  7:32 UTC (permalink / raw)
  To: Pavel Tatashin
  Cc: linux-kernel, akpm, linux-mm, osalvador, richard.weiyang, david,
	vbabka, rientjes

On Fri 04-09-20 11:14:48, Pavel Tatashin wrote:
> It is expensive to drain per-cpu page lists as a thread is started on each
> CPU, and we are waiting for them to complete.
> 
> Currently, we drain on every block of pages that is isolated. Instead lets
> drain once after pages are isolated.
> 
> For example, when 2G of memory is hot-removed drain is called 16 times,
> with this change it will be done only one time on average.

I do agree that the current implementation is much less effective than
it could be but I disagree we should be pushing the burden to all
callers as already stated
(http://lkml.kernel.org/r/20200907072608.GE30144@dhcp22.suse.cz).

I believe it should be perfectly fine to start_isolate_page_range would
improve the situation considerably. There are some minor details to sort
out (multizone pfn span which we do not allow but probably should be
enforcing somehow).

> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>

I believe we should be going an opposite direction and define a more
understandable and usable semantic for start_isolate_page_range. We do
not want callers to scratch their heads to call all caches they might
need to flush.

> ---
>  mm/memory_hotplug.c | 15 +--------------
>  mm/page_alloc.c     |  2 ++
>  mm/page_isolation.c | 40 ++++++++++++----------------------------
>  3 files changed, 15 insertions(+), 42 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index b11a269e2356..5a2ed1a94555 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1536,6 +1536,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  	}
>  
>  	do {
> +		drain_all_pages(zone);
>  		pfn = start_pfn;
>  		do {
>  			if (signal_pending(current)) {
> @@ -1575,20 +1576,6 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  		/* check again */
>  		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
>  					    NULL, check_pages_isolated_cb);
> -		/*
> -		 * per-cpu pages are drained in start_isolate_page_range, but if
> -		 * there are still pages that are not free, make sure that we
> -		 * drain again, because when we isolated range we might
> -		 * have raced with another thread that was adding pages to pcp
> -		 * list.
> -		 *
> -		 * Forward progress should be still guaranteed because
> -		 * pages on the pcp list can only belong to MOVABLE_ZONE
> -		 * because has_unmovable_pages explicitly checks for
> -		 * PageBuddy on freed pages on other zones.
> -		 */
> -		if (ret)
> -			drain_all_pages(zone);
>  	} while (ret);
>  
>  	/* Ok, all of our target is isolated.
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fab5e97dc9ca..6d6a501a103e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8462,6 +8462,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	if (ret < 0)
>  		return ret;
>  
> +	drain_all_pages(cc.zone);
> +
>  	/*
>  	 * In case of -EBUSY, we'd like to know which page causes problem.
>  	 * So, just fall through. test_pages_isolated() has a tracepoint
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 63a3db10a8c0..8dfa6c6c668d 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -19,8 +19,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>  {
>  	struct page *unmovable = NULL;
>  	struct zone *zone;
> -	unsigned long flags;
> -	int ret = -EBUSY;
> +	unsigned long flags, nr_pages;
> +	int ret = -EBUSY, mt;
>  
>  	zone = page_zone(page);
>  
> @@ -39,24 +39,18 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>  	 * We just check MOVABLE pages.
>  	 */
>  	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
> -	if (!unmovable) {
> -		unsigned long nr_pages;
> -		int mt = get_pageblock_migratetype(page);
> -
> -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> -		zone->nr_isolate_pageblock++;
> -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
> -									NULL);
> -
> -		__mod_zone_freepage_state(zone, -nr_pages, mt);
> -		ret = 0;
> -	}
> +	if (unmovable)
> +		goto out;
>  
> +	mt = get_pageblock_migratetype(page);
> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
> +	zone->nr_isolate_pageblock++;
> +	nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, NULL);
> +	__mod_zone_freepage_state(zone, -nr_pages, mt);
> +	ret = 0;
>  out:
>  	spin_unlock_irqrestore(&zone->lock, flags);
> -	if (!ret) {
> -		drain_all_pages(zone);
> -	} else {
> +	if (ret) {
>  		WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
>  
>  		if ((isol_flags & REPORT_FAILURE) && unmovable)
> @@ -170,14 +164,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * pageblocks we may have modified and return -EBUSY to caller. This
>   * prevents two threads from simultaneously working on overlapping ranges.
>   *
> - * Please note that there is no strong synchronization with the page allocator
> - * either. Pages might be freed while their page blocks are marked ISOLATED.
> - * In some cases pages might still end up on pcp lists and that would allow
> - * for their allocation even when they are in fact isolated already. Depending
> - * on how strong of a guarantee the caller needs drain_all_pages might be needed
> - * (e.g. __offline_pages will need to call it after check for isolated range for
> - * a next retry).
> - *
>   * Return: the number of isolated pageblocks on success and -EBUSY if any part
>   * of range cannot be isolated.
>   */
> @@ -192,9 +178,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
>  	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
>  
> -	for (pfn = start_pfn;
> -	     pfn < end_pfn;
> -	     pfn += pageblock_nr_pages) {
> +	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
>  		page = __first_valid_page(pfn, pageblock_nr_pages);
>  		if (page) {
>  			if (set_migratetype_isolate(page, migratetype, flags)) {
> -- 
> 2.25.1
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range
  2020-09-07  7:32   ` Michal Hocko
@ 2020-09-08  9:26     ` David Hildenbrand
  0 siblings, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2020-09-08  9:26 UTC (permalink / raw)
  To: Michal Hocko, Pavel Tatashin
  Cc: linux-kernel, akpm, linux-mm, osalvador, richard.weiyang, vbabka,
	rientjes

>> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> 
> I believe we should be going an opposite direction and define a more
> understandable and usable semantic for start_isolate_page_range. We do
> not want callers to scratch their heads to call all caches they might
> need to flush.

I still prefer temporarily disabling PCP, or using a different mechanism
to teach PCP to no work on isolated pageblocks (while the latter seems
to be harder to achieve). I can spot that Vlastimil already tried to
implement it - I'll have a look shortly.

Having that said, this patch improves the situation in case we cannot
get temporarily disabling of PCP implemented - instead of flushing at
random place, we flush at less random places :)

But it is far from ideal. It's just another of the special cases that
developed over time and instead of properly fixing it, we work around it.

Note: I think the flush semantics of drain_all_pages() is very different
from lru_add_drain_all(). The latter does not mess with isolated page
that have just been freed (as it's not part of the core buddy / buddy
extension).

> 
>> ---
>>  mm/memory_hotplug.c | 15 +--------------
>>  mm/page_alloc.c     |  2 ++
>>  mm/page_isolation.c | 40 ++++++++++++----------------------------
>>  3 files changed, 15 insertions(+), 42 deletions(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index b11a269e2356..5a2ed1a94555 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1536,6 +1536,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
>>  	}
>>  
>>  	do {
>> +		drain_all_pages(zone);
>>  		pfn = start_pfn;
>>  		do {
>>  			if (signal_pending(current)) {
>> @@ -1575,20 +1576,6 @@ static int __ref __offline_pages(unsigned long start_pfn,
>>  		/* check again */
>>  		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
>>  					    NULL, check_pages_isolated_cb);
>> -		/*
>> -		 * per-cpu pages are drained in start_isolate_page_range, but if
>> -		 * there are still pages that are not free, make sure that we
>> -		 * drain again, because when we isolated range we might
>> -		 * have raced with another thread that was adding pages to pcp
>> -		 * list.
>> -		 *
>> -		 * Forward progress should be still guaranteed because
>> -		 * pages on the pcp list can only belong to MOVABLE_ZONE
>> -		 * because has_unmovable_pages explicitly checks for
>> -		 * PageBuddy on freed pages on other zones.
>> -		 */
>> -		if (ret)
>> -			drain_all_pages(zone);
>>  	} while (ret);
>>  
>>  	/* Ok, all of our target is isolated.
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index fab5e97dc9ca..6d6a501a103e 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -8462,6 +8462,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>  	if (ret < 0)
>>  		return ret;
>>  
>> +	drain_all_pages(cc.zone);
>> +
>>  	/*
>>  	 * In case of -EBUSY, we'd like to know which page causes problem.
>>  	 * So, just fall through. test_pages_isolated() has a tracepoint
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 63a3db10a8c0..8dfa6c6c668d 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -19,8 +19,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>>  {
>>  	struct page *unmovable = NULL;
>>  	struct zone *zone;
>> -	unsigned long flags;
>> -	int ret = -EBUSY;
>> +	unsigned long flags, nr_pages;
>> +	int ret = -EBUSY, mt;
>>  
>>  	zone = page_zone(page);
>>  
>> @@ -39,24 +39,18 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
>>  	 * We just check MOVABLE pages.
>>  	 */
>>  	unmovable = has_unmovable_pages(zone, page, migratetype, isol_flags);
>> -	if (!unmovable) {
>> -		unsigned long nr_pages;
>> -		int mt = get_pageblock_migratetype(page);
>> -
>> -		set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>> -		zone->nr_isolate_pageblock++;
>> -		nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE,
>> -									NULL);
>> -
>> -		__mod_zone_freepage_state(zone, -nr_pages, mt);
>> -		ret = 0;
>> -	}
>> +	if (unmovable)
>> +		goto out;
>>  
>> +	mt = get_pageblock_migratetype(page);
>> +	set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>> +	zone->nr_isolate_pageblock++;
>> +	nr_pages = move_freepages_block(zone, page, MIGRATE_ISOLATE, NULL);
>> +	__mod_zone_freepage_state(zone, -nr_pages, mt);
>> +	ret = 0;
>>  out:
>>  	spin_unlock_irqrestore(&zone->lock, flags);
>> -	if (!ret) {
>> -		drain_all_pages(zone);
>> -	} else {
>> +	if (ret) {
>>  		WARN_ON_ONCE(zone_idx(zone) == ZONE_MOVABLE);
>>  

There are patches in -mm/-next by me that already cleanup/reshuffle that
code heavily, so better just keep the changes minimal or built up on top
of -next.

>>  		if ((isol_flags & REPORT_FAILURE) && unmovable)
>> @@ -170,14 +164,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>>   * pageblocks we may have modified and return -EBUSY to caller. This
>>   * prevents two threads from simultaneously working on overlapping ranges.
>>   *
>> - * Please note that there is no strong synchronization with the page allocator
>> - * either. Pages might be freed while their page blocks are marked ISOLATED.
>> - * In some cases pages might still end up on pcp lists and that would allow
>> - * for their allocation even when they are in fact isolated already. Depending
>> - * on how strong of a guarantee the caller needs drain_all_pages might be needed
>> - * (e.g. __offline_pages will need to call it after check for isolated range for
>> - * a next retry).
>> - *
>>   * Return: the number of isolated pageblocks on success and -EBUSY if any part
>>   * of range cannot be isolated.
>>   */
>> @@ -192,9 +178,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>  	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
>>  	BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
>>  
>> -	for (pfn = start_pfn;
>> -	     pfn < end_pfn;
>> -	     pfn += pageblock_nr_pages) {
>> +	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {

unnecessary code churn.

>>  		page = __first_valid_page(pfn, pageblock_nr_pages);
>>  		if (page) {
>>  			if (set_migratetype_isolate(page, migratetype, flags)) {
>> -- 
>> 2.25.1
>>
> 


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] mm/memory_hotplug: drain per-cpu pages again during memory offline
  2020-09-04 15:14 ` [PATCH v3 1/2] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin
  2020-09-07  7:27   ` Michal Hocko
@ 2020-09-08  9:57   ` David Hildenbrand
  1 sibling, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2020-09-08  9:57 UTC (permalink / raw)
  To: Pavel Tatashin, linux-kernel, akpm, mhocko, linux-mm, osalvador,
	richard.weiyang, vbabka, rientjes

On 04.09.20 17:14, Pavel Tatashin wrote:
> There is a race during page offline that can lead to infinite loop:
> a page never ends up on a buddy list and __offline_pages() keeps
> retrying infinitely or until a termination signal is received.
> 
> Thread#1 - a new process:
> 
> load_elf_binary
>  begin_new_exec
>   exec_mmap
>    mmput
>     exit_mmap
>      tlb_finish_mmu
>       tlb_flush_mmu
>        release_pages
>         free_unref_page_list
>          free_unref_page_prepare
>           set_pcppage_migratetype(page, migratetype);
>              // Set page->index migration type below  MIGRATE_PCPTYPES
> 
> Thread#2 - hot-removes memory
> __offline_pages
>   start_isolate_page_range
>     set_migratetype_isolate
>       set_pageblock_migratetype(page, MIGRATE_ISOLATE);
>         Set migration type to MIGRATE_ISOLATE-> set
>         drain_all_pages(zone);
>              // drain per-cpu page lists to buddy allocator.
> 
> Thread#1 - continue
>          free_unref_page_commit
>            migratetype = get_pcppage_migratetype(page);
>               // get old migration type
>            list_add(&page->lru, &pcp->lists[migratetype]);
>               // add new page to already drained pcp list
> 
> Thread#2
> Never drains pcp again, and therefore gets stuck in the loop.
> 
> The fix is to try to drain per-cpu lists again after
> check_pages_isolated_cb() fails.
> 
> Fixes: c52e75935f8d ("mm: remove extra drain pages on pcp list")
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: stable@vger.kernel.org
> Acked-by: David Rientjes <rientjes@google.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/memory_hotplug.c | 14 ++++++++++++++
>  mm/page_isolation.c |  8 ++++++++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index e9d5ab5d3ca0..b11a269e2356 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1575,6 +1575,20 @@ static int __ref __offline_pages(unsigned long start_pfn,
>  		/* check again */
>  		ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn,
>  					    NULL, check_pages_isolated_cb);
> +		/*
> +		 * per-cpu pages are drained in start_isolate_page_range, but if
> +		 * there are still pages that are not free, make sure that we
> +		 * drain again, because when we isolated range we might
> +		 * have raced with another thread that was adding pages to pcp
> +		 * list.
> +		 *
> +		 * Forward progress should be still guaranteed because
> +		 * pages on the pcp list can only belong to MOVABLE_ZONE
> +		 * because has_unmovable_pages explicitly checks for
> +		 * PageBuddy on freed pages on other zones.
> +		 */
> +		if (ret)
> +			drain_all_pages(zone);
>  	} while (ret);
>  
>  	/* Ok, all of our target is isolated.
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 242c03121d73..63a3db10a8c0 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * pageblocks we may have modified and return -EBUSY to caller. This
>   * prevents two threads from simultaneously working on overlapping ranges.
>   *
> + * Please note that there is no strong synchronization with the page allocator
> + * either. Pages might be freed while their page blocks are marked ISOLATED.
> + * In some cases pages might still end up on pcp lists and that would allow
> + * for their allocation even when they are in fact isolated already. Depending
> + * on how strong of a guarantee the caller needs drain_all_pages might be needed
> + * (e.g. __offline_pages will need to call it after check for isolated range for
> + * a next retry).
> + *
>   * Return: the number of isolated pageblocks on success and -EBUSY if any part
>   * of range cannot be isolated.
>   */
> 

Acked-by: David Hildenbrand <david@redhat.com>

As an easy stable fix to be improved in the near future.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-08  9:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 15:14 [PATCH v3 0/2] drain pcp outside of page isolation Pavel Tatashin
2020-09-04 15:14 ` [PATCH v3 1/2] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin
2020-09-07  7:27   ` Michal Hocko
2020-09-08  9:57   ` David Hildenbrand
2020-09-04 15:14 ` [PATCH v3 2/2] mm: drain per-cpu pages outside of isolate_migratepages_range Pavel Tatashin
2020-09-07  7:32   ` Michal Hocko
2020-09-08  9:26     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).