linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] mm/compaction: count pages and stop correctly during page isolation.
@ 2020-10-30 18:38 Zi Yan
  2020-10-30 18:38 ` [PATCH v3 2/2] mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate Zi Yan
  2020-11-02 13:03 ` [PATCH v3 1/2] mm/compaction: count pages and stop correctly during page isolation Vlastimil Babka
  0 siblings, 2 replies; 4+ messages in thread
From: Zi Yan @ 2020-10-30 18:38 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Yang Shi, Michal Hocko, Vlastimil Babka, Rik van Riel,
	linux-kernel, stable, Zi Yan

From: Zi Yan <ziy@nvidia.com>

In isolate_migratepages_block, when cc->alloc_contig is true, we are
able to isolate compound pages, nr_migratepages and nr_isolated did not
count compound pages correctly, causing us to isolate more pages than we
thought. Count compound pages as the number of base pages they contain.
Otherwise, we might be trapped in too_many_isolated while loop,
since the actual isolated pages can go up to
COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
since we stop isolation after cc->nr_migratepages reaches to
COMPACT_CLUSTER_MAX.

In addition, after we fix the issue above, cc->nr_migratepages could
never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
thus page isolation could not stop as we intended. Change the isolation
stop condition to >=.

The issue can be triggered as follows:
In a system with 16GB memory and an 8GB CMA region reserved by
hugetlb_cma, if we first allocate 10GB THPs and mlock them
(so some THPs are allocated in the CMA region and mlocked), reserving
6 1GB hugetlb pages via
/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages will get stuck
(looping in too_many_isolated function) until we kill either task.
With the patch applied, oom will kill the application with 10GB THPs and
let hugetlb page reservation finish.

Fixes: 1da2f328fa64 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
---
 mm/compaction.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ee1f8439369e..3e834ac402f1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 isolate_success:
 		list_add(&page->lru, &cc->migratepages);
-		cc->nr_migratepages++;
-		nr_isolated++;
+		cc->nr_migratepages += compound_nr(page);
+		nr_isolated += compound_nr(page);
 
 		/*
 		 * Avoid isolating too much unless this block is being
@@ -1021,7 +1021,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * or a lock is contended. For contention, isolate quickly to
 		 * potentially remove one source of contention.
 		 */
-		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
+		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
 		    !cc->rescan && !cc->contended) {
 			++low_pfn;
 			break;
@@ -1132,7 +1132,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 		if (!pfn)
 			break;
 
-		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
+		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
 			break;
 	}
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3 2/2] mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate.
  2020-10-30 18:38 [PATCH v3 1/2] mm/compaction: count pages and stop correctly during page isolation Zi Yan
@ 2020-10-30 18:38 ` Zi Yan
  2020-11-02 13:07   ` Vlastimil Babka
  2020-11-02 13:03 ` [PATCH v3 1/2] mm/compaction: count pages and stop correctly during page isolation Vlastimil Babka
  1 sibling, 1 reply; 4+ messages in thread
From: Zi Yan @ 2020-10-30 18:38 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Yang Shi, Michal Hocko, Vlastimil Babka, Rik van Riel,
	linux-kernel, stable, Zi Yan

From: Zi Yan <ziy@nvidia.com>

In isolate_migratepages_block, if we have too many isolated pages and
nr_migratepages is not zero, we should try to migrate what we have
without wasting time on isolating.

Fixes: 1da2f328fa64 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
---
 mm/compaction.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3e834ac402f1..4d237a7c3830 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -817,6 +817,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	 * delay for some time until fewer pages are isolated
 	 */
 	while (unlikely(too_many_isolated(pgdat))) {
+		/* stop isolation if there are still pages not migrated */
+		if (cc->nr_migratepages)
+			return 0;
+
 		/* async migration should just abort */
 		if (cc->mode == MIGRATE_ASYNC)
 			return 0;
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 1/2] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 18:38 [PATCH v3 1/2] mm/compaction: count pages and stop correctly during page isolation Zi Yan
  2020-10-30 18:38 ` [PATCH v3 2/2] mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate Zi Yan
@ 2020-11-02 13:03 ` Vlastimil Babka
  1 sibling, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2020-11-02 13:03 UTC (permalink / raw)
  To: Zi Yan, Andrew Morton, linux-mm
  Cc: Yang Shi, Michal Hocko, Rik van Riel, linux-kernel, stable

On 10/30/20 7:38 PM, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> able to isolate compound pages, nr_migratepages and nr_isolated did not
> count compound pages correctly, causing us to isolate more pages than we
> thought. Count compound pages as the number of base pages they contain.
> Otherwise, we might be trapped in too_many_isolated while loop,
> since the actual isolated pages can go up to
> COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
> since we stop isolation after cc->nr_migratepages reaches to
> COMPACT_CLUSTER_MAX.
> 
> In addition, after we fix the issue above, cc->nr_migratepages could
> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> thus page isolation could not stop as we intended. Change the isolation
> stop condition to >=.
> 
> The issue can be triggered as follows:
> In a system with 16GB memory and an 8GB CMA region reserved by
> hugetlb_cma, if we first allocate 10GB THPs and mlock them
> (so some THPs are allocated in the CMA region and mlocked), reserving
> 6 1GB hugetlb pages via
> /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages will get stuck
> (looping in too_many_isolated function) until we kill either task.
> With the patch applied, oom will kill the application with 10GB THPs and
> let hugetlb page reservation finish.
> 
> Fixes: 1da2f328fa64 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: Yang Shi <shy828301@gmail.com>
> Cc: <stable@vger.kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/compaction.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ee1f8439369e..3e834ac402f1 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   
>   isolate_success:
>   		list_add(&page->lru, &cc->migratepages);
> -		cc->nr_migratepages++;
> -		nr_isolated++;
> +		cc->nr_migratepages += compound_nr(page);
> +		nr_isolated += compound_nr(page);
>   
>   		/*
>   		 * Avoid isolating too much unless this block is being
> @@ -1021,7 +1021,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   		 * or a lock is contended. For contention, isolate quickly to
>   		 * potentially remove one source of contention.
>   		 */
> -		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
> +		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
>   		    !cc->rescan && !cc->contended) {
>   			++low_pfn;
>   			break;
> @@ -1132,7 +1132,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>   		if (!pfn)
>   			break;
>   
> -		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
> +		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
>   			break;
>   	}
>   
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 2/2] mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate.
  2020-10-30 18:38 ` [PATCH v3 2/2] mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate Zi Yan
@ 2020-11-02 13:07   ` Vlastimil Babka
  0 siblings, 0 replies; 4+ messages in thread
From: Vlastimil Babka @ 2020-11-02 13:07 UTC (permalink / raw)
  To: Zi Yan, Andrew Morton, linux-mm
  Cc: Yang Shi, Michal Hocko, Rik van Riel, linux-kernel, stable

On 10/30/20 7:38 PM, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> In isolate_migratepages_block, if we have too many isolated pages and
> nr_migratepages is not zero, we should try to migrate what we have
> without wasting time on isolating.

As you CC stable, there should be a stronger reason (strictly speaking the 
problem should have been observed in practice, but this is a simple patch, so 
they could accept it), so I suggest Andrew adds the following paragraph:

In theory it's possible that multiple parallel compactions will cause 
too_many_isolated() to become true even if each has isolated less than 
COMPACT_CLUSTER_MAX, and loop forever in the while loop. Bailing immediately 
prevents that.

> Fixes: 1da2f328fa64 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Cc: <stable@vger.kernel.org>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>   mm/compaction.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 3e834ac402f1..4d237a7c3830 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -817,6 +817,10 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   	 * delay for some time until fewer pages are isolated
>   	 */
>   	while (unlikely(too_many_isolated(pgdat))) {
> +		/* stop isolation if there are still pages not migrated */
> +		if (cc->nr_migratepages)
> +			return 0;
> +
>   		/* async migration should just abort */
>   		if (cc->mode == MIGRATE_ASYNC)
>   			return 0;
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-02 13:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30 18:38 [PATCH v3 1/2] mm/compaction: count pages and stop correctly during page isolation Zi Yan
2020-10-30 18:38 ` [PATCH v3 2/2] mm/compaction: stop isolation if too many pages are isolated and we have pages to migrate Zi Yan
2020-11-02 13:07   ` Vlastimil Babka
2020-11-02 13:03 ` [PATCH v3 1/2] mm/compaction: count pages and stop correctly during page isolation Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).