linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/compaction: count pages and stop correctly during page isolation.
@ 2020-10-29 20:04 Zi Yan
  2020-10-29 21:14 ` Yang Shi
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Zi Yan @ 2020-10-29 20:04 UTC (permalink / raw)
  To: Andrew Morton, linux-mm; +Cc: Rik van Riel, linux-kernel, Zi Yan

From: Zi Yan <ziy@nvidia.com>

In isolate_migratepages_block, when cc->alloc_contig is true, we are
able to isolate compound pages, nr_migratepages and nr_isolated did not
count compound pages correctly, causing us to isolate more pages than we
thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
in too_many_isolated while loop, since the actual isolated pages can go
up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
since we stop isolation after cc->nr_migratepages reaches to
COMPACT_CLUSTER_MAX.

In addition, after we fix the issue above, cc->nr_migratepages could
never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
thus page isolation could not stop as we intended. Change the isolation
stop condition to >=.

Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/compaction.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ee1f8439369e..0683a4999581 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 isolate_success:
 		list_add(&page->lru, &cc->migratepages);
-		cc->nr_migratepages++;
-		nr_isolated++;
+		cc->nr_migratepages += thp_nr_pages(page);
+		nr_isolated += thp_nr_pages(page);
 
 		/*
 		 * Avoid isolating too much unless this block is being
@@ -1021,7 +1021,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 		 * or a lock is contended. For contention, isolate quickly to
 		 * potentially remove one source of contention.
 		 */
-		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
+		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
 		    !cc->rescan && !cc->contended) {
 			++low_pfn;
 			break;
@@ -1132,7 +1132,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
 		if (!pfn)
 			break;
 
-		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
+		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
 			break;
 	}
 
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-29 20:04 [PATCH] mm/compaction: count pages and stop correctly during page isolation Zi Yan
@ 2020-10-29 21:14 ` Yang Shi
  2020-10-29 21:31   ` Zi Yan
  2020-10-30  9:43 ` Michal Hocko
  2020-10-30 14:50 ` Vlastimil Babka
  2 siblings, 1 reply; 19+ messages in thread
From: Yang Shi @ 2020-10-29 21:14 UTC (permalink / raw)
  To: Zi Yan; +Cc: Andrew Morton, Linux MM, Rik van Riel, Linux Kernel Mailing List

On Thu, Oct 29, 2020 at 1:04 PM Zi Yan <zi.yan@sent.com> wrote:
>
> From: Zi Yan <ziy@nvidia.com>
>
> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> able to isolate compound pages, nr_migratepages and nr_isolated did not
> count compound pages correctly, causing us to isolate more pages than we
> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
> in too_many_isolated while loop, since the actual isolated pages can go
> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,

Is it that easy to run into? 16384 doesn't seem like too many pages, just 64MB.

> since we stop isolation after cc->nr_migratepages reaches to
> COMPACT_CLUSTER_MAX.
>
> In addition, after we fix the issue above, cc->nr_migratepages could
> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> thus page isolation could not stop as we intended. Change the isolation
> stop condition to >=.

The fix looks sane to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

Shall you add Fixes tag to commit
1da2f328fa643bd72197dfed0c655148af31e4eb? And may cc stable.

>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/compaction.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ee1f8439369e..0683a4999581 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>
>  isolate_success:
>                 list_add(&page->lru, &cc->migratepages);
> -               cc->nr_migratepages++;
> -               nr_isolated++;
> +               cc->nr_migratepages += thp_nr_pages(page);
> +               nr_isolated += thp_nr_pages(page);
>
>                 /*
>                  * Avoid isolating too much unless this block is being
> @@ -1021,7 +1021,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>                  * or a lock is contended. For contention, isolate quickly to
>                  * potentially remove one source of contention.
>                  */
> -               if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
> +               if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
>                     !cc->rescan && !cc->contended) {
>                         ++low_pfn;
>                         break;
> @@ -1132,7 +1132,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>                 if (!pfn)
>                         break;
>
> -               if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
> +               if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
>                         break;
>         }
>
> --
> 2.28.0
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-29 21:14 ` Yang Shi
@ 2020-10-29 21:31   ` Zi Yan
  2020-10-30  0:28     ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2020-10-29 21:31 UTC (permalink / raw)
  To: Yang Shi
  Cc: Andrew Morton, Linux MM, Rik van Riel, Linux Kernel Mailing List, stable

[-- Attachment #1: Type: text/plain, Size: 3418 bytes --]

On 29 Oct 2020, at 17:14, Yang Shi wrote:

> On Thu, Oct 29, 2020 at 1:04 PM Zi Yan <zi.yan@sent.com> wrote:
>>
>> From: Zi Yan <ziy@nvidia.com>
>>
>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
>> able to isolate compound pages, nr_migratepages and nr_isolated did not
>> count compound pages correctly, causing us to isolate more pages than we
>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
>> in too_many_isolated while loop, since the actual isolated pages can go
>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
>
> Is it that easy to run into? 16384 doesn't seem like too many pages, just 64MB.

I hit this when I was running oom01 from ltp to test my PUD THP patchset, which
allocates PUD THPs from CMA regions and splits them into PMD THPs due to memory
pressure. I am not sure if it is common that in the upstream kernel PMD THPs will
be allocated in CMA regions due to allocation fallback.

>
>> since we stop isolation after cc->nr_migratepages reaches to
>> COMPACT_CLUSTER_MAX.
>>
>> In addition, after we fix the issue above, cc->nr_migratepages could
>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
>> thus page isolation could not stop as we intended. Change the isolation
>> stop condition to >=.
>
> The fix looks sane to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

Thanks.

>
> Shall you add Fixes tag to commit
> 1da2f328fa643bd72197dfed0c655148af31e4eb? And may cc stable.

Sure.

Fixes: 1da2f328fa64 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)

stable cc’ed.

>
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/compaction.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index ee1f8439369e..0683a4999581 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>
>>  isolate_success:
>>                 list_add(&page->lru, &cc->migratepages);
>> -               cc->nr_migratepages++;
>> -               nr_isolated++;
>> +               cc->nr_migratepages += thp_nr_pages(page);
>> +               nr_isolated += thp_nr_pages(page);
>>
>>                 /*
>>                  * Avoid isolating too much unless this block is being
>> @@ -1021,7 +1021,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>                  * or a lock is contended. For contention, isolate quickly to
>>                  * potentially remove one source of contention.
>>                  */
>> -               if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
>> +               if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
>>                     !cc->rescan && !cc->contended) {
>>                         ++low_pfn;
>>                         break;
>> @@ -1132,7 +1132,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>>                 if (!pfn)
>>                         break;
>>
>> -               if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
>> +               if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
>>                         break;
>>         }
>>
>> --
>> 2.28.0
>>
>>


—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-29 21:31   ` Zi Yan
@ 2020-10-30  0:28     ` Andrew Morton
  2020-10-30  1:20       ` Zi Yan
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2020-10-30  0:28 UTC (permalink / raw)
  To: Zi Yan
  Cc: Yang Shi, Linux MM, Rik van Riel, Linux Kernel Mailing List, stable

On Thu, 29 Oct 2020 17:31:28 -0400 Zi Yan <ziy@nvidia.com> wrote:

> >
> > Shall you add Fixes tag to commit
> > 1da2f328fa643bd72197dfed0c655148af31e4eb? And may cc stable.
> 
> Sure.
> 
> Fixes: 1da2f328fa64 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)
> 
> stable cc'ed.

A think a cc:stable really requires a description of the end-user
visible effects of the bug.  Could you please provide that?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30  0:28     ` Andrew Morton
@ 2020-10-30  1:20       ` Zi Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Zi Yan @ 2020-10-30  1:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yang Shi, Linux MM, Rik van Riel, Linux Kernel Mailing List, stable

[-- Attachment #1: Type: text/plain, Size: 1028 bytes --]

On 29 Oct 2020, at 20:28, Andrew Morton wrote:

> On Thu, 29 Oct 2020 17:31:28 -0400 Zi Yan <ziy@nvidia.com> wrote:
>
>>>
>>> Shall you add Fixes tag to commit
>>> 1da2f328fa643bd72197dfed0c655148af31e4eb? And may cc stable.
>>
>> Sure.
>>
>> Fixes: 1da2f328fa64 (“mm,thp,compaction,cma: allow THP migration for CMA allocations”)
>>
>> stable cc'ed.
>
> A think a cc:stable really requires a description of the end-user
> visible effects of the bug.  Could you please provide that?

Sure.

For example, in a system with 16GB memory and an 8GB CMA region reserved by hugetlb_cma,
if we first allocate 10GB THPs and mlock them (so some THPs are allocated in the CMA
region and mlocked), reserving 6 1GB hugetlb pages via
/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages will get stuck (looping
in too_many_isolated function) until we kill either task. With the patch applied,
oom will kill the application with 10GB THPs and let hugetlb page reservation finish.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-29 20:04 [PATCH] mm/compaction: count pages and stop correctly during page isolation Zi Yan
  2020-10-29 21:14 ` Yang Shi
@ 2020-10-30  9:43 ` Michal Hocko
  2020-10-30 12:20   ` Zi Yan
  2020-10-30 14:50 ` Vlastimil Babka
  2 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2020-10-30  9:43 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, linux-mm, Rik van Riel, linux-kernel, Vlastimil Babka

[Cc Vlastimil]

On Thu 29-10-20 16:04:35, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> able to isolate compound pages, nr_migratepages and nr_isolated did not
> count compound pages correctly, causing us to isolate more pages than we
> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
> in too_many_isolated while loop, since the actual isolated pages can go
> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
> since we stop isolation after cc->nr_migratepages reaches to
> COMPACT_CLUSTER_MAX.
> 
> In addition, after we fix the issue above, cc->nr_migratepages could
> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> thus page isolation could not stop as we intended. Change the isolation
> stop condition to >=.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/compaction.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ee1f8439369e..0683a4999581 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  
>  isolate_success:
>  		list_add(&page->lru, &cc->migratepages);
> -		cc->nr_migratepages++;
> -		nr_isolated++;
> +		cc->nr_migratepages += thp_nr_pages(page);
> +		nr_isolated += thp_nr_pages(page);

Does thp_nr_pages work for __PageMovable pages?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30  9:43 ` Michal Hocko
@ 2020-10-30 12:20   ` Zi Yan
  2020-10-30 13:36     ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2020-10-30 12:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, Rik van Riel, linux-kernel, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

On 30 Oct 2020, at 5:43, Michal Hocko wrote:

> [Cc Vlastimil]
>
> On Thu 29-10-20 16:04:35, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
>> able to isolate compound pages, nr_migratepages and nr_isolated did not
>> count compound pages correctly, causing us to isolate more pages than we
>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
>> in too_many_isolated while loop, since the actual isolated pages can go
>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
>> since we stop isolation after cc->nr_migratepages reaches to
>> COMPACT_CLUSTER_MAX.
>>
>> In addition, after we fix the issue above, cc->nr_migratepages could
>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
>> thus page isolation could not stop as we intended. Change the isolation
>> stop condition to >=.
>>
>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> ---
>>  mm/compaction.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index ee1f8439369e..0683a4999581 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>
>>  isolate_success:
>>  		list_add(&page->lru, &cc->migratepages);
>> -		cc->nr_migratepages++;
>> -		nr_isolated++;
>> +		cc->nr_migratepages += thp_nr_pages(page);
>> +		nr_isolated += thp_nr_pages(page);
>
> Does thp_nr_pages work for __PageMovable pages?

Yes. It is the same as compound_nr() but compiled
to 1 when THP is not enabled.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 12:20   ` Zi Yan
@ 2020-10-30 13:36     ` Michal Hocko
  2020-10-30 14:35       ` Zi Yan
  2020-10-30 18:33       ` Yang Shi
  0 siblings, 2 replies; 19+ messages in thread
From: Michal Hocko @ 2020-10-30 13:36 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, linux-mm, Rik van Riel, linux-kernel, Vlastimil Babka

On Fri 30-10-20 08:20:50, Zi Yan wrote:
> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
> 
> > [Cc Vlastimil]
> >
> > On Thu 29-10-20 16:04:35, Zi Yan wrote:
> >> From: Zi Yan <ziy@nvidia.com>
> >>
> >> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> >> able to isolate compound pages, nr_migratepages and nr_isolated did not
> >> count compound pages correctly, causing us to isolate more pages than we
> >> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
> >> in too_many_isolated while loop, since the actual isolated pages can go
> >> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
> >> since we stop isolation after cc->nr_migratepages reaches to
> >> COMPACT_CLUSTER_MAX.
> >>
> >> In addition, after we fix the issue above, cc->nr_migratepages could
> >> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> >> thus page isolation could not stop as we intended. Change the isolation
> >> stop condition to >=.
> >>
> >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >> ---
> >>  mm/compaction.c | 8 ++++----
> >>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/mm/compaction.c b/mm/compaction.c
> >> index ee1f8439369e..0683a4999581 100644
> >> --- a/mm/compaction.c
> >> +++ b/mm/compaction.c
> >> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >>
> >>  isolate_success:
> >>  		list_add(&page->lru, &cc->migratepages);
> >> -		cc->nr_migratepages++;
> >> -		nr_isolated++;
> >> +		cc->nr_migratepages += thp_nr_pages(page);
> >> +		nr_isolated += thp_nr_pages(page);
> >
> > Does thp_nr_pages work for __PageMovable pages?
> 
> Yes. It is the same as compound_nr() but compiled
> to 1 when THP is not enabled.

I am sorry but I do not follow. First of all the implementation of the
two is different and also I was asking about __PageMovable which should
never be THP IIRC. Can they be compound though?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 13:36     ` Michal Hocko
@ 2020-10-30 14:35       ` Zi Yan
  2020-10-30 14:49         ` Michal Hocko
  2020-10-30 18:33       ` Yang Shi
  1 sibling, 1 reply; 19+ messages in thread
From: Zi Yan @ 2020-10-30 14:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, Rik van Riel, linux-kernel, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 2511 bytes --]

On 30 Oct 2020, at 9:36, Michal Hocko wrote:

> On Fri 30-10-20 08:20:50, Zi Yan wrote:
>> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
>>
>>> [Cc Vlastimil]
>>>
>>> On Thu 29-10-20 16:04:35, Zi Yan wrote:
>>>> From: Zi Yan <ziy@nvidia.com>
>>>>
>>>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
>>>> able to isolate compound pages, nr_migratepages and nr_isolated did not
>>>> count compound pages correctly, causing us to isolate more pages than we
>>>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
>>>> in too_many_isolated while loop, since the actual isolated pages can go
>>>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
>>>> since we stop isolation after cc->nr_migratepages reaches to
>>>> COMPACT_CLUSTER_MAX.
>>>>
>>>> In addition, after we fix the issue above, cc->nr_migratepages could
>>>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
>>>> thus page isolation could not stop as we intended. Change the isolation
>>>> stop condition to >=.
>>>>
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>> ---
>>>>  mm/compaction.c | 8 ++++----
>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index ee1f8439369e..0683a4999581 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>
>>>>  isolate_success:
>>>>  		list_add(&page->lru, &cc->migratepages);
>>>> -		cc->nr_migratepages++;
>>>> -		nr_isolated++;
>>>> +		cc->nr_migratepages += thp_nr_pages(page);
>>>> +		nr_isolated += thp_nr_pages(page);
>>>
>>> Does thp_nr_pages work for __PageMovable pages?
>>
>> Yes. It is the same as compound_nr() but compiled
>> to 1 when THP is not enabled.
>
> I am sorry but I do not follow. First of all the implementation of the
> two is different and also I was asking about __PageMovable which should
> never be THP IIRC. Can they be compound though?

__PageMovable, non-lru movable pages, can be compound and thp_nr_page cannot
be used for it, since when THP is off, thp_nr_page will return the wrong number.
I got confused by its name, sorry.

But __PageMovable is irrelevant to this patch, since we are using
__isolate_lru_page to isolate pages. non-lru __PageMovable should not appear
after isolate_succes. thp_nr_pages can be used here.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 14:35       ` Zi Yan
@ 2020-10-30 14:49         ` Michal Hocko
  2020-10-30 14:53           ` Zi Yan
  2020-10-30 14:55           ` Vlastimil Babka
  0 siblings, 2 replies; 19+ messages in thread
From: Michal Hocko @ 2020-10-30 14:49 UTC (permalink / raw)
  To: Zi Yan
  Cc: Andrew Morton, linux-mm, Rik van Riel, linux-kernel, Vlastimil Babka

On Fri 30-10-20 10:35:43, Zi Yan wrote:
> On 30 Oct 2020, at 9:36, Michal Hocko wrote:
> 
> > On Fri 30-10-20 08:20:50, Zi Yan wrote:
> >> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
> >>
> >>> [Cc Vlastimil]
> >>>
> >>> On Thu 29-10-20 16:04:35, Zi Yan wrote:
> >>>> From: Zi Yan <ziy@nvidia.com>
> >>>>
> >>>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> >>>> able to isolate compound pages, nr_migratepages and nr_isolated did not
> >>>> count compound pages correctly, causing us to isolate more pages than we
> >>>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
> >>>> in too_many_isolated while loop, since the actual isolated pages can go
> >>>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
> >>>> since we stop isolation after cc->nr_migratepages reaches to
> >>>> COMPACT_CLUSTER_MAX.
> >>>>
> >>>> In addition, after we fix the issue above, cc->nr_migratepages could
> >>>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> >>>> thus page isolation could not stop as we intended. Change the isolation
> >>>> stop condition to >=.
> >>>>
> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>> ---
> >>>>  mm/compaction.c | 8 ++++----
> >>>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/mm/compaction.c b/mm/compaction.c
> >>>> index ee1f8439369e..0683a4999581 100644
> >>>> --- a/mm/compaction.c
> >>>> +++ b/mm/compaction.c
> >>>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >>>>
> >>>>  isolate_success:
> >>>>  		list_add(&page->lru, &cc->migratepages);
> >>>> -		cc->nr_migratepages++;
> >>>> -		nr_isolated++;
> >>>> +		cc->nr_migratepages += thp_nr_pages(page);
> >>>> +		nr_isolated += thp_nr_pages(page);
> >>>
> >>> Does thp_nr_pages work for __PageMovable pages?
> >>
> >> Yes. It is the same as compound_nr() but compiled
> >> to 1 when THP is not enabled.
> >
> > I am sorry but I do not follow. First of all the implementation of the
> > two is different and also I was asking about __PageMovable which should
> > never be THP IIRC. Can they be compound though?
> 
> __PageMovable, non-lru movable pages, can be compound and thp_nr_page cannot
> be used for it, since when THP is off, thp_nr_page will return the wrong number.
> I got confused by its name, sorry.

OK, this matches my understanding. Good we are on the same page.

> But __PageMovable is irrelevant to this patch, since we are using
> __isolate_lru_page to isolate pages. non-lru __PageMovable should not appear
> after isolate_succes. thp_nr_pages can be used here.

But this is still not clear to me. __PageMovable pages are isolated by
isolate_movable_page and then jump to isolate_succes. Does that somehow
changes the nature of the page being compound or tat thp_nr_page would
start working on those pages.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-29 20:04 [PATCH] mm/compaction: count pages and stop correctly during page isolation Zi Yan
  2020-10-29 21:14 ` Yang Shi
  2020-10-30  9:43 ` Michal Hocko
@ 2020-10-30 14:50 ` Vlastimil Babka
  2020-10-30 15:18   ` Zi Yan
  2 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2020-10-30 14:50 UTC (permalink / raw)
  To: Zi Yan, Andrew Morton, linux-mm; +Cc: Rik van Riel, linux-kernel

On 10/29/20 9:04 PM, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
> 
> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> able to isolate compound pages, nr_migratepages and nr_isolated did not
> count compound pages correctly, causing us to isolate more pages than we
> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
> in too_many_isolated while loop, since the actual isolated pages can go
> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
> since we stop isolation after cc->nr_migratepages reaches to
> COMPACT_CLUSTER_MAX.

I wonder if a better fix would be to adjust the too_many_isolated() check so 
that if we have non-zero cc->nr_migratepages, we bail out from further isolation 
and migrate what we have immediately, instead of looping.

Because I can also imagine a hypothetical situation where multiple threads in 
parallel cause too_many_isolated() to be true, and will all loop there forever. 
The proposed fix should prevent such situation as well, AFAICT.

> In addition, after we fix the issue above, cc->nr_migratepages could
> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> thus page isolation could not stop as we intended. Change the isolation
> stop condition to >=.
> 
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>   mm/compaction.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ee1f8439369e..0683a4999581 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   
>   isolate_success:
>   		list_add(&page->lru, &cc->migratepages);
> -		cc->nr_migratepages++;
> -		nr_isolated++;
> +		cc->nr_migratepages += thp_nr_pages(page);
> +		nr_isolated += thp_nr_pages(page);
>   
>   		/*
>   		 * Avoid isolating too much unless this block is being
> @@ -1021,7 +1021,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>   		 * or a lock is contended. For contention, isolate quickly to
>   		 * potentially remove one source of contention.
>   		 */
> -		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX &&
> +		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX &&
>   		    !cc->rescan && !cc->contended) {
>   			++low_pfn;
>   			break;
> @@ -1132,7 +1132,7 @@ isolate_migratepages_range(struct compact_control *cc, unsigned long start_pfn,
>   		if (!pfn)
>   			break;
>   
> -		if (cc->nr_migratepages == COMPACT_CLUSTER_MAX)
> +		if (cc->nr_migratepages >= COMPACT_CLUSTER_MAX)
>   			break;
>   	}
>   
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 14:49         ` Michal Hocko
@ 2020-10-30 14:53           ` Zi Yan
  2020-10-30 14:55           ` Vlastimil Babka
  1 sibling, 0 replies; 19+ messages in thread
From: Zi Yan @ 2020-10-30 14:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, Rik van Riel, linux-kernel, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 3199 bytes --]

On 30 Oct 2020, at 10:49, Michal Hocko wrote:

> On Fri 30-10-20 10:35:43, Zi Yan wrote:
>> On 30 Oct 2020, at 9:36, Michal Hocko wrote:
>>
>>> On Fri 30-10-20 08:20:50, Zi Yan wrote:
>>>> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
>>>>
>>>>> [Cc Vlastimil]
>>>>>
>>>>> On Thu 29-10-20 16:04:35, Zi Yan wrote:
>>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>>
>>>>>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
>>>>>> able to isolate compound pages, nr_migratepages and nr_isolated did not
>>>>>> count compound pages correctly, causing us to isolate more pages than we
>>>>>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
>>>>>> in too_many_isolated while loop, since the actual isolated pages can go
>>>>>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
>>>>>> since we stop isolation after cc->nr_migratepages reaches to
>>>>>> COMPACT_CLUSTER_MAX.
>>>>>>
>>>>>> In addition, after we fix the issue above, cc->nr_migratepages could
>>>>>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
>>>>>> thus page isolation could not stop as we intended. Change the isolation
>>>>>> stop condition to >=.
>>>>>>
>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>> ---
>>>>>>  mm/compaction.c | 8 ++++----
>>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>> index ee1f8439369e..0683a4999581 100644
>>>>>> --- a/mm/compaction.c
>>>>>> +++ b/mm/compaction.c
>>>>>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>>>
>>>>>>  isolate_success:
>>>>>>  		list_add(&page->lru, &cc->migratepages);
>>>>>> -		cc->nr_migratepages++;
>>>>>> -		nr_isolated++;
>>>>>> +		cc->nr_migratepages += thp_nr_pages(page);
>>>>>> +		nr_isolated += thp_nr_pages(page);
>>>>>
>>>>> Does thp_nr_pages work for __PageMovable pages?
>>>>
>>>> Yes. It is the same as compound_nr() but compiled
>>>> to 1 when THP is not enabled.
>>>
>>> I am sorry but I do not follow. First of all the implementation of the
>>> two is different and also I was asking about __PageMovable which should
>>> never be THP IIRC. Can they be compound though?
>>
>> __PageMovable, non-lru movable pages, can be compound and thp_nr_page cannot
>> be used for it, since when THP is off, thp_nr_page will return the wrong number.
>> I got confused by its name, sorry.
>
> OK, this matches my understanding. Good we are on the same page.
>
>> But __PageMovable is irrelevant to this patch, since we are using
>> __isolate_lru_page to isolate pages. non-lru __PageMovable should not appear
>> after isolate_succes. thp_nr_pages can be used here.
>
> But this is still not clear to me. __PageMovable pages are isolated by
> isolate_movable_page and then jump to isolate_succes. Does that somehow
> changes the nature of the page being compound or tat thp_nr_page would
> start working on those pages.

Ah, I missed that part of the code. If __PageMovable can reach the code, we
should use compound_nr instead. Will send v2 to fix it. Thanks.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 14:49         ` Michal Hocko
  2020-10-30 14:53           ` Zi Yan
@ 2020-10-30 14:55           ` Vlastimil Babka
  1 sibling, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2020-10-30 14:55 UTC (permalink / raw)
  To: Michal Hocko, Zi Yan; +Cc: Andrew Morton, linux-mm, Rik van Riel, linux-kernel

On 10/30/20 3:49 PM, Michal Hocko wrote:
> On Fri 30-10-20 10:35:43, Zi Yan wrote:
>> On 30 Oct 2020, at 9:36, Michal Hocko wrote:
>> 
>> > On Fri 30-10-20 08:20:50, Zi Yan wrote:
>> >> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
>> >>
>> >>> [Cc Vlastimil]
>> >>>
>> >>> On Thu 29-10-20 16:04:35, Zi Yan wrote:
>> >>>
>> >>> Does thp_nr_pages work for __PageMovable pages?
>> >>
>> >> Yes. It is the same as compound_nr() but compiled
>> >> to 1 when THP is not enabled.
>> >
>> > I am sorry but I do not follow. First of all the implementation of the
>> > two is different and also I was asking about __PageMovable which should
>> > never be THP IIRC. Can they be compound though?
>> 
>> __PageMovable, non-lru movable pages, can be compound and thp_nr_page cannot
>> be used for it, since when THP is off, thp_nr_page will return the wrong number.
>> I got confused by its name, sorry.
> 
> OK, this matches my understanding. Good we are on the same page.
> 
>> But __PageMovable is irrelevant to this patch, since we are using
>> __isolate_lru_page to isolate pages. non-lru __PageMovable should not appear
>> after isolate_succes. thp_nr_pages can be used here.
> 
> But this is still not clear to me. __PageMovable pages are isolated by
> isolate_movable_page and then jump to isolate_succes. Does that somehow
> changes the nature of the page being compound or tat thp_nr_page would
> start working on those pages.

Agreed that page movable can appear after isolate_success. compound_nr() should 
work for both.
Note that too_many_isolated() doesn't see __PageMovable isolated pages, as they 
are not counted as NR_ISOLATED_FILE/NR_ISOLATED_ANON, AFAIK. So in that sense 
they are irrelevant to the bug at hand... for now.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 14:50 ` Vlastimil Babka
@ 2020-10-30 15:18   ` Zi Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Zi Yan @ 2020-10-30 15:18 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: Andrew Morton, linux-mm, Rik van Riel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2336 bytes --]

On 30 Oct 2020, at 10:50, Vlastimil Babka wrote:

> On 10/29/20 9:04 PM, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
>> able to isolate compound pages, nr_migratepages and nr_isolated did not
>> count compound pages correctly, causing us to isolate more pages than we
>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
>> in too_many_isolated while loop, since the actual isolated pages can go
>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
>> since we stop isolation after cc->nr_migratepages reaches to
>> COMPACT_CLUSTER_MAX.
>
> I wonder if a better fix would be to adjust the too_many_isolated() check so that if we have non-zero cc->nr_migratepages, we bail out from further isolation and migrate what we have immediately, instead of looping.

I just tested your fix and it works too. The difference is that with
your fix alloc_contig_range will fail quickly without killing the user
application mlocking THPs in the CMA region (for more context, please
see my other email to Andrew explaining how to reproduce in userspace),
whereas my fix will oom the user application and make alloc_contig_range
successful at the end.

Anyway, I will add your fix below and send v2:

diff --git a/mm/compaction.c b/mm/compaction.c
index ee1f8439369e..8fa11637ccfd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -817,6 +817,9 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
         * delay for some time until fewer pages are isolated
         */
        while (unlikely(too_many_isolated(pgdat))) {
+               /* stop isolation if there are still pages not migrated */
+               if (cc->nr_migratepages)
+                       return 0;
                /* async migration should just abort */
                if (cc->mode == MIGRATE_ASYNC)
                        return 0;

>
> Because I can also imagine a hypothetical situation where multiple threads in parallel cause too_many_isolated() to be true, and will all loop there forever. The proposed fix should prevent such situation as well, AFAICT.

Yes. oom01 from ltp tests the multi-threaded situation and my fix works there too.


—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 13:36     ` Michal Hocko
  2020-10-30 14:35       ` Zi Yan
@ 2020-10-30 18:33       ` Yang Shi
  2020-10-30 18:39         ` Zi Yan
  1 sibling, 1 reply; 19+ messages in thread
From: Yang Shi @ 2020-10-30 18:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Zi Yan, Andrew Morton, Linux MM, Rik van Riel,
	Linux Kernel Mailing List, Vlastimil Babka

On Fri, Oct 30, 2020 at 6:36 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 30-10-20 08:20:50, Zi Yan wrote:
> > On 30 Oct 2020, at 5:43, Michal Hocko wrote:
> >
> > > [Cc Vlastimil]
> > >
> > > On Thu 29-10-20 16:04:35, Zi Yan wrote:
> > >> From: Zi Yan <ziy@nvidia.com>
> > >>
> > >> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> > >> able to isolate compound pages, nr_migratepages and nr_isolated did not
> > >> count compound pages correctly, causing us to isolate more pages than we
> > >> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
> > >> in too_many_isolated while loop, since the actual isolated pages can go
> > >> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
> > >> since we stop isolation after cc->nr_migratepages reaches to
> > >> COMPACT_CLUSTER_MAX.
> > >>
> > >> In addition, after we fix the issue above, cc->nr_migratepages could
> > >> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> > >> thus page isolation could not stop as we intended. Change the isolation
> > >> stop condition to >=.
> > >>
> > >> Signed-off-by: Zi Yan <ziy@nvidia.com>
> > >> ---
> > >>  mm/compaction.c | 8 ++++----
> > >>  1 file changed, 4 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/mm/compaction.c b/mm/compaction.c
> > >> index ee1f8439369e..0683a4999581 100644
> > >> --- a/mm/compaction.c
> > >> +++ b/mm/compaction.c
> > >> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > >>
> > >>  isolate_success:
> > >>            list_add(&page->lru, &cc->migratepages);
> > >> -          cc->nr_migratepages++;
> > >> -          nr_isolated++;
> > >> +          cc->nr_migratepages += thp_nr_pages(page);
> > >> +          nr_isolated += thp_nr_pages(page);
> > >
> > > Does thp_nr_pages work for __PageMovable pages?
> >
> > Yes. It is the same as compound_nr() but compiled
> > to 1 when THP is not enabled.
>
> I am sorry but I do not follow. First of all the implementation of the
> two is different and also I was asking about __PageMovable which should
> never be THP IIRC. Can they be compound though?

I have the same question, can they be compound? If they can be
compound, PageTransHuge() can't tell from THP and compound movable
page, right?

>
> --
> Michal Hocko
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 18:33       ` Yang Shi
@ 2020-10-30 18:39         ` Zi Yan
  2020-10-30 18:55           ` Yang Shi
  0 siblings, 1 reply; 19+ messages in thread
From: Zi Yan @ 2020-10-30 18:39 UTC (permalink / raw)
  To: Yang Shi
  Cc: Michal Hocko, Andrew Morton, Linux MM, Rik van Riel,
	Linux Kernel Mailing List, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]

On 30 Oct 2020, at 14:33, Yang Shi wrote:

> On Fri, Oct 30, 2020 at 6:36 AM Michal Hocko <mhocko@suse.com> wrote:
>>
>> On Fri 30-10-20 08:20:50, Zi Yan wrote:
>>> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
>>>
>>>> [Cc Vlastimil]
>>>>
>>>> On Thu 29-10-20 16:04:35, Zi Yan wrote:
>>>>> From: Zi Yan <ziy@nvidia.com>
>>>>>
>>>>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
>>>>> able to isolate compound pages, nr_migratepages and nr_isolated did not
>>>>> count compound pages correctly, causing us to isolate more pages than we
>>>>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
>>>>> in too_many_isolated while loop, since the actual isolated pages can go
>>>>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
>>>>> since we stop isolation after cc->nr_migratepages reaches to
>>>>> COMPACT_CLUSTER_MAX.
>>>>>
>>>>> In addition, after we fix the issue above, cc->nr_migratepages could
>>>>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
>>>>> thus page isolation could not stop as we intended. Change the isolation
>>>>> stop condition to >=.
>>>>>
>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>> ---
>>>>>  mm/compaction.c | 8 ++++----
>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index ee1f8439369e..0683a4999581 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>>
>>>>>  isolate_success:
>>>>>            list_add(&page->lru, &cc->migratepages);
>>>>> -          cc->nr_migratepages++;
>>>>> -          nr_isolated++;
>>>>> +          cc->nr_migratepages += thp_nr_pages(page);
>>>>> +          nr_isolated += thp_nr_pages(page);
>>>>
>>>> Does thp_nr_pages work for __PageMovable pages?
>>>
>>> Yes. It is the same as compound_nr() but compiled
>>> to 1 when THP is not enabled.
>>
>> I am sorry but I do not follow. First of all the implementation of the
>> two is different and also I was asking about __PageMovable which should
>> never be THP IIRC. Can they be compound though?
>
> I have the same question, can they be compound? If they can be
> compound, PageTransHuge() can't tell from THP and compound movable
> page, right?

Right. I have updated the patch and use compound_nr instead.

—
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 18:39         ` Zi Yan
@ 2020-10-30 18:55           ` Yang Shi
  2020-11-02 13:03             ` Vlastimil Babka
  0 siblings, 1 reply; 19+ messages in thread
From: Yang Shi @ 2020-10-30 18:55 UTC (permalink / raw)
  To: Zi Yan
  Cc: Michal Hocko, Andrew Morton, Linux MM, Rik van Riel,
	Linux Kernel Mailing List, Vlastimil Babka

On Fri, Oct 30, 2020 at 11:39 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 30 Oct 2020, at 14:33, Yang Shi wrote:
>
> > On Fri, Oct 30, 2020 at 6:36 AM Michal Hocko <mhocko@suse.com> wrote:
> >>
> >> On Fri 30-10-20 08:20:50, Zi Yan wrote:
> >>> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
> >>>
> >>>> [Cc Vlastimil]
> >>>>
> >>>> On Thu 29-10-20 16:04:35, Zi Yan wrote:
> >>>>> From: Zi Yan <ziy@nvidia.com>
> >>>>>
> >>>>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> >>>>> able to isolate compound pages, nr_migratepages and nr_isolated did not
> >>>>> count compound pages correctly, causing us to isolate more pages than we
> >>>>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
> >>>>> in too_many_isolated while loop, since the actual isolated pages can go
> >>>>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
> >>>>> since we stop isolation after cc->nr_migratepages reaches to
> >>>>> COMPACT_CLUSTER_MAX.
> >>>>>
> >>>>> In addition, after we fix the issue above, cc->nr_migratepages could
> >>>>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> >>>>> thus page isolation could not stop as we intended. Change the isolation
> >>>>> stop condition to >=.
> >>>>>
> >>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>>> ---
> >>>>>  mm/compaction.c | 8 ++++----
> >>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/mm/compaction.c b/mm/compaction.c
> >>>>> index ee1f8439369e..0683a4999581 100644
> >>>>> --- a/mm/compaction.c
> >>>>> +++ b/mm/compaction.c
> >>>>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >>>>>
> >>>>>  isolate_success:
> >>>>>            list_add(&page->lru, &cc->migratepages);
> >>>>> -          cc->nr_migratepages++;
> >>>>> -          nr_isolated++;
> >>>>> +          cc->nr_migratepages += thp_nr_pages(page);
> >>>>> +          nr_isolated += thp_nr_pages(page);
> >>>>
> >>>> Does thp_nr_pages work for __PageMovable pages?
> >>>
> >>> Yes. It is the same as compound_nr() but compiled
> >>> to 1 when THP is not enabled.
> >>
> >> I am sorry but I do not follow. First of all the implementation of the
> >> two is different and also I was asking about __PageMovable which should
> >> never be THP IIRC. Can they be compound though?
> >
> > I have the same question, can they be compound? If they can be
> > compound, PageTransHuge() can't tell from THP and compound movable
> > page, right?
>
> Right. I have updated the patch and use compound_nr instead.

Thanks. Actually I'm wondering what kind of movable page could be
compound. Any real examples?

>
> —
> Best Regards,
> Yan Zi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-10-30 18:55           ` Yang Shi
@ 2020-11-02 13:03             ` Vlastimil Babka
  2020-11-02 16:39               ` Yang Shi
  0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2020-11-02 13:03 UTC (permalink / raw)
  To: Yang Shi, Zi Yan
  Cc: Michal Hocko, Andrew Morton, Linux MM, Rik van Riel,
	Linux Kernel Mailing List, Minchan Kim

On 10/30/20 7:55 PM, Yang Shi wrote:
> On Fri, Oct 30, 2020 at 11:39 AM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 30 Oct 2020, at 14:33, Yang Shi wrote:
>>
>> > On Fri, Oct 30, 2020 at 6:36 AM Michal Hocko <mhocko@suse.com> wrote:
>> >>
>> >> On Fri 30-10-20 08:20:50, Zi Yan wrote:
>> >>> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
>> >>>
>> >>>> [Cc Vlastimil]
>> >>>>
>> >>>> On Thu 29-10-20 16:04:35, Zi Yan wrote:
>> >>>>> From: Zi Yan <ziy@nvidia.com>
>> >>>>>
>> >>>>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
>> >>>>> able to isolate compound pages, nr_migratepages and nr_isolated did not
>> >>>>> count compound pages correctly, causing us to isolate more pages than we
>> >>>>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
>> >>>>> in too_many_isolated while loop, since the actual isolated pages can go
>> >>>>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
>> >>>>> since we stop isolation after cc->nr_migratepages reaches to
>> >>>>> COMPACT_CLUSTER_MAX.
>> >>>>>
>> >>>>> In addition, after we fix the issue above, cc->nr_migratepages could
>> >>>>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
>> >>>>> thus page isolation could not stop as we intended. Change the isolation
>> >>>>> stop condition to >=.
>> >>>>>
>> >>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>> >>>>> ---
>> >>>>>  mm/compaction.c | 8 ++++----
>> >>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>> >>>>>
>> >>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>> >>>>> index ee1f8439369e..0683a4999581 100644
>> >>>>> --- a/mm/compaction.c
>> >>>>> +++ b/mm/compaction.c
>> >>>>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> >>>>>
>> >>>>>  isolate_success:
>> >>>>>            list_add(&page->lru, &cc->migratepages);
>> >>>>> -          cc->nr_migratepages++;
>> >>>>> -          nr_isolated++;
>> >>>>> +          cc->nr_migratepages += thp_nr_pages(page);
>> >>>>> +          nr_isolated += thp_nr_pages(page);
>> >>>>
>> >>>> Does thp_nr_pages work for __PageMovable pages?
>> >>>
>> >>> Yes. It is the same as compound_nr() but compiled
>> >>> to 1 when THP is not enabled.
>> >>
>> >> I am sorry but I do not follow. First of all the implementation of the
>> >> two is different and also I was asking about __PageMovable which should
>> >> never be THP IIRC. Can they be compound though?
>> >
>> > I have the same question, can they be compound? If they can be
>> > compound, PageTransHuge() can't tell from THP and compound movable
>> > page, right?
>>
>> Right. I have updated the patch and use compound_nr instead.
> 
> Thanks. Actually I'm wondering what kind of movable page could be
> compound. Any real examples?

Looks like there's currently none. Compaction also wouldn't work properly with 
movable pages with order>0 as the free page scanner looks for order-0 pages 
only. But it won't hurt to use compound_nr() anyway.

>>
>> —
>> Best Regards,
>> Yan Zi
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: count pages and stop correctly during page isolation.
  2020-11-02 13:03             ` Vlastimil Babka
@ 2020-11-02 16:39               ` Yang Shi
  0 siblings, 0 replies; 19+ messages in thread
From: Yang Shi @ 2020-11-02 16:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Zi Yan, Michal Hocko, Andrew Morton, Linux MM, Rik van Riel,
	Linux Kernel Mailing List, Minchan Kim

On Mon, Nov 2, 2020 at 5:03 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 10/30/20 7:55 PM, Yang Shi wrote:
> > On Fri, Oct 30, 2020 at 11:39 AM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> On 30 Oct 2020, at 14:33, Yang Shi wrote:
> >>
> >> > On Fri, Oct 30, 2020 at 6:36 AM Michal Hocko <mhocko@suse.com> wrote:
> >> >>
> >> >> On Fri 30-10-20 08:20:50, Zi Yan wrote:
> >> >>> On 30 Oct 2020, at 5:43, Michal Hocko wrote:
> >> >>>
> >> >>>> [Cc Vlastimil]
> >> >>>>
> >> >>>> On Thu 29-10-20 16:04:35, Zi Yan wrote:
> >> >>>>> From: Zi Yan <ziy@nvidia.com>
> >> >>>>>
> >> >>>>> In isolate_migratepages_block, when cc->alloc_contig is true, we are
> >> >>>>> able to isolate compound pages, nr_migratepages and nr_isolated did not
> >> >>>>> count compound pages correctly, causing us to isolate more pages than we
> >> >>>>> thought. Use thp_nr_pages to count pages. Otherwise, we might be trapped
> >> >>>>> in too_many_isolated while loop, since the actual isolated pages can go
> >> >>>>> up to COMPACT_CLUSTER_MAX*512=16384, where COMPACT_CLUSTER_MAX is 32,
> >> >>>>> since we stop isolation after cc->nr_migratepages reaches to
> >> >>>>> COMPACT_CLUSTER_MAX.
> >> >>>>>
> >> >>>>> In addition, after we fix the issue above, cc->nr_migratepages could
> >> >>>>> never be equal to COMPACT_CLUSTER_MAX if compound pages are isolated,
> >> >>>>> thus page isolation could not stop as we intended. Change the isolation
> >> >>>>> stop condition to >=.
> >> >>>>>
> >> >>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >> >>>>> ---
> >> >>>>>  mm/compaction.c | 8 ++++----
> >> >>>>>  1 file changed, 4 insertions(+), 4 deletions(-)
> >> >>>>>
> >> >>>>> diff --git a/mm/compaction.c b/mm/compaction.c
> >> >>>>> index ee1f8439369e..0683a4999581 100644
> >> >>>>> --- a/mm/compaction.c
> >> >>>>> +++ b/mm/compaction.c
> >> >>>>> @@ -1012,8 +1012,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >> >>>>>
> >> >>>>>  isolate_success:
> >> >>>>>            list_add(&page->lru, &cc->migratepages);
> >> >>>>> -          cc->nr_migratepages++;
> >> >>>>> -          nr_isolated++;
> >> >>>>> +          cc->nr_migratepages += thp_nr_pages(page);
> >> >>>>> +          nr_isolated += thp_nr_pages(page);
> >> >>>>
> >> >>>> Does thp_nr_pages work for __PageMovable pages?
> >> >>>
> >> >>> Yes. It is the same as compound_nr() but compiled
> >> >>> to 1 when THP is not enabled.
> >> >>
> >> >> I am sorry but I do not follow. First of all the implementation of the
> >> >> two is different and also I was asking about __PageMovable which should
> >> >> never be THP IIRC. Can they be compound though?
> >> >
> >> > I have the same question, can they be compound? If they can be
> >> > compound, PageTransHuge() can't tell from THP and compound movable
> >> > page, right?
> >>
> >> Right. I have updated the patch and use compound_nr instead.
> >
> > Thanks. Actually I'm wondering what kind of movable page could be
> > compound. Any real examples?
>
> Looks like there's currently none. Compaction also wouldn't work properly with
> movable pages with order>0 as the free page scanner looks for order-0 pages
> only. But it won't hurt to use compound_nr() anyway.

Thanks, yes this is what I thought otherwise we have troubles in
migration path too.

>
> >>
> >> —
> >> Best Regards,
> >> Yan Zi
> >
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-11-02 16:39 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-29 20:04 [PATCH] mm/compaction: count pages and stop correctly during page isolation Zi Yan
2020-10-29 21:14 ` Yang Shi
2020-10-29 21:31   ` Zi Yan
2020-10-30  0:28     ` Andrew Morton
2020-10-30  1:20       ` Zi Yan
2020-10-30  9:43 ` Michal Hocko
2020-10-30 12:20   ` Zi Yan
2020-10-30 13:36     ` Michal Hocko
2020-10-30 14:35       ` Zi Yan
2020-10-30 14:49         ` Michal Hocko
2020-10-30 14:53           ` Zi Yan
2020-10-30 14:55           ` Vlastimil Babka
2020-10-30 18:33       ` Yang Shi
2020-10-30 18:39         ` Zi Yan
2020-10-30 18:55           ` Yang Shi
2020-11-02 13:03             ` Vlastimil Babka
2020-11-02 16:39               ` Yang Shi
2020-10-30 14:50 ` Vlastimil Babka
2020-10-30 15:18   ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).