* Re: [PATCH 0/9] Fixes and cleanups to compaction
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
@ 2023-08-05 3:14 ` Matthew Wilcox
2023-08-05 4:07 ` Kemeng Shi
2023-08-05 11:07 ` [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail} Kemeng Shi
` (8 subsequent siblings)
9 siblings, 1 reply; 37+ messages in thread
From: Matthew Wilcox @ 2023-08-05 3:14 UTC (permalink / raw)
To: Kemeng Shi; +Cc: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david
On Sat, Aug 05, 2023 at 07:07:02PM +0800, Kemeng Shi wrote:
> Hi all, this is another series to do fix and clean up to compaction.
> Patch 1-2 fix and clean up freepage list operation.
> Patch 3-4 fix and clean up isolation of freepages
> Patch 7-9 factor code to check if compaction is needed for allocation
> order.
> More details can be found in respective patches. Thanks!
As with your last patch series, half of the patches are missing.
Looks like they didn't make it to lore.kernel.org either:
https://lore.kernel.org/linux-mm/20230804110454.2935878-1-shikemeng@huaweicloud.com/
https://lore.kernel.org/linux-mm/20230805110711.2975149-1-shikemeng@huaweicloud.com/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/9] Fixes and cleanups to compaction
2023-08-05 3:14 ` Matthew Wilcox
@ 2023-08-05 4:07 ` Kemeng Shi
0 siblings, 0 replies; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 4:07 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david
Hi Matthew, thanks to inform me of this. You can find full series on lore.kernel.org
all at [1] and [2].
I contacted owner-linux-mm@kvack.org and was told all patches are recived and forawrd
successfully. Then I contacted meta@public-inbox.org which is the only email I found
from help page of lore.kernel.org/linux-mm/, but no response yet.
Please let me know if there is any other ways I can get help. Thanks!
[1] https://lore.kernel.org/all/20230804110454.2935878-1-shikemeng@huaweicloud.com/
[2] https://lore.kernel.org/all/20230805110711.2975149-1-shikemeng@huaweicloud.com/
on 8/5/2023 11:14 AM, Matthew Wilcox wrote:
> On Sat, Aug 05, 2023 at 07:07:02PM +0800, Kemeng Shi wrote:
>> Hi all, this is another series to do fix and clean up to compaction.
>> Patch 1-2 fix and clean up freepage list operation.
>> Patch 3-4 fix and clean up isolation of freepages
>> Patch 7-9 factor code to check if compaction is needed for allocation
>> order.
>> More details can be found in respective patches. Thanks!
>
> As with your last patch series, half of the patches are missing.
> Looks like they didn't make it to lore.kernel.org either:
>
> https://lore.kernel.org/linux-mm/20230804110454.2935878-1-shikemeng@huaweicloud.com/
> https://lore.kernel.org/linux-mm/20230805110711.2975149-1-shikemeng@huaweicloud.com/
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail}
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
2023-08-05 3:14 ` Matthew Wilcox
@ 2023-08-05 11:07 ` Kemeng Shi
2023-08-05 17:11 ` Andrew Morton
2023-08-15 7:16 ` Baolin Wang
2023-08-05 11:07 ` [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively " Kemeng Shi
` (7 subsequent siblings)
9 siblings, 2 replies; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
The freepage is chained with buddy_list in freelist head. Use buddy_list
instead of lru to correct the list operation.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index ea61922a1619..513b1caeb4fa 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
{
LIST_HEAD(sublist);
- if (!list_is_last(freelist, &freepage->lru)) {
- list_cut_before(&sublist, freelist, &freepage->lru);
+ if (!list_is_last(freelist, &freepage->buddy_list)) {
+ list_cut_before(&sublist, freelist, &freepage->buddy_list);
list_splice_tail(&sublist, freelist);
}
}
@@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
{
LIST_HEAD(sublist);
- if (!list_is_first(freelist, &freepage->lru)) {
- list_cut_position(&sublist, freelist, &freepage->lru);
+ if (!list_is_first(freelist, &freepage->buddy_list)) {
+ list_cut_position(&sublist, freelist, &freepage->buddy_list);
list_splice_tail(&sublist, freelist);
}
}
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail}
2023-08-05 11:07 ` [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail} Kemeng Shi
@ 2023-08-05 17:11 ` Andrew Morton
2023-08-07 0:37 ` Kemeng Shi
2023-08-15 7:16 ` Baolin Wang
1 sibling, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2023-08-05 17:11 UTC (permalink / raw)
To: Kemeng Shi; +Cc: linux-mm, linux-kernel, baolin.wang, mgorman, david
On Sat, 5 Aug 2023 19:07:03 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> The freepage is chained with buddy_list in freelist head. Use buddy_list
> instead of lru to correct the list operation.
>
> ...
>
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
> {
> LIST_HEAD(sublist);
>
> - if (!list_is_last(freelist, &freepage->lru)) {
> - list_cut_before(&sublist, freelist, &freepage->lru);
> + if (!list_is_last(freelist, &freepage->buddy_list)) {
> + list_cut_before(&sublist, freelist, &freepage->buddy_list);
> list_splice_tail(&sublist, freelist);
> }
> }
> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
> {
> LIST_HEAD(sublist);
>
> - if (!list_is_first(freelist, &freepage->lru)) {
> - list_cut_position(&sublist, freelist, &freepage->lru);
> + if (!list_is_first(freelist, &freepage->buddy_list)) {
> + list_cut_position(&sublist, freelist, &freepage->buddy_list);
> list_splice_tail(&sublist, freelist);
> }
> }
This looks like a significant error. Can we speculate about the
possible runtime effects?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail}
2023-08-05 17:11 ` Andrew Morton
@ 2023-08-07 0:37 ` Kemeng Shi
0 siblings, 0 replies; 37+ messages in thread
From: Kemeng Shi @ 2023-08-07 0:37 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, baolin.wang, mgorman, david
on 8/6/2023 1:11 AM, Andrew Morton wrote:
> On Sat, 5 Aug 2023 19:07:03 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:
>
>> The freepage is chained with buddy_list in freelist head. Use buddy_list
>> instead of lru to correct the list operation.
>>
>> ...
>>
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
>> {
>> LIST_HEAD(sublist);
>>
>> - if (!list_is_last(freelist, &freepage->lru)) {
>> - list_cut_before(&sublist, freelist, &freepage->lru);
>> + if (!list_is_last(freelist, &freepage->buddy_list)) {
>> + list_cut_before(&sublist, freelist, &freepage->buddy_list);
>> list_splice_tail(&sublist, freelist);
>> }
>> }
>> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>> {
>> LIST_HEAD(sublist);
>>
>> - if (!list_is_first(freelist, &freepage->lru)) {
>> - list_cut_position(&sublist, freelist, &freepage->lru);
>> + if (!list_is_first(freelist, &freepage->buddy_list)) {
>> + list_cut_position(&sublist, freelist, &freepage->buddy_list);
>> list_splice_tail(&sublist, freelist);
>> }
>> }
>
> This looks like a significant error. Can we speculate about the
> possible runtime effects?
>
>
> It seems no runtime effects for now as lru and buddy_list share
the same memory address in a union.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail}
2023-08-05 11:07 ` [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail} Kemeng Shi
2023-08-05 17:11 ` Andrew Morton
@ 2023-08-15 7:16 ` Baolin Wang
1 sibling, 0 replies; 37+ messages in thread
From: Baolin Wang @ 2023-08-15 7:16 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> The freepage is chained with buddy_list in freelist head. Use buddy_list
> instead of lru to correct the list operation.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/compaction.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ea61922a1619..513b1caeb4fa 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
> {
> LIST_HEAD(sublist);
>
> - if (!list_is_last(freelist, &freepage->lru)) {
> - list_cut_before(&sublist, freelist, &freepage->lru);
> + if (!list_is_last(freelist, &freepage->buddy_list)) {
> + list_cut_before(&sublist, freelist, &freepage->buddy_list);
> list_splice_tail(&sublist, freelist);
> }
> }
> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
> {
> LIST_HEAD(sublist);
>
> - if (!list_is_first(freelist, &freepage->lru)) {
> - list_cut_position(&sublist, freelist, &freepage->lru);
> + if (!list_is_first(freelist, &freepage->buddy_list)) {
> + list_cut_position(&sublist, freelist, &freepage->buddy_list);
> list_splice_tail(&sublist, freelist);
> }
> }
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail}
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
2023-08-05 3:14 ` Matthew Wilcox
2023-08-05 11:07 ` [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail} Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
2023-08-15 7:49 ` Baolin Wang
2023-08-05 11:07 ` [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode Kemeng Shi
` (6 subsequent siblings)
9 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
We use move_freelist_head after list_for_each_entry_reverse to skip
recent pages. And there is no need to do actual move if all freepages
are searched in list_for_each_entry_reverse, e.g. freepage point to
first page in freelist. It's more intuitively to call list_is_first
with list entry as the first argument and list head as the second
argument to check if list entry is the first list entry instead of
call list_is_last with list entry and list head passed in reverse.
Similarly, call list_is_last in move_freelist_tail is more intuitively.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 513b1caeb4fa..fa1b100b0d10 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1395,7 +1395,7 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
{
LIST_HEAD(sublist);
- if (!list_is_last(freelist, &freepage->buddy_list)) {
+ if (!list_is_first(&freepage->buddy_list, freelist)) {
list_cut_before(&sublist, freelist, &freepage->buddy_list);
list_splice_tail(&sublist, freelist);
}
@@ -1412,7 +1412,7 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
{
LIST_HEAD(sublist);
- if (!list_is_first(freelist, &freepage->buddy_list)) {
+ if (!list_is_last(&freepage->buddy_list, freelist)) {
list_cut_position(&sublist, freelist, &freepage->buddy_list);
list_splice_tail(&sublist, freelist);
}
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail}
2023-08-05 11:07 ` [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively " Kemeng Shi
@ 2023-08-15 7:49 ` Baolin Wang
0 siblings, 0 replies; 37+ messages in thread
From: Baolin Wang @ 2023-08-15 7:49 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We use move_freelist_head after list_for_each_entry_reverse to skip
> recent pages. And there is no need to do actual move if all freepages
> are searched in list_for_each_entry_reverse, e.g. freepage point to
> first page in freelist. It's more intuitively to call list_is_first
> with list entry as the first argument and list head as the second
> argument to check if list entry is the first list entry instead of
> call list_is_last with list entry and list head passed in reverse.
>
> Similarly, call list_is_last in move_freelist_tail is more intuitively.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Make sense to me.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/compaction.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 513b1caeb4fa..fa1b100b0d10 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1395,7 +1395,7 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
> {
> LIST_HEAD(sublist);
>
> - if (!list_is_last(freelist, &freepage->buddy_list)) {
> + if (!list_is_first(&freepage->buddy_list, freelist)) {
> list_cut_before(&sublist, freelist, &freepage->buddy_list);
> list_splice_tail(&sublist, freelist);
> }
> @@ -1412,7 +1412,7 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
> {
> LIST_HEAD(sublist);
>
> - if (!list_is_first(freelist, &freepage->buddy_list)) {
> + if (!list_is_last(&freepage->buddy_list, freelist)) {
> list_cut_position(&sublist, freelist, &freepage->buddy_list);
> list_splice_tail(&sublist, freelist);
> }
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
` (2 preceding siblings ...)
2023-08-05 11:07 ` [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively " Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
2023-08-15 8:28 ` Baolin Wang
2023-08-05 11:07 ` [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range Kemeng Shi
` (5 subsequent siblings)
9 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
In strict mode, we should return 0 if there is any hole in pageblock. If
we successfully isolated pages at beginning at pageblock and then have a
bogus compound_order outside pageblock in next page. We will abort search
loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn,
we will treat it as a successful isolation in strict mode as blockpfn is
not < end_pfn and return partial isolated pages. Then
isolate_freepages_range may success unexpectly with hole in isolated
range.
This patch also removes unnecessary limit for blockpfn to go outside
by buddy page introduced in fixed commit or by stride introduced after
fixed commit. Caller could use returned blockpfn to check if full
pageblock is scanned by test if blockpfn >= end and to get next pfn to
scan inside isolate_freepages_block on demand.
Fixes: 9fcd6d2e052ee ("mm, compaction: skip compound pages by order in free scanner")
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index fa1b100b0d10..684f6e6cd8bc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -631,6 +631,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
page += (1UL << order) - 1;
nr_scanned += (1UL << order) - 1;
}
+ /*
+ * There is a tiny chance that we have read bogus
+ * compound_order(), so be careful to not go outside
+ * of the pageblock.
+ */
+ if (unlikely(blockpfn >= end_pfn))
+ blockpfn = end_pfn - 1;
+
goto isolate_fail;
}
@@ -677,17 +685,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
if (locked)
spin_unlock_irqrestore(&cc->zone->lock, flags);
- /*
- * There is a tiny chance that we have read bogus compound_order(),
- * so be careful to not go outside of the pageblock.
- */
- if (unlikely(blockpfn > end_pfn))
- blockpfn = end_pfn;
-
trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
nr_scanned, total_isolated);
- /* Record how far we have got within the block */
+ /* Record how far we have got */
*start_pfn = blockpfn;
/*
@@ -1443,7 +1444,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
/* Skip this pageblock in the future as it's full or nearly full */
- if (start_pfn == end_pfn && !cc->no_set_skip_hint)
+ if (start_pfn >= end_pfn && !cc->no_set_skip_hint)
set_pageblock_skip(page);
}
@@ -1712,7 +1713,7 @@ static void isolate_freepages(struct compact_control *cc)
block_end_pfn, freelist, stride, false);
/* Update the skip hint if the full pageblock was scanned */
- if (isolate_start_pfn == block_end_pfn)
+ if (isolate_start_pfn >= block_end_pfn)
update_pageblock_skip(cc, page, block_start_pfn -
pageblock_nr_pages);
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode
2023-08-05 11:07 ` [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode Kemeng Shi
@ 2023-08-15 8:28 ` Baolin Wang
2023-08-15 9:22 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-15 8:28 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> In strict mode, we should return 0 if there is any hole in pageblock. If
> we successfully isolated pages at beginning at pageblock and then have a
> bogus compound_order outside pageblock in next page. We will abort search
> loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn,
> we will treat it as a successful isolation in strict mode as blockpfn is
> not < end_pfn and return partial isolated pages. Then
> isolate_freepages_range may success unexpectly with hole in isolated
> range.
Yes, that can be happened.
> This patch also removes unnecessary limit for blockpfn to go outside
> by buddy page introduced in fixed commit or by stride introduced after
> fixed commit. Caller could use returned blockpfn to check if full
> pageblock is scanned by test if blockpfn >= end and to get next pfn to
> scan inside isolate_freepages_block on demand.
IMO, I don't think removing the pageblock restriction is worth it, since
it did not fix anything and will make people more confused, at least to me.
That is to say, it will be surprised that the blockpfn can go outside of
the pageblock after calling isolate_freepages_block() to just scan only
one pageblock, and I did not see in detail if this can cause other
potential problems.
> Fixes: 9fcd6d2e052ee ("mm, compaction: skip compound pages by order in free scanner")
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/compaction.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fa1b100b0d10..684f6e6cd8bc 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -631,6 +631,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> page += (1UL << order) - 1;
> nr_scanned += (1UL << order) - 1;
> }
> + /*
> + * There is a tiny chance that we have read bogus
> + * compound_order(), so be careful to not go outside
> + * of the pageblock.
> + */
> + if (unlikely(blockpfn >= end_pfn))
> + blockpfn = end_pfn - 1;
So we can just add this validation to ensure that the
isolate_freepages_block() can return 0 if failure is happened, which can
fix your problem.
> +
> goto isolate_fail;
> }
>
> @@ -677,17 +685,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> if (locked)
> spin_unlock_irqrestore(&cc->zone->lock, flags);
>
> - /*
> - * There is a tiny chance that we have read bogus compound_order(),
> - * so be careful to not go outside of the pageblock.
> - */
> - if (unlikely(blockpfn > end_pfn))
> - blockpfn = end_pfn;
> -
> trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
> nr_scanned, total_isolated);
>
> - /* Record how far we have got within the block */
> + /* Record how far we have got */
> *start_pfn = blockpfn;
>
> /*
> @@ -1443,7 +1444,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
> isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
>
> /* Skip this pageblock in the future as it's full or nearly full */
> - if (start_pfn == end_pfn && !cc->no_set_skip_hint)
> + if (start_pfn >= end_pfn && !cc->no_set_skip_hint)
> set_pageblock_skip(page);
> }
>
> @@ -1712,7 +1713,7 @@ static void isolate_freepages(struct compact_control *cc)
> block_end_pfn, freelist, stride, false);
>
> /* Update the skip hint if the full pageblock was scanned */
> - if (isolate_start_pfn == block_end_pfn)
> + if (isolate_start_pfn >= block_end_pfn)
> update_pageblock_skip(cc, page, block_start_pfn -
> pageblock_nr_pages);
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode
2023-08-15 8:28 ` Baolin Wang
@ 2023-08-15 9:22 ` Kemeng Shi
0 siblings, 0 replies; 37+ messages in thread
From: Kemeng Shi @ 2023-08-15 9:22 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/15/2023 4:28 PM, Baolin Wang wrote:
>
>
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> In strict mode, we should return 0 if there is any hole in pageblock. If
>> we successfully isolated pages at beginning at pageblock and then have a
>> bogus compound_order outside pageblock in next page. We will abort search
>> loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn,
>> we will treat it as a successful isolation in strict mode as blockpfn is
>> not < end_pfn and return partial isolated pages. Then
>> isolate_freepages_range may success unexpectly with hole in isolated
>> range.
>
> Yes, that can be happened.
>
>> This patch also removes unnecessary limit for blockpfn to go outside
>> by buddy page introduced in fixed commit or by stride introduced after
>> fixed commit. Caller could use returned blockpfn to check if full
>> pageblock is scanned by test if blockpfn >= end and to get next pfn to
>> scan inside isolate_freepages_block on demand.
>
> IMO, I don't think removing the pageblock restriction is worth it, since it did not fix anything and will make people more confused, at least to me.
>
> That is to say, it will be surprised that the blockpfn can go outside of the pageblock after calling isolate_freepages_block() to just scan only one pageblock, and I did not see in detail if this can cause other potential problems.
>
>> Fixes: 9fcd6d2e052ee ("mm, compaction: skip compound pages by order in free scanner")
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>> mm/compaction.c | 21 +++++++++++----------
>> 1 file changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index fa1b100b0d10..684f6e6cd8bc 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -631,6 +631,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>> page += (1UL << order) - 1;
>> nr_scanned += (1UL << order) - 1;
>> }
>> + /*
>> + * There is a tiny chance that we have read bogus
>> + * compound_order(), so be careful to not go outside
>> + * of the pageblock.
>> + */
>> + if (unlikely(blockpfn >= end_pfn))
>> + blockpfn = end_pfn - 1;
>
> So we can just add this validation to ensure that the isolate_freepages_block() can return 0 if failure is happened, which can fix your problem.
>
Thanks for feedback! Sure, I will do this in next version.
>> +
>> goto isolate_fail;
>> }
>> @@ -677,17 +685,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>> if (locked)
>> spin_unlock_irqrestore(&cc->zone->lock, flags);
>> - /*
>> - * There is a tiny chance that we have read bogus compound_order(),
>> - * so be careful to not go outside of the pageblock.
>> - */
>> - if (unlikely(blockpfn > end_pfn))
>> - blockpfn = end_pfn;
>> -
>> trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
>> nr_scanned, total_isolated);
>> - /* Record how far we have got within the block */
>> + /* Record how far we have got */
>> *start_pfn = blockpfn;
>> /*
>> @@ -1443,7 +1444,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
>> isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
>> /* Skip this pageblock in the future as it's full or nearly full */
>> - if (start_pfn == end_pfn && !cc->no_set_skip_hint)
>> + if (start_pfn >= end_pfn && !cc->no_set_skip_hint)
>> set_pageblock_skip(page);
>> }
>> @@ -1712,7 +1713,7 @@ static void isolate_freepages(struct compact_control *cc)
>> block_end_pfn, freelist, stride, false);
>> /* Update the skip hint if the full pageblock was scanned */
>> - if (isolate_start_pfn == block_end_pfn)
>> + if (isolate_start_pfn >= block_end_pfn)
>> update_pageblock_skip(cc, page, block_start_pfn -
>> pageblock_nr_pages);
>>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
` (3 preceding siblings ...)
2023-08-05 11:07 ` [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
2023-08-15 8:38 ` Baolin Wang
2023-08-05 11:07 ` [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable Kemeng Shi
` (4 subsequent siblings)
9 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
We call isolate_freepages_block in strict mode, continuous pages in
pageblock will be isolated if isolate_freepages_block successed.
Then pfn + isolated will point to start of next pageblock to scan
no matter how many pageblocks are isolated in isolate_freepages_block.
Use pfn + isolated as start of next pageblock to scan to simplify the
iteration.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 684f6e6cd8bc..8d7d38073d30 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
block_end_pfn = pageblock_end_pfn(pfn);
for (; pfn < end_pfn; pfn += isolated,
- block_start_pfn = block_end_pfn,
- block_end_pfn += pageblock_nr_pages) {
+ block_start_pfn = pfn,
+ block_end_pfn = pfn + pageblock_nr_pages) {
/* Protect pfn from changing by isolate_freepages_block */
unsigned long isolate_start_pfn = pfn;
- /*
- * pfn could pass the block_end_pfn if isolated freepage
- * is more than pageblock order. In this case, we adjust
- * scanning range to right one.
- */
- if (pfn >= block_end_pfn) {
- block_start_pfn = pageblock_start_pfn(pfn);
- block_end_pfn = pageblock_end_pfn(pfn);
- }
-
block_end_pfn = min(block_end_pfn, end_pfn);
if (!pageblock_pfn_to_page(block_start_pfn,
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
2023-08-05 11:07 ` [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range Kemeng Shi
@ 2023-08-15 8:38 ` Baolin Wang
2023-08-15 9:32 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-15 8:38 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We call isolate_freepages_block in strict mode, continuous pages in
> pageblock will be isolated if isolate_freepages_block successed.
> Then pfn + isolated will point to start of next pageblock to scan
> no matter how many pageblocks are isolated in isolate_freepages_block.
> Use pfn + isolated as start of next pageblock to scan to simplify the
> iteration.
IIUC, the isolate_freepages_block() can isolate high-order free pages,
which means the pfn + isolated can be larger than the block_end_pfn. So
in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in
different pageblocks, that will break pageblock_pfn_to_page().
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/compaction.c | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 684f6e6cd8bc..8d7d38073d30 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
> block_end_pfn = pageblock_end_pfn(pfn);
>
> for (; pfn < end_pfn; pfn += isolated,
> - block_start_pfn = block_end_pfn,
> - block_end_pfn += pageblock_nr_pages) {
> + block_start_pfn = pfn,
> + block_end_pfn = pfn + pageblock_nr_pages) {
> /* Protect pfn from changing by isolate_freepages_block */
> unsigned long isolate_start_pfn = pfn;
>
> - /*
> - * pfn could pass the block_end_pfn if isolated freepage
> - * is more than pageblock order. In this case, we adjust
> - * scanning range to right one.
> - */
> - if (pfn >= block_end_pfn) {
> - block_start_pfn = pageblock_start_pfn(pfn);
> - block_end_pfn = pageblock_end_pfn(pfn);
> - }
> -
> block_end_pfn = min(block_end_pfn, end_pfn);
>
> if (!pageblock_pfn_to_page(block_start_pfn,
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
2023-08-15 8:38 ` Baolin Wang
@ 2023-08-15 9:32 ` Kemeng Shi
2023-08-15 10:07 ` Baolin Wang
0 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-15 9:32 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/15/2023 4:38 PM, Baolin Wang wrote:
>
>
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> We call isolate_freepages_block in strict mode, continuous pages in
>> pageblock will be isolated if isolate_freepages_block successed.
>> Then pfn + isolated will point to start of next pageblock to scan
>> no matter how many pageblocks are isolated in isolate_freepages_block.
>> Use pfn + isolated as start of next pageblock to scan to simplify the
>> iteration.
>
> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>
In for update statement, we always update block_start_pfn to pfn and
update block_end_pfn to pfn + pageblock_nr_pages. So they should point
to the same pageblock. I guess you missed the change to update of
block_end_pfn. :)
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>> mm/compaction.c | 14 ++------------
>> 1 file changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 684f6e6cd8bc..8d7d38073d30 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>> block_end_pfn = pageblock_end_pfn(pfn);
>> for (; pfn < end_pfn; pfn += isolated,
>> - block_start_pfn = block_end_pfn,
>> - block_end_pfn += pageblock_nr_pages) {
>> + block_start_pfn = pfn,
>> + block_end_pfn = pfn + pageblock_nr_pages) {
>> /* Protect pfn from changing by isolate_freepages_block */
>> unsigned long isolate_start_pfn = pfn;
>> - /*
>> - * pfn could pass the block_end_pfn if isolated freepage
>> - * is more than pageblock order. In this case, we adjust
>> - * scanning range to right one.
>> - */
>> - if (pfn >= block_end_pfn) {
>> - block_start_pfn = pageblock_start_pfn(pfn);
>> - block_end_pfn = pageblock_end_pfn(pfn);
>> - }
>> -
>> block_end_pfn = min(block_end_pfn, end_pfn);
>> if (!pageblock_pfn_to_page(block_start_pfn,
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
2023-08-15 9:32 ` Kemeng Shi
@ 2023-08-15 10:07 ` Baolin Wang
2023-08-15 10:37 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-15 10:07 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>
>
> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>
>>
>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>> We call isolate_freepages_block in strict mode, continuous pages in
>>> pageblock will be isolated if isolate_freepages_block successed.
>>> Then pfn + isolated will point to start of next pageblock to scan
>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>> iteration.
>>
>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>
> In for update statement, we always update block_start_pfn to pfn and
I mean, you changed to:
1) pfn += isolated;
2) block_start_pfn = pfn;
3) block_end_pfn = pfn + pageblock_nr_pages;
But in 1) pfn + isolated can go outside of the currnet pageblock if
isolating a high-order page, for example, located in the middle of the
next pageblock. So that the block_start_pfn can point to the middle of
the next pageblock, not the start position. Meanwhile after 3), the
block_end_pfn can point another pageblock. Or I missed something else?
> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
> to the same pageblock. I guess you missed the change to update of
> block_end_pfn. :)
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>> mm/compaction.c | 14 ++------------
>>> 1 file changed, 2 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>> block_end_pfn = pageblock_end_pfn(pfn);
>>> for (; pfn < end_pfn; pfn += isolated,
>>> - block_start_pfn = block_end_pfn,
>>> - block_end_pfn += pageblock_nr_pages) {
>>> + block_start_pfn = pfn,
>>> + block_end_pfn = pfn + pageblock_nr_pages) {
>>> /* Protect pfn from changing by isolate_freepages_block */
>>> unsigned long isolate_start_pfn = pfn;
>>> - /*
>>> - * pfn could pass the block_end_pfn if isolated freepage
>>> - * is more than pageblock order. In this case, we adjust
>>> - * scanning range to right one.
>>> - */
>>> - if (pfn >= block_end_pfn) {
>>> - block_start_pfn = pageblock_start_pfn(pfn);
>>> - block_end_pfn = pageblock_end_pfn(pfn);
>>> - }
>>> -
>>> block_end_pfn = min(block_end_pfn, end_pfn);
>>> if (!pageblock_pfn_to_page(block_start_pfn,
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
2023-08-15 10:07 ` Baolin Wang
@ 2023-08-15 10:37 ` Kemeng Shi
2023-08-19 11:58 ` Baolin Wang
0 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-15 10:37 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/15/2023 6:07 PM, Baolin Wang wrote:
>
>
> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>> iteration.
>>>
>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>
>> In for update statement, we always update block_start_pfn to pfn and
>
> I mean, you changed to:
> 1) pfn += isolated;
> 2) block_start_pfn = pfn;
> 3) block_end_pfn = pfn + pageblock_nr_pages;
>
> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>
Ah, I miss to explain this in changelog.
In case we could we have buddy page with order higher than pageblock:
1. page in buddy page is aligned with it's order
2. order of page is higher than pageblock order
Then page is aligned with pageblock order. So pfn of page and isolated pages
count are both aligned pageblock order. So pfn + isolated is pageblock order
aligned.
>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>> to the same pageblock. I guess you missed the change to update of
>> block_end_pfn. :)
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>> mm/compaction.c | 14 ++------------
>>>> 1 file changed, 2 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>> block_end_pfn = pageblock_end_pfn(pfn);
>>>> for (; pfn < end_pfn; pfn += isolated,
>>>> - block_start_pfn = block_end_pfn,
>>>> - block_end_pfn += pageblock_nr_pages) {
>>>> + block_start_pfn = pfn,
>>>> + block_end_pfn = pfn + pageblock_nr_pages) {
>>>> /* Protect pfn from changing by isolate_freepages_block */
>>>> unsigned long isolate_start_pfn = pfn;
>>>> - /*
>>>> - * pfn could pass the block_end_pfn if isolated freepage
>>>> - * is more than pageblock order. In this case, we adjust
>>>> - * scanning range to right one.
>>>> - */
>>>> - if (pfn >= block_end_pfn) {
>>>> - block_start_pfn = pageblock_start_pfn(pfn);
>>>> - block_end_pfn = pageblock_end_pfn(pfn);
>>>> - }
>>>> -
>>>> block_end_pfn = min(block_end_pfn, end_pfn);
>>>> if (!pageblock_pfn_to_page(block_start_pfn,
>>>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
2023-08-15 10:37 ` Kemeng Shi
@ 2023-08-19 11:58 ` Baolin Wang
2023-08-22 1:37 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-19 11:58 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/15/2023 6:37 PM, Kemeng Shi wrote:
>
>
> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>> iteration.
>>>>
>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>
>>> In for update statement, we always update block_start_pfn to pfn and
>>
>> I mean, you changed to:
>> 1) pfn += isolated;
>> 2) block_start_pfn = pfn;
>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>
>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>
> Ah, I miss to explain this in changelog.
> In case we could we have buddy page with order higher than pageblock:
> 1. page in buddy page is aligned with it's order
> 2. order of page is higher than pageblock order
> Then page is aligned with pageblock order. So pfn of page and isolated pages
> count are both aligned pageblock order. So pfn + isolated is pageblock order
> aligned.
That's not what I mean. pfn + isolated is not always pageblock-aligned,
since the isolate_freepages_block() can isolated high-order free pages
(for example: order-1, order-2 ...).
Suppose the pageblock size is 2M, when isolating a pageblock (suppose
the pfn range is 0 - 511 to make the arithmetic easy) by
isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0
page, but pfn 511 is order-1 page, so you will isolate 513 pages from
this pageblock, which will make 'pfn + isolated' not pageblock aligned.
>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>> to the same pageblock. I guess you missed the change to update of
>>> block_end_pfn. :)
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>> mm/compaction.c | 14 ++------------
>>>>> 1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>> block_end_pfn = pageblock_end_pfn(pfn);
>>>>> for (; pfn < end_pfn; pfn += isolated,
>>>>> - block_start_pfn = block_end_pfn,
>>>>> - block_end_pfn += pageblock_nr_pages) {
>>>>> + block_start_pfn = pfn,
>>>>> + block_end_pfn = pfn + pageblock_nr_pages) {
>>>>> /* Protect pfn from changing by isolate_freepages_block */
>>>>> unsigned long isolate_start_pfn = pfn;
>>>>> - /*
>>>>> - * pfn could pass the block_end_pfn if isolated freepage
>>>>> - * is more than pageblock order. In this case, we adjust
>>>>> - * scanning range to right one.
>>>>> - */
>>>>> - if (pfn >= block_end_pfn) {
>>>>> - block_start_pfn = pageblock_start_pfn(pfn);
>>>>> - block_end_pfn = pageblock_end_pfn(pfn);
>>>>> - }
>>>>> -
>>>>> block_end_pfn = min(block_end_pfn, end_pfn);
>>>>> if (!pageblock_pfn_to_page(block_start_pfn,
>>>>
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
2023-08-19 11:58 ` Baolin Wang
@ 2023-08-22 1:37 ` Kemeng Shi
2023-08-24 2:19 ` Baolin Wang
0 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-22 1:37 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/19/2023 7:58 PM, Baolin Wang wrote:
>
>
> On 8/15/2023 6:37 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>>
>>>>
>>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>>> iteration.
>>>>>
>>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>>
>>>> In for update statement, we always update block_start_pfn to pfn and
>>>
>>> I mean, you changed to:
>>> 1) pfn += isolated;
>>> 2) block_start_pfn = pfn;
>>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>>
>>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>>
>> Ah, I miss to explain this in changelog.
>> In case we could we have buddy page with order higher than pageblock:
>> 1. page in buddy page is aligned with it's order
>> 2. order of page is higher than pageblock order
>> Then page is aligned with pageblock order. So pfn of page and isolated pages
>> count are both aligned pageblock order. So pfn + isolated is pageblock order
>> aligned.
>
> That's not what I mean. pfn + isolated is not always pageblock-aligned, since the isolate_freepages_block() can isolated high-order free pages (for example: order-1, order-2 ...).
>
> Suppose the pageblock size is 2M, when isolating a pageblock (suppose the pfn range is 0 - 511 to make the arithmetic easy) by isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0 page, but pfn 511 is order-1 page, so you will isolate 513 pages from this pageblock, which will make 'pfn + isolated' not pageblock aligned.
This is also no supposed to happen as low order buddy pages should never span
cross boundary of high order pages:
In buddy system, we always split order N pages into two order N - 1 pages as
following:
| order N |
|order N - 1|order N - 1|
So buddy pages with order N - 1 will never cross boudary of order N. Similar,
buddy pages with order N - 2 will never cross boudary of order N - 1 and so
on. Then any pages with order less than N will never cross boudary of order
N.
>
>>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>>> to the same pageblock. I guess you missed the change to update of
>>>> block_end_pfn. :)
>>>>>>
>>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>>> ---
>>>>>> mm/compaction.c | 14 ++------------
>>>>>> 1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>>> --- a/mm/compaction.c
>>>>>> +++ b/mm/compaction.c
>>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>>> block_end_pfn = pageblock_end_pfn(pfn);
>>>>>> for (; pfn < end_pfn; pfn += isolated,
>>>>>> - block_start_pfn = block_end_pfn,
>>>>>> - block_end_pfn += pageblock_nr_pages) {
>>>>>> + block_start_pfn = pfn,
>>>>>> + block_end_pfn = pfn + pageblock_nr_pages) {
>>>>>> /* Protect pfn from changing by isolate_freepages_block */
>>>>>> unsigned long isolate_start_pfn = pfn;
>>>>>> - /*
>>>>>> - * pfn could pass the block_end_pfn if isolated freepage
>>>>>> - * is more than pageblock order. In this case, we adjust
>>>>>> - * scanning range to right one.
>>>>>> - */
>>>>>> - if (pfn >= block_end_pfn) {
>>>>>> - block_start_pfn = pageblock_start_pfn(pfn);
>>>>>> - block_end_pfn = pageblock_end_pfn(pfn);
>>>>>> - }
>>>>>> -
>>>>>> block_end_pfn = min(block_end_pfn, end_pfn);
>>>>>> if (!pageblock_pfn_to_page(block_start_pfn,
>>>>>
>>>
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
2023-08-22 1:37 ` Kemeng Shi
@ 2023-08-24 2:19 ` Baolin Wang
0 siblings, 0 replies; 37+ messages in thread
From: Baolin Wang @ 2023-08-24 2:19 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/22/2023 9:37 AM, Kemeng Shi wrote:
>
>
> on 8/19/2023 7:58 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 6:37 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>>>
>>>>>
>>>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>>>> iteration.
>>>>>>
>>>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>>>
>>>>> In for update statement, we always update block_start_pfn to pfn and
>>>>
>>>> I mean, you changed to:
>>>> 1) pfn += isolated;
>>>> 2) block_start_pfn = pfn;
>>>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>>>
>>>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>>>
>>> Ah, I miss to explain this in changelog.
>>> In case we could we have buddy page with order higher than pageblock:
>>> 1. page in buddy page is aligned with it's order
>>> 2. order of page is higher than pageblock order
>>> Then page is aligned with pageblock order. So pfn of page and isolated pages
>>> count are both aligned pageblock order. So pfn + isolated is pageblock order
>>> aligned.
>>
>> That's not what I mean. pfn + isolated is not always pageblock-aligned, since the isolate_freepages_block() can isolated high-order free pages (for example: order-1, order-2 ...).
>>
>> Suppose the pageblock size is 2M, when isolating a pageblock (suppose the pfn range is 0 - 511 to make the arithmetic easy) by isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0 page, but pfn 511 is order-1 page, so you will isolate 513 pages from this pageblock, which will make 'pfn + isolated' not pageblock aligned.
I realized I made a bad example, sorry for noise.
After more thinking, I agree that the 'pfn + isolated' is always
pageblock aligned in strict mode. So feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> This is also no supposed to happen as low order buddy pages should never span
> cross boundary of high order pages:
> In buddy system, we always split order N pages into two order N - 1 pages as
> following:
> | order N |
> |order N - 1|order N - 1|
> So buddy pages with order N - 1 will never cross boudary of order N. Similar,
> buddy pages with order N - 2 will never cross boudary of order N - 1 and so
> on. Then any pages with order less than N will never cross boudary of order
> N.
>
>>
>>>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>>>> to the same pageblock. I guess you missed the change to update of
>>>>> block_end_pfn. :)
>>>>>>>
>>>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>>>> ---
>>>>>>> mm/compaction.c | 14 ++------------
>>>>>>> 1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>>>> --- a/mm/compaction.c
>>>>>>> +++ b/mm/compaction.c
>>>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>>>> block_end_pfn = pageblock_end_pfn(pfn);
>>>>>>> for (; pfn < end_pfn; pfn += isolated,
>>>>>>> - block_start_pfn = block_end_pfn,
>>>>>>> - block_end_pfn += pageblock_nr_pages) {
>>>>>>> + block_start_pfn = pfn,
>>>>>>> + block_end_pfn = pfn + pageblock_nr_pages) {
>>>>>>> /* Protect pfn from changing by isolate_freepages_block */
>>>>>>> unsigned long isolate_start_pfn = pfn;
>>>>>>> - /*
>>>>>>> - * pfn could pass the block_end_pfn if isolated freepage
>>>>>>> - * is more than pageblock order. In this case, we adjust
>>>>>>> - * scanning range to right one.
>>>>>>> - */
>>>>>>> - if (pfn >= block_end_pfn) {
>>>>>>> - block_start_pfn = pageblock_start_pfn(pfn);
>>>>>>> - block_end_pfn = pageblock_end_pfn(pfn);
>>>>>>> - }
>>>>>>> -
>>>>>>> block_end_pfn = min(block_end_pfn, end_pfn);
>>>>>>> if (!pageblock_pfn_to_page(block_start_pfn,
>>>>>>
>>>>
>>
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
` (4 preceding siblings ...)
2023-08-05 11:07 ` [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
2023-08-15 8:42 ` Baolin Wang
2023-08-05 11:07 ` [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order Kemeng Shi
` (3 subsequent siblings)
9 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
We have compact_blockskip_flush check in __reset_isolation_suitable, just
remove repeat check before __reset_isolation_suitable in
compact_blockskip_flush.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 8d7d38073d30..d8416d3dd445 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -434,9 +434,7 @@ void reset_isolation_suitable(pg_data_t *pgdat)
if (!populated_zone(zone))
continue;
- /* Only flush if a full compaction finished recently */
- if (zone->compact_blockskip_flush)
- __reset_isolation_suitable(zone);
+ __reset_isolation_suitable(zone);
}
}
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable
2023-08-05 11:07 ` [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable Kemeng Shi
@ 2023-08-15 8:42 ` Baolin Wang
0 siblings, 0 replies; 37+ messages in thread
From: Baolin Wang @ 2023-08-15 8:42 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We have compact_blockskip_flush check in __reset_isolation_suitable, just
> remove repeat check before __reset_isolation_suitable in
> compact_blockskip_flush.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> mm/compaction.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8d7d38073d30..d8416d3dd445 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -434,9 +434,7 @@ void reset_isolation_suitable(pg_data_t *pgdat)
> if (!populated_zone(zone))
> continue;
>
> - /* Only flush if a full compaction finished recently */
> - if (zone->compact_blockskip_flush)
> - __reset_isolation_suitable(zone);
> + __reset_isolation_suitable(zone);
> }
> }
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
` (5 preceding siblings ...)
2023-08-05 11:07 ` [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
2023-08-15 8:58 ` Baolin Wang
2023-08-05 11:07 ` [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order Kemeng Shi
` (2 subsequent siblings)
9 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
We have order = -1 via proactive compaction, the is_via_compact_memory is
not proper name anymore.
As cc->order informs the compaction to satisfy a allocation with that
order, so rename it to compaction_with_allocation_order.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index d8416d3dd445..b5a699ed526b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
}
/*
- * order == -1 is expected when compacting via
- * /proc/sys/vm/compact_memory
+ * compact to satisfy allocation with target order
*/
-static inline bool is_via_compact_memory(int order)
+static inline bool compaction_with_allocation_order(int order)
{
- return order == -1;
+ return order != -1;
}
/*
@@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
goto out;
}
- if (is_via_compact_memory(cc->order))
+ if (!compaction_with_allocation_order(cc->order))
return COMPACT_CONTINUE;
/*
@@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
cc->migratetype = gfp_migratetype(cc->gfp_mask);
- if (!is_via_compact_memory(cc->order)) {
+ if (compaction_with_allocation_order(cc->order)) {
unsigned long watermark;
/* Allocation can already succeed, nothing to do */
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
2023-08-05 11:07 ` [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order Kemeng Shi
@ 2023-08-15 8:58 ` Baolin Wang
2023-08-15 12:04 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-15 8:58 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We have order = -1 via proactive compaction, the is_via_compact_memory is
> not proper name anymore.
> As cc->order informs the compaction to satisfy a allocation with that
> order, so rename it to compaction_with_allocation_order.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/compaction.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d8416d3dd445..b5a699ed526b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
> }
>
> /*
> - * order == -1 is expected when compacting via
> - * /proc/sys/vm/compact_memory
> + * compact to satisfy allocation with target order
> */
> -static inline bool is_via_compact_memory(int order)
> +static inline bool compaction_with_allocation_order(int order)
I know naming is hard, but this name is not good enough that can show
the compaction mode. But the original one could.
> {
> - return order == -1;
> + return order != -1;
> }
>
> /*
> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
> goto out;
> }
>
> - if (is_via_compact_memory(cc->order))
> + if (!compaction_with_allocation_order(cc->order))
> return COMPACT_CONTINUE;
>
> /*
> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>
> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>
> - if (!is_via_compact_memory(cc->order)) {
> + if (compaction_with_allocation_order(cc->order)) {
> unsigned long watermark;
>
> /* Allocation can already succeed, nothing to do */
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
2023-08-15 8:58 ` Baolin Wang
@ 2023-08-15 12:04 ` Kemeng Shi
2023-08-19 12:14 ` Baolin Wang
0 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-15 12:04 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/15/2023 4:58 PM, Baolin Wang wrote:
>
>
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> We have order = -1 via proactive compaction, the is_via_compact_memory is
>> not proper name anymore.
>> As cc->order informs the compaction to satisfy a allocation with that
>> order, so rename it to compaction_with_allocation_order.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>> mm/compaction.c | 11 +++++------
>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index d8416d3dd445..b5a699ed526b 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>> }
>> /*
>> - * order == -1 is expected when compacting via
>> - * /proc/sys/vm/compact_memory
>> + * compact to satisfy allocation with target order
>> */
>> -static inline bool is_via_compact_memory(int order)
>> +static inline bool compaction_with_allocation_order(int order)
>
> I know naming is hard, but this name is not good enough that can show the compaction mode. But the original one could.
>
Yes, I agree with this, but name and comment of is_via_compact_memory may
mislead reader that order == -1 is equivalent to compaction from
/proc/sys/vm/compact_memory.
Actually, we have several approaches to trigger compaction with order == -1:
1. via /proc/sys/vm/compact_memory
2. via /sys/devices/system/node/nodex/compact
3. via proactive compact
Instead of indicate compaction is tirggerred by compact_memocy or anything,
order == -1 implies if compaction is triggerrred to meet allocation with high
order and we will stop compaction if allocation with target order will success.
>> {
>> - return order == -1;
>> + return order != -1;
>> }
>> /*
>> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>> goto out;
>> }
>> - if (is_via_compact_memory(cc->order))
>> + if (!compaction_with_allocation_order(cc->order))
>> return COMPACT_CONTINUE;
>> /*
>> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>> - if (!is_via_compact_memory(cc->order)) {
>> + if (compaction_with_allocation_order(cc->order)) {
>> unsigned long watermark;
>> /* Allocation can already succeed, nothing to do */
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
2023-08-15 12:04 ` Kemeng Shi
@ 2023-08-19 12:14 ` Baolin Wang
2023-08-22 1:51 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-19 12:14 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/15/2023 8:04 PM, Kemeng Shi wrote:
>
>
> on 8/15/2023 4:58 PM, Baolin Wang wrote:
>>
>>
>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>> We have order = -1 via proactive compaction, the is_via_compact_memory is
>>> not proper name anymore.
>>> As cc->order informs the compaction to satisfy a allocation with that
>>> order, so rename it to compaction_with_allocation_order.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>> mm/compaction.c | 11 +++++------
>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index d8416d3dd445..b5a699ed526b 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>>> }
>>> /*
>>> - * order == -1 is expected when compacting via
>>> - * /proc/sys/vm/compact_memory
>>> + * compact to satisfy allocation with target order
>>> */
>>> -static inline bool is_via_compact_memory(int order)
>>> +static inline bool compaction_with_allocation_order(int order)
>>
>> I know naming is hard, but this name is not good enough that can show the compaction mode. But the original one could.
>>
> Yes, I agree with this, but name and comment of is_via_compact_memory may
> mislead reader that order == -1 is equivalent to compaction from
> /proc/sys/vm/compact_memory.
> Actually, we have several approaches to trigger compaction with order == -1:
> 1. via /proc/sys/vm/compact_memory
> 2. via /sys/devices/system/node/nodex/compact
> 3. via proactive compact
They can all be called proactive compaction.
>
> Instead of indicate compaction is tirggerred by compact_memocy or anything,
> order == -1 implies if compaction is triggerrred to meet allocation with high
> order and we will stop compaction if allocation with target order will success.
IMO, the is_via_compact_memory() function helps people better
distinguish the compaction logic we have under direct compaction or
kcompactd compaction, while proactive compaction does not concern itself
with these details. But compaction_with_allocation_order() will make me
just wonder why we should compare with -1. So I don't think this patch
is worth it, but as you said above, we can add more comments to make it
more clear.
>>> {
>>> - return order == -1;
>>> + return order != -1;
>>> }
>>> /*
>>> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>>> goto out;
>>> }
>>> - if (is_via_compact_memory(cc->order))
>>> + if (!compaction_with_allocation_order(cc->order))
>>> return COMPACT_CONTINUE;
>>> /*
>>> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>> - if (!is_via_compact_memory(cc->order)) {
>>> + if (compaction_with_allocation_order(cc->order)) {
>>> unsigned long watermark;
>>> /* Allocation can already succeed, nothing to do */
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
2023-08-19 12:14 ` Baolin Wang
@ 2023-08-22 1:51 ` Kemeng Shi
2023-08-24 2:20 ` Baolin Wang
0 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-22 1:51 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/19/2023 8:14 PM, Baolin Wang wrote:
>
>
> On 8/15/2023 8:04 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 4:58 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>> We have order = -1 via proactive compaction, the is_via_compact_memory is
>>>> not proper name anymore.
>>>> As cc->order informs the compaction to satisfy a allocation with that
>>>> order, so rename it to compaction_with_allocation_order.
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>> mm/compaction.c | 11 +++++------
>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index d8416d3dd445..b5a699ed526b 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>>>> }
>>>> /*
>>>> - * order == -1 is expected when compacting via
>>>> - * /proc/sys/vm/compact_memory
>>>> + * compact to satisfy allocation with target order
>>>> */
>>>> -static inline bool is_via_compact_memory(int order)
>>>> +static inline bool compaction_with_allocation_order(int order)
>>>
>>> I know naming is hard, but this name is not good enough that can show the compaction mode. But the original one could.
>>>
>> Yes, I agree with this, but name and comment of is_via_compact_memory may
>> mislead reader that order == -1 is equivalent to compaction from
>> /proc/sys/vm/compact_memory.
>> Actually, we have several approaches to trigger compaction with order == -1:
>> 1. via /proc/sys/vm/compact_memory
>> 2. via /sys/devices/system/node/nodex/compact
>> 3. via proactive compact
>
> They can all be called proactive compaction.
I have considered rename to is_proactive_compaction. But "proactive compaction"
in comments of compaction.c mostly implies to compaction triggerred from
/proc/sys/vm/compaction_proactiveness. So "proactive compaction" itself looks
ambiguous...
>
>>
>> Instead of indicate compaction is tirggerred by compact_memocy or anything,
>> order == -1 implies if compaction is triggerrred to meet allocation with high
>> order and we will stop compaction if allocation with target order will success.
>
> IMO, the is_via_compact_memory() function helps people better distinguish the compaction logic we have under direct compaction or kcompactd compaction, while proactive compaction does not concern itself with these details. But compaction_with_allocation_order() will make me just wonder why we should compare with -1. So I don't think this patch is worth it, but as you said above, we can add more comments to make it more clear.
>
Sure, no insistant on this.
Is it looks good to you just change comment of is_via_compact_memory to:
We need do compaction proactively with order == -1
order == -1 is expected for proactive compaction via:
1. via /proc/sys/vm/compact_memory
2. via /sys/devices/system/node/nodex/compact
3. /proc/sys/vm/compaction_proactiveness
>>>> {
>>>> - return order == -1;
>>>> + return order != -1;
>>>> }
>>>> /*
>>>> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>>>> goto out;
>>>> }
>>>> - if (is_via_compact_memory(cc->order))
>>>> + if (!compaction_with_allocation_order(cc->order))
>>>> return COMPACT_CONTINUE;
>>>> /*
>>>> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>> - if (!is_via_compact_memory(cc->order)) {
>>>> + if (compaction_with_allocation_order(cc->order)) {
>>>> unsigned long watermark;
>>>> /* Allocation can already succeed, nothing to do */
>>>
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
2023-08-22 1:51 ` Kemeng Shi
@ 2023-08-24 2:20 ` Baolin Wang
0 siblings, 0 replies; 37+ messages in thread
From: Baolin Wang @ 2023-08-24 2:20 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/22/2023 9:51 AM, Kemeng Shi wrote:
>
>
> on 8/19/2023 8:14 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 8:04 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 4:58 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>> We have order = -1 via proactive compaction, the is_via_compact_memory is
>>>>> not proper name anymore.
>>>>> As cc->order informs the compaction to satisfy a allocation with that
>>>>> order, so rename it to compaction_with_allocation_order.
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>> mm/compaction.c | 11 +++++------
>>>>> 1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index d8416d3dd445..b5a699ed526b 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>>>>> }
>>>>> /*
>>>>> - * order == -1 is expected when compacting via
>>>>> - * /proc/sys/vm/compact_memory
>>>>> + * compact to satisfy allocation with target order
>>>>> */
>>>>> -static inline bool is_via_compact_memory(int order)
>>>>> +static inline bool compaction_with_allocation_order(int order)
>>>>
>>>> I know naming is hard, but this name is not good enough that can show the compaction mode. But the original one could.
>>>>
>>> Yes, I agree with this, but name and comment of is_via_compact_memory may
>>> mislead reader that order == -1 is equivalent to compaction from
>>> /proc/sys/vm/compact_memory.
>>> Actually, we have several approaches to trigger compaction with order == -1:
>>> 1. via /proc/sys/vm/compact_memory
>>> 2. via /sys/devices/system/node/nodex/compact
>>> 3. via proactive compact
>>
>> They can all be called proactive compaction.
> I have considered rename to is_proactive_compaction. But "proactive compaction"
> in comments of compaction.c mostly implies to compaction triggerred from
> /proc/sys/vm/compaction_proactiveness. So "proactive compaction" itself looks
> ambiguous...
>>
>>>
>>> Instead of indicate compaction is tirggerred by compact_memocy or anything,
>>> order == -1 implies if compaction is triggerrred to meet allocation with high
>>> order and we will stop compaction if allocation with target order will success.
>>
>> IMO, the is_via_compact_memory() function helps people better distinguish the compaction logic we have under direct compaction or kcompactd compaction, while proactive compaction does not concern itself with these details. But compaction_with_allocation_order() will make me just wonder why we should compare with -1. So I don't think this patch is worth it, but as you said above, we can add more comments to make it more clear.
>>
> Sure, no insistant on this.
> Is it looks good to you just change comment of is_via_compact_memory to:
> We need do compaction proactively with order == -1
> order == -1 is expected for proactive compaction via:
> 1. via /proc/sys/vm/compact_memory
> 2. via /sys/devices/system/node/nodex/compact
> 3. /proc/sys/vm/compaction_proactiveness
Look good to me. Thanks.
>
>>>>> {
>>>>> - return order == -1;
>>>>> + return order != -1;
>>>>> }
>>>>> /*
>>>>> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>>>>> goto out;
>>>>> }
>>>>> - if (is_via_compact_memory(cc->order))
>>>>> + if (!compaction_with_allocation_order(cc->order))
>>>>> return COMPACT_CONTINUE;
>>>>> /*
>>>>> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>>> - if (!is_via_compact_memory(cc->order)) {
>>>>> + if (compaction_with_allocation_order(cc->order)) {
>>>>> unsigned long watermark;
>>>>> /* Allocation can already succeed, nothing to do */
>>>>
>>
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
` (6 preceding siblings ...)
2023-08-05 11:07 ` [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
2023-08-15 8:53 ` Baolin Wang
2023-08-05 11:07 ` [PATCH 8/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_node_suitable Kemeng Shi
2023-08-05 11:07 ` [PATCH 9/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_do_work Kemeng Shi
9 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
We always do zone_watermark_ok check and compaction_suitable check
together to test if compaction for target order should be runned.
Factor these code out for preparation to remove repeat code.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index b5a699ed526b..26787ebb0297 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
return false;
}
+/*
+ * Should we do compaction for target allocation order.
+ * Return COMPACT_SUCCESS if allocation for target order can be already
+ * satisfied
+ * Return COMPACT_SKIPPED if compaction for target order is likely to fail
+ * Return COMPACT_CONTINUE if compaction for target order should be runned
+ */
+static inline enum compact_result
+compaction_suit_allocation_order(struct zone *zone, unsigned int order,
+ int highest_zoneidx, unsigned int alloc_flags)
+{
+ unsigned long watermark;
+
+ watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
+ if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
+ alloc_flags))
+ return COMPACT_SUCCESS;
+
+ if (!compaction_suitable(zone, order, highest_zoneidx))
+ return COMPACT_SKIPPED;
+
+ return COMPACT_CONTINUE;
+}
+
static enum compact_result
compact_zone(struct compact_control *cc, struct capture_control *capc)
{
@@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
cc->migratetype = gfp_migratetype(cc->gfp_mask);
if (compaction_with_allocation_order(cc->order)) {
- unsigned long watermark;
-
- /* Allocation can already succeed, nothing to do */
- watermark = wmark_pages(cc->zone,
- cc->alloc_flags & ALLOC_WMARK_MASK);
- if (zone_watermark_ok(cc->zone, cc->order, watermark,
- cc->highest_zoneidx, cc->alloc_flags))
- return COMPACT_SUCCESS;
-
- /* Compaction is likely to fail */
- if (!compaction_suitable(cc->zone, cc->order,
- cc->highest_zoneidx))
- return COMPACT_SKIPPED;
+ ret = compaction_suit_allocation_order(cc->zone, cc->order,
+ cc->highest_zoneidx,
+ cc->alloc_flags);
+ if (ret != COMPACT_CONTINUE)
+ return ret;
}
/*
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
2023-08-05 11:07 ` [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order Kemeng Shi
@ 2023-08-15 8:53 ` Baolin Wang
2023-08-15 12:10 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-15 8:53 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We always do zone_watermark_ok check and compaction_suitable check
> together to test if compaction for target order should be runned.
> Factor these code out for preparation to remove repeat code.
>
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
> mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
> 1 file changed, 29 insertions(+), 13 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index b5a699ed526b..26787ebb0297 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
> return false;
> }
>
> +/*
> + * Should we do compaction for target allocation order.
> + * Return COMPACT_SUCCESS if allocation for target order can be already
> + * satisfied
> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
> + * Return COMPACT_CONTINUE if compaction for target order should be runned
> + */
> +static inline enum compact_result
> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
> + int highest_zoneidx, unsigned int alloc_flags)
> +{
> + unsigned long watermark;
> +
> + watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
IIUC, the watermark used in patch 8 and patch 9 is different, right?
Have you measured the impact of modifying this watermark?
> + if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
> + alloc_flags))
> + return COMPACT_SUCCESS;
> +
> + if (!compaction_suitable(zone, order, highest_zoneidx))
> + return COMPACT_SKIPPED;
> +
> + return COMPACT_CONTINUE;
> +}
> +
> static enum compact_result
> compact_zone(struct compact_control *cc, struct capture_control *capc)
> {
> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>
> if (compaction_with_allocation_order(cc->order)) {
> - unsigned long watermark;
> -
> - /* Allocation can already succeed, nothing to do */
> - watermark = wmark_pages(cc->zone,
> - cc->alloc_flags & ALLOC_WMARK_MASK);
> - if (zone_watermark_ok(cc->zone, cc->order, watermark,
> - cc->highest_zoneidx, cc->alloc_flags))
> - return COMPACT_SUCCESS;
> -
> - /* Compaction is likely to fail */
> - if (!compaction_suitable(cc->zone, cc->order,
> - cc->highest_zoneidx))
> - return COMPACT_SKIPPED;
> + ret = compaction_suit_allocation_order(cc->zone, cc->order,
> + cc->highest_zoneidx,
> + cc->alloc_flags);
> + if (ret != COMPACT_CONTINUE)
> + return ret;
> }
>
> /*
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
2023-08-15 8:53 ` Baolin Wang
@ 2023-08-15 12:10 ` Kemeng Shi
2023-08-19 12:27 ` Baolin Wang
0 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-15 12:10 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/15/2023 4:53 PM, Baolin Wang wrote:
>
>
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> We always do zone_watermark_ok check and compaction_suitable check
>> together to test if compaction for target order should be runned.
>> Factor these code out for preparation to remove repeat code.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>> mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>> 1 file changed, 29 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index b5a699ed526b..26787ebb0297 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>> return false;
>> }
>> +/*
>> + * Should we do compaction for target allocation order.
>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>> + * satisfied
>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>> + */
>> +static inline enum compact_result
>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>> + int highest_zoneidx, unsigned int alloc_flags)
>> +{
>> + unsigned long watermark;
>> +
>> + watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>
> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>
Actually, there is no functional change intended. Consider wmark_pages with
alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
use original watermark.
>> + if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>> + alloc_flags))
>> + return COMPACT_SUCCESS;
>> +
>> + if (!compaction_suitable(zone, order, highest_zoneidx))
>> + return COMPACT_SKIPPED;
>> +
>> + return COMPACT_CONTINUE;
>> +}
>> +
>> static enum compact_result
>> compact_zone(struct compact_control *cc, struct capture_control *capc)
>> {
>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>> if (compaction_with_allocation_order(cc->order)) {
>> - unsigned long watermark;
>> -
>> - /* Allocation can already succeed, nothing to do */
>> - watermark = wmark_pages(cc->zone,
>> - cc->alloc_flags & ALLOC_WMARK_MASK);
>> - if (zone_watermark_ok(cc->zone, cc->order, watermark,
>> - cc->highest_zoneidx, cc->alloc_flags))
>> - return COMPACT_SUCCESS;
>> -
>> - /* Compaction is likely to fail */
>> - if (!compaction_suitable(cc->zone, cc->order,
>> - cc->highest_zoneidx))
>> - return COMPACT_SKIPPED;
>> + ret = compaction_suit_allocation_order(cc->zone, cc->order,
>> + cc->highest_zoneidx,
>> + cc->alloc_flags);
>> + if (ret != COMPACT_CONTINUE)
>> + return ret;
>> }
>> /*
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
2023-08-15 12:10 ` Kemeng Shi
@ 2023-08-19 12:27 ` Baolin Wang
2023-08-22 1:57 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-19 12:27 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/15/2023 8:10 PM, Kemeng Shi wrote:
>
>
> on 8/15/2023 4:53 PM, Baolin Wang wrote:
>>
>>
>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>> We always do zone_watermark_ok check and compaction_suitable check
>>> together to test if compaction for target order should be runned.
>>> Factor these code out for preparation to remove repeat code.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>> mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>> 1 file changed, 29 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index b5a699ed526b..26787ebb0297 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>> return false;
>>> }
>>> +/*
>>> + * Should we do compaction for target allocation order.
>>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>>> + * satisfied
>>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>>> + */
>>> +static inline enum compact_result
>>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>>> + int highest_zoneidx, unsigned int alloc_flags)
>>> +{
>>> + unsigned long watermark;
>>> +
>>> + watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>>
>> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>>
> Actually, there is no functional change intended. Consider wmark_pages with
> alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
> use original watermark.
Can you use ALLOC_WMARK_MIN macro to make it more clear?
And I think patch 8 and patch 9 should be squashed into patch 7 to
convert all at once.
>>> + if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>>> + alloc_flags))
>>> + return COMPACT_SUCCESS;
>>> +
>>> + if (!compaction_suitable(zone, order, highest_zoneidx))
>>> + return COMPACT_SKIPPED;
>>> +
>>> + return COMPACT_CONTINUE;
>>> +}
>>> +
>>> static enum compact_result
>>> compact_zone(struct compact_control *cc, struct capture_control *capc)
>>> {
>>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>> if (compaction_with_allocation_order(cc->order)) {
>>> - unsigned long watermark;
>>> -
>>> - /* Allocation can already succeed, nothing to do */
>>> - watermark = wmark_pages(cc->zone,
>>> - cc->alloc_flags & ALLOC_WMARK_MASK);
>>> - if (zone_watermark_ok(cc->zone, cc->order, watermark,
>>> - cc->highest_zoneidx, cc->alloc_flags))
>>> - return COMPACT_SUCCESS;
>>> -
>>> - /* Compaction is likely to fail */
>>> - if (!compaction_suitable(cc->zone, cc->order,
>>> - cc->highest_zoneidx))
>>> - return COMPACT_SKIPPED;
>>> + ret = compaction_suit_allocation_order(cc->zone, cc->order,
>>> + cc->highest_zoneidx,
>>> + cc->alloc_flags);
>>> + if (ret != COMPACT_CONTINUE)
>>> + return ret;
>>> }
>>> /*
>>
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
2023-08-19 12:27 ` Baolin Wang
@ 2023-08-22 1:57 ` Kemeng Shi
2023-08-24 2:25 ` Baolin Wang
0 siblings, 1 reply; 37+ messages in thread
From: Kemeng Shi @ 2023-08-22 1:57 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/19/2023 8:27 PM, Baolin Wang wrote:
>
>
> On 8/15/2023 8:10 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 4:53 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>> We always do zone_watermark_ok check and compaction_suitable check
>>>> together to test if compaction for target order should be runned.
>>>> Factor these code out for preparation to remove repeat code.
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>> mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>>> 1 file changed, 29 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index b5a699ed526b..26787ebb0297 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>>> return false;
>>>> }
>>>> +/*
>>>> + * Should we do compaction for target allocation order.
>>>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>>>> + * satisfied
>>>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>>>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>>>> + */
>>>> +static inline enum compact_result
>>>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>>>> + int highest_zoneidx, unsigned int alloc_flags)
>>>> +{
>>>> + unsigned long watermark;
>>>> +
>>>> + watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>>>
>>> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>>>
>> Actually, there is no functional change intended. Consider wmark_pages with
>> alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
>> use original watermark.
>
> Can you use ALLOC_WMARK_MIN macro to make it more clear?
Sorry, I can't quite follow this. The watermark should differ with different
alloc_flags instead of WMARK_MIN hard-coded.
Patch 8 and patch 9 use watermark with WMARK_MIN as they get alloc_flags = 0.
>
> And I think patch 8 and patch 9 should be squashed into patch 7 to convert all at once.
Sure, i could do this in next version.
>
>>>> + if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>>>> + alloc_flags))
>>>> + return COMPACT_SUCCESS;
>>>> +
>>>> + if (!compaction_suitable(zone, order, highest_zoneidx))
>>>> + return COMPACT_SKIPPED;
>>>> +
>>>> + return COMPACT_CONTINUE;
>>>> +}
>>>> +
>>>> static enum compact_result
>>>> compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>> {
>>>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>> if (compaction_with_allocation_order(cc->order)) {
>>>> - unsigned long watermark;
>>>> -
>>>> - /* Allocation can already succeed, nothing to do */
>>>> - watermark = wmark_pages(cc->zone,
>>>> - cc->alloc_flags & ALLOC_WMARK_MASK);
>>>> - if (zone_watermark_ok(cc->zone, cc->order, watermark,
>>>> - cc->highest_zoneidx, cc->alloc_flags))
>>>> - return COMPACT_SUCCESS;
>>>> -
>>>> - /* Compaction is likely to fail */
>>>> - if (!compaction_suitable(cc->zone, cc->order,
>>>> - cc->highest_zoneidx))
>>>> - return COMPACT_SKIPPED;
>>>> + ret = compaction_suit_allocation_order(cc->zone, cc->order,
>>>> + cc->highest_zoneidx,
>>>> + cc->alloc_flags);
>>>> + if (ret != COMPACT_CONTINUE)
>>>> + return ret;
>>>> }
>>>> /*
>>>
>>>
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
2023-08-22 1:57 ` Kemeng Shi
@ 2023-08-24 2:25 ` Baolin Wang
2023-08-24 2:59 ` Kemeng Shi
0 siblings, 1 reply; 37+ messages in thread
From: Baolin Wang @ 2023-08-24 2:25 UTC (permalink / raw)
To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david
On 8/22/2023 9:57 AM, Kemeng Shi wrote:
>
>
> on 8/19/2023 8:27 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 8:10 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 4:53 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>> We always do zone_watermark_ok check and compaction_suitable check
>>>>> together to test if compaction for target order should be runned.
>>>>> Factor these code out for preparation to remove repeat code.
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>> mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>>>> 1 file changed, 29 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index b5a699ed526b..26787ebb0297 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>>>> return false;
>>>>> }
>>>>> +/*
>>>>> + * Should we do compaction for target allocation order.
>>>>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>>>>> + * satisfied
>>>>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>>>>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>>>>> + */
>>>>> +static inline enum compact_result
>>>>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>>>>> + int highest_zoneidx, unsigned int alloc_flags)
>>>>> +{
>>>>> + unsigned long watermark;
>>>>> +
>>>>> + watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>>>>
>>>> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>>>>
>>> Actually, there is no functional change intended. Consider wmark_pages with
>>> alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
>>> use original watermark.
>>
>> Can you use ALLOC_WMARK_MIN macro to make it more clear?
> Sorry, I can't quite follow this. The watermark should differ with different
> alloc_flags instead of WMARK_MIN hard-coded.
> Patch 8 and patch 9 use watermark with WMARK_MIN as they get alloc_flags = 0.
I mean you can pass 'alloc_flags=ALLOC_WMARK_MIN' instead of a magic
number 0 when calling compaction_suit_allocation_order() in patch 8 and
patch 9.
>> And I think patch 8 and patch 9 should be squashed into patch 7 to convert all at once.
> Sure, i could do this in next version.
>>
>>>>> + if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>>>>> + alloc_flags))
>>>>> + return COMPACT_SUCCESS;
>>>>> +
>>>>> + if (!compaction_suitable(zone, order, highest_zoneidx))
>>>>> + return COMPACT_SKIPPED;
>>>>> +
>>>>> + return COMPACT_CONTINUE;
>>>>> +}
>>>>> +
>>>>> static enum compact_result
>>>>> compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>> {
>>>>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>>> if (compaction_with_allocation_order(cc->order)) {
>>>>> - unsigned long watermark;
>>>>> -
>>>>> - /* Allocation can already succeed, nothing to do */
>>>>> - watermark = wmark_pages(cc->zone,
>>>>> - cc->alloc_flags & ALLOC_WMARK_MASK);
>>>>> - if (zone_watermark_ok(cc->zone, cc->order, watermark,
>>>>> - cc->highest_zoneidx, cc->alloc_flags))
>>>>> - return COMPACT_SUCCESS;
>>>>> -
>>>>> - /* Compaction is likely to fail */
>>>>> - if (!compaction_suitable(cc->zone, cc->order,
>>>>> - cc->highest_zoneidx))
>>>>> - return COMPACT_SKIPPED;
>>>>> + ret = compaction_suit_allocation_order(cc->zone, cc->order,
>>>>> + cc->highest_zoneidx,
>>>>> + cc->alloc_flags);
>>>>> + if (ret != COMPACT_CONTINUE)
>>>>> + return ret;
>>>>> }
>>>>> /*
>>>>
>>>>
>>
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
2023-08-24 2:25 ` Baolin Wang
@ 2023-08-24 2:59 ` Kemeng Shi
0 siblings, 0 replies; 37+ messages in thread
From: Kemeng Shi @ 2023-08-24 2:59 UTC (permalink / raw)
To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david
on 8/24/2023 10:25 AM, Baolin Wang wrote:
>
>
> On 8/22/2023 9:57 AM, Kemeng Shi wrote:
>>
>>
>> on 8/19/2023 8:27 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/15/2023 8:10 PM, Kemeng Shi wrote:
>>>>
>>>>
>>>> on 8/15/2023 4:53 PM, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>>> We always do zone_watermark_ok check and compaction_suitable check
>>>>>> together to test if compaction for target order should be runned.
>>>>>> Factor these code out for preparation to remove repeat code.
>>>>>>
>>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>>> ---
>>>>>> mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>>>>> 1 file changed, 29 insertions(+), 13 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>> index b5a699ed526b..26787ebb0297 100644
>>>>>> --- a/mm/compaction.c
>>>>>> +++ b/mm/compaction.c
>>>>>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>>>>> return false;
>>>>>> }
>>>>>> +/*
>>>>>> + * Should we do compaction for target allocation order.
>>>>>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>>>>>> + * satisfied
>>>>>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>>>>>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>>>>>> + */
>>>>>> +static inline enum compact_result
>>>>>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>>>>>> + int highest_zoneidx, unsigned int alloc_flags)
>>>>>> +{
>>>>>> + unsigned long watermark;
>>>>>> +
>>>>>> + watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>>>>>
>>>>> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>>>>>
>>>> Actually, there is no functional change intended. Consider wmark_pages with
>>>> alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
>>>> use original watermark.
>>>
>>> Can you use ALLOC_WMARK_MIN macro to make it more clear?
>> Sorry, I can't quite follow this. The watermark should differ with different
>> alloc_flags instead of WMARK_MIN hard-coded.
>> Patch 8 and patch 9 use watermark with WMARK_MIN as they get alloc_flags = 0.
>
> I mean you can pass 'alloc_flags=ALLOC_WMARK_MIN' instead of a magic number 0 when calling compaction_suit_allocation_order() in patch 8 and patch 9.
>
Thanks for explain and this do make it better. I will do this in next version.
>>> And I think patch 8 and patch 9 should be squashed into patch 7 to convert all at once.
>> Sure, i could do this in next version.
>>>
>>>>>> + if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>>>>>> + alloc_flags))
>>>>>> + return COMPACT_SUCCESS;
>>>>>> +
>>>>>> + if (!compaction_suitable(zone, order, highest_zoneidx))
>>>>>> + return COMPACT_SKIPPED;
>>>>>> +
>>>>>> + return COMPACT_CONTINUE;
>>>>>> +}
>>>>>> +
>>>>>> static enum compact_result
>>>>>> compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>>> {
>>>>>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>>> cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>>>> if (compaction_with_allocation_order(cc->order)) {
>>>>>> - unsigned long watermark;
>>>>>> -
>>>>>> - /* Allocation can already succeed, nothing to do */
>>>>>> - watermark = wmark_pages(cc->zone,
>>>>>> - cc->alloc_flags & ALLOC_WMARK_MASK);
>>>>>> - if (zone_watermark_ok(cc->zone, cc->order, watermark,
>>>>>> - cc->highest_zoneidx, cc->alloc_flags))
>>>>>> - return COMPACT_SUCCESS;
>>>>>> -
>>>>>> - /* Compaction is likely to fail */
>>>>>> - if (!compaction_suitable(cc->zone, cc->order,
>>>>>> - cc->highest_zoneidx))
>>>>>> - return COMPACT_SKIPPED;
>>>>>> + ret = compaction_suit_allocation_order(cc->zone, cc->order,
>>>>>> + cc->highest_zoneidx,
>>>>>> + cc->alloc_flags);
>>>>>> + if (ret != COMPACT_CONTINUE)
>>>>>> + return ret;
>>>>>> }
>>>>>> /*
>>>>>
>>>>>
>>>
>>>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 8/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_node_suitable
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
` (7 preceding siblings ...)
2023-08-05 11:07 ` [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
2023-08-05 11:07 ` [PATCH 9/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_do_work Kemeng Shi
9 siblings, 0 replies; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
Use compaction_suit_allocation_order helper in kcompactd_node_suitable to
remove repeat code.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 26787ebb0297..f4b6c520038a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2924,14 +2924,9 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat)
if (!populated_zone(zone))
continue;
- /* Allocation can already succeed, check other zones */
- if (zone_watermark_ok(zone, pgdat->kcompactd_max_order,
- min_wmark_pages(zone),
- highest_zoneidx, 0))
- continue;
-
- if (compaction_suitable(zone, pgdat->kcompactd_max_order,
- highest_zoneidx))
+ if (compaction_suit_allocation_order(zone,
+ pgdat->kcompactd_max_order,
+ highest_zoneidx, 0) == COMPACT_CONTINUE)
return true;
}
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread
* [PATCH 9/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_do_work
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
` (8 preceding siblings ...)
2023-08-05 11:07 ` [PATCH 8/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_node_suitable Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
9 siblings, 0 replies; 37+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng
Use compaction_suit_allocation_order helper in kcompactd_do_work to remove
repeat code.
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
mm/compaction.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index f4b6c520038a..05c27a1ef68a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2963,12 +2963,8 @@ static void kcompactd_do_work(pg_data_t *pgdat)
if (compaction_deferred(zone, cc.order))
continue;
- /* Allocation can already succeed, nothing to do */
- if (zone_watermark_ok(zone, cc.order,
- min_wmark_pages(zone), zoneid, 0))
- continue;
-
- if (!compaction_suitable(zone, cc.order, zoneid))
+ if (compaction_suit_allocation_order(zone,
+ cc.order, zoneid, 0) != COMPACT_CONTINUE)
continue;
if (kthread_should_stop())
--
2.30.0
^ permalink raw reply related [flat|nested] 37+ messages in thread