All of lore.kernel.org
 help / color / mirror / Atom feed
* [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
@ 2019-12-05 18:54 Yang Shi
  2019-12-05 19:19 ` Qian Cai
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Yang Shi @ 2019-12-05 18:54 UTC (permalink / raw)
  To: fabecassis, jhubbard, mhocko, cl, vbabka, mgorman, akpm
  Cc: yang.shi, linux-mm, linux-kernel, stable

Felix Abecassis reports move_pages() would return random status if the
pages are already on the target node by the below test program:

---8<---

int main(void)
{
	const long node_id = 1;
	const long page_size = sysconf(_SC_PAGESIZE);
	const int64_t num_pages = 8;

	unsigned long nodemask =  1 << node_id;
	long ret = set_mempolicy(MPOL_BIND, &nodemask, sizeof(nodemask));
	if (ret < 0)
		return (EXIT_FAILURE);

	void **pages = malloc(sizeof(void*) * num_pages);
	for (int i = 0; i < num_pages; ++i) {
		pages[i] = mmap(NULL, page_size, PROT_WRITE | PROT_READ,
				MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS,
				-1, 0);
		if (pages[i] == MAP_FAILED)
			return (EXIT_FAILURE);
	}

	ret = set_mempolicy(MPOL_DEFAULT, NULL, 0);
	if (ret < 0)
		return (EXIT_FAILURE);

	int *nodes = malloc(sizeof(int) * num_pages);
	int *status = malloc(sizeof(int) * num_pages);
	for (int i = 0; i < num_pages; ++i) {
		nodes[i] = node_id;
		status[i] = 0xd0; /* simulate garbage values */
	}

	ret = move_pages(0, num_pages, pages, nodes, status, MPOL_MF_MOVE);
	printf("move_pages: %ld\n", ret);
	for (int i = 0; i < num_pages; ++i)
		printf("status[%d] = %d\n", i, status[i]);
}
---8<---

Then running the program would return nonsense status values:
$ ./move_pages_bug
move_pages: 0
status[0] = 208
status[1] = 208
status[2] = 208
status[3] = 208
status[4] = 208
status[5] = 208
status[6] = 208
status[7] = 208

This is because the status is not set if the page is already on the
target node, but move_pages() should return valid status as long as it
succeeds.  The valid status may be errno or node id.

We can't simply initialize status array to zero since the pages may be
not on node 0.  Fix it by updating status with node id which the page is
already on.

Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Reported-by: Felix Abecassis <fabecassis@nvidia.com>
Tested-by: Felix Abecassis <fabecassis@nvidia.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> 4.17+
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v3: * Adopted the suggestion from Michal.
v2: * Correted the return value when add_page_for_migration() returns 1.

 mm/migrate.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index a8f87cb..9c172f4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1512,9 +1512,11 @@ static int do_move_pages_to_node(struct mm_struct *mm,
 /*
  * Resolves the given address to a struct page, isolates it from the LRU and
  * puts it to the given pagelist.
- * Returns -errno if the page cannot be found/isolated or 0 when it has been
- * queued or the page doesn't need to be migrated because it is already on
- * the target node
+ * Returns:
+ *     errno - if the page cannot be found/isolated
+ *     0 - when it doesn't have to be migrated because it is already on the
+ *         target node
+ *     1 - when it has been queued
  */
 static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		int node, struct list_head *pagelist, bool migrate_all)
@@ -1553,7 +1555,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 	if (PageHuge(page)) {
 		if (PageHead(page)) {
 			isolate_huge_page(page, pagelist);
-			err = 0;
+			err = 1;
 		}
 	} else {
 		struct page *head;
@@ -1563,7 +1565,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 		if (err)
 			goto out_putpage;
 
-		err = 0;
+		err = 1;
 		list_add_tail(&head->lru, pagelist);
 		mod_node_page_state(page_pgdat(head),
 			NR_ISOLATED_ANON + page_is_file_cache(head),
@@ -1578,6 +1580,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
 	put_page(page);
 out:
 	up_read(&mm->mmap_sem);
+
 	return err;
 }
 
@@ -1640,8 +1643,17 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 		 */
 		err = add_page_for_migration(mm, addr, current_node,
 				&pagelist, flags & MPOL_MF_MOVE_ALL);
-		if (!err)
+
+		/* The page is already on the target node */
+		if (!err) {
+			err = store_status(status, i, current_node, 1);
+			if (err)
+				goto out_flush;
 			continue;
+		/* The page is successfully queued */
+		} else if (err > 0) {
+			continue;
+		}
 
 		err = store_status(status, i, err, 1);
 		if (err)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 18:54 [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node Yang Shi
@ 2019-12-05 19:19 ` Qian Cai
  2019-12-05 19:27   ` Yang Shi
  2019-12-05 19:45   ` Christopher Lameter
  2019-12-05 21:27 ` John Hubbard
  2 siblings, 1 reply; 19+ messages in thread
From: Qian Cai @ 2019-12-05 19:19 UTC (permalink / raw)
  To: Yang Shi
  Cc: fabecassis, jhubbard, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable



> On Dec 5, 2019, at 1:54 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> 
> This is because the status is not set if the page is already on the
> target node, but move_pages() should return valid status as long as it
> succeeds.  The valid status may be errno or node id.
> 
> We can't simply initialize status array to zero since the pages may be
> not on node 0.  Fix it by updating status with node id which the page is
> already on.

This does not look correct either.

“ENOENT
No pages were found that require moving. All pages are either already on the target node, not present, had an invalid address or could not be moved because they were mapped by multiple processes.”

move_pages() should return -ENOENT instead.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 19:19 ` Qian Cai
@ 2019-12-05 19:27   ` Yang Shi
  2019-12-05 19:34     ` Qian Cai
  0 siblings, 1 reply; 19+ messages in thread
From: Yang Shi @ 2019-12-05 19:27 UTC (permalink / raw)
  To: Qian Cai
  Cc: fabecassis, jhubbard, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable



On 12/5/19 11:19 AM, Qian Cai wrote:
>
>> On Dec 5, 2019, at 1:54 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>> This is because the status is not set if the page is already on the
>> target node, but move_pages() should return valid status as long as it
>> succeeds.  The valid status may be errno or node id.
>>
>> We can't simply initialize status array to zero since the pages may be
>> not on node 0.  Fix it by updating status with node id which the page is
>> already on.
> This does not look correct either.
>
> “ENOENT
> No pages were found that require moving. All pages are either already on the target node, not present, had an invalid address or could not be moved because they were mapped by multiple processes.”
>
> move_pages() should return -ENOENT instead.

Yes, we noticed this too. I had a note in v1 and v2 patch, but I forgot 
paste in v3, says:

John noticed another return value inconsistency between the 
implementation and the manpage. The manpage says it should return 
-ENOENT if the page is already on the target node, but it doesn't. It 
looks the original code didn't return -ENOENT either, I'm not sure if 
this is a document issue or not. Anyway this is another issue, once we 
confirm it we can fix it later.

And, Michal also commented to the note:

I do not remember all the details but my recollection is that there were 
several inconsistencies present before I touched the code and I've 
decided to not touch them without a clear usecase.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 19:27   ` Yang Shi
@ 2019-12-05 19:34     ` Qian Cai
  2019-12-05 22:09       ` Yang Shi
  0 siblings, 1 reply; 19+ messages in thread
From: Qian Cai @ 2019-12-05 19:34 UTC (permalink / raw)
  To: Yang Shi
  Cc: fabecassis, jhubbard, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable



> On Dec 5, 2019, at 2:27 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> 
> John noticed another return value inconsistency between the implementation and the manpage. The manpage says it should return -ENOENT if the page is already on the target node, but it doesn't. It looks the original code didn't return -ENOENT either, I'm not sure if this is a document issue or not. Anyway this is another issue, once we confirm it we can fix it later.

No, I think it is important to figure out this in the first place. Otherwise, it is pointless to touch this piece of code over and over again, i.e., this is not another issue but the core of this problem on hand.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 18:54 [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node Yang Shi
@ 2019-12-05 19:45   ` Christopher Lameter
  2019-12-05 19:45   ` Christopher Lameter
  2019-12-05 21:27 ` John Hubbard
  2 siblings, 0 replies; 19+ messages in thread
From: Christopher Lameter @ 2019-12-05 19:45 UTC (permalink / raw)
  To: Yang Shi
  Cc: fabecassis, jhubbard, mhocko, vbabka, mgorman, akpm, linux-mm,
	linux-kernel, stable

On Fri, 6 Dec 2019, Yang Shi wrote:

> Felix Abecassis reports move_pages() would return random status if the
> pages are already on the target node by the below test program:

Looks ok.

Acked-by: Christoph Lameter <cl@linux.com>

Nitpicks:

> @@ -1553,7 +1555,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  	if (PageHuge(page)) {
>  		if (PageHead(page)) {
>  			isolate_huge_page(page, pagelist);
> -			err = 0;
> +			err = 1;

Add a meaningful constant instead of 1?

>  out:
>  	up_read(&mm->mmap_sem);
> +
>  	return err;

Dont do that.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
@ 2019-12-05 19:45   ` Christopher Lameter
  0 siblings, 0 replies; 19+ messages in thread
From: Christopher Lameter @ 2019-12-05 19:45 UTC (permalink / raw)
  To: Yang Shi
  Cc: fabecassis, jhubbard, mhocko, vbabka, mgorman, akpm, linux-mm,
	linux-kernel, stable

On Fri, 6 Dec 2019, Yang Shi wrote:

> Felix Abecassis reports move_pages() would return random status if the
> pages are already on the target node by the below test program:

Looks ok.

Acked-by: Christoph Lameter <cl@linux.com>

Nitpicks:

> @@ -1553,7 +1555,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  	if (PageHuge(page)) {
>  		if (PageHead(page)) {
>  			isolate_huge_page(page, pagelist);
> -			err = 0;
> +			err = 1;

Add a meaningful constant instead of 1?

>  out:
>  	up_read(&mm->mmap_sem);
> +
>  	return err;

Dont do that.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 18:54 [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node Yang Shi
  2019-12-05 19:19 ` Qian Cai
  2019-12-05 19:45   ` Christopher Lameter
@ 2019-12-05 21:27 ` John Hubbard
  2019-12-05 22:00   ` Yang Shi
  2 siblings, 1 reply; 19+ messages in thread
From: John Hubbard @ 2019-12-05 21:27 UTC (permalink / raw)
  To: Yang Shi, fabecassis, mhocko, cl, vbabka, mgorman, akpm
  Cc: linux-mm, linux-kernel, stable

On 12/5/19 10:54 AM, Yang Shi wrote:
> Felix Abecassis reports move_pages() would return random status if the
> pages are already on the target node by the below test program:
> 
> ---8<---
> 
> int main(void)
> {
> 	const long node_id = 1;
> 	const long page_size = sysconf(_SC_PAGESIZE);
> 	const int64_t num_pages = 8;
> 
> 	unsigned long nodemask =  1 << node_id;
> 	long ret = set_mempolicy(MPOL_BIND, &nodemask, sizeof(nodemask));
> 	if (ret < 0)
> 		return (EXIT_FAILURE);
> 
> 	void **pages = malloc(sizeof(void*) * num_pages);
> 	for (int i = 0; i < num_pages; ++i) {
> 		pages[i] = mmap(NULL, page_size, PROT_WRITE | PROT_READ,
> 				MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS,
> 				-1, 0);
> 		if (pages[i] == MAP_FAILED)
> 			return (EXIT_FAILURE);
> 	}
> 
> 	ret = set_mempolicy(MPOL_DEFAULT, NULL, 0);
> 	if (ret < 0)
> 		return (EXIT_FAILURE);
> 
> 	int *nodes = malloc(sizeof(int) * num_pages);
> 	int *status = malloc(sizeof(int) * num_pages);
> 	for (int i = 0; i < num_pages; ++i) {
> 		nodes[i] = node_id;
> 		status[i] = 0xd0; /* simulate garbage values */
> 	}
> 
> 	ret = move_pages(0, num_pages, pages, nodes, status, MPOL_MF_MOVE);
> 	printf("move_pages: %ld\n", ret);
> 	for (int i = 0; i < num_pages; ++i)
> 		printf("status[%d] = %d\n", i, status[i]);
> }
> ---8<---
> 
> Then running the program would return nonsense status values:
> $ ./move_pages_bug
> move_pages: 0
> status[0] = 208
> status[1] = 208
> status[2] = 208
> status[3] = 208
> status[4] = 208
> status[5] = 208
> status[6] = 208
> status[7] = 208
> 
> This is because the status is not set if the page is already on the
> target node, but move_pages() should return valid status as long as it
> succeeds.  The valid status may be errno or node id.
> 
> We can't simply initialize status array to zero since the pages may be
> not on node 0.  Fix it by updating status with node id which the page is
> already on.
> 
> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
> Reported-by: Felix Abecassis <fabecassis@nvidia.com>
> Tested-by: Felix Abecassis <fabecassis@nvidia.com>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: <stable@vger.kernel.org> 4.17+
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
> v3: * Adopted the suggestion from Michal.
> v2: * Correted the return value when add_page_for_migration() returns 1.
> 
>  mm/migrate.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index a8f87cb..9c172f4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1512,9 +1512,11 @@ static int do_move_pages_to_node(struct mm_struct *mm,
>  /*
>   * Resolves the given address to a struct page, isolates it from the LRU and
>   * puts it to the given pagelist.
> - * Returns -errno if the page cannot be found/isolated or 0 when it has been
> - * queued or the page doesn't need to be migrated because it is already on
> - * the target node
> + * Returns:
> + *     errno - if the page cannot be found/isolated
> + *     0 - when it doesn't have to be migrated because it is already on the
> + *         target node
> + *     1 - when it has been queued
>   */
>  static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  		int node, struct list_head *pagelist, bool migrate_all)
> @@ -1553,7 +1555,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  	if (PageHuge(page)) {
>  		if (PageHead(page)) {
>  			isolate_huge_page(page, pagelist);
> -			err = 0;
> +			err = 1;
>  		}
>  	} else {
>  		struct page *head;
> @@ -1563,7 +1565,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  		if (err)
>  			goto out_putpage;
>  
> -		err = 0;
> +		err = 1;
>  		list_add_tail(&head->lru, pagelist);
>  		mod_node_page_state(page_pgdat(head),
>  			NR_ISOLATED_ANON + page_is_file_cache(head),
> @@ -1578,6 +1580,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>  	put_page(page);
>  out:
>  	up_read(&mm->mmap_sem);
> +
>  	return err;
>  }
>  
> @@ -1640,8 +1643,17 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>  		 */
>  		err = add_page_for_migration(mm, addr, current_node,
>  				&pagelist, flags & MPOL_MF_MOVE_ALL);
> -		if (!err)
> +
> +		/* The page is already on the target node */
> +		if (!err) {
> +			err = store_status(status, i, current_node, 1);
> +			if (err)
> +				goto out_flush;
>  			continue;
> +		/* The page is successfully queued */

Nit: could you move this comment down one line, so that it's clear that it
applies to the "else" branch? Like this:

		} else if (err > 0) {
			/* The page is successfully queued for migration */
			continue;
		}


> +		} else if (err > 0) {
> +			continue;
> +		}
>  
>  		err = store_status(status, i, err, 1);
>  		if (err)
> 

Also agree with Christopher's requests, but all of these don't affect the
correct operation of the patch, so you can add:

Reviewed-by: John Hubbard <jhubbard@nvidia.com>

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 19:45   ` Christopher Lameter
  (?)
@ 2019-12-05 21:59   ` Yang Shi
  -1 siblings, 0 replies; 19+ messages in thread
From: Yang Shi @ 2019-12-05 21:59 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: fabecassis, jhubbard, mhocko, vbabka, mgorman, akpm, linux-mm,
	linux-kernel, stable



On 12/5/19 11:45 AM, Christopher Lameter wrote:
> On Fri, 6 Dec 2019, Yang Shi wrote:
>
>> Felix Abecassis reports move_pages() would return random status if the
>> pages are already on the target node by the below test program:
> Looks ok.
>
> Acked-by: Christoph Lameter <cl@linux.com>
>
> Nitpicks:
>
>> @@ -1553,7 +1555,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>   	if (PageHuge(page)) {
>>   		if (PageHead(page)) {
>>   			isolate_huge_page(page, pagelist);
>> -			err = 0;
>> +			err = 1;
> Add a meaningful constant instead of 1?

Since it just returns errno, 0 and 1 it sounds not worth a constant or enum.

>
>>   out:
>>   	up_read(&mm->mmap_sem);
>> +
>>   	return err;
> Dont do that.

Yes, my fat finger.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 21:27 ` John Hubbard
@ 2019-12-05 22:00   ` Yang Shi
  0 siblings, 0 replies; 19+ messages in thread
From: Yang Shi @ 2019-12-05 22:00 UTC (permalink / raw)
  To: John Hubbard, fabecassis, mhocko, cl, vbabka, mgorman, akpm
  Cc: linux-mm, linux-kernel, stable



On 12/5/19 1:27 PM, John Hubbard wrote:
> On 12/5/19 10:54 AM, Yang Shi wrote:
>> Felix Abecassis reports move_pages() would return random status if the
>> pages are already on the target node by the below test program:
>>
>> ---8<---
>>
>> int main(void)
>> {
>> 	const long node_id = 1;
>> 	const long page_size = sysconf(_SC_PAGESIZE);
>> 	const int64_t num_pages = 8;
>>
>> 	unsigned long nodemask =  1 << node_id;
>> 	long ret = set_mempolicy(MPOL_BIND, &nodemask, sizeof(nodemask));
>> 	if (ret < 0)
>> 		return (EXIT_FAILURE);
>>
>> 	void **pages = malloc(sizeof(void*) * num_pages);
>> 	for (int i = 0; i < num_pages; ++i) {
>> 		pages[i] = mmap(NULL, page_size, PROT_WRITE | PROT_READ,
>> 				MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS,
>> 				-1, 0);
>> 		if (pages[i] == MAP_FAILED)
>> 			return (EXIT_FAILURE);
>> 	}
>>
>> 	ret = set_mempolicy(MPOL_DEFAULT, NULL, 0);
>> 	if (ret < 0)
>> 		return (EXIT_FAILURE);
>>
>> 	int *nodes = malloc(sizeof(int) * num_pages);
>> 	int *status = malloc(sizeof(int) * num_pages);
>> 	for (int i = 0; i < num_pages; ++i) {
>> 		nodes[i] = node_id;
>> 		status[i] = 0xd0; /* simulate garbage values */
>> 	}
>>
>> 	ret = move_pages(0, num_pages, pages, nodes, status, MPOL_MF_MOVE);
>> 	printf("move_pages: %ld\n", ret);
>> 	for (int i = 0; i < num_pages; ++i)
>> 		printf("status[%d] = %d\n", i, status[i]);
>> }
>> ---8<---
>>
>> Then running the program would return nonsense status values:
>> $ ./move_pages_bug
>> move_pages: 0
>> status[0] = 208
>> status[1] = 208
>> status[2] = 208
>> status[3] = 208
>> status[4] = 208
>> status[5] = 208
>> status[6] = 208
>> status[7] = 208
>>
>> This is because the status is not set if the page is already on the
>> target node, but move_pages() should return valid status as long as it
>> succeeds.  The valid status may be errno or node id.
>>
>> We can't simply initialize status array to zero since the pages may be
>> not on node 0.  Fix it by updating status with node id which the page is
>> already on.
>>
>> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>> Reported-by: Felix Abecassis <fabecassis@nvidia.com>
>> Tested-by: Felix Abecassis <fabecassis@nvidia.com>
>> Suggested-by: Michal Hocko <mhocko@suse.com>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: <stable@vger.kernel.org> 4.17+
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>> v3: * Adopted the suggestion from Michal.
>> v2: * Correted the return value when add_page_for_migration() returns 1.
>>
>>   mm/migrate.c | 24 ++++++++++++++++++------
>>   1 file changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index a8f87cb..9c172f4 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1512,9 +1512,11 @@ static int do_move_pages_to_node(struct mm_struct *mm,
>>   /*
>>    * Resolves the given address to a struct page, isolates it from the LRU and
>>    * puts it to the given pagelist.
>> - * Returns -errno if the page cannot be found/isolated or 0 when it has been
>> - * queued or the page doesn't need to be migrated because it is already on
>> - * the target node
>> + * Returns:
>> + *     errno - if the page cannot be found/isolated
>> + *     0 - when it doesn't have to be migrated because it is already on the
>> + *         target node
>> + *     1 - when it has been queued
>>    */
>>   static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>   		int node, struct list_head *pagelist, bool migrate_all)
>> @@ -1553,7 +1555,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>   	if (PageHuge(page)) {
>>   		if (PageHead(page)) {
>>   			isolate_huge_page(page, pagelist);
>> -			err = 0;
>> +			err = 1;
>>   		}
>>   	} else {
>>   		struct page *head;
>> @@ -1563,7 +1565,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>   		if (err)
>>   			goto out_putpage;
>>   
>> -		err = 0;
>> +		err = 1;
>>   		list_add_tail(&head->lru, pagelist);
>>   		mod_node_page_state(page_pgdat(head),
>>   			NR_ISOLATED_ANON + page_is_file_cache(head),
>> @@ -1578,6 +1580,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
>>   	put_page(page);
>>   out:
>>   	up_read(&mm->mmap_sem);
>> +
>>   	return err;
>>   }
>>   
>> @@ -1640,8 +1643,17 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>   		 */
>>   		err = add_page_for_migration(mm, addr, current_node,
>>   				&pagelist, flags & MPOL_MF_MOVE_ALL);
>> -		if (!err)
>> +
>> +		/* The page is already on the target node */
>> +		if (!err) {
>> +			err = store_status(status, i, current_node, 1);
>> +			if (err)
>> +				goto out_flush;
>>   			continue;
>> +		/* The page is successfully queued */
> Nit: could you move this comment down one line, so that it's clear that it
> applies to the "else" branch? Like this:
>
> 		} else if (err > 0) {
> 			/* The page is successfully queued for migration */
> 			continue;
> 		}

Sure.

>
>
>> +		} else if (err > 0) {
>> +			continue;
>> +		}
>>   
>>   		err = store_status(status, i, err, 1);
>>   		if (err)
>>
> Also agree with Christopher's requests, but all of these don't affect the
> correct operation of the patch, so you can add:
>
> Reviewed-by: John Hubbard <jhubbard@nvidia.com>
>
> thanks,


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 19:34     ` Qian Cai
@ 2019-12-05 22:09       ` Yang Shi
  2019-12-05 22:23         ` Qian Cai
  0 siblings, 1 reply; 19+ messages in thread
From: Yang Shi @ 2019-12-05 22:09 UTC (permalink / raw)
  To: Qian Cai
  Cc: fabecassis, jhubbard, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable



On 12/5/19 11:34 AM, Qian Cai wrote:
>
>> On Dec 5, 2019, at 2:27 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>> John noticed another return value inconsistency between the implementation and the manpage. The manpage says it should return -ENOENT if the page is already on the target node, but it doesn't. It looks the original code didn't return -ENOENT either, I'm not sure if this is a document issue or not. Anyway this is another issue, once we confirm it we can fix it later.
> No, I think it is important to figure out this in the first place. Otherwise, it is pointless to touch this piece of code over and over again, i.e., this is not another issue but the core of this problem on hand.

As I said the status return value issue is a regression, but the -ENOENT 
issue has been there since the syscall was introduced (The visual 
inspection shows so I didn't actually run test against 2.6.x kernel, but 
it returns 0 for >= 3.10 at least). It does need further clarification 
(doc problem or code problem).

Michal also noticed several inconsistencies when he was reworking 
move_pages(), and I agree with him that we'd better not touch them 
without a clear usecase.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 22:09       ` Yang Shi
@ 2019-12-05 22:23         ` Qian Cai
  2019-12-05 22:41           ` John Hubbard
  0 siblings, 1 reply; 19+ messages in thread
From: Qian Cai @ 2019-12-05 22:23 UTC (permalink / raw)
  To: Yang Shi
  Cc: fabecassis, jhubbard, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable



> On Dec 5, 2019, at 5:09 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
> 
> As I said the status return value issue is a regression, but the -ENOENT issue has been there since the syscall was introduced (The visual inspection shows so I didn't actually run test against 2.6.x kernel, but it returns 0 for >= 3.10 at least). It does need further clarification (doc problem or code problem).

The question is why we should care about this change of behavior. It is arguably you are even trying to fix an ambiguous part of the manpage, but instead leave a more obviously one still broken. It is really difficult to understand the logical here.

> 
> Michal also noticed several inconsistencies when he was reworking move_pages(), and I agree with him that we'd better not touch them without a clear usecase.

It could argue that there is no use case to restore the behavior either.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 22:23         ` Qian Cai
@ 2019-12-05 22:41           ` John Hubbard
  2019-12-05 23:16             ` Qian Cai
  0 siblings, 1 reply; 19+ messages in thread
From: John Hubbard @ 2019-12-05 22:41 UTC (permalink / raw)
  To: Qian Cai, Yang Shi
  Cc: fabecassis, mhocko, cl, vbabka, mgorman, akpm, linux-mm,
	linux-kernel, stable

On 12/5/19 2:23 PM, Qian Cai wrote:
>> On Dec 5, 2019, at 5:09 PM, Yang Shi <yang.shi@linux.alibaba.com> wrote:
>>
>> As I said the status return value issue is a regression, but the -ENOENT issue has been there since the syscall was introduced (The visual inspection shows so I didn't actually run test against 2.6.x kernel, but it returns 0 for >= 3.10 at least). It does need further clarification (doc problem or code problem).
> 
> The question is why we should care about this change of behavior. It is arguably you are even trying to fix an ambiguous part of the manpage, but instead leave a more obviously one still broken. It is really difficult to understand the logical here.
> 

Please recall how this started: it was due to a report from a real end user, who was 
seeing a real problem. After a few emails, it was clear that there's not a good
work around available for cases like this:

* User space calls move_pages(), gets 0 (success) returned, and based on that,
proceeds to iterate through the status array.

* The status array remains untouched by the move_pages() call, so confusion and
wrong behavior ensues.

After some further discussion, we decided that the current behavior really is 
incorrect, and that it needs fixing in the kernel. Which this patch does.

>>
>> Michal also noticed several inconsistencies when he was reworking move_pages(), and I agree with him that we'd better not touch them without a clear usecase.
> 
> It could argue that there is no use case to restore the behavior either.
> 

So far, there are no reports from the field, and that's probably the key
difference between these two situations.

Hope that clears up the reasoning for you. I might add that, were you to study
all the emails in these threads, and the code and the man page, you would
probably agree with the conclusions above. You might disagree with the underlying
philosophies (such as "user space is really important and we fix it if it
breaks", etc), but that's a different conversation.


thanks,
-- 
John Hubbard
NVIDIA


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 22:41           ` John Hubbard
@ 2019-12-05 23:16             ` Qian Cai
  2019-12-05 23:24               ` John Hubbard
  0 siblings, 1 reply; 19+ messages in thread
From: Qian Cai @ 2019-12-05 23:16 UTC (permalink / raw)
  To: John Hubbard
  Cc: Yang Shi, fabecassis, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable



> On Dec 5, 2019, at 5:41 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> Please recall how this started: it was due to a report from a real end user, who was 
> seeing a real problem. After a few emails, it was clear that there's not a good
> work around available for cases like this:
> 
> * User space calls move_pages(), gets 0 (success) returned, and based on that,
> proceeds to iterate through the status array.
> 
> * The status array remains untouched by the move_pages() call, so confusion and
> wrong behavior ensues.
> 
> After some further discussion, we decided that the current behavior really is 
> incorrect, and that it needs fixing in the kernel. Which this patch does.

Well, that test code itself  does not really tell any real world use case.  Also, thanks to the discussion, it brought to me it is more obvious and critical  that the return code is wrong according to the spec. Then, if that part is taking care of, it would kill two-bird with one stone because there is no need to return status array anymore. Make sense?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 23:16             ` Qian Cai
@ 2019-12-05 23:24               ` John Hubbard
  2019-12-05 23:58                 ` Qian Cai
  0 siblings, 1 reply; 19+ messages in thread
From: John Hubbard @ 2019-12-05 23:24 UTC (permalink / raw)
  To: Qian Cai
  Cc: Yang Shi, fabecassis, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable

On 12/5/19 3:16 PM, Qian Cai wrote:
> 
> 
>> On Dec 5, 2019, at 5:41 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> Please recall how this started: it was due to a report from a real end user, who was 
>> seeing a real problem. After a few emails, it was clear that there's not a good
>> work around available for cases like this:
>>
>> * User space calls move_pages(), gets 0 (success) returned, and based on that,
>> proceeds to iterate through the status array.
>>
>> * The status array remains untouched by the move_pages() call, so confusion and
>> wrong behavior ensues.
>>
>> After some further discussion, we decided that the current behavior really is 
>> incorrect, and that it needs fixing in the kernel. Which this patch does.
> 
> Well, that test code itself  does not really tell any real world use case.  Also, thanks to the discussion, it brought to me it is more obvious and critical  that the return code is wrong according to the spec. Then, if that part is taking care of, it would kill two-bird with one stone because there is no need to return status array anymore. Make sense?
> 

Let's check in the fix that is clearly correct and non-controversial, in one
patch. Then another patch can be created for the other case. This allows forward
progress and quick resolution of the user's bug report, while still dealing
with all the problems.

If you try to fix too many problems in one patch (and remember, sometimes ">1"
is too many), then things bog down. It's always a judgment call, but what's 
unfolding here is quite consistent with the usual judgment calls in this area.

I don't think anyone is saying, "don't work on the second problem", it's just
that it's less urgent, due to no reports from the field. If you are passionate
about fixing the second problem (and are ready and willing to handle the fallout
from user space, if it occurs), then I'd encourage you to look into it.

It could turn out to be one of those "cannot change this because user space expectations
have baked and hardened, and changes would break user space" situations, just to
warn you in advance, though.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 23:24               ` John Hubbard
@ 2019-12-05 23:58                 ` Qian Cai
  2019-12-06  0:04                   ` John Hubbard
  0 siblings, 1 reply; 19+ messages in thread
From: Qian Cai @ 2019-12-05 23:58 UTC (permalink / raw)
  To: John Hubbard
  Cc: Yang Shi, fabecassis, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable



> On Dec 5, 2019, at 6:24 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> Let's check in the fix that is clearly correct and non-controversial, in one
> patch. Then another patch can be created for the other case. This allows forward
> progress and quick resolution of the user's bug report, while still dealing
> with all the problems.
> 
> If you try to fix too many problems in one patch (and remember, sometimes ">1"
> is too many), then things bog down. It's always a judgment call, but what's 
> unfolding here is quite consistent with the usual judgment calls in this area.
> 
> I don't think anyone is saying, "don't work on the second problem", it's just
> that it's less urgent, due to no reports from the field. If you are passionate
> about fixing the second problem (and are ready and willing to handle the fallout
> from user space, if it occurs), then I'd encourage you to look into it.
> 
> It could turn out to be one of those "cannot change this because user space expectations
> have baked and hardened, and changes would break user space" situations, just to
> warn you in advance, though.

There is no need to paper over the underlying issue. One can think there is only one problem. The way move_pages() deal with pages are already in the desired node. Then, I don’t see there is any controversy that it was broken for so long and just restore it to according to the manpage. If you worried about people has already depended on the broken behavior, it could stay in linux-next for many releases to gather feedback. In any case, I don’t see it need to hurry to fix this until someone can show the real world use case for it apart from some random test code.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 23:58                 ` Qian Cai
@ 2019-12-06  0:04                   ` John Hubbard
  2019-12-06  0:19                     ` Qian Cai
  0 siblings, 1 reply; 19+ messages in thread
From: John Hubbard @ 2019-12-06  0:04 UTC (permalink / raw)
  To: Qian Cai
  Cc: Yang Shi, fabecassis, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable

On 12/5/19 3:58 PM, Qian Cai wrote:
> 
> 
>> On Dec 5, 2019, at 6:24 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> Let's check in the fix that is clearly correct and non-controversial, in one
>> patch. Then another patch can be created for the other case. This allows forward
>> progress and quick resolution of the user's bug report, while still dealing
>> with all the problems.
>>
>> If you try to fix too many problems in one patch (and remember, sometimes ">1"
>> is too many), then things bog down. It's always a judgment call, but what's 
>> unfolding here is quite consistent with the usual judgment calls in this area.
>>
>> I don't think anyone is saying, "don't work on the second problem", it's just
>> that it's less urgent, due to no reports from the field. If you are passionate
>> about fixing the second problem (and are ready and willing to handle the fallout
>> from user space, if it occurs), then I'd encourage you to look into it.
>>
>> It could turn out to be one of those "cannot change this because user space expectations
>> have baked and hardened, and changes would break user space" situations, just to
>> warn you in advance, though.
> 
> There is no need to paper over the underlying issue. One can think there is only one problem. The way move_pages() deal with pages are already in the desired node. Then, I don’t see there is any controversy that it was broken for so long and just restore it to according to the manpage. If you worried about people has already depended on the broken behavior, it could stay in linux-next for many releases to gather feedback. In any case, I don’t see it need to hurry to fix this until someone can show the real world use case for it apart from some random test code.
> 

Felix's code is not random test code. It's code he wrote and he expected it to work.

Anyway, I've explained what I want here, and done my best to explain it. So I'm 
dropping off now. :)

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-06  0:04                   ` John Hubbard
@ 2019-12-06  0:19                     ` Qian Cai
  2019-12-06  1:11                       ` Yang Shi
  0 siblings, 1 reply; 19+ messages in thread
From: Qian Cai @ 2019-12-06  0:19 UTC (permalink / raw)
  To: John Hubbard
  Cc: Yang Shi, fabecassis, mhocko, cl, vbabka, mgorman, akpm,
	linux-mm, linux-kernel, stable



> On Dec 5, 2019, at 7:04 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> Felix's code is not random test code. It's code he wrote and he expected it to work.

Sure, but could he show a bit detail if the kernel will start to behavior as expected like what was written in the manpage to return ENOENT in this case, why is it not acceptable? i.e., why is it important to depend on the broken behavior?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-06  0:19                     ` Qian Cai
@ 2019-12-06  1:11                       ` Yang Shi
  0 siblings, 0 replies; 19+ messages in thread
From: Yang Shi @ 2019-12-06  1:11 UTC (permalink / raw)
  To: Qian Cai, John Hubbard
  Cc: fabecassis, mhocko, cl, vbabka, mgorman, akpm, linux-mm,
	linux-kernel, stable



On 12/5/19 4:19 PM, Qian Cai wrote:
>
>> On Dec 5, 2019, at 7:04 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> Felix's code is not random test code. It's code he wrote and he expected it to work.
> Sure, but could he show a bit detail if the kernel will start to behavior as expected like what was written in the manpage to return ENOENT in this case, why is it not acceptable? i.e., why is it important to depend on the broken behavior?


I think I found the root cause. It did return -ENOENT when the syscall 
was introduced (my first impression was wrong), but the behavior was 
changed since 2.6.28 by commit e78bbfa82624 ("mm: stop returning -ENOENT 
from sys_move_pages() if nothing got migrated"). The full commit log is 
as below:

Author: Brice Goglin <Brice.Goglin@inria.fr>
Date:   Sat Oct 18 20:27:15 2008 -0700

     mm: stop returning -ENOENT from sys_move_pages() if nothing got 
migrated

     A patchset reworking sys_move_pages().  It removes the possibly large
     vmalloc by using multiple chunks when migrating large buffers. It also
     dramatically increases the throughput for large buffers since the 
lookup
     in new_page_node() is now limited to a single chunk, causing the 
quadratic
     complexity to have a much slower impact.  There is no need to use any
     radix-tree-like structure to improve this lookup.

     sys_move_pages() duration on a 4-quadcore-opteron 2347HE (1.9Gz),
     migrating between nodes #2 and #3:

             length          move_pages (us)         move_pages+patch (us)
             4kB             126                     98
             40kB            198                     168
             400kB           963                     937
             4MB             12503                   11930
             40MB            246867                  11848

     Patches #1 and #4 are the important ones:
     1) stop returning -ENOENT from sys_move_pages() if nothing got migrated
     2) don't vmalloc a huge page_to_node array for do_pages_stat()
     3) extract do_pages_move() out of sys_move_pages()
     4) rework do_pages_move() to work on page_sized chunks
     5) move_pages: no need to set pp->page to ZERO_PAGE(0) by default

     This patch:

     There is no point in returning -ENOENT from sys_move_pages() if all 
pages
     were already on the right node, while we return 0 if only 1 page 
was not.
     Most application don't know where their pages are allocated, so 
it's not
     an error to try to migrate them anyway.

     Just return 0 and let the status array in user-space be checked if the
     application needs details.

     It will make the upcoming chunked-move_pages() support much easier.

     Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
     Acked-by: Christoph Lameter <cl@linux-foundation.org>
     Cc: Nick Piggin <nickpiggin@yahoo.com.au>
     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


So the behavior was changed in kernel intentionally but never reflected 
in the manpage. I will propose a patch to fix the document.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node
  2019-12-05 19:45   ` Christopher Lameter
  (?)
  (?)
@ 2019-12-06  7:35   ` Michal Hocko
  -1 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2019-12-06  7:35 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Yang Shi, fabecassis, jhubbard, vbabka, mgorman, akpm, linux-mm,
	linux-kernel, stable

On Thu 05-12-19 19:45:49, Cristopher Lameter wrote:
> On Fri, 6 Dec 2019, Yang Shi wrote:
> 
> > Felix Abecassis reports move_pages() would return random status if the
> > pages are already on the target node by the below test program:
> 
> Looks ok.
> 
> Acked-by: Christoph Lameter <cl@linux.com>
> 
> Nitpicks:
> 
> > @@ -1553,7 +1555,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
> >  	if (PageHuge(page)) {
> >  		if (PageHead(page)) {
> >  			isolate_huge_page(page, pagelist);
> > -			err = 0;
> > +			err = 1;
> 
> Add a meaningful constant instead of 1?

Well 1 has a good meaning here actually. We have -errno or the number of
queued pages.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-12-06  7:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-05 18:54 [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node Yang Shi
2019-12-05 19:19 ` Qian Cai
2019-12-05 19:27   ` Yang Shi
2019-12-05 19:34     ` Qian Cai
2019-12-05 22:09       ` Yang Shi
2019-12-05 22:23         ` Qian Cai
2019-12-05 22:41           ` John Hubbard
2019-12-05 23:16             ` Qian Cai
2019-12-05 23:24               ` John Hubbard
2019-12-05 23:58                 ` Qian Cai
2019-12-06  0:04                   ` John Hubbard
2019-12-06  0:19                     ` Qian Cai
2019-12-06  1:11                       ` Yang Shi
2019-12-05 19:45 ` Christopher Lameter
2019-12-05 19:45   ` Christopher Lameter
2019-12-05 21:59   ` Yang Shi
2019-12-06  7:35   ` Michal Hocko
2019-12-05 21:27 ` John Hubbard
2019-12-05 22:00   ` Yang Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.