All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] mm/migrate.c: Fix return code when migration fails
@ 2023-08-07  6:39 Alistair Popple
  2023-08-07  6:39 ` [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status Alistair Popple
  2023-08-07  8:39 ` [PATCH 1/2] mm/migrate.c: Fix return code when migration fails Michal Hocko
  0 siblings, 2 replies; 16+ messages in thread
From: Alistair Popple @ 2023-08-07  6:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mhocko, jhubbard, ying.huang, osalvador, baolin.wang,
	ziy, shy828301, ryan.roberts, Alistair Popple

When a page fails to migrate move_pages() returns the error code in a
per-page array of status values. The function call itself is also
supposed to return a summary error code indicating that a failure
occurred.

This doesn't always happen. Instead success can be returned even
though some pages failed to migrate. This is due to incorrectly
returning the error code from store_status() rather than the code from
add_page_for_migration. Fix this by only returning an error from
store_status() if the store actually failed.

Signed-off-by: Alistair Popple <apopple@nvidia.com>
Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
---
 mm/migrate.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 24baad2571e3..bb3a37245e13 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
 		 * If the page is already on the target node (!err), store the
 		 * node, otherwise, store the err.
 		 */
-		err = store_status(status, i, err ? : current_node, 1);
+		err1 = store_status(status, i, err ? : current_node, 1);
+		if (err1)
+			err = err1;
 		if (err)
 			goto out_flush;
 
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-07  6:39 [PATCH 1/2] mm/migrate.c: Fix return code when migration fails Alistair Popple
@ 2023-08-07  6:39 ` Alistair Popple
  2023-08-07  9:15   ` Ryan Roberts
  2023-08-09  9:35   ` David Hildenbrand
  2023-08-07  8:39 ` [PATCH 1/2] mm/migrate.c: Fix return code when migration fails Michal Hocko
  1 sibling, 2 replies; 16+ messages in thread
From: Alistair Popple @ 2023-08-07  6:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, mhocko, jhubbard, ying.huang, osalvador, baolin.wang,
	ziy, shy828301, ryan.roberts, Alistair Popple

The migration selftest was only checking the return code and not the
status array for migration success/failure. Update the test to check
both. This uncovered a bug in the return code handling of
do_pages_move().

Also disable NUMA balancing as that can lead to unexpected migration
failures.

Signed-off-by: Alistair Popple <apopple@nvidia.com>
Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
---

Ryan, this will still cause the test to fail if a migration failed. I
was unable to reproduce a migration failure for any cases on my system
once I disabled NUMA balancing though so I'd be curious if you are
still seeing failures with this patch applied. AFAIK there shouldn't
be anything else that would be causing migration failure so would like
to know what is causing failures. Thanks!

 tools/testing/selftests/mm/migration.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
index 379581567f27..cf079af5799b 100644
--- a/tools/testing/selftests/mm/migration.c
+++ b/tools/testing/selftests/mm/migration.c
@@ -51,6 +51,12 @@ FIXTURE_SETUP(migration)
 	ASSERT_NE(self->threads, NULL);
 	self->pids = malloc(self->nthreads * sizeof(*self->pids));
 	ASSERT_NE(self->pids, NULL);
+
+	/*
+	 * Disable NUMA balancing which can cause migration
+	 * failures.
+	 */
+	numa_set_membind(numa_all_nodes_ptr);
 };
 
 FIXTURE_TEARDOWN(migration)
@@ -62,13 +68,14 @@ FIXTURE_TEARDOWN(migration)
 int migrate(uint64_t *ptr, int n1, int n2)
 {
 	int ret, tmp;
-	int status = 0;
 	struct timespec ts1, ts2;
 
 	if (clock_gettime(CLOCK_MONOTONIC, &ts1))
 		return -1;
 
 	while (1) {
+		int status = NUMA_NUM_NODES + 1;
+
 		if (clock_gettime(CLOCK_MONOTONIC, &ts2))
 			return -1;
 
@@ -85,6 +92,15 @@ int migrate(uint64_t *ptr, int n1, int n2)
 			return -2;
 		}
 
+		/*
+		 * Note we should never see this because move_pages() should
+		 * have indicated a page couldn't migrate above.
+		 */
+		if (status < 0) {
+			printf("Page didn't migrate, error %d\n", status);
+			return -2;
+		}
+
 		tmp = n2;
 		n2 = n1;
 		n1 = tmp;
-- 
2.39.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm/migrate.c: Fix return code when migration fails
  2023-08-07  6:39 [PATCH 1/2] mm/migrate.c: Fix return code when migration fails Alistair Popple
  2023-08-07  6:39 ` [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status Alistair Popple
@ 2023-08-07  8:39 ` Michal Hocko
  2023-08-07 12:31   ` Alistair Popple
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2023-08-07  8:39 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301, ryan.roberts

On Mon 07-08-23 16:39:44, Alistair Popple wrote:
> When a page fails to migrate move_pages() returns the error code in a
> per-page array of status values. The function call itself is also
> supposed to return a summary error code indicating that a failure
> occurred.
> 
> This doesn't always happen. Instead success can be returned even
> though some pages failed to migrate. This is due to incorrectly
> returning the error code from store_status() rather than the code from
> add_page_for_migration. Fix this by only returning an error from
> store_status() if the store actually failed.

Error reporting by this syscall has been really far from
straightforward. Please read through a49bd4d71637 and the section "On a
side note". 
Is there any specific reason you are trying to address this now or is
this motivated by the code inspection?

> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
> ---
>  mm/migrate.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 24baad2571e3..bb3a37245e13 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>  		 * If the page is already on the target node (!err), store the
>  		 * node, otherwise, store the err.
>  		 */
> -		err = store_status(status, i, err ? : current_node, 1);
> +		err1 = store_status(status, i, err ? : current_node, 1);
> +		if (err1)
> +			err = err1;
>  		if (err)
>  			goto out_flush;
>  
> -- 
> 2.39.2

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-07  6:39 ` [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status Alistair Popple
@ 2023-08-07  9:15   ` Ryan Roberts
  2023-08-07 12:41     ` Alistair Popple
  2023-08-09  9:35   ` David Hildenbrand
  1 sibling, 1 reply; 16+ messages in thread
From: Ryan Roberts @ 2023-08-07  9:15 UTC (permalink / raw)
  To: Alistair Popple, Andrew Morton
  Cc: linux-mm, mhocko, jhubbard, ying.huang, osalvador, baolin.wang,
	ziy, shy828301

On 07/08/2023 07:39, Alistair Popple wrote:
> The migration selftest was only checking the return code and not the
> status array for migration success/failure. Update the test to check
> both. This uncovered a bug in the return code handling of
> do_pages_move().
> 
> Also disable NUMA balancing as that can lead to unexpected migration
> failures.
> 
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
> 
> Ryan, this will still cause the test to fail if a migration failed. I
> was unable to reproduce a migration failure for any cases on my system
> once I disabled NUMA balancing though so I'd be curious if you are
> still seeing failures with this patch applied. AFAIK there shouldn't
> be anything else that would be causing migration failure so would like
> to know what is causing failures. Thanks!


Hi Alistair,

Afraid I'm still seeing unmigrated pages when running with these 2 patches:


#  RUN           migration.shared_anon ...
Didn't migrate 1 pages
# migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0)
# shared_anon: Test terminated by assertion
#          FAIL  migration.shared_anon
not ok 2 migration.shared_anon


I added some instrumentation; it usually fails on the second time through the loop in migrate() but I've also seen it fail the first time. Never seen it get though 2 iterations successfully though.

I did also try just this patch without the error handling update in the kernel, but it still fails in the same way.

I'm running on arm64 in case that wasn't clear. Let me know if there is anything I can do to help debug.

Thanks,
Ryan


> 
>  tools/testing/selftests/mm/migration.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
> index 379581567f27..cf079af5799b 100644
> --- a/tools/testing/selftests/mm/migration.c
> +++ b/tools/testing/selftests/mm/migration.c
> @@ -51,6 +51,12 @@ FIXTURE_SETUP(migration)
>  	ASSERT_NE(self->threads, NULL);
>  	self->pids = malloc(self->nthreads * sizeof(*self->pids));
>  	ASSERT_NE(self->pids, NULL);
> +
> +	/*
> +	 * Disable NUMA balancing which can cause migration
> +	 * failures.
> +	 */
> +	numa_set_membind(numa_all_nodes_ptr);
>  };
>  
>  FIXTURE_TEARDOWN(migration)
> @@ -62,13 +68,14 @@ FIXTURE_TEARDOWN(migration)
>  int migrate(uint64_t *ptr, int n1, int n2)
>  {
>  	int ret, tmp;
> -	int status = 0;
>  	struct timespec ts1, ts2;
>  
>  	if (clock_gettime(CLOCK_MONOTONIC, &ts1))
>  		return -1;
>  
>  	while (1) {
> +		int status = NUMA_NUM_NODES + 1;
> +
>  		if (clock_gettime(CLOCK_MONOTONIC, &ts2))
>  			return -1;
>  
> @@ -85,6 +92,15 @@ int migrate(uint64_t *ptr, int n1, int n2)
>  			return -2;
>  		}
>  
> +		/*
> +		 * Note we should never see this because move_pages() should
> +		 * have indicated a page couldn't migrate above.
> +		 */
> +		if (status < 0) {
> +			printf("Page didn't migrate, error %d\n", status);
> +			return -2;
> +		}
> +
>  		tmp = n2;
>  		n2 = n1;
>  		n1 = tmp;



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm/migrate.c: Fix return code when migration fails
  2023-08-07  8:39 ` [PATCH 1/2] mm/migrate.c: Fix return code when migration fails Michal Hocko
@ 2023-08-07 12:31   ` Alistair Popple
  2023-08-07 13:07     ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: Alistair Popple @ 2023-08-07 12:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301, ryan.roberts


Michal Hocko <mhocko@suse.com> writes:

> On Mon 07-08-23 16:39:44, Alistair Popple wrote:
>> When a page fails to migrate move_pages() returns the error code in a
>> per-page array of status values. The function call itself is also
>> supposed to return a summary error code indicating that a failure
>> occurred.
>> 
>> This doesn't always happen. Instead success can be returned even
>> though some pages failed to migrate. This is due to incorrectly
>> returning the error code from store_status() rather than the code from
>> add_page_for_migration. Fix this by only returning an error from
>> store_status() if the store actually failed.
>
> Error reporting by this syscall has been really far from
> straightforward. Please read through a49bd4d71637 and the section "On a
> side note". 
> Is there any specific reason you are trying to address this now or is
> this motivated by the code inspection?

Thanks Michal. There was no specific reason to address this now other
than I came across this behaviour when updating the migration selftest
to inspect the status array and thought it was odd. I was seeing pages
had failed to migrate according to the status argument even though
move_pages() had returned 0 (ie. success) rather than a number of
non-migrated pages.

If I'm interpreting the side note correctly the behaviour you were
concerned about was the opposite - returning a fail return code from
move_pages() but not indicating failure in the status array.

That said I'm happy to leave the behaviour as is, although in that case
an update to the man page is in order to clarify a return value of 0
from move_pages() doesn't actually mean all pages were successfully
migrated.

>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>> ---
>>  mm/migrate.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 24baad2571e3..bb3a37245e13 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>  		 * If the page is already on the target node (!err), store the
>>  		 * node, otherwise, store the err.
>>  		 */
>> -		err = store_status(status, i, err ? : current_node, 1);
>> +		err1 = store_status(status, i, err ? : current_node, 1);
>> +		if (err1)
>> +			err = err1;
>>  		if (err)
>>  			goto out_flush;
>>  
>> -- 
>> 2.39.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-07  9:15   ` Ryan Roberts
@ 2023-08-07 12:41     ` Alistair Popple
  2023-08-07 13:42       ` Ryan Roberts
  0 siblings, 1 reply; 16+ messages in thread
From: Alistair Popple @ 2023-08-07 12:41 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Andrew Morton, linux-mm, mhocko, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301


Ryan Roberts <ryan.roberts@arm.com> writes:

> On 07/08/2023 07:39, Alistair Popple wrote:
>> The migration selftest was only checking the return code and not the
>> status array for migration success/failure. Update the test to check
>> both. This uncovered a bug in the return code handling of
>> do_pages_move().
>> 
>> Also disable NUMA balancing as that can lead to unexpected migration
>> failures.
>> 
>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>> 
>> Ryan, this will still cause the test to fail if a migration failed. I
>> was unable to reproduce a migration failure for any cases on my system
>> once I disabled NUMA balancing though so I'd be curious if you are
>> still seeing failures with this patch applied. AFAIK there shouldn't
>> be anything else that would be causing migration failure so would like
>> to know what is causing failures. Thanks!
>
>
> Hi Alistair,
>
> Afraid I'm still seeing unmigrated pages when running with these 2 patches:
>
>
> #  RUN           migration.shared_anon ...
> Didn't migrate 1 pages
> # migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0)
> # shared_anon: Test terminated by assertion
> #          FAIL  migration.shared_anon
> not ok 2 migration.shared_anon
>
>
> I added some instrumentation; it usually fails on the second time
> through the loop in migrate() but I've also seen it fail the first
> time. Never seen it get though 2 iterations successfully though.

Interesting. I guess migration failure is always possible for various
reasons so I will update the test to report the number of failed
migrations rather than making it a test failure. I was mostly just
curious as to what would be causing the occasional failures for my own
understanding, but the failures themselves are unimportant.

> I did also try just this patch without the error handling update in the kernel, but it still fails in the same way.
>
> I'm running on arm64 in case that wasn't clear. Let me know if there is anything I can do to help debug.

Thanks! Unless you're concerned about the failures I am happy to ignore
them. Pages can fail to migrate for all sorts of reasons although I'm a
little suprised anonymous migrations are failing so frequently for you.

> Thanks,
> Ryan
>
>
>> 
>>  tools/testing/selftests/mm/migration.c | 18 +++++++++++++++++-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>> 
>> diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
>> index 379581567f27..cf079af5799b 100644
>> --- a/tools/testing/selftests/mm/migration.c
>> +++ b/tools/testing/selftests/mm/migration.c
>> @@ -51,6 +51,12 @@ FIXTURE_SETUP(migration)
>>  	ASSERT_NE(self->threads, NULL);
>>  	self->pids = malloc(self->nthreads * sizeof(*self->pids));
>>  	ASSERT_NE(self->pids, NULL);
>> +
>> +	/*
>> +	 * Disable NUMA balancing which can cause migration
>> +	 * failures.
>> +	 */
>> +	numa_set_membind(numa_all_nodes_ptr);
>>  };
>>  
>>  FIXTURE_TEARDOWN(migration)
>> @@ -62,13 +68,14 @@ FIXTURE_TEARDOWN(migration)
>>  int migrate(uint64_t *ptr, int n1, int n2)
>>  {
>>  	int ret, tmp;
>> -	int status = 0;
>>  	struct timespec ts1, ts2;
>>  
>>  	if (clock_gettime(CLOCK_MONOTONIC, &ts1))
>>  		return -1;
>>  
>>  	while (1) {
>> +		int status = NUMA_NUM_NODES + 1;
>> +
>>  		if (clock_gettime(CLOCK_MONOTONIC, &ts2))
>>  			return -1;
>>  
>> @@ -85,6 +92,15 @@ int migrate(uint64_t *ptr, int n1, int n2)
>>  			return -2;
>>  		}
>>  
>> +		/*
>> +		 * Note we should never see this because move_pages() should
>> +		 * have indicated a page couldn't migrate above.
>> +		 */
>> +		if (status < 0) {
>> +			printf("Page didn't migrate, error %d\n", status);
>> +			return -2;
>> +		}
>> +
>>  		tmp = n2;
>>  		n2 = n1;
>>  		n1 = tmp;



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm/migrate.c: Fix return code when migration fails
  2023-08-07 12:31   ` Alistair Popple
@ 2023-08-07 13:07     ` Michal Hocko
  2023-08-09  4:10       ` Alistair Popple
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2023-08-07 13:07 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301, ryan.roberts

On Mon 07-08-23 22:31:52, Alistair Popple wrote:
> 
> Michal Hocko <mhocko@suse.com> writes:
> 
> > On Mon 07-08-23 16:39:44, Alistair Popple wrote:
> >> When a page fails to migrate move_pages() returns the error code in a
> >> per-page array of status values. The function call itself is also
> >> supposed to return a summary error code indicating that a failure
> >> occurred.
> >> 
> >> This doesn't always happen. Instead success can be returned even
> >> though some pages failed to migrate. This is due to incorrectly
> >> returning the error code from store_status() rather than the code from
> >> add_page_for_migration. Fix this by only returning an error from
> >> store_status() if the store actually failed.
> >
> > Error reporting by this syscall has been really far from
> > straightforward. Please read through a49bd4d71637 and the section "On a
> > side note". 
> > Is there any specific reason you are trying to address this now or is
> > this motivated by the code inspection?
> 
> Thanks Michal. There was no specific reason to address this now other
> than I came across this behaviour when updating the migration selftest
> to inspect the status array and thought it was odd. I was seeing pages
> had failed to migrate according to the status argument even though
> move_pages() had returned 0 (ie. success) rather than a number of
> non-migrated pages.

It is good to mention such a motivation in the changelog to make it
clear. Also do we have a specific test case which trigger this case?

> If I'm interpreting the side note correctly the behaviour you were
> concerned about was the opposite - returning a fail return code from
> move_pages() but not indicating failure in the status array.
> 
> That said I'm happy to leave the behaviour as is, although in that case
> an update to the man page is in order to clarify a return value of 0
> from move_pages() doesn't actually mean all pages were successfully
> migrated.

While I would say that it is better to let old dogs sleep I do not mind
changing the behavior and see whether anything breaks. I suspect nobody
except for couple of test cases hardcoded to the original behavior will
notice.

> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> >> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")

The patch itself looks good. I am not sure the fixes tag is accurate.
Has the reporting been correct before this change? I didn't have time to
re-read the original code which was quite different.

Anyway
Acked-by: Michal Hocko <mhocko@suse.com>

Anyway rewriting this function to clarify the error handling would be a
nice exercise if somebody is interested.

> >> ---
> >>  mm/migrate.c | 4 +++-
> >>  1 file changed, 3 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/mm/migrate.c b/mm/migrate.c
> >> index 24baad2571e3..bb3a37245e13 100644
> >> --- a/mm/migrate.c
> >> +++ b/mm/migrate.c
> >> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> >>  		 * If the page is already on the target node (!err), store the
> >>  		 * node, otherwise, store the err.
> >>  		 */
> >> -		err = store_status(status, i, err ? : current_node, 1);
> >> +		err1 = store_status(status, i, err ? : current_node, 1);
> >> +		if (err1)
> >> +			err = err1;
> >>  		if (err)
> >>  			goto out_flush;
> >>  
> >> -- 
> >> 2.39.2

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-07 12:41     ` Alistair Popple
@ 2023-08-07 13:42       ` Ryan Roberts
  2023-08-09  4:21         ` Alistair Popple
  0 siblings, 1 reply; 16+ messages in thread
From: Ryan Roberts @ 2023-08-07 13:42 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Andrew Morton, linux-mm, mhocko, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301

On 07/08/2023 13:41, Alistair Popple wrote:
> 
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
>> On 07/08/2023 07:39, Alistair Popple wrote:
>>> The migration selftest was only checking the return code and not the
>>> status array for migration success/failure. Update the test to check
>>> both. This uncovered a bug in the return code handling of
>>> do_pages_move().
>>>
>>> Also disable NUMA balancing as that can lead to unexpected migration
>>> failures.
>>>
>>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>>> Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>
>>> Ryan, this will still cause the test to fail if a migration failed. I
>>> was unable to reproduce a migration failure for any cases on my system
>>> once I disabled NUMA balancing though so I'd be curious if you are
>>> still seeing failures with this patch applied. AFAIK there shouldn't
>>> be anything else that would be causing migration failure so would like
>>> to know what is causing failures. Thanks!
>>
>>
>> Hi Alistair,
>>
>> Afraid I'm still seeing unmigrated pages when running with these 2 patches:
>>
>>
>> #  RUN           migration.shared_anon ...
>> Didn't migrate 1 pages
>> # migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0)
>> # shared_anon: Test terminated by assertion
>> #          FAIL  migration.shared_anon
>> not ok 2 migration.shared_anon
>>
>>
>> I added some instrumentation; it usually fails on the second time
>> through the loop in migrate() but I've also seen it fail the first
>> time. Never seen it get though 2 iterations successfully though.
> 
> Interesting. I guess migration failure is always possible for various
> reasons so I will update the test to report the number of failed
> migrations rather than making it a test failure. 

I find it odd that migration always succeeds for the private_anon and
private_anon_thp cases, but fails for the fork case. I guess there is something
about the page being RO (for CoW) or having higher mapcount/refcount that causes
the migration to fail?

I just modified access_mem() to access the page _after_ the one that is being
migrated (the mmap is allocating 2M). In this case, shared_anon passes. So there
is something about access_mem() reading the page that stops its from being
migrated? - that would explain why we sometimes have a successful first
iteration in migrate() - we are racing the fork'ed child.

But its not clear why reading a page would cause migration to fail? If the
reader thread wins, then the HW should just allow the read without faulting into
the kernel - so no opportunity to change kernel state. If move_pages() wins,
then access_mem would take a fault and see the migration entry and (presumably)
wait for migration to complete?

> I was mostly just
> curious as to what would be causing the occasional failures for my own
> understanding, but the failures themselves are unimportant.
> 
>> I did also try just this patch without the error handling update in the kernel, but it still fails in the same way.
>>
>> I'm running on arm64 in case that wasn't clear. Let me know if there is anything I can do to help debug.
> 
> Thanks! Unless you're concerned about the failures I am happy to ignore
> them. Pages can fail to migrate for all sorts of reasons although I'm a
> little suprised anonymous migrations are failing so frequently for you.

Something about this doesn't smell right to me. Would be good to hear your
thoughts on above before we close this and move on...

> 
>> Thanks,
>> Ryan
>>
>>
>>>
>>>  tools/testing/selftests/mm/migration.c | 18 +++++++++++++++++-
>>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
>>> index 379581567f27..cf079af5799b 100644
>>> --- a/tools/testing/selftests/mm/migration.c
>>> +++ b/tools/testing/selftests/mm/migration.c
>>> @@ -51,6 +51,12 @@ FIXTURE_SETUP(migration)
>>>  	ASSERT_NE(self->threads, NULL);
>>>  	self->pids = malloc(self->nthreads * sizeof(*self->pids));
>>>  	ASSERT_NE(self->pids, NULL);
>>> +
>>> +	/*
>>> +	 * Disable NUMA balancing which can cause migration
>>> +	 * failures.
>>> +	 */
>>> +	numa_set_membind(numa_all_nodes_ptr);
>>>  };
>>>  
>>>  FIXTURE_TEARDOWN(migration)
>>> @@ -62,13 +68,14 @@ FIXTURE_TEARDOWN(migration)
>>>  int migrate(uint64_t *ptr, int n1, int n2)
>>>  {
>>>  	int ret, tmp;
>>> -	int status = 0;
>>>  	struct timespec ts1, ts2;
>>>  
>>>  	if (clock_gettime(CLOCK_MONOTONIC, &ts1))
>>>  		return -1;
>>>  
>>>  	while (1) {
>>> +		int status = NUMA_NUM_NODES + 1;
>>> +
>>>  		if (clock_gettime(CLOCK_MONOTONIC, &ts2))
>>>  			return -1;
>>>  
>>> @@ -85,6 +92,15 @@ int migrate(uint64_t *ptr, int n1, int n2)
>>>  			return -2;
>>>  		}
>>>  
>>> +		/*
>>> +		 * Note we should never see this because move_pages() should
>>> +		 * have indicated a page couldn't migrate above.
>>> +		 */
>>> +		if (status < 0) {
>>> +			printf("Page didn't migrate, error %d\n", status);
>>> +			return -2;
>>> +		}
>>> +
>>>  		tmp = n2;
>>>  		n2 = n1;
>>>  		n1 = tmp;
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm/migrate.c: Fix return code when migration fails
  2023-08-07 13:07     ` Michal Hocko
@ 2023-08-09  4:10       ` Alistair Popple
  2023-08-11  7:37         ` Huang, Ying
  0 siblings, 1 reply; 16+ messages in thread
From: Alistair Popple @ 2023-08-09  4:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301, ryan.roberts


Michal Hocko <mhocko@suse.com> writes:

> On Mon 07-08-23 22:31:52, Alistair Popple wrote:
>> 
>> Michal Hocko <mhocko@suse.com> writes:
>> 
>> > On Mon 07-08-23 16:39:44, Alistair Popple wrote:
>> >> When a page fails to migrate move_pages() returns the error code in a
>> >> per-page array of status values. The function call itself is also
>> >> supposed to return a summary error code indicating that a failure
>> >> occurred.
>> >> 
>> >> This doesn't always happen. Instead success can be returned even
>> >> though some pages failed to migrate. This is due to incorrectly
>> >> returning the error code from store_status() rather than the code from
>> >> add_page_for_migration. Fix this by only returning an error from
>> >> store_status() if the store actually failed.
>> >
>> > Error reporting by this syscall has been really far from
>> > straightforward. Please read through a49bd4d71637 and the section "On a
>> > side note". 
>> > Is there any specific reason you are trying to address this now or is
>> > this motivated by the code inspection?
>> 
>> Thanks Michal. There was no specific reason to address this now other
>> than I came across this behaviour when updating the migration selftest
>> to inspect the status array and thought it was odd. I was seeing pages
>> had failed to migrate according to the status argument even though
>> move_pages() had returned 0 (ie. success) rather than a number of
>> non-migrated pages.
>
> It is good to mention such a motivation in the changelog to make it
> clear. Also do we have a specific test case which trigger this case?

Not explicitly/reliably although I could write one.

>> If I'm interpreting the side note correctly the behaviour you were
>> concerned about was the opposite - returning a fail return code from
>> move_pages() but not indicating failure in the status array.
>> 
>> That said I'm happy to leave the behaviour as is, although in that case
>> an update to the man page is in order to clarify a return value of 0
>> from move_pages() doesn't actually mean all pages were successfully
>> migrated.
>
> While I would say that it is better to let old dogs sleep I do not mind
> changing the behavior and see whether anything breaks. I suspect nobody
> except for couple of test cases hardcoded to the original behavior will
> notice.
>
>> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>> >> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>
> The patch itself looks good. I am not sure the fixes tag is accurate.
> Has the reporting been correct before this change? I didn't have time to
> re-read the original code which was quite different.

I dug deeper into the history and the fixes tag is wrong. The behaviour
was actually introduced way back in commit e78bbfa82624 ("mm: stop
returning -ENOENT from sys_move_pages() if nothing got migrated"). As
you may guess from the title it was intentional, so suspect it is better
to update documentation.

> Anyway
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks for looking, but I will drop this and see if I can get the man
page updated.

> Anyway rewriting this function to clarify the error handling would be a
> nice exercise if somebody is interested.

Yeah, everytime I look at this function I want to do that but haven't
yet found the time.

>> >> ---
>> >>  mm/migrate.c | 4 +++-
>> >>  1 file changed, 3 insertions(+), 1 deletion(-)
>> >> 
>> >> diff --git a/mm/migrate.c b/mm/migrate.c
>> >> index 24baad2571e3..bb3a37245e13 100644
>> >> --- a/mm/migrate.c
>> >> +++ b/mm/migrate.c
>> >> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>> >>  		 * If the page is already on the target node (!err), store the
>> >>  		 * node, otherwise, store the err.
>> >>  		 */
>> >> -		err = store_status(status, i, err ? : current_node, 1);
>> >> +		err1 = store_status(status, i, err ? : current_node, 1);
>> >> +		if (err1)
>> >> +			err = err1;
>> >>  		if (err)
>> >>  			goto out_flush;
>> >>  
>> >> -- 
>> >> 2.39.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-07 13:42       ` Ryan Roberts
@ 2023-08-09  4:21         ` Alistair Popple
  2023-08-09  9:34           ` David Hildenbrand
  0 siblings, 1 reply; 16+ messages in thread
From: Alistair Popple @ 2023-08-09  4:21 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Andrew Morton, linux-mm, mhocko, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301


Ryan Roberts <ryan.roberts@arm.com> writes:

> On 07/08/2023 13:41, Alistair Popple wrote:
>> 
>> Ryan Roberts <ryan.roberts@arm.com> writes:
>> 
>>> On 07/08/2023 07:39, Alistair Popple wrote:
>>>> The migration selftest was only checking the return code and not the
>>>> status array for migration success/failure. Update the test to check
>>>> both. This uncovered a bug in the return code handling of
>>>> do_pages_move().
>>>>
>>>> Also disable NUMA balancing as that can lead to unexpected migration
>>>> failures.
>>>>
>>>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>>>> Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>
>>>> Ryan, this will still cause the test to fail if a migration failed. I
>>>> was unable to reproduce a migration failure for any cases on my system
>>>> once I disabled NUMA balancing though so I'd be curious if you are
>>>> still seeing failures with this patch applied. AFAIK there shouldn't
>>>> be anything else that would be causing migration failure so would like
>>>> to know what is causing failures. Thanks!
>>>
>>>
>>> Hi Alistair,
>>>
>>> Afraid I'm still seeing unmigrated pages when running with these 2 patches:
>>>
>>>
>>> #  RUN           migration.shared_anon ...
>>> Didn't migrate 1 pages
>>> # migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0)
>>> # shared_anon: Test terminated by assertion
>>> #          FAIL  migration.shared_anon
>>> not ok 2 migration.shared_anon
>>>
>>>
>>> I added some instrumentation; it usually fails on the second time
>>> through the loop in migrate() but I've also seen it fail the first
>>> time. Never seen it get though 2 iterations successfully though.
>> 
>> Interesting. I guess migration failure is always possible for various
>> reasons so I will update the test to report the number of failed
>> migrations rather than making it a test failure. 
>
> I find it odd that migration always succeeds for the private_anon and
> private_anon_thp cases, but fails for the fork case. I guess there is something
> about the page being RO (for CoW) or having higher mapcount/refcount that causes
> the migration to fail?

But the fork case uses a shared mapping, so there shouldn't be any
RO/CoW AFAIK. A higher refcount than expected would cause migration
failure, but there's nothing I can think of that would be causing that
now NUMA balancing is disabled in the test (that caused migration
failures for me in the private cases due to the concurrent migration
attempts).

> I just modified access_mem() to access the page _after_ the one that is being
> migrated (the mmap is allocating 2M). In this case, shared_anon passes. So there
> is something about access_mem() reading the page that stops its from being
> migrated? - that would explain why we sometimes have a successful first
> iteration in migrate() - we are racing the fork'ed child.
>
> But its not clear why reading a page would cause migration to fail? If the
> reader thread wins, then the HW should just allow the read without faulting into
> the kernel - so no opportunity to change kernel state. If move_pages() wins,
> then access_mem would take a fault and see the migration entry and (presumably)
> wait for migration to complete?

That matches my understanding also, hence why I was failing the test if
a migration didn't succeed because I don't understand why it would be
failing.

>> I was mostly just
>> curious as to what would be causing the occasional failures for my own
>> understanding, but the failures themselves are unimportant.
>> 
>>> I did also try just this patch without the error handling update in the kernel, but it still fails in the same way.
>>>
>>> I'm running on arm64 in case that wasn't clear. Let me know if there is anything I can do to help debug.
>> 
>> Thanks! Unless you're concerned about the failures I am happy to ignore
>> them. Pages can fail to migrate for all sorts of reasons although I'm a
>> little suprised anonymous migrations are failing so frequently for you.
>
> Something about this doesn't smell right to me. Would be good to hear your
> thoughts on above before we close this and move on...

I agree, something seems odd here so happy to try and close it. I
haven't been able to replicate any failures on my local fake NUMA x86_64
development machine though. Will see if I can replicate on an ARM64
platform. Chances are something is taking an extra reference on the
page, it's just a matter of figuring out what...

>> 
>>> Thanks,
>>> Ryan
>>>
>>>
>>>>
>>>>  tools/testing/selftests/mm/migration.c | 18 +++++++++++++++++-
>>>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c
>>>> index 379581567f27..cf079af5799b 100644
>>>> --- a/tools/testing/selftests/mm/migration.c
>>>> +++ b/tools/testing/selftests/mm/migration.c
>>>> @@ -51,6 +51,12 @@ FIXTURE_SETUP(migration)
>>>>  	ASSERT_NE(self->threads, NULL);
>>>>  	self->pids = malloc(self->nthreads * sizeof(*self->pids));
>>>>  	ASSERT_NE(self->pids, NULL);
>>>> +
>>>> +	/*
>>>> +	 * Disable NUMA balancing which can cause migration
>>>> +	 * failures.
>>>> +	 */
>>>> +	numa_set_membind(numa_all_nodes_ptr);
>>>>  };
>>>>  
>>>>  FIXTURE_TEARDOWN(migration)
>>>> @@ -62,13 +68,14 @@ FIXTURE_TEARDOWN(migration)
>>>>  int migrate(uint64_t *ptr, int n1, int n2)
>>>>  {
>>>>  	int ret, tmp;
>>>> -	int status = 0;
>>>>  	struct timespec ts1, ts2;
>>>>  
>>>>  	if (clock_gettime(CLOCK_MONOTONIC, &ts1))
>>>>  		return -1;
>>>>  
>>>>  	while (1) {
>>>> +		int status = NUMA_NUM_NODES + 1;
>>>> +
>>>>  		if (clock_gettime(CLOCK_MONOTONIC, &ts2))
>>>>  			return -1;
>>>>  
>>>> @@ -85,6 +92,15 @@ int migrate(uint64_t *ptr, int n1, int n2)
>>>>  			return -2;
>>>>  		}
>>>>  
>>>> +		/*
>>>> +		 * Note we should never see this because move_pages() should
>>>> +		 * have indicated a page couldn't migrate above.
>>>> +		 */
>>>> +		if (status < 0) {
>>>> +			printf("Page didn't migrate, error %d\n", status);
>>>> +			return -2;
>>>> +		}
>>>> +
>>>>  		tmp = n2;
>>>>  		n2 = n1;
>>>>  		n1 = tmp;
>> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-09  4:21         ` Alistair Popple
@ 2023-08-09  9:34           ` David Hildenbrand
  2023-08-09 13:39             ` Ryan Roberts
  0 siblings, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2023-08-09  9:34 UTC (permalink / raw)
  To: Alistair Popple, Ryan Roberts
  Cc: Andrew Morton, linux-mm, mhocko, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301

On 09.08.23 06:21, Alistair Popple wrote:
> 
> Ryan Roberts <ryan.roberts@arm.com> writes:
> 
>> On 07/08/2023 13:41, Alistair Popple wrote:
>>>
>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>
>>>> On 07/08/2023 07:39, Alistair Popple wrote:
>>>>> The migration selftest was only checking the return code and not the
>>>>> status array for migration success/failure. Update the test to check
>>>>> both. This uncovered a bug in the return code handling of
>>>>> do_pages_move().
>>>>>
>>>>> Also disable NUMA balancing as that can lead to unexpected migration
>>>>> failures.
>>>>>
>>>>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>>>>> Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>> ---
>>>>>
>>>>> Ryan, this will still cause the test to fail if a migration failed. I
>>>>> was unable to reproduce a migration failure for any cases on my system
>>>>> once I disabled NUMA balancing though so I'd be curious if you are
>>>>> still seeing failures with this patch applied. AFAIK there shouldn't
>>>>> be anything else that would be causing migration failure so would like
>>>>> to know what is causing failures. Thanks!
>>>>
>>>>
>>>> Hi Alistair,
>>>>
>>>> Afraid I'm still seeing unmigrated pages when running with these 2 patches:
>>>>
>>>>
>>>> #  RUN           migration.shared_anon ...
>>>> Didn't migrate 1 pages
>>>> # migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2) (-2) == 0 (0)
>>>> # shared_anon: Test terminated by assertion
>>>> #          FAIL  migration.shared_anon
>>>> not ok 2 migration.shared_anon
>>>>
>>>>
>>>> I added some instrumentation; it usually fails on the second time
>>>> through the loop in migrate() but I've also seen it fail the first
>>>> time. Never seen it get though 2 iterations successfully though.
>>>
>>> Interesting. I guess migration failure is always possible for various
>>> reasons so I will update the test to report the number of failed
>>> migrations rather than making it a test failure.
>>
>> I find it odd that migration always succeeds for the private_anon and
>> private_anon_thp cases, but fails for the fork case. I guess there is something
>> about the page being RO (for CoW) or having higher mapcount/refcount that causes
>> the migration to fail?
> 
> But the fork case uses a shared mapping, so there shouldn't be any
> RO/CoW AFAIK. A higher refcount than expected would cause migration
> failure, but there's nothing I can think of that would be causing that
> now NUMA balancing is disabled in the test (that caused migration
> failures for me in the private cases due to the concurrent migration
> attempts).

Yes, we do have MAP_SHARED | MAP_ANONYMOUS, which is just shmem with 
extra-steps.

In add_page_for_migration(), we do have

	if (page_mapcount(page) > 1 && !migrate_all)

but the test seems to always specify MPOL_MF_MOVE_ALL when calling 
move_pages(), so it should be ignored.

It would be helpful to dump the page when migration fails in that case.


Note that in add_page_for_migration(), we perform a follow_page() that 
used to accidentally honor NUMA hinting faults. Fixed by

commit 3c471b04db7604974e3bea45e2d9c7c27a905536
Author: David Hildenbrand <david@redhat.com>
Date:   Thu Aug 3 16:32:02 2023 +0200

     mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT

in mm-unstable.


Maybe that fix does no longer require you to disable NUMA hinting for 
this test case. Maybe you're lucky and it resolves the  shared_anon case 
as well, but I somewhat doubt it :)

It doesn't really explain why the shared case would fail, though...

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-07  6:39 ` [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status Alistair Popple
  2023-08-07  9:15   ` Ryan Roberts
@ 2023-08-09  9:35   ` David Hildenbrand
  2023-08-09 10:46     ` Alistair Popple
  1 sibling, 1 reply; 16+ messages in thread
From: David Hildenbrand @ 2023-08-09  9:35 UTC (permalink / raw)
  To: Alistair Popple, Andrew Morton
  Cc: linux-mm, mhocko, jhubbard, ying.huang, osalvador, baolin.wang,
	ziy, shy828301, ryan.roberts

On 07.08.23 08:39, Alistair Popple wrote:
> The migration selftest was only checking the return code and not the
> status array for migration success/failure. Update the test to check
> both. This uncovered a bug in the return code handling of
> do_pages_move().
> 
> Also disable NUMA balancing as that can lead to unexpected migration
> failures.

As raised in a sub-thread, is that still required with the mm-unstable 
patch:

commit 3c471b04db7604974e3bea45e2d9c7c27a905536
Author: David Hildenbrand <david@redhat.com>
Date:   Thu Aug 3 16:32:02 2023 +0200

     mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-09  9:35   ` David Hildenbrand
@ 2023-08-09 10:46     ` Alistair Popple
  0 siblings, 0 replies; 16+ messages in thread
From: Alistair Popple @ 2023-08-09 10:46 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Andrew Morton, linux-mm, mhocko, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301, ryan.roberts


David Hildenbrand <david@redhat.com> writes:

> On 07.08.23 08:39, Alistair Popple wrote:
>> The migration selftest was only checking the return code and not the
>> status array for migration success/failure. Update the test to check
>> both. This uncovered a bug in the return code handling of
>> do_pages_move().
>> Also disable NUMA balancing as that can lead to unexpected migration
>> failures.
>
> As raised in a sub-thread, is that still required with the mm-unstable
> patch:

Oh I must have missed that sub-thread. I will give it a try. Thanks!

> commit 3c471b04db7604974e3bea45e2d9c7c27a905536
> Author: David Hildenbrand <david@redhat.com>
> Date:   Thu Aug 3 16:32:02 2023 +0200
>
>     mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-09  9:34           ` David Hildenbrand
@ 2023-08-09 13:39             ` Ryan Roberts
  2023-08-10  8:23               ` Alistair Popple
  0 siblings, 1 reply; 16+ messages in thread
From: Ryan Roberts @ 2023-08-09 13:39 UTC (permalink / raw)
  To: David Hildenbrand, Alistair Popple
  Cc: Andrew Morton, linux-mm, mhocko, jhubbard, ying.huang, osalvador,
	baolin.wang, ziy, shy828301

On 09/08/2023 10:34, David Hildenbrand wrote:
> On 09.08.23 06:21, Alistair Popple wrote:
>>
>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>
>>> On 07/08/2023 13:41, Alistair Popple wrote:
>>>>
>>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>>
>>>>> On 07/08/2023 07:39, Alistair Popple wrote:
>>>>>> The migration selftest was only checking the return code and not the
>>>>>> status array for migration success/failure. Update the test to check
>>>>>> both. This uncovered a bug in the return code handling of
>>>>>> do_pages_move().
>>>>>>
>>>>>> Also disable NUMA balancing as that can lead to unexpected migration
>>>>>> failures.
>>>>>>
>>>>>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>>>>>> Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>> ---
>>>>>>
>>>>>> Ryan, this will still cause the test to fail if a migration failed. I
>>>>>> was unable to reproduce a migration failure for any cases on my system
>>>>>> once I disabled NUMA balancing though so I'd be curious if you are
>>>>>> still seeing failures with this patch applied. AFAIK there shouldn't
>>>>>> be anything else that would be causing migration failure so would like
>>>>>> to know what is causing failures. Thanks!
>>>>>
>>>>>
>>>>> Hi Alistair,
>>>>>
>>>>> Afraid I'm still seeing unmigrated pages when running with these 2 patches:
>>>>>
>>>>>
>>>>> #  RUN           migration.shared_anon ...
>>>>> Didn't migrate 1 pages
>>>>> # migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2)
>>>>> (-2) == 0 (0)
>>>>> # shared_anon: Test terminated by assertion
>>>>> #          FAIL  migration.shared_anon
>>>>> not ok 2 migration.shared_anon
>>>>>
>>>>>
>>>>> I added some instrumentation; it usually fails on the second time
>>>>> through the loop in migrate() but I've also seen it fail the first
>>>>> time. Never seen it get though 2 iterations successfully though.
>>>>
>>>> Interesting. I guess migration failure is always possible for various
>>>> reasons so I will update the test to report the number of failed
>>>> migrations rather than making it a test failure.
>>>
>>> I find it odd that migration always succeeds for the private_anon and
>>> private_anon_thp cases, but fails for the fork case. I guess there is something
>>> about the page being RO (for CoW) or having higher mapcount/refcount that causes
>>> the migration to fail?
>>
>> But the fork case uses a shared mapping, so there shouldn't be any
>> RO/CoW AFAIK. A higher refcount than expected would cause migration
>> failure, but there's nothing I can think of that would be causing that
>> now NUMA balancing is disabled in the test (that caused migration
>> failures for me in the private cases due to the concurrent migration
>> attempts).
> 
> Yes, we do have MAP_SHARED | MAP_ANONYMOUS, which is just shmem with extra-steps.

Yep my bad - not reading the code properly. I assumed it was private because the
other one is.

> 
> In add_page_for_migration(), we do have
> 
>     if (page_mapcount(page) > 1 && !migrate_all)
> 
> but the test seems to always specify MPOL_MF_MOVE_ALL when calling move_pages(),
> so it should be ignored.
> 
> It would be helpful to dump the page when migration fails in that case.
> 
> 
> Note that in add_page_for_migration(), we perform a follow_page() that used to
> accidentally honor NUMA hinting faults. Fixed by
> 
> commit 3c471b04db7604974e3bea45e2d9c7c27a905536
> Author: David Hildenbrand <david@redhat.com>
> Date:   Thu Aug 3 16:32:02 2023 +0200
> 
>     mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
> 
> in mm-unstable.

This fix does not change the behaviour of the test; it still fails.

HOWEVER... It turns out the test has started passing in mm-unstable due to a
commit after this one. See bisect:



Note terms are inverted:
good => test *fails*
bad => test *passes*

Initial bounds:
good: 3f943756e8b3 => commit suggested by DavidH (test still failing there)
bad: 84d9f657acae  => mm-unstable head as of a few days ago

All of these commits are from mm-unstable; none are in v6.5-rc3, which is where
I originally reported the issue.

git bisect start
# bad: [84d9f657acaecc43dc52f25d52230db85fd5ffdd] mm: move vma locking out of
vma_prepare and dup_anon_vma
git bisect bad 84d9f657acaecc43dc52f25d52230db85fd5ffdd
# good: [3f943756e8b3e0b5d0ea1f3087658eb559b0c7b0] mm/gup: reintroduce FOLL_NUMA
as FOLL_HONOR_NUMA_FAULT
git bisect good 3f943756e8b3e0b5d0ea1f3087658eb559b0c7b0
# good: [7dfbad370c66f35789d193a758575d31aabd25a4] selftests/mm: optionally pass
duration to transhuge-stress
git bisect good 7dfbad370c66f35789d193a758575d31aabd25a4
# bad: [fc99f767390266b5436575c00445d4445f6c9be6] mips: convert various
functions to use ptdescs
git bisect bad fc99f767390266b5436575c00445d4445f6c9be6
# bad: [af89742c0bf319979e00cfb066ead6b510f3e296] powerpc/book3s64/vmemmap:
switch radix to use a different vmemmap handling function
git bisect bad af89742c0bf319979e00cfb066ead6b510f3e296
# good: [dfebce290a7b44985eda4ddd76378cdc82e3541c] maple_tree: adjust node
allocation on mas_rebalance()
git bisect good dfebce290a7b44985eda4ddd76378cdc82e3541c
# good: [9517da22a74a49102bcd6c8556eeceaca965b0a6] mm: move FAULT_FLAG_VMA_LOCK
check down from do_fault()
git bisect good 9517da22a74a49102bcd6c8556eeceaca965b0a6
# bad: [5ec6b3644e50d859ebf4cba5cc29cfbda0e6d109] mm: change
pudp_huge_get_and_clear_full take vm_area_struct as arg
git bisect bad 5ec6b3644e50d859ebf4cba5cc29cfbda0e6d109
# bad: [bbecc62bae72ec22e4276464a5ef359511923954] mm: handle faults that merely
update the accessed bit under the VMA lock
git bisect bad bbecc62bae72ec22e4276464a5ef359511923954
# bad: [2132f14c5bc1b10ea964ab89bd6291fc05eaccaa] mm: handle swap and NUMA PTE
faults under the VMA lock
git bisect bad 2132f14c5bc1b10ea964ab89bd6291fc05eaccaa
# bad: [8890e186b3470fc690d3022656e98c0c41e27eca] mm: run the fault-around code
under the VMA lock
git bisect bad 8890e186b3470fc690d3022656e98c0c41e27eca
# first bad commit: [8890e186b3470fc690d3022656e98c0c41e27eca] mm: run the
fault-around code under the VMA lock

First commit where test passes:

commit 8890e186b3470fc690d3022656e98c0c41e27eca
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Jul 24 19:54:08 2023 +0100

    mm: run the fault-around code under the VMA lock

    The map_pages fs method should be safe to run under the VMA lock instead
    of the mmap lock.  This should have a measurable reduction in contention
    on the mmap lock.

    Link: https://lkml.kernel.org/r/20230724185410.1124082-9-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Suren Baghdasaryan <surenb@google.com>
    Cc: Arjun Roy <arjunroy@google.com>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Punit Agrawal <punit.agrawal@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

 mm/memory.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)



It's not really clear to me why this change should turn the fail into a pass
though... Perhaps migration tries to get the mmap_lock and if it fails, then it
skips migration?

> 
> 
> Maybe that fix does no longer require you to disable NUMA hinting for this test
> case. Maybe you're lucky and it resolves the  shared_anon case as well, but I
> somewhat doubt it :)
> 
> It doesn't really explain why the shared case would fail, though...
> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status
  2023-08-09 13:39             ` Ryan Roberts
@ 2023-08-10  8:23               ` Alistair Popple
  0 siblings, 0 replies; 16+ messages in thread
From: Alistair Popple @ 2023-08-10  8:23 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: David Hildenbrand, Andrew Morton, linux-mm, mhocko, jhubbard,
	ying.huang, osalvador, baolin.wang, ziy, shy828301


Ryan Roberts <ryan.roberts@arm.com> writes:

> On 09/08/2023 10:34, David Hildenbrand wrote:
>> On 09.08.23 06:21, Alistair Popple wrote:
>>>
>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>
>>>> On 07/08/2023 13:41, Alistair Popple wrote:
>>>>>
>>>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>>>
>>>>>> On 07/08/2023 07:39, Alistair Popple wrote:
>>>>>>> The migration selftest was only checking the return code and not the
>>>>>>> status array for migration success/failure. Update the test to check
>>>>>>> both. This uncovered a bug in the return code handling of
>>>>>>> do_pages_move().
>>>>>>>
>>>>>>> Also disable NUMA balancing as that can lead to unexpected migration
>>>>>>> failures.
>>>>>>>
>>>>>>> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>>>>>>> Suggested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>>> ---
>>>>>>>
>>>>>>> Ryan, this will still cause the test to fail if a migration failed. I
>>>>>>> was unable to reproduce a migration failure for any cases on my system
>>>>>>> once I disabled NUMA balancing though so I'd be curious if you are
>>>>>>> still seeing failures with this patch applied. AFAIK there shouldn't
>>>>>>> be anything else that would be causing migration failure so would like
>>>>>>> to know what is causing failures. Thanks!
>>>>>>
>>>>>>
>>>>>> Hi Alistair,
>>>>>>
>>>>>> Afraid I'm still seeing unmigrated pages when running with these 2 patches:
>>>>>>
>>>>>>
>>>>>> #  RUN           migration.shared_anon ...
>>>>>> Didn't migrate 1 pages
>>>>>> # migration.c:183:shared_anon:Expected migrate(ptr, self->n1, self->n2)
>>>>>> (-2) == 0 (0)
>>>>>> # shared_anon: Test terminated by assertion
>>>>>> #          FAIL  migration.shared_anon
>>>>>> not ok 2 migration.shared_anon
>>>>>>
>>>>>>
>>>>>> I added some instrumentation; it usually fails on the second time
>>>>>> through the loop in migrate() but I've also seen it fail the first
>>>>>> time. Never seen it get though 2 iterations successfully though.
>>>>>
>>>>> Interesting. I guess migration failure is always possible for various
>>>>> reasons so I will update the test to report the number of failed
>>>>> migrations rather than making it a test failure.
>>>>
>>>> I find it odd that migration always succeeds for the private_anon and
>>>> private_anon_thp cases, but fails for the fork case. I guess there is something
>>>> about the page being RO (for CoW) or having higher mapcount/refcount that causes
>>>> the migration to fail?
>>>
>>> But the fork case uses a shared mapping, so there shouldn't be any
>>> RO/CoW AFAIK. A higher refcount than expected would cause migration
>>> failure, but there's nothing I can think of that would be causing that
>>> now NUMA balancing is disabled in the test (that caused migration
>>> failures for me in the private cases due to the concurrent migration
>>> attempts).
>> 
>> Yes, we do have MAP_SHARED | MAP_ANONYMOUS, which is just shmem with extra-steps.
>
> Yep my bad - not reading the code properly. I assumed it was private because the
> other one is.

Not a problem!

>> 
>> In add_page_for_migration(), we do have
>> 
>>     if (page_mapcount(page) > 1 && !migrate_all)
>> 
>> but the test seems to always specify MPOL_MF_MOVE_ALL when calling move_pages(),
>> so it should be ignored.
>> 
>> It would be helpful to dump the page when migration fails in that case.
>> 
>> 
>> Note that in add_page_for_migration(), we perform a follow_page() that used to
>> accidentally honor NUMA hinting faults. Fixed by
>> 
>> commit 3c471b04db7604974e3bea45e2d9c7c27a905536
>> Author: David Hildenbrand <david@redhat.com>
>> Date:   Thu Aug 3 16:32:02 2023 +0200
>> 
>>     mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
>> 
>> in mm-unstable.
>
> This fix does not change the behaviour of the test; it still fails.

It fixes the failures I was seeing for private_anon with NUMA balancing
enabled[1]. Clearly there is something different between Ryan's setup
and mine because he doesn't see failures for the private cases but I do.

However I am still seeing failures in the private_anon_thp case with
NUMA balancing enabled even with the patch. I haven't had time to dig
deeper - perhaps it is expected. From memory concurrent migrations can
take extra page references which could be upsetting things which is
initially why I figured it needed disabling.

> HOWEVER... It turns out the test has started passing in mm-unstable due to a
> commit after this one. See bisect:

As I said, I've never seen a failure for shared_anon on my setup so
haven't replicated this. Hopefully I will get some time to look more
closely soon.

[1] - Note that requires changing the test in this patch to still pass
'status' argument to move_pages() but remove the mbind() call that
disables NUMA balancing. Without the status argument the test passes
with NUMA balancing enabled anyway with or without the patch from David.

> Note terms are inverted:
> good => test *fails*
> bad => test *passes*
>
> Initial bounds:
> good: 3f943756e8b3 => commit suggested by DavidH (test still failing there)
> bad: 84d9f657acae  => mm-unstable head as of a few days ago
>
> All of these commits are from mm-unstable; none are in v6.5-rc3, which is where
> I originally reported the issue.
>
> git bisect start
> # bad: [84d9f657acaecc43dc52f25d52230db85fd5ffdd] mm: move vma locking out of
> vma_prepare and dup_anon_vma
> git bisect bad 84d9f657acaecc43dc52f25d52230db85fd5ffdd
> # good: [3f943756e8b3e0b5d0ea1f3087658eb559b0c7b0] mm/gup: reintroduce FOLL_NUMA
> as FOLL_HONOR_NUMA_FAULT
> git bisect good 3f943756e8b3e0b5d0ea1f3087658eb559b0c7b0
> # good: [7dfbad370c66f35789d193a758575d31aabd25a4] selftests/mm: optionally pass
> duration to transhuge-stress
> git bisect good 7dfbad370c66f35789d193a758575d31aabd25a4
> # bad: [fc99f767390266b5436575c00445d4445f6c9be6] mips: convert various
> functions to use ptdescs
> git bisect bad fc99f767390266b5436575c00445d4445f6c9be6
> # bad: [af89742c0bf319979e00cfb066ead6b510f3e296] powerpc/book3s64/vmemmap:
> switch radix to use a different vmemmap handling function
> git bisect bad af89742c0bf319979e00cfb066ead6b510f3e296
> # good: [dfebce290a7b44985eda4ddd76378cdc82e3541c] maple_tree: adjust node
> allocation on mas_rebalance()
> git bisect good dfebce290a7b44985eda4ddd76378cdc82e3541c
> # good: [9517da22a74a49102bcd6c8556eeceaca965b0a6] mm: move FAULT_FLAG_VMA_LOCK
> check down from do_fault()
> git bisect good 9517da22a74a49102bcd6c8556eeceaca965b0a6
> # bad: [5ec6b3644e50d859ebf4cba5cc29cfbda0e6d109] mm: change
> pudp_huge_get_and_clear_full take vm_area_struct as arg
> git bisect bad 5ec6b3644e50d859ebf4cba5cc29cfbda0e6d109
> # bad: [bbecc62bae72ec22e4276464a5ef359511923954] mm: handle faults that merely
> update the accessed bit under the VMA lock
> git bisect bad bbecc62bae72ec22e4276464a5ef359511923954
> # bad: [2132f14c5bc1b10ea964ab89bd6291fc05eaccaa] mm: handle swap and NUMA PTE
> faults under the VMA lock
> git bisect bad 2132f14c5bc1b10ea964ab89bd6291fc05eaccaa
> # bad: [8890e186b3470fc690d3022656e98c0c41e27eca] mm: run the fault-around code
> under the VMA lock
> git bisect bad 8890e186b3470fc690d3022656e98c0c41e27eca
> # first bad commit: [8890e186b3470fc690d3022656e98c0c41e27eca] mm: run the
> fault-around code under the VMA lock
>
> First commit where test passes:
>
> commit 8890e186b3470fc690d3022656e98c0c41e27eca
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date:   Mon Jul 24 19:54:08 2023 +0100
>
>     mm: run the fault-around code under the VMA lock
>
>     The map_pages fs method should be safe to run under the VMA lock instead
>     of the mmap lock.  This should have a measurable reduction in contention
>     on the mmap lock.
>
>     Link: https://lkml.kernel.org/r/20230724185410.1124082-9-willy@infradead.org
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>     Reviewed-by: Suren Baghdasaryan <surenb@google.com>
>     Cc: Arjun Roy <arjunroy@google.com>
>     Cc: Eric Dumazet <edumazet@google.com>
>     Cc: Punit Agrawal <punit.agrawal@bytedance.com>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
>  mm/memory.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
>
>
> It's not really clear to me why this change should turn the fail into a pass
> though... Perhaps migration tries to get the mmap_lock and if it fails, then it
> skips migration?
>
>> 
>> 
>> Maybe that fix does no longer require you to disable NUMA hinting for this test
>> case. Maybe you're lucky and it resolves the  shared_anon case as well, but I
>> somewhat doubt it :)
>> 
>> It doesn't really explain why the shared case would fail, though...
>> 



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/2] mm/migrate.c: Fix return code when migration fails
  2023-08-09  4:10       ` Alistair Popple
@ 2023-08-11  7:37         ` Huang, Ying
  0 siblings, 0 replies; 16+ messages in thread
From: Huang, Ying @ 2023-08-11  7:37 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Michal Hocko, Andrew Morton, linux-mm, jhubbard, osalvador,
	baolin.wang, ziy, shy828301, ryan.roberts

Alistair Popple <apopple@nvidia.com> writes:

> Michal Hocko <mhocko@suse.com> writes:
>
>> On Mon 07-08-23 22:31:52, Alistair Popple wrote:
>>> 
>>> Michal Hocko <mhocko@suse.com> writes:
>>> 
>>> > On Mon 07-08-23 16:39:44, Alistair Popple wrote:
>>> >> When a page fails to migrate move_pages() returns the error code in a
>>> >> per-page array of status values. The function call itself is also
>>> >> supposed to return a summary error code indicating that a failure
>>> >> occurred.
>>> >> 
>>> >> This doesn't always happen. Instead success can be returned even
>>> >> though some pages failed to migrate. This is due to incorrectly
>>> >> returning the error code from store_status() rather than the code from
>>> >> add_page_for_migration. Fix this by only returning an error from
>>> >> store_status() if the store actually failed.
>>> >
>>> > Error reporting by this syscall has been really far from
>>> > straightforward. Please read through a49bd4d71637 and the section "On a
>>> > side note". 
>>> > Is there any specific reason you are trying to address this now or is
>>> > this motivated by the code inspection?
>>> 
>>> Thanks Michal. There was no specific reason to address this now other
>>> than I came across this behaviour when updating the migration selftest
>>> to inspect the status array and thought it was odd. I was seeing pages
>>> had failed to migrate according to the status argument even though
>>> move_pages() had returned 0 (ie. success) rather than a number of
>>> non-migrated pages.
>>
>> It is good to mention such a motivation in the changelog to make it
>> clear. Also do we have a specific test case which trigger this case?
>
> Not explicitly/reliably although I could write one.
>
>>> If I'm interpreting the side note correctly the behaviour you were
>>> concerned about was the opposite - returning a fail return code from
>>> move_pages() but not indicating failure in the status array.

IIUC, we cannot avoid this even if leaving code untouched.  In
move_pages_and_store_status(), if some pages fails to be migrated, we
will not store status.  In fact, we don't know which pages failed to be
migrated in kernel too.

>>> That said I'm happy to leave the behaviour as is, although in that case
>>> an update to the man page is in order to clarify a return value of 0
>>> from move_pages() doesn't actually mean all pages were successfully
>>> migrated.

IMHO, return value is more important than "status" as the reason I
mentioned above.  At least, we can make one thing correct.

>> While I would say that it is better to let old dogs sleep I do not mind
>> changing the behavior and see whether anything breaks. I suspect nobody
>> except for couple of test cases hardcoded to the original behavior will
>> notice.
>>
>>> >> Signed-off-by: Alistair Popple <apopple@nvidia.com>
>>> >> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
>>
>> The patch itself looks good. I am not sure the fixes tag is accurate.
>> Has the reporting been correct before this change? I didn't have time to
>> re-read the original code which was quite different.
>
> I dug deeper into the history and the fixes tag is wrong. The behaviour
> was actually introduced way back in commit e78bbfa82624 ("mm: stop
> returning -ENOENT from sys_move_pages() if nothing got migrated"). As
> you may guess from the title it was intentional, so suspect it is better
> to update documentation.

Can we change the code but keep the -ENOENT behavior with some special
case?

--
Best Regards,
Huang, Ying

>> Anyway
>> Acked-by: Michal Hocko <mhocko@suse.com>
>
> Thanks for looking, but I will drop this and see if I can get the man
> page updated.
>
>> Anyway rewriting this function to clarify the error handling would be a
>> nice exercise if somebody is interested.
>
> Yeah, everytime I look at this function I want to do that but haven't
> yet found the time.
>
>>> >> ---
>>> >>  mm/migrate.c | 4 +++-
>>> >>  1 file changed, 3 insertions(+), 1 deletion(-)
>>> >> 
>>> >> diff --git a/mm/migrate.c b/mm/migrate.c
>>> >> index 24baad2571e3..bb3a37245e13 100644
>>> >> --- a/mm/migrate.c
>>> >> +++ b/mm/migrate.c
>>> >> @@ -2222,7 +2222,9 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
>>> >>  		 * If the page is already on the target node (!err), store the
>>> >>  		 * node, otherwise, store the err.
>>> >>  		 */
>>> >> -		err = store_status(status, i, err ? : current_node, 1);
>>> >> +		err1 = store_status(status, i, err ? : current_node, 1);
>>> >> +		if (err1)
>>> >> +			err = err1;
>>> >>  		if (err)
>>> >>  			goto out_flush;
>>> >>  
>>> >> -- 
>>> >> 2.39.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-08-11  7:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-07  6:39 [PATCH 1/2] mm/migrate.c: Fix return code when migration fails Alistair Popple
2023-08-07  6:39 ` [PATCH 2/2] selftests/migration: Disable NUMA balancing and check migration status Alistair Popple
2023-08-07  9:15   ` Ryan Roberts
2023-08-07 12:41     ` Alistair Popple
2023-08-07 13:42       ` Ryan Roberts
2023-08-09  4:21         ` Alistair Popple
2023-08-09  9:34           ` David Hildenbrand
2023-08-09 13:39             ` Ryan Roberts
2023-08-10  8:23               ` Alistair Popple
2023-08-09  9:35   ` David Hildenbrand
2023-08-09 10:46     ` Alistair Popple
2023-08-07  8:39 ` [PATCH 1/2] mm/migrate.c: Fix return code when migration fails Michal Hocko
2023-08-07 12:31   ` Alistair Popple
2023-08-07 13:07     ` Michal Hocko
2023-08-09  4:10       ` Alistair Popple
2023-08-11  7:37         ` Huang, Ying

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.