kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/1] selftests: KVM: set affinity of VM to right CPUs
@ 2021-09-24 23:30 Dongli Zhang
  2021-09-27 19:22 ` Sean Christopherson
  0 siblings, 1 reply; 5+ messages in thread
From: Dongli Zhang @ 2021-09-24 23:30 UTC (permalink / raw)
  To: kvm, linux-kselftest; +Cc: pbonzini, shuah, seanjc, linux-kernel

The nr_cpus = CPU_COUNT(&possible_mask) is the number of available CPUs in
possible_mask. As a result, the "cpu = i % nr_cpus" may always return CPU
that is not available in possible_mask.

Suppose the server has 8 CPUs. The below Failure is encountered immediately
if the task is bound to CPU 5 and 6.

==== Test Assertion Failure ====
  rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
  pid=10127 tid=10127 errno=4 - Interrupted system call
     1	0x00000000004018e5: main at rseq_test.c:227
     2	0x00007fcc8fc66bf6: ?? ??:0
     3	0x0000000000401959: _start at ??:?
  Only performed 4 KVM_RUNs, task stalled too much?

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
---
 tools/testing/selftests/kvm/rseq_test.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index c5e0dd664a7b..41df5173970c 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -10,6 +10,7 @@
 #include <signal.h>
 #include <syscall.h>
 #include <sys/ioctl.h>
+#include <sys/sysinfo.h>
 #include <asm/barrier.h>
 #include <linux/atomic.h>
 #include <linux/rseq.h>
@@ -43,6 +44,18 @@ static bool done;
 
 static atomic_t seq_cnt;
 
+static int get_max_cpu_idx(void)
+{
+	int nproc = get_nprocs_conf();
+	int i, max = -ENOENT;
+
+	for (i = 0; i < nproc; i++)
+		if (CPU_ISSET(i, &possible_mask))
+			max = i;
+
+	return max;
+}
+
 static void guest_code(void)
 {
 	for (;;)
@@ -61,10 +74,13 @@ static void *migration_worker(void *ign)
 {
 	cpu_set_t allowed_mask;
 	int r, i, nr_cpus, cpu;
+	int max_cpu_idx;
 
 	CPU_ZERO(&allowed_mask);
 
-	nr_cpus = CPU_COUNT(&possible_mask);
+	max_cpu_idx = get_max_cpu_idx();
+	TEST_ASSERT(max_cpu_idx >= 0, "Invalid possible_mask");
+	nr_cpus = max_cpu_idx + 1;
 
 	for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
 		cpu = i % nr_cpus;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] selftests: KVM: set affinity of VM to right CPUs
  2021-09-24 23:30 [PATCH 1/1] selftests: KVM: set affinity of VM to right CPUs Dongli Zhang
@ 2021-09-27 19:22 ` Sean Christopherson
  2021-09-28 17:37   ` Dongli Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2021-09-27 19:22 UTC (permalink / raw)
  To: Dongli Zhang; +Cc: kvm, linux-kselftest, pbonzini, shuah, linux-kernel

On Fri, Sep 24, 2021, Dongli Zhang wrote:
> The nr_cpus = CPU_COUNT(&possible_mask) is the number of available CPUs in
> possible_mask. As a result, the "cpu = i % nr_cpus" may always return CPU
> that is not available in possible_mask.
> 
> Suppose the server has 8 CPUs. The below Failure is encountered immediately
> if the task is bound to CPU 5 and 6.

/facepalm

> ==== Test Assertion Failure ====
>   rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
>   pid=10127 tid=10127 errno=4 - Interrupted system call
>      1	0x00000000004018e5: main at rseq_test.c:227
>      2	0x00007fcc8fc66bf6: ?? ??:0
>      3	0x0000000000401959: _start at ??:?
>   Only performed 4 KVM_RUNs, task stalled too much?
> 
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>  tools/testing/selftests/kvm/rseq_test.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
> index c5e0dd664a7b..41df5173970c 100644
> --- a/tools/testing/selftests/kvm/rseq_test.c
> +++ b/tools/testing/selftests/kvm/rseq_test.c
> @@ -10,6 +10,7 @@
>  #include <signal.h>
>  #include <syscall.h>
>  #include <sys/ioctl.h>
> +#include <sys/sysinfo.h>
>  #include <asm/barrier.h>
>  #include <linux/atomic.h>
>  #include <linux/rseq.h>
> @@ -43,6 +44,18 @@ static bool done;
>  
>  static atomic_t seq_cnt;
>  
> +static int get_max_cpu_idx(void)
> +{
> +	int nproc = get_nprocs_conf();
> +	int i, max = -ENOENT;
> +
> +	for (i = 0; i < nproc; i++)
> +		if (CPU_ISSET(i, &possible_mask))
> +			max = i;
> +
> +	return max;
> +}
> +
>  static void guest_code(void)
>  {
>  	for (;;)
> @@ -61,10 +74,13 @@ static void *migration_worker(void *ign)
>  {
>  	cpu_set_t allowed_mask;
>  	int r, i, nr_cpus, cpu;
> +	int max_cpu_idx;
>  
>  	CPU_ZERO(&allowed_mask);
>  
> -	nr_cpus = CPU_COUNT(&possible_mask);
> +	max_cpu_idx = get_max_cpu_idx();
> +	TEST_ASSERT(max_cpu_idx >= 0, "Invalid possible_mask");

I feel like this should be a KSFT_SKIP condition, not an assert.

> +	nr_cpus = max_cpu_idx + 1;
>  
>  	for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
>  		cpu = i % nr_cpus;

This is still flawed, e.g. if the max CPU is 1023, but the task is pinned to _just_
CPU 1023, then the assert at the end will likely still fail because the migration
helper is effectively only running 1/1024 loops.

It probably also makes sense to grab the min CPU to reduce the pain if the task
is affined to a small subset.

As an aside, _which_ CPUs the task is affined to seems to matter, e.g. some
combinations of CPUs on my system don't fail, even with 100x iterations.  Don't
think there's anything the test can do about that, just an interesting data point
that suggests pinning while running tests may be a bad idea.

Anyways, something like this?

---
 tools/testing/selftests/kvm/rseq_test.c | 44 ++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 060538bd405a..befd64c27152 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -10,6 +10,7 @@
 #include <signal.h>
 #include <syscall.h>
 #include <sys/ioctl.h>
+#include <sys/sysinfo.h>
 #include <asm/barrier.h>
 #include <linux/atomic.h>
 #include <linux/rseq.h>
@@ -39,6 +40,7 @@ static __thread volatile struct rseq __rseq = {

 static pthread_t migration_thread;
 static cpu_set_t possible_mask;
+static int min_cpu, max_cpu;
 static bool done;

 static atomic_t seq_cnt;
@@ -60,16 +62,17 @@ static void sys_rseq(int flags)
 static void *migration_worker(void *ign)
 {
 	cpu_set_t allowed_mask;
-	int r, i, nr_cpus, cpu;
+	int r, i, cpu;

 	CPU_ZERO(&allowed_mask);

-	nr_cpus = CPU_COUNT(&possible_mask);
-
-	for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
-		cpu = i % nr_cpus;
-		if (!CPU_ISSET(cpu, &possible_mask))
-			continue;
+	for (i = 0, cpu = -1; i < NR_TASK_MIGRATIONS; i++) {
+		do {
+			if (cpu < min_cpu || cpu > max_cpu)
+				cpu = min_cpu;
+			else
+				cpu++;
+		} while (!CPU_ISSET(cpu, &possible_mask));

 		CPU_SET(cpu, &allowed_mask);

@@ -154,6 +157,27 @@ static void *migration_worker(void *ign)
 	return NULL;
 }

+static int calc_min_max_cpu(void)
+{
+	int i, cnt, nproc;
+
+	nproc = get_nprocs_conf();
+	cnt = 0;
+	min_cpu = -1;
+	max_cpu = -1;
+
+	for (i = 0; i < nproc; i++) {
+		if (!CPU_ISSET(i, &possible_mask))
+			continue;
+		if (min_cpu == -1)
+			min_cpu = i;
+		max_cpu = i;
+		cnt++;
+	}
+
+	return (cnt < 2) ? -EINVAL : 0;
+}
+
 int main(int argc, char *argv[])
 {
 	int r, i, snapshot;
@@ -168,7 +192,11 @@ int main(int argc, char *argv[])
 		    strerror(errno));

 	if (CPU_COUNT(&possible_mask) < 2) {
-		print_skip("Only one CPU, task migration not possible\n");
+		print_skip("Only one CPU, task migration not possible");
+		exit(KSFT_SKIP);
+	}
+	if (calc_min_max_cpu()) {
+		print_skip("Only one usable CPU, task migration not possible");
 		exit(KSFT_SKIP);
 	}

--

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] selftests: KVM: set affinity of VM to right CPUs
  2021-09-27 19:22 ` Sean Christopherson
@ 2021-09-28 17:37   ` Dongli Zhang
  2021-09-29 17:54     ` Sean Christopherson
  0 siblings, 1 reply; 5+ messages in thread
From: Dongli Zhang @ 2021-09-28 17:37 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kselftest, pbonzini, shuah, linux-kernel



On 9/27/21 12:22 PM, Sean Christopherson wrote:
> On Fri, Sep 24, 2021, Dongli Zhang wrote:
>> The nr_cpus = CPU_COUNT(&possible_mask) is the number of available CPUs in
>> possible_mask. As a result, the "cpu = i % nr_cpus" may always return CPU
>> that is not available in possible_mask.
>>
>> Suppose the server has 8 CPUs. The below Failure is encountered immediately
>> if the task is bound to CPU 5 and 6.
> 
> /facepalm
> 
>> ==== Test Assertion Failure ====
>>   rseq_test.c:228: i > (NR_TASK_MIGRATIONS / 2)
>>   pid=10127 tid=10127 errno=4 - Interrupted system call
>>      1	0x00000000004018e5: main at rseq_test.c:227
>>      2	0x00007fcc8fc66bf6: ?? ??:0
>>      3	0x0000000000401959: _start at ??:?
>>   Only performed 4 KVM_RUNs, task stalled too much?
>>
>> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
>> ---
>>  tools/testing/selftests/kvm/rseq_test.c | 18 +++++++++++++++++-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
>> index c5e0dd664a7b..41df5173970c 100644
>> --- a/tools/testing/selftests/kvm/rseq_test.c
>> +++ b/tools/testing/selftests/kvm/rseq_test.c
>> @@ -10,6 +10,7 @@
>>  #include <signal.h>
>>  #include <syscall.h>
>>  #include <sys/ioctl.h>
>> +#include <sys/sysinfo.h>
>>  #include <asm/barrier.h>
>>  #include <linux/atomic.h>
>>  #include <linux/rseq.h>
>> @@ -43,6 +44,18 @@ static bool done;
>>  
>>  static atomic_t seq_cnt;
>>  
>> +static int get_max_cpu_idx(void)
>> +{
>> +	int nproc = get_nprocs_conf();
>> +	int i, max = -ENOENT;
>> +
>> +	for (i = 0; i < nproc; i++)
>> +		if (CPU_ISSET(i, &possible_mask))
>> +			max = i;
>> +
>> +	return max;
>> +}
>> +
>>  static void guest_code(void)
>>  {
>>  	for (;;)
>> @@ -61,10 +74,13 @@ static void *migration_worker(void *ign)
>>  {
>>  	cpu_set_t allowed_mask;
>>  	int r, i, nr_cpus, cpu;
>> +	int max_cpu_idx;
>>  
>>  	CPU_ZERO(&allowed_mask);
>>  
>> -	nr_cpus = CPU_COUNT(&possible_mask);
>> +	max_cpu_idx = get_max_cpu_idx();
>> +	TEST_ASSERT(max_cpu_idx >= 0, "Invalid possible_mask");
> 
> I feel like this should be a KSFT_SKIP condition, not an assert.
> 
>> +	nr_cpus = max_cpu_idx + 1;
>>  
>>  	for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
>>  		cpu = i % nr_cpus;
> 
> This is still flawed, e.g. if the max CPU is 1023, but the task is pinned to _just_
> CPU 1023, then the assert at the end will likely still fail because the migration
> helper is effectively only running 1/1024 loops.
> 
> It probably also makes sense to grab the min CPU to reduce the pain if the task
> is affined to a small subset.
> 
> As an aside, _which_ CPUs the task is affined to seems to matter, e.g. some
> combinations of CPUs on my system don't fail, even with 100x iterations.  Don't
> think there's anything the test can do about that, just an interesting data point
> that suggests pinning while running tests may be a bad idea.

I agree sometimes the failure is expected and reasonable. E.g., I am able to
randomly reproduce the failure if I reduce NR_TASK_MIGRATIONS to 1000.

> 
> Anyways, something like this?
> 
> ---
>  tools/testing/selftests/kvm/rseq_test.c | 44 ++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
> index 060538bd405a..befd64c27152 100644
> --- a/tools/testing/selftests/kvm/rseq_test.c
> +++ b/tools/testing/selftests/kvm/rseq_test.c
> @@ -10,6 +10,7 @@
>  #include <signal.h>
>  #include <syscall.h>
>  #include <sys/ioctl.h>
> +#include <sys/sysinfo.h>
>  #include <asm/barrier.h>
>  #include <linux/atomic.h>
>  #include <linux/rseq.h>
> @@ -39,6 +40,7 @@ static __thread volatile struct rseq __rseq = {
> 
>  static pthread_t migration_thread;
>  static cpu_set_t possible_mask;
> +static int min_cpu, max_cpu;
>  static bool done;
> 
>  static atomic_t seq_cnt;
> @@ -60,16 +62,17 @@ static void sys_rseq(int flags)
>  static void *migration_worker(void *ign)
>  {
>  	cpu_set_t allowed_mask;
> -	int r, i, nr_cpus, cpu;
> +	int r, i, cpu;
> 
>  	CPU_ZERO(&allowed_mask);
> 
> -	nr_cpus = CPU_COUNT(&possible_mask);
> -
> -	for (i = 0; i < NR_TASK_MIGRATIONS; i++) {
> -		cpu = i % nr_cpus;
> -		if (!CPU_ISSET(cpu, &possible_mask))
> -			continue;
> +	for (i = 0, cpu = -1; i < NR_TASK_MIGRATIONS; i++) {
> +		do {
> +			if (cpu < min_cpu || cpu > max_cpu)
> +				cpu = min_cpu;
> +			else
> +				cpu++;
> +		} while (!CPU_ISSET(cpu, &possible_mask));
> 
>  		CPU_SET(cpu, &allowed_mask);

I see. The idea is to search within between min_cpu and max_cpu for each
NR_TASK_MIGRATION iteration to find the next cpu.

Perhaps a linked list is more suitable to here (when there are 1024 cpus and the
task is bound to both 1 and 1022) ... to pre-save the possible cpus in a list
and to only move to next cpu in the list for each iteration.

However, I think min_cpu/max_cpu is good enough for selttests case.

Would you please let me know if you would like to send above with my
Reported-by, or if you would like me to send with your Suggested-by.

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] selftests: KVM: set affinity of VM to right CPUs
  2021-09-28 17:37   ` Dongli Zhang
@ 2021-09-29 17:54     ` Sean Christopherson
  2021-09-29 18:54       ` Dongli Zhang
  0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2021-09-29 17:54 UTC (permalink / raw)
  To: Dongli Zhang; +Cc: kvm, linux-kselftest, pbonzini, shuah, linux-kernel

On Tue, Sep 28, 2021, Dongli Zhang wrote:
> 
> On 9/27/21 12:22 PM, Sean Christopherson wrote:
> Perhaps a linked list is more suitable to here (when there are 1024 cpus and the
> task is bound to both 1 and 1022) ... to pre-save the possible cpus in a list
> and to only move to next cpu in the list for each iteration.
> 
> However, I think min_cpu/max_cpu is good enough for selttests case.

Yeah, it's annoying that there's no CPU_SET_FOR_EACH so that x86 could optimize
it to use BSF :-/

> Would you please let me know if you would like to send above with my
> Reported-by, or if you would like me to send with your Suggested-by.

If you don't mind, I'll send a patch, I want to fiddle with the migration loop to
see if I can make it less magical/ugly.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] selftests: KVM: set affinity of VM to right CPUs
  2021-09-29 17:54     ` Sean Christopherson
@ 2021-09-29 18:54       ` Dongli Zhang
  0 siblings, 0 replies; 5+ messages in thread
From: Dongli Zhang @ 2021-09-29 18:54 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, linux-kselftest, pbonzini, shuah, linux-kernel



On 9/29/21 10:54 AM, Sean Christopherson wrote:
> On Tue, Sep 28, 2021, Dongli Zhang wrote:
>>
>> On 9/27/21 12:22 PM, Sean Christopherson wrote:
>> Perhaps a linked list is more suitable to here (when there are 1024 cpus and the
>> task is bound to both 1 and 1022) ... to pre-save the possible cpus in a list
>> and to only move to next cpu in the list for each iteration.
>>
>> However, I think min_cpu/max_cpu is good enough for selttests case.
> 
> Yeah, it's annoying that there's no CPU_SET_FOR_EACH so that x86 could optimize
> it to use BSF :-/
> 
>> Would you please let me know if you would like to send above with my
>> Reported-by, or if you would like me to send with your Suggested-by.
> 
> If you don't mind, I'll send a patch, I want to fiddle with the migration loop to
> see if I can make it less magical/ugly.
> 

I do not mind. Feel free to send the patch with my "Reported-by: Dongli Zhang
<dongli.zhang@oracle.com>".

Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-29 18:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-24 23:30 [PATCH 1/1] selftests: KVM: set affinity of VM to right CPUs Dongli Zhang
2021-09-27 19:22 ` Sean Christopherson
2021-09-28 17:37   ` Dongli Zhang
2021-09-29 17:54     ` Sean Christopherson
2021-09-29 18:54       ` Dongli Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).