All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
@ 2022-11-02  2:01 Robert Hoo
  2022-11-02  2:01 ` [RFC 1/1] " Robert Hoo
  0 siblings, 1 reply; 10+ messages in thread
From: Robert Hoo @ 2022-11-02  2:01 UTC (permalink / raw)
  To: pbonzini, seanjc, gshan; +Cc: kvm, Robert Hoo

Recently, our QA often meet the test assert failure in KVM selftest rseq_test.
e.g.
==== Test Assertion Failure ====
  rseq_test.c:273: i > (NR_TASK_MIGRATIONS / 2)
  pid=391366 tid=391366 errno=4 - Interrupted system call
     1	0x00000000004027dd: main at rseq_test.c:272
     2	0x00007f7fc383ad84: ?? ??:0
     3	0x000000000040286d: _start at ??:?
  Only performed 32083 KVM_RUNs, task stalled too much?

Though this is not a bug [1], passing this assert means the race condition
can be more hit, which is the original purpose of this test case design.

[1] https://lore.kernel.org/kvm/YvwYxeE4vc%2FSrbil@google.com/

Robert Hoo (1):
  KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()

 tools/testing/selftests/kvm/rseq_test.c | 32 ++++++++++++++++++-------
 1 file changed, 24 insertions(+), 8 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-02  2:01 [RFC 0/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall() Robert Hoo
@ 2022-11-02  2:01 ` Robert Hoo
  2022-11-02  4:24   ` Gavin Shan
  2022-11-03  0:46   ` Sean Christopherson
  0 siblings, 2 replies; 10+ messages in thread
From: Robert Hoo @ 2022-11-02  2:01 UTC (permalink / raw)
  To: pbonzini, seanjc, gshan; +Cc: kvm, Robert Hoo

vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
generally available.
Use vDSO getcpu() to reduce the overhead, so that vcpu thread stalls less
therefore can have more odds to hit the race condition.

Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test")
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
---
 tools/testing/selftests/kvm/rseq_test.c | 32 ++++++++++++++++++-------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 6f88da7e60be..0b68a6b19b31 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -42,15 +42,29 @@ static void guest_code(void)
 }
 
 /*
- * We have to perform direct system call for getcpu() because it's
- * not available until glic 2.29.
+ * getcpu() was added in kernel 2.6.19. glibc support wasn't there
+ * until glibc 2.29.
+ * We can direct call it from vdso to ease gblic dependency.
+ *
+ * vdso manipulation code refers from selftests/x86/test_vsyscall.c
  */
-static void sys_getcpu(unsigned *cpu)
-{
-	int r;
+typedef long (*getcpu_t)(unsigned *, unsigned *, void *);
+static getcpu_t vdso_getcpu;
 
-	r = syscall(__NR_getcpu, cpu, NULL, NULL);
-	TEST_ASSERT(!r, "getcpu failed, errno = %d (%s)", errno, strerror(errno));
+static void init_vdso(void)
+{
+	void *vdso = dlopen("linux-vdso.so.1", RTLD_LAZY | RTLD_LOCAL |
+			    RTLD_NOLOAD);
+	if (!vdso)
+		vdso = dlopen("linux-gate.so.1", RTLD_LAZY | RTLD_LOCAL |
+			      RTLD_NOLOAD);
+	if (!vdso)
+		TEST_ASSERT(!vdso, "failed to find vDSO\n");
+
+	vdso_getcpu = (getcpu_t)dlsym(vdso, "__vdso_getcpu");
+	if (!vdso_getcpu)
+		TEST_ASSERT(!vdso_getcpu,
+			    "failed to find __vdso_getcpu in vDSO\n");
 }
 
 static int next_cpu(int cpu)
@@ -205,6 +219,8 @@ int main(int argc, char *argv[])
 	struct kvm_vcpu *vcpu;
 	u32 cpu, rseq_cpu;
 
+	init_vdso();
+
 	/* Tell stdout not to buffer its content */
 	setbuf(stdout, NULL);
 
@@ -253,7 +269,7 @@ int main(int argc, char *argv[])
 			 * across the seq_cnt reads.
 			 */
 			smp_rmb();
-			sys_getcpu(&cpu);
+			vdso_getcpu(&cpu, NULL, NULL);
 			rseq_cpu = rseq_current_cpu_raw();
 			smp_rmb();
 		} while (snapshot != atomic_read(&seq_cnt));
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-02  2:01 ` [RFC 1/1] " Robert Hoo
@ 2022-11-02  4:24   ` Gavin Shan
  2022-11-02 12:46     ` Robert Hoo
  2022-11-03  0:46   ` Sean Christopherson
  1 sibling, 1 reply; 10+ messages in thread
From: Gavin Shan @ 2022-11-02  4:24 UTC (permalink / raw)
  To: Robert Hoo, pbonzini, seanjc; +Cc: kvm

Hi Robert,

On 11/2/22 10:01 AM, Robert Hoo wrote:
> vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
> generally available.
> Use vDSO getcpu() to reduce the overhead, so that vcpu thread stalls less
> therefore can have more odds to hit the race condition.
> 

It would be nice to provide more context to explain how the race
condition is caused.

> Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test")
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---
>   tools/testing/selftests/kvm/rseq_test.c | 32 ++++++++++++++++++-------
>   1 file changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
> index 6f88da7e60be..0b68a6b19b31 100644
> --- a/tools/testing/selftests/kvm/rseq_test.c
> +++ b/tools/testing/selftests/kvm/rseq_test.c
> @@ -42,15 +42,29 @@ static void guest_code(void)
>   }
>   
>   /*
> - * We have to perform direct system call for getcpu() because it's
> - * not available until glic 2.29.
> + * getcpu() was added in kernel 2.6.19. glibc support wasn't there
> + * until glibc 2.29.
> + * We can direct call it from vdso to ease gblic dependency.
> + *
> + * vdso manipulation code refers from selftests/x86/test_vsyscall.c
>    */
> -static void sys_getcpu(unsigned *cpu)
> -{
> -	int r;
> +typedef long (*getcpu_t)(unsigned *, unsigned *, void *);
> +static getcpu_t vdso_getcpu;
>   
> -	r = syscall(__NR_getcpu, cpu, NULL, NULL);
> -	TEST_ASSERT(!r, "getcpu failed, errno = %d (%s)", errno, strerror(errno));
> +static void init_vdso(void)
> +{
> +	void *vdso = dlopen("linux-vdso.so.1", RTLD_LAZY | RTLD_LOCAL |
> +			    RTLD_NOLOAD);
> +	if (!vdso)
> +		vdso = dlopen("linux-gate.so.1", RTLD_LAZY | RTLD_LOCAL |
> +			      RTLD_NOLOAD);
> +	if (!vdso)
> +		TEST_ASSERT(!vdso, "failed to find vDSO\n");
> +
> +	vdso_getcpu = (getcpu_t)dlsym(vdso, "__vdso_getcpu");
> +	if (!vdso_getcpu)
> +		TEST_ASSERT(!vdso_getcpu,
> +			    "failed to find __vdso_getcpu in vDSO\n");
>   }
>   

As the comments say, vdso manipulation code comes from selftests/x86/test_vsyscall.c.
I would guess 'linux-vdso.so.1' and 'linux-gate.so.1' are x86 specific. If I'm correct,
the test case will fail on other architectures, including ARM64.

>   static int next_cpu(int cpu)
> @@ -205,6 +219,8 @@ int main(int argc, char *argv[])
>   	struct kvm_vcpu *vcpu;
>   	u32 cpu, rseq_cpu;
>   
> +	init_vdso();
> +
>   	/* Tell stdout not to buffer its content */
>   	setbuf(stdout, NULL);
>   
> @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
>   			 * across the seq_cnt reads.
>   			 */
>   			smp_rmb();
> -			sys_getcpu(&cpu);
> +			vdso_getcpu(&cpu, NULL, NULL);
>   			rseq_cpu = rseq_current_cpu_raw();
>   			smp_rmb();
>   		} while (snapshot != atomic_read(&seq_cnt));
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-02  4:24   ` Gavin Shan
@ 2022-11-02 12:46     ` Robert Hoo
  0 siblings, 0 replies; 10+ messages in thread
From: Robert Hoo @ 2022-11-02 12:46 UTC (permalink / raw)
  To: Gavin Shan, pbonzini, seanjc; +Cc: kvm

On Wed, 2022-11-02 at 12:24 +0800, Gavin Shan wrote:
> Hi Robert,
> 
> On 11/2/22 10:01 AM, Robert Hoo wrote:
> > vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
> > generally available.
> > Use vDSO getcpu() to reduce the overhead, so that vcpu thread
> > stalls less
> > therefore can have more odds to hit the race condition.
> > 
> 
> It would be nice to provide more context to explain how the race
> condition is caused.

OK. How about this?
... hit the race condition that vcpu_run() inside need to handle pcpu
migration triggered by sched_setaffinity() in migration thread.
> 
> > Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of
> > sched_getcpu() in rseq_test")
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > ---
> >   tools/testing/selftests/kvm/rseq_test.c | 32 ++++++++++++++++++
> > -------
> >   1 file changed, 24 insertions(+), 8 deletions(-)
> > 
> > diff --git a/tools/testing/selftests/kvm/rseq_test.c
> > b/tools/testing/selftests/kvm/rseq_test.c
> > index 6f88da7e60be..0b68a6b19b31 100644
> > --- a/tools/testing/selftests/kvm/rseq_test.c
> > +++ b/tools/testing/selftests/kvm/rseq_test.c
> > @@ -42,15 +42,29 @@ static void guest_code(void)
> >   }
> >   
> >   /*
> > - * We have to perform direct system call for getcpu() because it's
> > - * not available until glic 2.29.
> > + * getcpu() was added in kernel 2.6.19. glibc support wasn't there
> > + * until glibc 2.29.
> > + * We can direct call it from vdso to ease gblic dependency.
> > + *
> > + * vdso manipulation code refers from
> > selftests/x86/test_vsyscall.c
> >    */
> > -static void sys_getcpu(unsigned *cpu)
> > -{
> > -	int r;
> > +typedef long (*getcpu_t)(unsigned *, unsigned *, void *);
> > +static getcpu_t vdso_getcpu;
> >   
> > -	r = syscall(__NR_getcpu, cpu, NULL, NULL);
> > -	TEST_ASSERT(!r, "getcpu failed, errno = %d (%s)", errno,
> > strerror(errno));
> > +static void init_vdso(void)
> > +{
> > +	void *vdso = dlopen("linux-vdso.so.1", RTLD_LAZY | RTLD_LOCAL |
> > +			    RTLD_NOLOAD);
> > +	if (!vdso)
> > +		vdso = dlopen("linux-gate.so.1", RTLD_LAZY | RTLD_LOCAL
> > |
> > +			      RTLD_NOLOAD);
> > +	if (!vdso)
> > +		TEST_ASSERT(!vdso, "failed to find vDSO\n");
> > +
> > +	vdso_getcpu = (getcpu_t)dlsym(vdso, "__vdso_getcpu");
> > +	if (!vdso_getcpu)
> > +		TEST_ASSERT(!vdso_getcpu,
> > +			    "failed to find __vdso_getcpu in vDSO\n");
> >   }
> >   
> 
> As the comments say, vdso manipulation code comes from
> selftests/x86/test_vsyscall.c.
> I would guess 'linux-vdso.so.1' and 'linux-gate.so.1' are x86
> specific. If I'm correct,
> the test case will fail on other architectures, including ARM64.
> 
Ah, right, thanks.
Fortunately ARM and x86 share same vDSO name, and we can define macros
for variations.

       user ABI   vDSO name
       ?????????????????????????????
       aarch64    linux-vdso.so.1
       arm        linux-vdso.so.1
       ia64       linux-gate.so.1
       mips       linux-vdso.so.1
       ppc/32     linux-vdso32.so.1
       ppc/64     linux-vdso64.so.1
       s390       linux-vdso32.so.1
       s390x      linux-vdso64.so.1
       sh         linux-gate.so.1
       i386       linux-gate.so.1
       x86-64     linux-vdso.so.1
       x86/x32    linux-vdso.so.1

While unfortunately, looks like ARM vDSO doesn't have getcpu(). In that
case, we might roll back to syscall(__NR_getcpu)?

aarch64 functions
       The table below lists the symbols exported by the vDSO.

       symbol                   version
       --------------------------------------
       __kernel_rt_sigreturn    LINUX_2.6.39
       __kernel_gettimeofday    LINUX_2.6.39
       __kernel_clock_gettime   LINUX_2.6.39
       __kernel_clock_getres    LINUX_2.6.39

https://man7.org/linux/man-pages/man7/vdso.7.html

> >   static int next_cpu(int cpu)
> > @@ -205,6 +219,8 @@ int main(int argc, char *argv[])
> >   	struct kvm_vcpu *vcpu;
> >   	u32 cpu, rseq_cpu;
> >   
> > +	init_vdso();
> > +
> >   	/* Tell stdout not to buffer its content */
> >   	setbuf(stdout, NULL);
> >   
> > @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
> >   			 * across the seq_cnt reads.
> >   			 */
> >   			smp_rmb();
> > -			sys_getcpu(&cpu);
> > +			vdso_getcpu(&cpu, NULL, NULL);
> >   			rseq_cpu = rseq_current_cpu_raw();
> >   			smp_rmb();
> >   		} while (snapshot != atomic_read(&seq_cnt));
> > 
> 
> Thanks,
> Gavin
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-02  2:01 ` [RFC 1/1] " Robert Hoo
  2022-11-02  4:24   ` Gavin Shan
@ 2022-11-03  0:46   ` Sean Christopherson
  2022-11-03  1:16     ` Gavin Shan
  2022-11-03  2:59     ` Robert Hoo
  1 sibling, 2 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-11-03  0:46 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, gshan, kvm

On Wed, Nov 02, 2022, Robert Hoo wrote:
> vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
> generally available.
> Use vDSO getcpu() to reduce the overhead, so that vcpu thread stalls less
> therefore can have more odds to hit the race condition.
> 
> Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test")
> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> ---

...

> @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
>  			 * across the seq_cnt reads.
>  			 */
>  			smp_rmb();
> -			sys_getcpu(&cpu);
> +			vdso_getcpu(&cpu, NULL, NULL);
>  			rseq_cpu = rseq_current_cpu_raw();
>  			smp_rmb();
>  		} while (snapshot != atomic_read(&seq_cnt));

Something seems off here.  Half of the iterations in the migration thread have a
delay of 5+us, which should be more than enough time to complete a few getcpu()
syscalls to stabilize the CPU.

Has anyone tried to figure out why the vCPU thread is apparently running slow?
E.g. is KVM_RUN itself taking a long time, is the task not getting scheduled in,
etc...  I can see how using vDSO would make the vCPU more efficient, but I'm
curious as to why that's a problem in the first place.

Anyways, assuming there's no underlying problem that can be solved, the easier
solution is to just bump the delay in the migration thread.  As per its gigantic
comment, the original bug reproduced with up to 500us delays, so bumping the min
delay to e.g. 5us is acceptable.  If that doesn't guarantee the vCPU meets its
quota, then something else is definitely going on.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-03  0:46   ` Sean Christopherson
@ 2022-11-03  1:16     ` Gavin Shan
  2022-11-04  2:05       ` Sean Christopherson
  2022-11-03  2:59     ` Robert Hoo
  1 sibling, 1 reply; 10+ messages in thread
From: Gavin Shan @ 2022-11-03  1:16 UTC (permalink / raw)
  To: Sean Christopherson, Robert Hoo; +Cc: pbonzini, kvm

On 11/3/22 8:46 AM, Sean Christopherson wrote:
> On Wed, Nov 02, 2022, Robert Hoo wrote:
>> vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
>> generally available.
>> Use vDSO getcpu() to reduce the overhead, so that vcpu thread stalls less
>> therefore can have more odds to hit the race condition.
>>
>> Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test")
>> Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
>> ---
> 
> ...
> 
>> @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
>>   			 * across the seq_cnt reads.
>>   			 */
>>   			smp_rmb();
>> -			sys_getcpu(&cpu);
>> +			vdso_getcpu(&cpu, NULL, NULL);
>>   			rseq_cpu = rseq_current_cpu_raw();
>>   			smp_rmb();
>>   		} while (snapshot != atomic_read(&seq_cnt));
> 
> Something seems off here.  Half of the iterations in the migration thread have a
> delay of 5+us, which should be more than enough time to complete a few getcpu()
> syscalls to stabilize the CPU.
> 
> Has anyone tried to figure out why the vCPU thread is apparently running slow?
> E.g. is KVM_RUN itself taking a long time, is the task not getting scheduled in,
> etc...  I can see how using vDSO would make the vCPU more efficient, but I'm
> curious as to why that's a problem in the first place.
> 
> Anyways, assuming there's no underlying problem that can be solved, the easier
> solution is to just bump the delay in the migration thread.  As per its gigantic
> comment, the original bug reproduced with up to 500us delays, so bumping the min
> delay to e.g. 5us is acceptable.  If that doesn't guarantee the vCPU meets its
> quota, then something else is definitely going on.
> 

I doubt if it's still caused by busy system as mentioned previously [1]. At least,
I failed to reproduce the issue on my ARM64 system until some workloads are enforced
to hog CPUs. Looking at the implementation syscall(NR_getcpu), it's simply to copy
the per-cpu data from kernel to userspace. So I don't see it should consume lots
of time. As system call is handled by interrupt/exception, the time consumed by
the interrupt/exception handler should be architecture dependent. Besides, the time
needed by ioctl(KVM_RUN) also differs on architectures.

[1] https://lore.kernel.org/kvm/d8290cbe-5d87-137a-0633-0ff5c69d57b0@redhat.com/

I think Sean's suggestion to bump the delay to 5us would be the quick fix if it helps.
However, more time will be needed to complete the test. Sean, do you mind to reduce
NR_TASK_MIGRATIONS from 100000 to 20000 either?

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-03  0:46   ` Sean Christopherson
  2022-11-03  1:16     ` Gavin Shan
@ 2022-11-03  2:59     ` Robert Hoo
  2022-11-04  2:07       ` Sean Christopherson
  1 sibling, 1 reply; 10+ messages in thread
From: Robert Hoo @ 2022-11-03  2:59 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, gshan, kvm

On Thu, 2022-11-03 at 00:46 +0000, Sean Christopherson wrote:
> On Wed, Nov 02, 2022, Robert Hoo wrote:
> > vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
> > generally available.
> > Use vDSO getcpu() to reduce the overhead, so that vcpu thread
> > stalls less
> > therefore can have more odds to hit the race condition.
> > 
> > Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of
> > sched_getcpu() in rseq_test")
> > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > ---
> 
> ...
> 
> > @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
> >  			 * across the seq_cnt reads.
> >  			 */
> >  			smp_rmb();
> > -			sys_getcpu(&cpu);
> > +			vdso_getcpu(&cpu, NULL, NULL);
> >  			rseq_cpu = rseq_current_cpu_raw();
> >  			smp_rmb();
> >  		} while (snapshot != atomic_read(&seq_cnt));
> 
> Something seems off here.  Half of the iterations in the migration
> thread have a
> delay of 5+us, which should be more than enough time to complete a
> few getcpu()
> syscalls to stabilize the CPU.
> 
The migration thread delay time is for the whole vcpu thread loop, not
just vcpu_run(), I think.

for (i = 0; !done; i++) {
		vcpu_run(vcpu);
		TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
			    "Guest failed?");
...
		do {
			...
			vdso_getcpu(&cpu, NULL, NULL);
			rseq_cpu = rseq_current_cpu_raw();
			...
		} while (snapshot != atomic_read(&seq_cnt));

...
	}

> Has anyone tried to figure out why the vCPU thread is apparently
> running slow?
> E.g. is KVM_RUN itself taking a long time, is the task not getting
> scheduled in,
> etc...  I can see how using vDSO would make the vCPU more efficient,
> but I'm
> curious as to why that's a problem in the first place.

Yes, it should be the first-place problem.
But firstly, it's the whole for(){} loop taking more time than before,
that increment can be attributed to those key sub-calls, e.g.
vcpu_run(), get_ucall(), getcpu(), rseq_current_cpu_raw().

Though vcpu_run() should have first attention, reduce others' time
spending also helps.

BTW, I find that x86 get_ucall() have a more vcpu ioctl
(vcpu_regs_get()) than aarch64's, this perhaps explains a little why
the for(){} loop is heavier than aarch64.

uint64_t get_ucall(struct kvm_vcpu *vcpu, struct ucall *uc)
@@ -43,12 +95,14 @@
 	if (uc)
 		memset(uc, 0, sizeof(*uc));
 
-	if (run->exit_reason == KVM_EXIT_IO && run->io.port ==
UCALL_PIO_PORT) {
-		struct kvm_regs regs;
-
-		vcpu_regs_get(vcpu, &regs);
-		memcpy(&ucall, addr_gva2hva(vcpu->vm,
(vm_vaddr_t)regs.rdi),
-		       sizeof(ucall));
+	if (run->exit_reason == KVM_EXIT_MMIO &&
+	    run->mmio.phys_addr == (uint64_t)ucall_exit_mmio_addr) {
+		vm_vaddr_t gva;
+
+		TEST_ASSERT(run->mmio.is_write && run->mmio.len == 8,
+			    "Unexpected ucall exit mmio address
access");
+		memcpy(&gva, run->mmio.data, sizeof(gva));
+		memcpy(&ucall, addr_gva2hva(vcpu->vm, gva),
sizeof(ucall));

> 
> Anyways, assuming there's no underlying problem that can be solved,
> the easier
> solution is to just bump the delay in the migration thread.  As per
> its gigantic
> comment, the original bug reproduced with up to 500us delays, so
> bumping the min
> delay to e.g. 5us is acceptable.  If that doesn't guarantee the vCPU
> meets its
> quota, then something else is definitely going on.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-03  1:16     ` Gavin Shan
@ 2022-11-04  2:05       ` Sean Christopherson
  2022-11-04 20:27         ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-11-04  2:05 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Robert Hoo, pbonzini, kvm

On Thu, Nov 03, 2022, Gavin Shan wrote:
> On 11/3/22 8:46 AM, Sean Christopherson wrote:
> > On Wed, Nov 02, 2022, Robert Hoo wrote:
> > > @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
> > >   			 * across the seq_cnt reads.
> > >   			 */
> > >   			smp_rmb();
> > > -			sys_getcpu(&cpu);
> > > +			vdso_getcpu(&cpu, NULL, NULL);
> > >   			rseq_cpu = rseq_current_cpu_raw();
> > >   			smp_rmb();
> > >   		} while (snapshot != atomic_read(&seq_cnt));
> > 
> > Something seems off here.  Half of the iterations in the migration thread have a
> > delay of 5+us, which should be more than enough time to complete a few getcpu()
> > syscalls to stabilize the CPU.
> > 
> > Has anyone tried to figure out why the vCPU thread is apparently running slow?
> > E.g. is KVM_RUN itself taking a long time, is the task not getting scheduled in,
> > etc...  I can see how using vDSO would make the vCPU more efficient, but I'm
> > curious as to why that's a problem in the first place.
> > 
> > Anyways, assuming there's no underlying problem that can be solved, the easier
> > solution is to just bump the delay in the migration thread.  As per its gigantic
> > comment, the original bug reproduced with up to 500us delays, so bumping the min
> > delay to e.g. 5us is acceptable.  If that doesn't guarantee the vCPU meets its
> > quota, then something else is definitely going on.
> > 
> 
> I doubt if it's still caused by busy system as mentioned previously [1]. At least,
> I failed to reproduce the issue on my ARM64 system until some workloads are enforced
> to hog CPUs.

Yeah, I suspect something else as well.  My best guest at this point is mitigations,
I'll test that tomorrow to see if it makes any difference.

> Looking at the implementation syscall(NR_getcpu), it's simply to copy
> the per-cpu data from kernel to userspace. So I don't see it should consume lots
> of time. As system call is handled by interrupt/exception, the time consumed by
> the interrupt/exception handler should be architecture dependent. Besides, the time
> needed by ioctl(KVM_RUN) also differs on architectures.

Yes, but Robert is seeing problems on x86-64 that I have been unable to reproduce,
i.e. this isn't an architectural difference problem.

> [1] https://lore.kernel.org/kvm/d8290cbe-5d87-137a-0633-0ff5c69d57b0@redhat.com/
> 
> I think Sean's suggestion to bump the delay to 5us would be the quick fix if it helps.
> However, more time will be needed to complete the test. Sean, do you mind to reduce
> NR_TASK_MIGRATIONS from 100000 to 20000 either?

I don't think the number of migrations needs to be cut by 5x, the +5us bump only
changes the average from ~5us (to ~7.5us).

But before we start mucking with the delay, I want to at least understand _why_
a lower bound of 1us is insufficient.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-03  2:59     ` Robert Hoo
@ 2022-11-04  2:07       ` Sean Christopherson
  0 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-11-04  2:07 UTC (permalink / raw)
  To: Robert Hoo; +Cc: pbonzini, gshan, kvm

On Thu, Nov 03, 2022, Robert Hoo wrote:
> On Thu, 2022-11-03 at 00:46 +0000, Sean Christopherson wrote:
> > On Wed, Nov 02, 2022, Robert Hoo wrote:
> > > vDSO getcpu() has been in Kernel since 2.6.19, which we can assume
> > > generally available.
> > > Use vDSO getcpu() to reduce the overhead, so that vcpu thread
> > > stalls less
> > > therefore can have more odds to hit the race condition.
> > > 
> > > Fixes: 0fcc102923de ("KVM: selftests: Use getcpu() instead of
> > > sched_getcpu() in rseq_test")
> > > Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
> > > ---
> > 
> > ...
> > 
> > > @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
> > >  			 * across the seq_cnt reads.
> > >  			 */
> > >  			smp_rmb();
> > > -			sys_getcpu(&cpu);
> > > +			vdso_getcpu(&cpu, NULL, NULL);
> > >  			rseq_cpu = rseq_current_cpu_raw();
> > >  			smp_rmb();
> > >  		} while (snapshot != atomic_read(&seq_cnt));
> > 
> > Something seems off here.  Half of the iterations in the migration
> > thread have a
> > delay of 5+us, which should be more than enough time to complete a
> > few getcpu()
> > syscalls to stabilize the CPU.
> > 
> The migration thread delay time is for the whole vcpu thread loop, not
> just vcpu_run(), I think.

Yes, but if switching to vdso_getcpu() makes the issues go away, that suggests
that the task migration is causing the tight do-while loop to get stuck.

> for (i = 0; !done; i++) {
> 		vcpu_run(vcpu);
> 		TEST_ASSERT(get_ucall(vcpu, NULL) == UCALL_SYNC,
> 			    "Guest failed?");
> ...
> 		do {
> 			...
> 			vdso_getcpu(&cpu, NULL, NULL);
> 			rseq_cpu = rseq_current_cpu_raw();
> 			...
> 		} while (snapshot != atomic_read(&seq_cnt));
> 
> ...
> 	}
> 
> > Has anyone tried to figure out why the vCPU thread is apparently running
> > slow?  E.g. is KVM_RUN itself taking a long time, is the task not getting
> > scheduled in, etc...  I can see how using vDSO would make the vCPU more
> > efficient, but I'm curious as to why that's a problem in the first place.
> 
> Yes, it should be the first-place problem.
> But firstly, it's the whole for(){} loop taking more time than before,

Do you have actual performance numbers?  If so, can you share them?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC 1/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall()
  2022-11-04  2:05       ` Sean Christopherson
@ 2022-11-04 20:27         ` Sean Christopherson
  0 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-11-04 20:27 UTC (permalink / raw)
  To: Gavin Shan; +Cc: Robert Hoo, pbonzini, kvm

On Fri, Nov 04, 2022, Sean Christopherson wrote:
> On Thu, Nov 03, 2022, Gavin Shan wrote:
> > On 11/3/22 8:46 AM, Sean Christopherson wrote:
> > > On Wed, Nov 02, 2022, Robert Hoo wrote:
> > > > @@ -253,7 +269,7 @@ int main(int argc, char *argv[])
> > > >   			 * across the seq_cnt reads.
> > > >   			 */
> > > >   			smp_rmb();
> > > > -			sys_getcpu(&cpu);
> > > > +			vdso_getcpu(&cpu, NULL, NULL);
> > > >   			rseq_cpu = rseq_current_cpu_raw();
> > > >   			smp_rmb();
> > > >   		} while (snapshot != atomic_read(&seq_cnt));
> > > 
> > > Something seems off here.  Half of the iterations in the migration thread have a
> > > delay of 5+us, which should be more than enough time to complete a few getcpu()
> > > syscalls to stabilize the CPU.
> > > 
> > > Has anyone tried to figure out why the vCPU thread is apparently running slow?
> > > E.g. is KVM_RUN itself taking a long time, is the task not getting scheduled in,
> > > etc...  I can see how using vDSO would make the vCPU more efficient, but I'm
> > > curious as to why that's a problem in the first place.
> > > 
> > > Anyways, assuming there's no underlying problem that can be solved, the easier
> > > solution is to just bump the delay in the migration thread.  As per its gigantic
> > > comment, the original bug reproduced with up to 500us delays, so bumping the min
> > > delay to e.g. 5us is acceptable.  If that doesn't guarantee the vCPU meets its
> > > quota, then something else is definitely going on.
> > > 
> > 
> > I doubt if it's still caused by busy system as mentioned previously [1]. At least,
> > I failed to reproduce the issue on my ARM64 system until some workloads are enforced
> > to hog CPUs.
> 
> Yeah, I suspect something else as well.  My best guest at this point is mitigations,
> I'll test that tomorrow to see if it makes any difference.

So much for the mitigations theory, the migration thread gets slowed down more than
the vCPU thread.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-04 20:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-02  2:01 [RFC 0/1] KVM: selftests: rseq_test: use vdso_getcpu() instead of syscall() Robert Hoo
2022-11-02  2:01 ` [RFC 1/1] " Robert Hoo
2022-11-02  4:24   ` Gavin Shan
2022-11-02 12:46     ` Robert Hoo
2022-11-03  0:46   ` Sean Christopherson
2022-11-03  1:16     ` Gavin Shan
2022-11-04  2:05       ` Sean Christopherson
2022-11-04 20:27         ` Sean Christopherson
2022-11-03  2:59     ` Robert Hoo
2022-11-04  2:07       ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.