* [PATCH] KVM: VMX: optimize pi_wakeup_handler
@ 2022-04-02 4:01 Li RongQing
2022-04-02 8:14 ` Paolo Bonzini
0 siblings, 1 reply; 5+ messages in thread
From: Li RongQing @ 2022-04-02 4:01 UTC (permalink / raw)
To: kvm, pbonzini, seanjc, vkuznets
pi_wakeup_handler is used to wakeup the sleep vCPUs by posted irq
list_for_each_entry is used in it, and whose input is other function
per_cpu(), That cause that per_cpu() be invoked at least twice when
there is one sleep vCPU
so optimize pi_wakeup_handler it by reading once which is safe in
spinlock protection
and same to per CPU spinlock
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
arch/x86/kvm/vmx/posted_intr.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5fdabf3..0dae431 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -214,17 +214,21 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
*/
void pi_wakeup_handler(void)
{
+ struct list_head *wakeup_list;
int cpu = smp_processor_id();
+ raw_spinlock_t *spinlock;
struct vcpu_vmx *vmx;
- raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
- list_for_each_entry(vmx, &per_cpu(wakeup_vcpus_on_cpu, cpu),
- pi_wakeup_list) {
+ spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu);
+
+ raw_spin_lock(spinlock);
+ wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu);
+ list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) {
if (pi_test_on(&vmx->pi_desc))
kvm_vcpu_wake_up(&vmx->vcpu);
}
- raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
+ raw_spin_unlock(spinlock);
}
void __init pi_init_cpu(int cpu)
--
2.9.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] KVM: VMX: optimize pi_wakeup_handler
2022-04-02 4:01 [PATCH] KVM: VMX: optimize pi_wakeup_handler Li RongQing
@ 2022-04-02 8:14 ` Paolo Bonzini
2022-04-02 8:32 ` 答复: " Li,Rongqing
0 siblings, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2022-04-02 8:14 UTC (permalink / raw)
To: Li RongQing, kvm, seanjc, vkuznets
On 4/2/22 06:01, Li RongQing wrote:
> pi_wakeup_handler is used to wakeup the sleep vCPUs by posted irq
> list_for_each_entry is used in it, and whose input is other function
> per_cpu(), That cause that per_cpu() be invoked at least twice when
> there is one sleep vCPU
>
> so optimize pi_wakeup_handler it by reading once which is safe in
> spinlock protection
>
> and same to per CPU spinlock
What's the difference in the generated code?
Paolo
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
> arch/x86/kvm/vmx/posted_intr.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index 5fdabf3..0dae431 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -214,17 +214,21 @@ void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
> */
> void pi_wakeup_handler(void)
> {
> + struct list_head *wakeup_list;
> int cpu = smp_processor_id();
> + raw_spinlock_t *spinlock;
> struct vcpu_vmx *vmx;
>
> - raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
> - list_for_each_entry(vmx, &per_cpu(wakeup_vcpus_on_cpu, cpu),
> - pi_wakeup_list) {
> + spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu);
> +
> + raw_spin_lock(spinlock);
> + wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu);
> + list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) {
>
> if (pi_test_on(&vmx->pi_desc))
> kvm_vcpu_wake_up(&vmx->vcpu);
> }
> - raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, cpu));
> + raw_spin_unlock(spinlock);
> }
>
> void __init pi_init_cpu(int cpu)
^ permalink raw reply [flat|nested] 5+ messages in thread
* 答复: [PATCH] KVM: VMX: optimize pi_wakeup_handler
2022-04-02 8:14 ` Paolo Bonzini
@ 2022-04-02 8:32 ` Li,Rongqing
2022-04-04 15:15 ` Sean Christopherson
0 siblings, 1 reply; 5+ messages in thread
From: Li,Rongqing @ 2022-04-02 8:32 UTC (permalink / raw)
To: Paolo Bonzini, kvm, seanjc, vkuznets
> -----邮件原件-----
> 发件人: Paolo Bonzini <paolo.bonzini@gmail.com> 代表 Paolo Bonzini
> 发送时间: 2022年4月2日 16:14
> 收件人: Li,Rongqing <lirongqing@baidu.com>; kvm@vger.kernel.org;
> seanjc@google.com; vkuznets@redhat.com
> 主题: Re: [PATCH] KVM: VMX: optimize pi_wakeup_handler
>
> On 4/2/22 06:01, Li RongQing wrote:
> > pi_wakeup_handler is used to wakeup the sleep vCPUs by posted irq
> > list_for_each_entry is used in it, and whose input is other function
> > per_cpu(), That cause that per_cpu() be invoked at least twice when
> > there is one sleep vCPU
> >
> > so optimize pi_wakeup_handler it by reading once which is safe in
> > spinlock protection
> >
> > and same to per CPU spinlock
>
> What's the difference in the generated code?
>
This reduces one fifth asm codes
Without this patch:
0000000000000400 <pi_wakeup_handler>:
400: e8 00 00 00 00 callq 405 <pi_wakeup_handler+0x5>
405: 55 push %rbp
406: 48 89 e5 mov %rsp,%rbp
409: 41 57 push %r15
40b: 41 56 push %r14
40d: 41 55 push %r13
40f: 41 54 push %r12
411: 49 c7 c4 00 00 00 00 mov $0x0,%r12
418: 65 44 8b 2d 00 00 00 mov %gs:0x0(%rip),%r13d # 420 <pi_wakeup_handler+0x20>
41f: 00
420: 4d 63 ed movslq %r13d,%r13
423: 4c 89 e7 mov %r12,%rdi
426: 53 push %rbx
427: 4a 03 3c ed 00 00 00 add 0x0(,%r13,8),%rdi
42e: 00
42f: 49 c7 c6 00 00 00 00 mov $0x0,%r14
436: e8 00 00 00 00 callq 43b <pi_wakeup_handler+0x3b>
43b: 4a 8b 0c ed 00 00 00 mov 0x0(,%r13,8),%rcx
442: 00
443: 4c 89 f0 mov %r14,%rax
446: 48 8b 14 01 mov (%rcx,%rax,1),%rdx
44a: 48 01 c8 add %rcx,%rax
44d: 48 39 c2 cmp %rax,%rdx
450: 74 3e je 490 <pi_wakeup_handler+0x90>
452: 48 8d 9a 40 d9 ff ff lea -0x26c0(%rdx),%rbx
459: 49 c7 c7 00 00 00 00 mov $0x0,%r15
460: 48 8b 83 a0 26 00 00 mov 0x26a0(%rbx),%rax
467: a8 01 test $0x1,%al
469: 74 08 je 473 <pi_wakeup_handler+0x73>
46b: 48 89 df mov %rbx,%rdi
46e: e8 00 00 00 00 callq 473 <pi_wakeup_handler+0x73>
473: 4b 8b 0c ef mov (%r15,%r13,8),%rcx
477: 48 8b 93 c0 26 00 00 mov 0x26c0(%rbx),%rdx
47e: 4c 89 f0 mov %r14,%rax
481: 48 01 c8 add %rcx,%rax
484: 48 8d 9a 40 d9 ff ff lea -0x26c0(%rdx),%rbx
48b: 48 39 c2 cmp %rax,%rdx
48e: 75 d0 jne 460 <pi_wakeup_handler+0x60>
490: 49 01 cc add %rcx,%r12
493: 41 c6 04 24 00 movb $0x0,(%r12)
498: 5b pop %rbx
499: 41 5c pop %r12
49b: 41 5d pop %r13
49d: 41 5e pop %r14
49f: 41 5f pop %r15
4a1: 5d pop %rbp
4a2: c3 retq
4a3: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
4aa: 00 00
4ac: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4b3: 00 00 00
4b6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4bd: 00 00 00
With this patch
400: e8 00 00 00 00 callq 405 <pi_wakeup_handler+0x5>
405: 55 push %rbp
406: 48 89 e5 mov %rsp,%rbp
409: 41 55 push %r13
40b: 41 54 push %r12
40d: 53 push %rbx
40e: 49 c7 c5 00 00 00 00 mov $0x0,%r13
415: 49 c7 c4 00 00 00 00 mov $0x0,%r12
41c: 65 8b 1d 00 00 00 00 mov %gs:0x0(%rip),%ebx # 423 <pi_wakeup_handler+0x23>
423: 48 63 db movslq %ebx,%rbx
426: 4c 03 2c dd 00 00 00 add 0x0(,%rbx,8),%r13
42d: 00
42e: 4c 89 ef mov %r13,%rdi
431: e8 00 00 00 00 callq 436 <pi_wakeup_handler+0x36>
436: 4c 03 24 dd 00 00 00 add 0x0(,%rbx,8),%r12
43d: 00
43e: 49 8b 04 24 mov (%r12),%rax
442: 49 39 c4 cmp %rax,%r12
445: 74 2d je 474 <pi_wakeup_handler+0x74>
447: 48 8d 98 40 d9 ff ff lea -0x26c0(%rax),%rbx
44e: 48 8b 83 a0 26 00 00 mov 0x26a0(%rbx),%rax
455: a8 01 test $0x1,%al
457: 74 08 je 461 <pi_wakeup_handler+0x61>
459: 48 89 df mov %rbx,%rdi
45c: e8 00 00 00 00 callq 461 <pi_wakeup_handler+0x61>
461: 48 8b 83 c0 26 00 00 mov 0x26c0(%rbx),%rax
468: 49 39 c4 cmp %rax,%r12
46b: 48 8d 98 40 d9 ff ff lea -0x26c0(%rax),%rbx
472: 75 da jne 44e <pi_wakeup_handler+0x4e>
474: 41 c6 45 00 00 movb $0x0,0x0(%r13)
479: 5b pop %rbx
47a: 41 5c pop %r12
47c: 41 5d pop %r13
47e: 5d pop %rbp
47f: c3 retq
these is a similar patch 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
-Li
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: 答复: [PATCH] KVM: VMX: optimize pi_wakeup_handler
2022-04-02 8:32 ` 答复: " Li,Rongqing
@ 2022-04-04 15:15 ` Sean Christopherson
2022-04-05 8:26 ` 答复: " Li,Rongqing
0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2022-04-04 15:15 UTC (permalink / raw)
To: Li,Rongqing; +Cc: Paolo Bonzini, kvm, vkuznets
On Sat, Apr 02, 2022, Li,Rongqing wrote:
> > 发件人: Paolo Bonzini <paolo.bonzini@gmail.com> 代表 Paolo Bonzini
> > On 4/2/22 06:01, Li RongQing wrote:
> > > pi_wakeup_handler is used to wakeup the sleep vCPUs by posted irq
> > > list_for_each_entry is used in it, and whose input is other function
> > > per_cpu(), That cause that per_cpu() be invoked at least twice when
> > > there is one sleep vCPU
> > >
> > > so optimize pi_wakeup_handler it by reading once which is safe in
> > > spinlock protection
There's no need to protect reading the per-cpu variable with the spinlock, only
walking the list needs to be protected. E.g. the code can be compacted to:
int cpu = smp_processor_id();
raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, cpu);
struct list_head *wakeup_list = &per_cpu(wakeup_vcpus_on_cpu, cpu);
struct vcpu_vmx *vmx;
raw_spin_lock(spinlock);
list_for_each_entry(vmx, wakeup_list, pi_wakeup_list) {
if (pi_test_on(&vmx->pi_desc))
kvm_vcpu_wake_up(&vmx->vcpu);
}
raw_spin_unlock(spinlock);
> > >
> > > and same to per CPU spinlock
> >
> > What's the difference in the generated code?
> >
>
> This reduces one fifth asm codes
...
> these is a similar patch 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
Is there a measurable performance improvement though? I don't dislike the patch,
but it probably falls into the "technically an optimization but no one will ever
notice" category.
^ permalink raw reply [flat|nested] 5+ messages in thread
* 答复: 答复: [PATCH] KVM: VMX: optimize pi_wakeup_handler
2022-04-04 15:15 ` Sean Christopherson
@ 2022-04-05 8:26 ` Li,Rongqing
0 siblings, 0 replies; 5+ messages in thread
From: Li,Rongqing @ 2022-04-05 8:26 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, vkuznets
> There's no need to protect reading the per-cpu variable with the spinlock, only
> walking the list needs to be protected. E.g. the code can be compacted to:
Thanks
> > >
> > > What's the difference in the generated code?
> > >
> >
> > This reduces one fifth asm codes
>
> ...
>
> > these is a similar patch 031e3bd8986fffe31e1ddbf5264cccfe30c9abd7
>
> Is there a measurable performance improvement though? I don't dislike the
> patch, but it probably falls into the "technically an optimization but no one will
> ever notice" category.
There is small performance improvement when "perf bench sched pipe" is tested on IPI virtualization supported cpu and halt/mwait donot passthrough to vm
(https://patchwork.kernel.org/project/kvm/patch/20220304080725.18135-9-guang.zeng@intel.com/ are included)
Thanks
-Li
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-04-05 8:38 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-02 4:01 [PATCH] KVM: VMX: optimize pi_wakeup_handler Li RongQing
2022-04-02 8:14 ` Paolo Bonzini
2022-04-02 8:32 ` 答复: " Li,Rongqing
2022-04-04 15:15 ` Sean Christopherson
2022-04-05 8:26 ` 答复: " Li,Rongqing
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.