LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] KVM: X86: Single target IPI fastpath
@ 2019-11-09  7:05 Wanpeng Li
  2019-11-09  7:05 ` [PATCH 2/2] KVM: LAPIC: micro-optimize fixed mode ipi delivery Wanpeng Li
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Wanpeng Li @ 2019-11-09  7:05 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

This patch tries to optimize x2apic physical destination mode, fixed delivery
mode single target IPI by delivering IPI to receiver immediately after sender
writes ICR vmexit to avoid various checks when possible.

Testing on Xeon Skylake server:

The virtual IPI latency from sender send to receiver receive reduces more than
330+ cpu cycles.

Running hackbench(reschedule ipi) in the guest, the avg handle time of MSR_WRITE
caused vmexit reduces more than 1000+ cpu cycles:

Before patch:

  VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
MSR_WRITE    5417390    90.01%    16.31%      0.69us    159.60us    1.08us

After patch:

  VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
MSR_WRITE    6726109    90.73%    62.18%      0.48us    191.27us    0.58us

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/vmx/vmx.c   | 39 +++++++++++++++++++++++++++++++++++++--
 include/linux/kvm_host.h |  1 +
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d21a4a..5c67061 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5924,7 +5924,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
 		}
 	}
 
-	if (exit_reason < kvm_vmx_max_exit_handlers
+	if (vcpu->fast_vmexit)
+		return 1;
+	else if (exit_reason < kvm_vmx_max_exit_handlers
 	    && kvm_vmx_exit_handlers[exit_reason])
 		return kvm_vmx_exit_handlers[exit_reason](vcpu);
 	else {
@@ -6474,6 +6476,34 @@ void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
 
 bool __vmx_vcpu_run(struct vcpu_vmx *vmx, unsigned long *regs, bool launched);
 
+static int handle_ipi_fastpath(struct kvm_vcpu *vcpu)
+{
+	u32 index;
+	u64 data;
+	int ret = 0;
+
+	if (lapic_in_kernel(vcpu) && apic_x2apic_mode(vcpu->arch.apic)) {
+		/*
+		 * fastpath to IPI target
+		 */
+		index = kvm_rcx_read(vcpu);
+		data = kvm_read_edx_eax(vcpu);
+
+		if (((index - APIC_BASE_MSR) << 4 == APIC_ICR) &&
+			((data & KVM_APIC_DEST_MASK) == APIC_DEST_PHYSICAL) &&
+			((data & APIC_MODE_MASK) == APIC_DM_FIXED)) {
+
+			kvm_lapic_set_reg(vcpu->arch.apic, APIC_ICR2, (u32)(data >> 32));
+			ret = kvm_lapic_reg_write(vcpu->arch.apic, APIC_ICR, (u32)data);
+
+			if (ret == 0)
+				ret = kvm_skip_emulated_instruction(vcpu);
+		};
+	};
+
+	return ret;
+}
+
 static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -6615,6 +6645,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 				  | (1 << VCPU_EXREG_CR3));
 	vcpu->arch.regs_dirty = 0;
 
+	vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
+	vcpu->fast_vmexit = false;
+	if (!is_guest_mode(vcpu) &&
+		vmx->exit_reason == EXIT_REASON_MSR_WRITE)
+		vcpu->fast_vmexit = handle_ipi_fastpath(vcpu);
+
 	pt_guest_exit(vmx);
 
 	/*
@@ -6634,7 +6670,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	vmx->nested.nested_run_pending = 0;
 	vmx->idt_vectoring_info = 0;
 
-	vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
 	if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
 		kvm_machine_check();
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 719fc3e..7a7358b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -319,6 +319,7 @@ struct kvm_vcpu {
 #endif
 	bool preempted;
 	bool ready;
+	bool fast_vmexit;
 	struct kvm_vcpu_arch arch;
 	struct dentry *debugfs_dentry;
 };
-- 
2.7.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] KVM: LAPIC: micro-optimize fixed mode ipi delivery
  2019-11-09  7:05 [PATCH 1/2] KVM: X86: Single target IPI fastpath Wanpeng Li
@ 2019-11-09  7:05 ` Wanpeng Li
  2019-11-11 21:59   ` Paolo Bonzini
  2019-11-10  1:59 ` [PATCH 1/2] KVM: X86: Single target IPI fastpath kbuild test robot
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2019-11-09  7:05 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

After disabling mwait/halt/pause vmexits, RESCHEDULE_VECTOR and
CALL_FUNCTION_SINGLE_VECTOR etc IPI is one of the main remaining
cause of vmexits observed in product environment which can't be
optimized by PV IPIs. This patch is the follow-up on commit
0e6d242eccdb (KVM: LAPIC: Micro optimize IPI latency), to optimize
redundancy logic before fixed mode ipi is delivered in the fast
path.

- broadcast handling needs to go slow path, so the delivery mode repair
  can be delayed to before slow path.
- self-IPI will not be intervened by hypervisor any more after APICv is
  introduced and the boxes support APICv are popular now. In addition,
  kvm_apic_map_get_dest_lapic() can handle the self-IPI, so there is no
  need a shortcut for the non-APICv case.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/irq_comm.c | 6 +++---
 arch/x86/kvm/lapic.c    | 5 -----
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 8ecd48d..aa88156 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -52,15 +52,15 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 	unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
 	unsigned int dest_vcpus = 0;
 
+	if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
+		return r;
+
 	if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
 			kvm_lowest_prio_delivery(irq)) {
 		printk(KERN_INFO "kvm: apic: phys broadcast and lowest prio\n");
 		irq->delivery_mode = APIC_DM_FIXED;
 	}
 
-	if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
-		return r;
-
 	memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b29d00b..ea936fa 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -951,11 +951,6 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
 
 	*r = -1;
 
-	if (irq->shorthand == APIC_DEST_SELF) {
-		*r = kvm_apic_set_irq(src->vcpu, irq, dest_map);
-		return true;
-	}
-
 	rcu_read_lock();
 	map = rcu_dereference(kvm->arch.apic_map);
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-09  7:05 [PATCH 1/2] KVM: X86: Single target IPI fastpath Wanpeng Li
  2019-11-09  7:05 ` [PATCH 2/2] KVM: LAPIC: micro-optimize fixed mode ipi delivery Wanpeng Li
@ 2019-11-10  1:59 ` kbuild test robot
  2019-11-10  1:59 ` [PATCH] KVM: X86: fix semicolon.cocci warnings kbuild test robot
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2019-11-10  1:59 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: kbuild-all, linux-kernel, kvm, Paolo Bonzini,
	Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

Hi Wanpeng,

I love your patch! Perhaps something to improve:

[auto build test WARNING on kvm/linux-next]
[also build test WARNING on v5.4-rc6 next-20191108]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Wanpeng-Li/KVM-X86-Single-target-IPI-fastpath/20191110-081303
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


coccinelle warnings: (new ones prefixed by >>)

>> arch/x86/kvm/vmx/vmx.c:6509:3-4: Unneeded semicolon
   arch/x86/kvm/vmx/vmx.c:6510:2-3: Unneeded semicolon

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] KVM: X86: fix semicolon.cocci warnings
  2019-11-09  7:05 [PATCH 1/2] KVM: X86: Single target IPI fastpath Wanpeng Li
  2019-11-09  7:05 ` [PATCH 2/2] KVM: LAPIC: micro-optimize fixed mode ipi delivery Wanpeng Li
  2019-11-10  1:59 ` [PATCH 1/2] KVM: X86: Single target IPI fastpath kbuild test robot
@ 2019-11-10  1:59 ` kbuild test robot
  2019-11-11 13:06 ` [PATCH 1/2] KVM: X86: Single target IPI fastpath Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2019-11-10  1:59 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: kbuild-all, linux-kernel, kvm, Paolo Bonzini,
	Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

From: kbuild test robot <lkp@intel.com>

arch/x86/kvm/vmx/vmx.c:6509:3-4: Unneeded semicolon
arch/x86/kvm/vmx/vmx.c:6510:2-3: Unneeded semicolon


 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: fbb78ac80ca1 ("KVM: X86: Single target IPI fastpath")
CC: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
---

url:    https://github.com/0day-ci/linux/commits/Wanpeng-Li/KVM-X86-Single-target-IPI-fastpath/20191110-081303
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next

 vmx.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6506,8 +6506,8 @@ static int handle_ipi_fastpath(struct kv
 
 			if (ret == 0)
 				ret = kvm_skip_emulated_instruction(vcpu);
-		};
-	};
+		}
+	}
 
 	return ret;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-09  7:05 [PATCH 1/2] KVM: X86: Single target IPI fastpath Wanpeng Li
                   ` (2 preceding siblings ...)
  2019-11-10  1:59 ` [PATCH] KVM: X86: fix semicolon.cocci warnings kbuild test robot
@ 2019-11-11 13:06 ` Vitaly Kuznetsov
  2019-11-12  1:18   ` Wanpeng Li
  2019-11-11 21:59 ` Paolo Bonzini
  2019-11-14 11:58 ` Paolo Bonzini
  5 siblings, 1 reply; 17+ messages in thread
From: Vitaly Kuznetsov @ 2019-11-11 13:06 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel,
	linux-kernel, kvm

Wanpeng Li <kernellwp@gmail.com> writes:

> +
>  static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -6615,6 +6645,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  				  | (1 << VCPU_EXREG_CR3));
>  	vcpu->arch.regs_dirty = 0;
>  
> +	vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
> +	vcpu->fast_vmexit = false;
> +	if (!is_guest_mode(vcpu) &&
> +		vmx->exit_reason == EXIT_REASON_MSR_WRITE)
> +		vcpu->fast_vmexit = handle_ipi_fastpath(vcpu);

I have to admit this looks too much to me :-( Yes, I see the benefits of
taking a shortcut (by actualy penalizing all other MSR writes) but the
question I have is: where do we stop?

Also, this 'shortcut' creates an imbalance in tracing: you don't go down
to kvm_emulate_wrmsr() so handle_ipi_fastpath() should probably gain a
tracepoint.

Looking at 'fast_vmexit' name makes me think this is something
generic. Is this so? Maybe we can create some sort of an infrastructure
for fast vmexit handling and make it easy to hook things up to it?

(Maybe it's just me who thinks the codebase is already too complex,
let's see what Paolo and other reviewers have to say).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: LAPIC: micro-optimize fixed mode ipi delivery
  2019-11-09  7:05 ` [PATCH 2/2] KVM: LAPIC: micro-optimize fixed mode ipi delivery Wanpeng Li
@ 2019-11-11 21:59   ` Paolo Bonzini
  2019-11-12  1:34     ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2019-11-11 21:59 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On 09/11/19 08:05, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> After disabling mwait/halt/pause vmexits, RESCHEDULE_VECTOR and
> CALL_FUNCTION_SINGLE_VECTOR etc IPI is one of the main remaining
> cause of vmexits observed in product environment which can't be
> optimized by PV IPIs. This patch is the follow-up on commit
> 0e6d242eccdb (KVM: LAPIC: Micro optimize IPI latency), to optimize
> redundancy logic before fixed mode ipi is delivered in the fast
> path.
> 
> - broadcast handling needs to go slow path, so the delivery mode repair
>   can be delayed to before slow path.

I agree with this part, but is the cost of the irq->shorthand check
really measurable?

Paolo

> - self-IPI will not be intervened by hypervisor any more after APICv is
>   introduced and the boxes support APICv are popular now. In addition,
>   kvm_apic_map_get_dest_lapic() can handle the self-IPI, so there is no
>   need a shortcut for the non-APICv case.
> 
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/irq_comm.c | 6 +++---
>  arch/x86/kvm/lapic.c    | 5 -----
>  2 files changed, 3 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 8ecd48d..aa88156 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -52,15 +52,15 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
>  	unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
>  	unsigned int dest_vcpus = 0;
>  
> +	if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
> +		return r;
> +
>  	if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
>  			kvm_lowest_prio_delivery(irq)) {
>  		printk(KERN_INFO "kvm: apic: phys broadcast and lowest prio\n");
>  		irq->delivery_mode = APIC_DM_FIXED;
>  	}
>  
> -	if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
> -		return r;
> -
>  	memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
>  
>  	kvm_for_each_vcpu(i, vcpu, kvm) {
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b29d00b..ea936fa 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -951,11 +951,6 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
>  
>  	*r = -1;
>  
> -	if (irq->shorthand == APIC_DEST_SELF) {
> -		*r = kvm_apic_set_irq(src->vcpu, irq, dest_map);
> -		return true;
> -	}
> -
>  	rcu_read_lock();
>  	map = rcu_dereference(kvm->arch.apic_map);
>  
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-09  7:05 [PATCH 1/2] KVM: X86: Single target IPI fastpath Wanpeng Li
                   ` (3 preceding siblings ...)
  2019-11-11 13:06 ` [PATCH 1/2] KVM: X86: Single target IPI fastpath Vitaly Kuznetsov
@ 2019-11-11 21:59 ` Paolo Bonzini
  2019-11-12  1:33   ` Wanpeng Li
  2019-11-14 11:58 ` Paolo Bonzini
  5 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2019-11-11 21:59 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On 09/11/19 08:05, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> This patch tries to optimize x2apic physical destination mode, fixed delivery
> mode single target IPI by delivering IPI to receiver immediately after sender
> writes ICR vmexit to avoid various checks when possible.
> 
> Testing on Xeon Skylake server:
> 
> The virtual IPI latency from sender send to receiver receive reduces more than
> 330+ cpu cycles.
> 
> Running hackbench(reschedule ipi) in the guest, the avg handle time of MSR_WRITE
> caused vmexit reduces more than 1000+ cpu cycles:
> 
> Before patch:
> 
>   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> MSR_WRITE    5417390    90.01%    16.31%      0.69us    159.60us    1.08us
> 
> After patch:
> 
>   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> MSR_WRITE    6726109    90.73%    62.18%      0.48us    191.27us    0.58us

Do you have retpolines enabled?  The bulk of the speedup might come just
from the indirect jump.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-11 13:06 ` [PATCH 1/2] KVM: X86: Single target IPI fastpath Vitaly Kuznetsov
@ 2019-11-12  1:18   ` Wanpeng Li
  0 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2019-11-12  1:18 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Wanpeng Li, Jim Mattson, Joerg Roedel, LKML,
	kvm

On Mon, 11 Nov 2019 at 21:06, Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Wanpeng Li <kernellwp@gmail.com> writes:
>
> > +
> >  static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >  {
> >       struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -6615,6 +6645,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >                                 | (1 << VCPU_EXREG_CR3));
> >       vcpu->arch.regs_dirty = 0;
> >
> > +     vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
> > +     vcpu->fast_vmexit = false;
> > +     if (!is_guest_mode(vcpu) &&
> > +             vmx->exit_reason == EXIT_REASON_MSR_WRITE)
> > +             vcpu->fast_vmexit = handle_ipi_fastpath(vcpu);
>
> I have to admit this looks too much to me :-( Yes, I see the benefits of
> taking a shortcut (by actualy penalizing all other MSR writes) but the
> question I have is: where do we stop?

In our iaas environment observation, ICR and TSCDEADLINE are the main
MSR write vmexits.

Before patch:
tscdeadline_immed 3900
tscdeadline 5413

After patch:
tscdeadline_immed 3912
tscdeadline 5427

So the penalize can be tolerated.

>
> Also, this 'shortcut' creates an imbalance in tracing: you don't go down
> to kvm_emulate_wrmsr() so handle_ipi_fastpath() should probably gain a
> tracepoint.

Agreed.

>
> Looking at 'fast_vmexit' name makes me think this is something
> generic. Is this so? Maybe we can create some sort of an infrastructure
> for fast vmexit handling and make it easy to hook things up to it?

Maybe an indirect jump? But I can have a try.

    Wanpeng

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-11 21:59 ` Paolo Bonzini
@ 2019-11-12  1:33   ` Wanpeng Li
  2019-11-13  6:05     ` Wanpeng Li
  2019-11-14  3:12     ` Wanpeng Li
  0 siblings, 2 replies; 17+ messages in thread
From: Wanpeng Li @ 2019-11-12  1:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On Tue, 12 Nov 2019 at 05:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 09/11/19 08:05, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > This patch tries to optimize x2apic physical destination mode, fixed delivery
> > mode single target IPI by delivering IPI to receiver immediately after sender
> > writes ICR vmexit to avoid various checks when possible.
> >
> > Testing on Xeon Skylake server:
> >
> > The virtual IPI latency from sender send to receiver receive reduces more than
> > 330+ cpu cycles.
> >
> > Running hackbench(reschedule ipi) in the guest, the avg handle time of MSR_WRITE
> > caused vmexit reduces more than 1000+ cpu cycles:
> >
> > Before patch:
> >
> >   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> > MSR_WRITE    5417390    90.01%    16.31%      0.69us    159.60us    1.08us
> >
> > After patch:
> >
> >   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> > MSR_WRITE    6726109    90.73%    62.18%      0.48us    191.27us    0.58us
>
> Do you have retpolines enabled?  The bulk of the speedup might come just
> from the indirect jump.

Adding 'mitigations=off' to the host grub parameter:

Before patch:

    VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
MSR_WRITE    2681713    92.98%    77.52%      0.38us     18.54us
0.73us ( +-   0.02% )

After patch:

    VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
MSR_WRITE    2953447    92.48%    62.47%      0.30us     59.09us
0.40us ( +-   0.02% )

Actually, this is not the first attempt to add shortcut for MSR writes
which performance sensitive, the other effort is tscdeadline timer
from Isaku Yamahata, https://patchwork.kernel.org/cover/10541035/ ,
ICR and TSCDEADLINE MSR writes cause the main MSR write vmexits in our
product observation, multicast IPIs are not as common as unicast IPI
like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc. As far as
I know, something similar to this patch has already been deployed in
some cloud companies private kvm fork.

    Wanpeng

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] KVM: LAPIC: micro-optimize fixed mode ipi delivery
  2019-11-11 21:59   ` Paolo Bonzini
@ 2019-11-12  1:34     ` Wanpeng Li
  0 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2019-11-12  1:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On Tue, 12 Nov 2019 at 05:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 09/11/19 08:05, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > After disabling mwait/halt/pause vmexits, RESCHEDULE_VECTOR and
> > CALL_FUNCTION_SINGLE_VECTOR etc IPI is one of the main remaining
> > cause of vmexits observed in product environment which can't be
> > optimized by PV IPIs. This patch is the follow-up on commit
> > 0e6d242eccdb (KVM: LAPIC: Micro optimize IPI latency), to optimize
> > redundancy logic before fixed mode ipi is delivered in the fast
> > path.
> >
> > - broadcast handling needs to go slow path, so the delivery mode repair
> >   can be delayed to before slow path.
>
> I agree with this part, but is the cost of the irq->shorthand check
> really measurable?

I can drop the second part for v2.

    Wanpeng

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-12  1:33   ` Wanpeng Li
@ 2019-11-13  6:05     ` Wanpeng Li
  2019-11-14  3:12     ` Wanpeng Li
  1 sibling, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2019-11-13  6:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On Tue, 12 Nov 2019 at 09:33, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> On Tue, 12 Nov 2019 at 05:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 09/11/19 08:05, Wanpeng Li wrote:
> > > From: Wanpeng Li <wanpengli@tencent.com>
> > >
> > > This patch tries to optimize x2apic physical destination mode, fixed delivery
> > > mode single target IPI by delivering IPI to receiver immediately after sender
> > > writes ICR vmexit to avoid various checks when possible.
> > >
> > > Testing on Xeon Skylake server:
> > >
> > > The virtual IPI latency from sender send to receiver receive reduces more than
> > > 330+ cpu cycles.
> > >
> > > Running hackbench(reschedule ipi) in the guest, the avg handle time of MSR_WRITE
> > > caused vmexit reduces more than 1000+ cpu cycles:
> > >
> > > Before patch:
> > >
> > >   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> > > MSR_WRITE    5417390    90.01%    16.31%      0.69us    159.60us    1.08us
> > >
> > > After patch:
> > >
> > >   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> > > MSR_WRITE    6726109    90.73%    62.18%      0.48us    191.27us    0.58us
> >
> > Do you have retpolines enabled?  The bulk of the speedup might come just
> > from the indirect jump.
>
> Adding 'mitigations=off' to the host grub parameter:
>
> Before patch:
>
>     VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> MSR_WRITE    2681713    92.98%    77.52%      0.38us     18.54us
> 0.73us ( +-   0.02% )
>
> After patch:
>
>     VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> MSR_WRITE    2953447    92.48%    62.47%      0.30us     59.09us
> 0.40us ( +-   0.02% )
>
> Actually, this is not the first attempt to add shortcut for MSR writes
> which performance sensitive, the other effort is tscdeadline timer
> from Isaku Yamahata, https://patchwork.kernel.org/cover/10541035/ ,
> ICR and TSCDEADLINE MSR writes cause the main MSR write vmexits in our
> product observation, multicast IPIs are not as common as unicast IPI
> like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc. As far as
> I know, something similar to this patch has already been deployed in
> some cloud companies private kvm fork.

Hi Paolo,

Do you think I should continue for this?

    Wanpeng

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-12  1:33   ` Wanpeng Li
  2019-11-13  6:05     ` Wanpeng Li
@ 2019-11-14  3:12     ` Wanpeng Li
  1 sibling, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2019-11-14  3:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: LKML, kvm, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On Tue, 12 Nov 2019 at 09:33, Wanpeng Li <kernellwp@gmail.com> wrote:
>
> On Tue, 12 Nov 2019 at 05:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 09/11/19 08:05, Wanpeng Li wrote:
> > > From: Wanpeng Li <wanpengli@tencent.com>
> > >
> > > This patch tries to optimize x2apic physical destination mode, fixed delivery
> > > mode single target IPI by delivering IPI to receiver immediately after sender
> > > writes ICR vmexit to avoid various checks when possible.
> > >
> > > Testing on Xeon Skylake server:
> > >
> > > The virtual IPI latency from sender send to receiver receive reduces more than
> > > 330+ cpu cycles.
> > >
> > > Running hackbench(reschedule ipi) in the guest, the avg handle time of MSR_WRITE
> > > caused vmexit reduces more than 1000+ cpu cycles:
> > >
> > > Before patch:
> > >
> > >   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> > > MSR_WRITE    5417390    90.01%    16.31%      0.69us    159.60us    1.08us
> > >
> > > After patch:
> > >
> > >   VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> > > MSR_WRITE    6726109    90.73%    62.18%      0.48us    191.27us    0.58us
> >
> > Do you have retpolines enabled?  The bulk of the speedup might come just
> > from the indirect jump.
>
> Adding 'mitigations=off' to the host grub parameter:
>
> Before patch:
>
>     VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> MSR_WRITE    2681713    92.98%    77.52%      0.38us     18.54us
> 0.73us ( +-   0.02% )
>
> After patch:
>
>     VM-EXIT    Samples  Samples%     Time%    Min Time    Max Time   Avg time
> MSR_WRITE    2953447    92.48%    62.47%      0.30us     59.09us
> 0.40us ( +-   0.02% )

Hmm, sender side less vmexit time is due to kvm_exit tracepoint is
still left in vmx_handle_exit, and ICR wrmsr is moved ahead, that is
why the time between kvm_exit tracepoint and kvm_entry tracepoint is
reduced.  But the virtual IPI latency still can reduce 330+ cycles.

    Wanpeng

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-09  7:05 [PATCH 1/2] KVM: X86: Single target IPI fastpath Wanpeng Li
                   ` (4 preceding siblings ...)
  2019-11-11 21:59 ` Paolo Bonzini
@ 2019-11-14 11:58 ` Paolo Bonzini
  2019-11-14 15:22   ` Sean Christopherson
  5 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2019-11-14 11:58 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

Ok, it's not _so_ ugly after all.

> ---
>  arch/x86/kvm/vmx/vmx.c   | 39 +++++++++++++++++++++++++++++++++++++--
>  include/linux/kvm_host.h |  1 +
>  2 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 5d21a4a..5c67061 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -5924,7 +5924,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>  		}
>  	}
>  
> -	if (exit_reason < kvm_vmx_max_exit_handlers
> +	if (vcpu->fast_vmexit)
> +		return 1;
> +	else if (exit_reason < kvm_vmx_max_exit_handlers

Instead of a separate vcpu->fast_vmexit, perhaps you can set exit_reason
to vmx->exit_reason to -1 if the fast path succeeds.

> +			if (ret == 0)
> +				ret = kvm_skip_emulated_instruction(vcpu);

Please move the "kvm_skip_emulated_instruction(vcpu)" to
vmx_handle_exit, so that this basically is

#define EXIT_REASON_NEED_SKIP_EMULATED_INSN -1

	if (ret == 0)
		vcpu->exit_reason = EXIT_REASON_NEED_SKIP_EMULATED_INSN;

and handle_ipi_fastpath can return void.

Thanks,

Paolo

> +		};
> +	};
> +
> +	return ret;
> +}
> +
>  static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -6615,6 +6645,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  				  | (1 << VCPU_EXREG_CR3));
>  	vcpu->arch.regs_dirty = 0;
>  
> +	vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
> +	vcpu->fast_vmexit = false;
> +	if (!is_guest_mode(vcpu) &&
> +		vmx->exit_reason == EXIT_REASON_MSR_WRITE)
> +		vcpu->fast_vmexit = handle_ipi_fastpath(vcpu);

This should be done later, at least after kvm_put_guest_xcr0, because
running with partially-loaded guest state is harder to audit.  The best
place to put it actually is right after the existing vmx->exit_reason
assignment, where we already handle EXIT_REASON_MCE_DURING_VMENTRY.

>  	pt_guest_exit(vmx);
>  
>  	/*
> @@ -6634,7 +6670,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>  	vmx->nested.nested_run_pending = 0;
>  	vmx->idt_vectoring_info = 0;
>  
> -	vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
>  	if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
>  		kvm_machine_check();
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 719fc3e..7a7358b 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -319,6 +319,7 @@ struct kvm_vcpu {
>  #endif
>  	bool preempted;
>  	bool ready;
> +	bool fast_vmexit;
>  	struct kvm_vcpu_arch arch;
>  	struct dentry *debugfs_dentry;
>  };
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-14 11:58 ` Paolo Bonzini
@ 2019-11-14 15:22   ` Sean Christopherson
  2019-11-14 15:44     ` Paolo Bonzini
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2019-11-14 15:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, linux-kernel, kvm, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel

On Thu, Nov 14, 2019 at 12:58:56PM +0100, Paolo Bonzini wrote:
> Ok, it's not _so_ ugly after all.
> 
> > ---
> >  arch/x86/kvm/vmx/vmx.c   | 39 +++++++++++++++++++++++++++++++++++++--
> >  include/linux/kvm_host.h |  1 +
> >  2 files changed, 38 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 5d21a4a..5c67061 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -5924,7 +5924,9 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
> >  		}
> >  	}
> >  
> > -	if (exit_reason < kvm_vmx_max_exit_handlers
> > +	if (vcpu->fast_vmexit)
> > +		return 1;
> > +	else if (exit_reason < kvm_vmx_max_exit_handlers
> 
> Instead of a separate vcpu->fast_vmexit, perhaps you can set exit_reason
> to vmx->exit_reason to -1 if the fast path succeeds.

Actually, rather than make this super special case, what about moving the
handling into vmx_handle_exit_irqoff()?  Practically speaking that would
only add ~50 cycles (two VMREADs) relative to the code being run right
after kvm_put_guest_xcr0().  It has the advantage of restoring the host's
hardware breakpoints, preserving a semi-accurate last_guest_tsc, and
running with vcpu->mode set back to OUTSIDE_GUEST_MODE.  Hopefully it'd
also be more intuitive for people unfamiliar with the code.

> 

> > +			if (ret == 0)
> > +				ret = kvm_skip_emulated_instruction(vcpu);
> 
> Please move the "kvm_skip_emulated_instruction(vcpu)" to
> vmx_handle_exit, so that this basically is
> 
> #define EXIT_REASON_NEED_SKIP_EMULATED_INSN -1
> 
> 	if (ret == 0)
> 		vcpu->exit_reason = EXIT_REASON_NEED_SKIP_EMULATED_INSN;
> 
> and handle_ipi_fastpath can return void.

I'd rather we add a dedicated variable to say the exit has already been
handled.  Overloading exit_reason is bound to cause confusion, and that's
probably a best case scenario.

> > +		};
> > +	};
> > +
> > +	return ret;
> > +}
> > +
> >  static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> > @@ -6615,6 +6645,12 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >  				  | (1 << VCPU_EXREG_CR3));
> >  	vcpu->arch.regs_dirty = 0;
> >  
> > +	vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
> > +	vcpu->fast_vmexit = false;
> > +	if (!is_guest_mode(vcpu) &&
> > +		vmx->exit_reason == EXIT_REASON_MSR_WRITE)
> > +		vcpu->fast_vmexit = handle_ipi_fastpath(vcpu);
> 
> This should be done later, at least after kvm_put_guest_xcr0, because
> running with partially-loaded guest state is harder to audit.  The best
> place to put it actually is right after the existing vmx->exit_reason
> assignment, where we already handle EXIT_REASON_MCE_DURING_VMENTRY.
> 
> >  	pt_guest_exit(vmx);
> >  
> >  	/*
> > @@ -6634,7 +6670,6 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
> >  	vmx->nested.nested_run_pending = 0;
> >  	vmx->idt_vectoring_info = 0;
> >  
> > -	vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
> >  	if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
> >  		kvm_machine_check();
> >  
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 719fc3e..7a7358b 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -319,6 +319,7 @@ struct kvm_vcpu {
> >  #endif
> >  	bool preempted;
> >  	bool ready;
> > +	bool fast_vmexit;
> >  	struct kvm_vcpu_arch arch;
> >  	struct dentry *debugfs_dentry;
> >  };
> > 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-14 15:22   ` Sean Christopherson
@ 2019-11-14 15:44     ` Paolo Bonzini
  2019-11-14 16:34       ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Bonzini @ 2019-11-14 15:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Wanpeng Li, linux-kernel, kvm, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel

On 14/11/19 16:22, Sean Christopherson wrote:
>> Instead of a separate vcpu->fast_vmexit, perhaps you can set exit_reason
>> to vmx->exit_reason to -1 if the fast path succeeds.
> 
> Actually, rather than make this super special case, what about moving the
> handling into vmx_handle_exit_irqoff()?  Practically speaking that would
> only add ~50 cycles (two VMREADs) relative to the code being run right
> after kvm_put_guest_xcr0().  It has the advantage of restoring the host's
> hardware breakpoints, preserving a semi-accurate last_guest_tsc, and
> running with vcpu->mode set back to OUTSIDE_GUEST_MODE.  Hopefully it'd
> also be more intuitive for people unfamiliar with the code.

Yes, that's a good idea.  The expensive bit between handle_exit_irqoff
and handle_exit is srcu_read_lock, which has two memory barriers in it.


>>> +			if (ret == 0)
>>> +				ret = kvm_skip_emulated_instruction(vcpu);
>> Please move the "kvm_skip_emulated_instruction(vcpu)" to
>> vmx_handle_exit, so that this basically is
>>
>> #define EXIT_REASON_NEED_SKIP_EMULATED_INSN -1
>>
>> 	if (ret == 0)
>> 		vcpu->exit_reason = EXIT_REASON_NEED_SKIP_EMULATED_INSN;
>>
>> and handle_ipi_fastpath can return void.
>
> I'd rather we add a dedicated variable to say the exit has already been
> handled.  Overloading exit_reason is bound to cause confusion, and that's
> probably a best case scenario.

I proposed the fake exit reason to avoid a ternary return code from
handle_ipi_fastpath (return to guest, return to userspace, call
kvm_x86_ops->handle_exit), which Wanpeng's patch was mishandling.

To ensure confusion does not become the best case scenario, perhaps it
is worth trying to push exit_reason into vcpu_enter_guest's stack.
vcpu_enter_guest can pass a pointer to it, and then it can be passed
back into kvm_x86_ops->handle_exit{,_irqoff}.  It could be a struct too,
instead of just a bare u32.

This would ensure at compile-time that exit_reason is not accessed
outside the short path from vmexit to kvm_x86_ops->handle_exit.

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-14 15:44     ` Paolo Bonzini
@ 2019-11-14 16:34       ` Sean Christopherson
  2019-11-15  5:56         ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2019-11-14 16:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, linux-kernel, kvm, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel

On Thu, Nov 14, 2019 at 04:44:33PM +0100, Paolo Bonzini wrote:
> On 14/11/19 16:22, Sean Christopherson wrote:
> >> Instead of a separate vcpu->fast_vmexit, perhaps you can set exit_reason
> >> to vmx->exit_reason to -1 if the fast path succeeds.
> > 
> > Actually, rather than make this super special case, what about moving the
> > handling into vmx_handle_exit_irqoff()?  Practically speaking that would
> > only add ~50 cycles (two VMREADs) relative to the code being run right
> > after kvm_put_guest_xcr0().  It has the advantage of restoring the host's
> > hardware breakpoints, preserving a semi-accurate last_guest_tsc, and
> > running with vcpu->mode set back to OUTSIDE_GUEST_MODE.  Hopefully it'd
> > also be more intuitive for people unfamiliar with the code.
> 
> Yes, that's a good idea.  The expensive bit between handle_exit_irqoff
> and handle_exit is srcu_read_lock, which has two memory barriers in it.

Preaching to the choir at this point, but it'd also eliminate latency
spikes due to interrupts.

> >>> +			if (ret == 0)
> >>> +				ret = kvm_skip_emulated_instruction(vcpu);
> >> Please move the "kvm_skip_emulated_instruction(vcpu)" to
> >> vmx_handle_exit, so that this basically is
> >>
> >> #define EXIT_REASON_NEED_SKIP_EMULATED_INSN -1
> >>
> >> 	if (ret == 0)
> >> 		vcpu->exit_reason = EXIT_REASON_NEED_SKIP_EMULATED_INSN;
> >>
> >> and handle_ipi_fastpath can return void.
> >
> > I'd rather we add a dedicated variable to say the exit has already been
> > handled.  Overloading exit_reason is bound to cause confusion, and that's
> > probably a best case scenario.
> 
> I proposed the fake exit reason to avoid a ternary return code from
> handle_ipi_fastpath (return to guest, return to userspace, call
> kvm_x86_ops->handle_exit), which Wanpeng's patch was mishandling.

For this case, I think we can get away with a WARN if kvm_lapic_reg_write()
fails since it (currently) can't fail for ICR.  That would allow us to keep
a void return for ->handle_exit_irqoff() and avoid an overloaded return
value.

And, if the fastpath is used for all ICR writes, not just FIXED+PHYSICAL,
and is implemented for SVM, then we don't even need a a flag, e.g.
kvm_x2apic_msr_write() can simply ignore ICR writes, similar to how
handle_exception() ignores #MC and NMIs.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 87b0fcc23ef8..d7b79f7faac1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2615,12 +2615,11 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
        if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(apic))
                return 1;

-       if (reg == APIC_ICR2)
+
+       /* ICR2 writes are ignored and ICR writes are handled early. */
+       if (reg == APIC_ICR2 || reg == APIC_ICR)
                return 1;

-       /* if this is ICR write vector before command */
-       if (reg == APIC_ICR)
-               kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
        return kvm_lapic_reg_write(apic, reg, (u32)data);
 }

Another bonus to this approach is that the refactoring for the
exit_reason can be done in a separate series.

> To ensure confusion does not become the best case scenario, perhaps it
> is worth trying to push exit_reason into vcpu_enter_guest's stack.
> vcpu_enter_guest can pass a pointer to it, and then it can be passed
> back into kvm_x86_ops->handle_exit{,_irqoff}.  It could be a struct too,
> instead of just a bare u32.

On the other hand, if it's a bare u32 then kvm_x86_ops->run can simply
return the value instead of doing out parameter shenanigans.

> This would ensure at compile-time that exit_reason is not accessed
> outside the short path from vmexit to kvm_x86_ops->handle_exit.

That would be nice.  Surprisingly, the code actually appears to be fairly
clean, I thought for sure the nested stuff would be using exit_reason all
over the place.  The only one that needs to be fixed is handle_vmfunc(),
which passes vmx->exit_reason when forwarding the VM-Exit instead of simply
hardcoding EXIT_REASON_VMFUNC.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] KVM: X86: Single target IPI fastpath
  2019-11-14 16:34       ` Sean Christopherson
@ 2019-11-15  5:56         ` Wanpeng Li
  0 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2019-11-15  5:56 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: LKML, kvm, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel

On Fri, 15 Nov 2019 at 00:34, Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Thu, Nov 14, 2019 at 04:44:33PM +0100, Paolo Bonzini wrote:
> > On 14/11/19 16:22, Sean Christopherson wrote:
> > >> Instead of a separate vcpu->fast_vmexit, perhaps you can set exit_reason
> > >> to vmx->exit_reason to -1 if the fast path succeeds.
> > >
> > > Actually, rather than make this super special case, what about moving the
> > > handling into vmx_handle_exit_irqoff()?  Practically speaking that would
> > > only add ~50 cycles (two VMREADs) relative to the code being run right
> > > after kvm_put_guest_xcr0().  It has the advantage of restoring the host's
> > > hardware breakpoints, preserving a semi-accurate last_guest_tsc, and
> > > running with vcpu->mode set back to OUTSIDE_GUEST_MODE.  Hopefully it'd
> > > also be more intuitive for people unfamiliar with the code.
> >
> > Yes, that's a good idea.  The expensive bit between handle_exit_irqoff
> > and handle_exit is srcu_read_lock, which has two memory barriers in it.

Moving the handling into vmx_handle_exit_irqoff() can worse ~100
cycles than right after kvm_put_guest_xcr0() in my testing. So, which
one do you prefer?

For moving the handling into vmx_handle_exit_irqoff(), how about the
patch below:

---8<---

From 1b20b0a80c6da18b721f125cc40f8be5ad31f4b1 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli@tencent.com>
Date: Fri, 15 Nov 2019 10:18:09 +0800
Subject: [PATCH v2] KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/include/uapi/asm/vmx.h |  1 +
 arch/x86/kvm/svm.c              |  4 ++--
 arch/x86/kvm/vmx/vmx.c          | 40 +++++++++++++++++++++++++++++++++++++---
 arch/x86/kvm/x86.c              |  5 +++--
 5 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 24d6598..3604f3a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1067,7 +1067,7 @@ struct kvm_x86_ops {
     void (*tlb_flush_gva)(struct kvm_vcpu *vcpu, gva_t addr);

     void (*run)(struct kvm_vcpu *vcpu);
-    int (*handle_exit)(struct kvm_vcpu *vcpu);
+    int (*handle_exit)(struct kvm_vcpu *vcpu, u32 *vcpu_exit_reason);
     int (*skip_emulated_instruction)(struct kvm_vcpu *vcpu);
     void (*set_interrupt_shadow)(struct kvm_vcpu *vcpu, int mask);
     u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
@@ -1117,7 +1117,7 @@ struct kvm_x86_ops {
     int (*check_intercept)(struct kvm_vcpu *vcpu,
                    struct x86_instruction_info *info,
                    enum x86_intercept_stage stage);
-    void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu);
+    void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu, u32 *vcpu_exit_reason);
     bool (*mpx_supported)(void);
     bool (*xsaves_supported)(void);
     bool (*umip_emulated)(void);
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index 3eb8411..b33c6e1 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -88,6 +88,7 @@
 #define EXIT_REASON_XRSTORS             64
 #define EXIT_REASON_UMWAIT              67
 #define EXIT_REASON_TPAUSE              68
+#define EXIT_REASON_NEED_SKIP_EMULATED_INSN -1

 #define VMX_EXIT_REASONS \
     { EXIT_REASON_EXCEPTION_NMI,         "EXCEPTION_NMI" }, \
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c5673bd..3bb0661 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4927,7 +4927,7 @@ static void svm_get_exit_info(struct kvm_vcpu
*vcpu, u64 *info1, u64 *info2)
     *info2 = control->exit_info_2;
 }

-static int handle_exit(struct kvm_vcpu *vcpu)
+static int handle_exit(struct kvm_vcpu *vcpu, u32 *vcpu_exit_reason)
 {
     struct vcpu_svm *svm = to_svm(vcpu);
     struct kvm_run *kvm_run = vcpu->run;
@@ -6171,7 +6171,7 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
     return ret;
 }

-static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+static void svm_handle_exit_irqoff(struct kvm_vcpu *vcpu, u32
*vcpu_exit_reason)
 {

 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d21a4a..073fe1f 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5838,7 +5838,7 @@ void dump_vmcs(void)
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
  */
-static int vmx_handle_exit(struct kvm_vcpu *vcpu)
+static int vmx_handle_exit(struct kvm_vcpu *vcpu, u32 *vcpu_exit_reason)
 {
     struct vcpu_vmx *vmx = to_vmx(vcpu);
     u32 exit_reason = vmx->exit_reason;
@@ -5924,7 +5924,10 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
         }
     }

-    if (exit_reason < kvm_vmx_max_exit_handlers
+    if (*vcpu_exit_reason == EXIT_REASON_NEED_SKIP_EMULATED_INSN) {
+        kvm_skip_emulated_instruction(vcpu);
+        return 1;
+    } else if (exit_reason < kvm_vmx_max_exit_handlers
         && kvm_vmx_exit_handlers[exit_reason])
         return kvm_vmx_exit_handlers[exit_reason](vcpu);
     else {
@@ -6255,7 +6258,36 @@ static void
handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu)
 }
 STACK_FRAME_NON_STANDARD(handle_external_interrupt_irqoff);

-static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
+static u32 handle_ipi_fastpath(struct kvm_vcpu *vcpu)
+{
+    u32 index;
+    u64 data;
+    int ret = 0;
+
+    if (lapic_in_kernel(vcpu) && apic_x2apic_mode(vcpu->arch.apic)) {
+        /*
+         * fastpath to IPI target, FIXED+PHYSICAL which is popular
+         */
+        index = kvm_rcx_read(vcpu);
+        data = kvm_read_edx_eax(vcpu);
+
+        if (((index - APIC_BASE_MSR) << 4 == APIC_ICR) &&
+            ((data & KVM_APIC_DEST_MASK) == APIC_DEST_PHYSICAL) &&
+            ((data & APIC_MODE_MASK) == APIC_DM_FIXED)) {
+
+            trace_kvm_msr_write(index, data);
+            kvm_lapic_set_reg(vcpu->arch.apic, APIC_ICR2, (u32)(data >> 32));
+            ret = kvm_lapic_reg_write(vcpu->arch.apic, APIC_ICR, (u32)data);
+
+            if (ret == 0)
+                return EXIT_REASON_NEED_SKIP_EMULATED_INSN;
+        }
+    }
+
+    return ret;
+}
+
+static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu, u32 *exit_reason)
 {
     struct vcpu_vmx *vmx = to_vmx(vcpu);

@@ -6263,6 +6295,8 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)
         handle_external_interrupt_irqoff(vcpu);
     else if (vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI)
         handle_exception_nmi_irqoff(vmx);
+    else if (vmx->exit_reason == EXIT_REASON_MSR_WRITE)
+        *exit_reason = handle_ipi_fastpath(vcpu);
 }

 static bool vmx_has_emulated_msr(int index)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff395f8..130576b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7931,6 +7931,7 @@ EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
 static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 {
     int r;
+    u32 exit_reason = 0;
     bool req_int_win =
         dm_request_for_irq_injection(vcpu) &&
         kvm_cpu_accept_dm_intr(vcpu);
@@ -8180,7 +8181,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
     vcpu->mode = OUTSIDE_GUEST_MODE;
     smp_wmb();

-    kvm_x86_ops->handle_exit_irqoff(vcpu);
+    kvm_x86_ops->handle_exit_irqoff(vcpu, &exit_reason);

     /*
      * Consume any pending interrupts, including the possible source of
@@ -8224,7 +8225,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
         kvm_lapic_sync_from_vapic(vcpu);

     vcpu->arch.gpa_available = false;
-    r = kvm_x86_ops->handle_exit(vcpu);
+    r = kvm_x86_ops->handle_exit(vcpu, &exit_reason);
     return r;

 cancel_injection:
--
2.7.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, back to index

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-09  7:05 [PATCH 1/2] KVM: X86: Single target IPI fastpath Wanpeng Li
2019-11-09  7:05 ` [PATCH 2/2] KVM: LAPIC: micro-optimize fixed mode ipi delivery Wanpeng Li
2019-11-11 21:59   ` Paolo Bonzini
2019-11-12  1:34     ` Wanpeng Li
2019-11-10  1:59 ` [PATCH 1/2] KVM: X86: Single target IPI fastpath kbuild test robot
2019-11-10  1:59 ` [PATCH] KVM: X86: fix semicolon.cocci warnings kbuild test robot
2019-11-11 13:06 ` [PATCH 1/2] KVM: X86: Single target IPI fastpath Vitaly Kuznetsov
2019-11-12  1:18   ` Wanpeng Li
2019-11-11 21:59 ` Paolo Bonzini
2019-11-12  1:33   ` Wanpeng Li
2019-11-13  6:05     ` Wanpeng Li
2019-11-14  3:12     ` Wanpeng Li
2019-11-14 11:58 ` Paolo Bonzini
2019-11-14 15:22   ` Sean Christopherson
2019-11-14 15:44     ` Paolo Bonzini
2019-11-14 16:34       ` Sean Christopherson
2019-11-15  5:56         ` Wanpeng Li

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git