linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis
@ 2020-11-28 14:18 Shenming Lu
  2020-11-28 14:18 ` [PATCH v2 1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit Shenming Lu
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Shenming Lu @ 2020-11-28 14:18 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Jason Cooper, linux-kernel,
	linux-arm-kernel, kvmarm, kvm, James Morse, Julien Thierry,
	Suzuki K Poulose, Catalin Marinas, Will Deacon, Eric Auger,
	Christoffer Dall
  Cc: wanghaibin.wang, yuzenghui, lushenming

Right after a vPE is made resident, the code starts polling the
GICR_VPENDBASER.Dirty bit until it becomes 0, where the delay_us
is set to 10. But in our measurement, it takes only hundreds of
nanoseconds, or 1~2 microseconds, to finish parsing the VPT in most
cases. What's more, we found that the MMIO delay on GICv4.1 system
(HiSilicon) is about 10 times higher than that on GICv4.0 system in
kvm-unit-tests (the specific data is as follows).

                        |   GICv4.1 emulator   |  GICv4.0 emulator
mmio_read_user (ns)     |        12811         |        1598

After analysis, this is mainly caused by the 10 delay_us, so it might
really hurt performance.

To avoid this, we can set the delay_us to 1, which is more appropriate
in this situation and universal. Besides, we can delay the execution
of the polling, giving the GIC a chance to work in parallel with the CPU
on the entry path.

Shenming Lu (2):
  irqchip/gic-v4.1: Reduce the delay time of the poll on the
    GICR_VPENDBASER.Dirty bit
  KVM: arm64: Delay the execution of the polling on the
    GICR_VPENDBASER.Dirty bit

 arch/arm64/kvm/vgic/vgic-v4.c      | 16 ++++++++++++++++
 arch/arm64/kvm/vgic/vgic.c         |  3 +++
 drivers/irqchip/irq-gic-v3-its.c   | 18 +++++++++++++-----
 drivers/irqchip/irq-gic-v4.c       | 11 +++++++++++
 include/kvm/arm_vgic.h             |  3 +++
 include/linux/irqchip/arm-gic-v4.h |  4 ++++
 6 files changed, 50 insertions(+), 5 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit
  2020-11-28 14:18 [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis Shenming Lu
@ 2020-11-28 14:18 ` Shenming Lu
  2020-12-11 14:58   ` [irqchip: irq/irqchip-next] irqchip/gic-v4.1: Reduce the delay when polling GICR_VPENDBASER.Dirty irqchip-bot for Shenming Lu
  2020-11-28 14:18 ` [PATCH v2 2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit Shenming Lu
  2020-12-11 15:01 ` [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis Marc Zyngier
  2 siblings, 1 reply; 8+ messages in thread
From: Shenming Lu @ 2020-11-28 14:18 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Jason Cooper, linux-kernel,
	linux-arm-kernel, kvmarm, kvm, James Morse, Julien Thierry,
	Suzuki K Poulose, Catalin Marinas, Will Deacon, Eric Auger,
	Christoffer Dall
  Cc: wanghaibin.wang, yuzenghui, lushenming

The 10 delay_us of the poll on the GICR_VPENDBASER.Dirty bit is too
high, which might greatly affect the total scheduling latency of a
vCPU in our measurement. So we reduce it to 1 to lessen the impact.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
---
 drivers/irqchip/irq-gic-v3-its.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 0fec31931e11..22f427135c6b 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3809,7 +3809,7 @@ static void its_wait_vpt_parse_complete(void)
 	WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + GICR_VPENDBASER,
 						       val,
 						       !(val & GICR_VPENDBASER_Dirty),
-						       10, 500));
+						       1, 500));
 }
 
 static void its_vpe_schedule(struct its_vpe *vpe)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit
  2020-11-28 14:18 [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis Shenming Lu
  2020-11-28 14:18 ` [PATCH v2 1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit Shenming Lu
@ 2020-11-28 14:18 ` Shenming Lu
  2020-11-30 11:22   ` Marc Zyngier
  2020-12-11 15:01 ` [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis Marc Zyngier
  2 siblings, 1 reply; 8+ messages in thread
From: Shenming Lu @ 2020-11-28 14:18 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Jason Cooper, linux-kernel,
	linux-arm-kernel, kvmarm, kvm, James Morse, Julien Thierry,
	Suzuki K Poulose, Catalin Marinas, Will Deacon, Eric Auger,
	Christoffer Dall
  Cc: wanghaibin.wang, yuzenghui, lushenming

In order to further reduce the impact of the wait delay of the
VPT analysis, we can delay the execution of the polling on the
GICR_VPENDBASER.Dirty bit (call it from kvm_vgic_flush_hwstate()
corresponding to vPE resident), let the GIC and the CPU work in
parallel on the entry path.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
---
 arch/arm64/kvm/vgic/vgic-v4.c      | 16 ++++++++++++++++
 arch/arm64/kvm/vgic/vgic.c         |  3 +++
 drivers/irqchip/irq-gic-v3-its.c   | 16 ++++++++++++----
 drivers/irqchip/irq-gic-v4.c       | 11 +++++++++++
 include/kvm/arm_vgic.h             |  3 +++
 include/linux/irqchip/arm-gic-v4.h |  4 ++++
 6 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index b5fa73c9fd35..b0da74809187 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -353,6 +353,22 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
 	return err;
 }
 
+void vgic_v4_commit(struct kvm_vcpu *vcpu)
+{
+	struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
+
+	/*
+	 * No need to wait for the vPE to be ready across a shallow guest
+	 * exit, as only a vcpu_put will invalidate it.
+	 */
+	if (vpe->vpe_ready)
+		return;
+
+	its_commit_vpe(vpe);
+
+	vpe->vpe_ready = true;
+}
+
 static struct vgic_its *vgic_get_its(struct kvm *kvm,
 				     struct kvm_kernel_irq_routing_entry *irq_entry)
 {
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index c3643b7f101b..1c597c9885fa 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -915,6 +915,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 
 	if (can_access_vgic_from_kernel())
 		vgic_restore_state(vcpu);
+
+	if (vgic_supports_direct_msis(vcpu->kvm))
+		vgic_v4_commit(vcpu);
 }
 
 void kvm_vgic_load(struct kvm_vcpu *vcpu)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 22f427135c6b..f30aba14933e 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3842,8 +3842,6 @@ static void its_vpe_schedule(struct its_vpe *vpe)
 	val |= vpe->idai ? GICR_VPENDBASER_IDAI : 0;
 	val |= GICR_VPENDBASER_Valid;
 	gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER);
-
-	its_wait_vpt_parse_complete();
 }
 
 static void its_vpe_deschedule(struct its_vpe *vpe)
@@ -3855,6 +3853,8 @@ static void its_vpe_deschedule(struct its_vpe *vpe)
 
 	vpe->idai = !!(val & GICR_VPENDBASER_IDAI);
 	vpe->pending_last = !!(val & GICR_VPENDBASER_PendingLast);
+
+	vpe->vpe_ready = false;
 }
 
 static void its_vpe_invall(struct its_vpe *vpe)
@@ -3891,6 +3891,10 @@ static int its_vpe_set_vcpu_affinity(struct irq_data *d, void *vcpu_info)
 		its_vpe_deschedule(vpe);
 		return 0;
 
+	case COMMIT_VPE:
+		its_wait_vpt_parse_complete();
+		return 0;
+
 	case INVALL_VPE:
 		its_vpe_invall(vpe);
 		return 0;
@@ -4052,8 +4056,6 @@ static void its_vpe_4_1_schedule(struct its_vpe *vpe,
 	val |= FIELD_PREP(GICR_VPENDBASER_4_1_VPEID, vpe->vpe_id);
 
 	gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER);
-
-	its_wait_vpt_parse_complete();
 }
 
 static void its_vpe_4_1_deschedule(struct its_vpe *vpe,
@@ -4091,6 +4093,8 @@ static void its_vpe_4_1_deschedule(struct its_vpe *vpe,
 					    GICR_VPENDBASER_PendingLast);
 		vpe->pending_last = true;
 	}
+
+	vpe->vpe_ready = false;
 }
 
 static void its_vpe_4_1_invall(struct its_vpe *vpe)
@@ -4128,6 +4132,10 @@ static int its_vpe_4_1_set_vcpu_affinity(struct irq_data *d, void *vcpu_info)
 		its_vpe_4_1_deschedule(vpe, info);
 		return 0;
 
+	case COMMIT_VPE:
+		its_wait_vpt_parse_complete();
+		return 0;
+
 	case INVALL_VPE:
 		its_vpe_4_1_invall(vpe);
 		return 0;
diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c
index 0c18714ae13e..6cea71a4e68b 100644
--- a/drivers/irqchip/irq-gic-v4.c
+++ b/drivers/irqchip/irq-gic-v4.c
@@ -258,6 +258,17 @@ int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en)
 	return ret;
 }
 
+int its_commit_vpe(struct its_vpe *vpe)
+{
+	struct its_cmd_info info = {
+		.cmd_type = COMMIT_VPE,
+	};
+
+	WARN_ON(preemptible());
+
+	return its_send_vpe_cmd(vpe, &info);
+}
+
 int its_invall_vpe(struct its_vpe *vpe)
 {
 	struct its_cmd_info info = {
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index a8d8fdcd3723..f2170df6cf7c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -401,7 +401,10 @@ int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int irq,
 int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
 				 struct kvm_kernel_irq_routing_entry *irq_entry);
 
+void vgic_v4_commit(struct kvm_vcpu *vcpu);
+
 int vgic_v4_load(struct kvm_vcpu *vcpu);
+
 int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
 
 #endif /* __KVM_ARM_VGIC_H */
diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h
index 6976b8331b60..936d88e482a9 100644
--- a/include/linux/irqchip/arm-gic-v4.h
+++ b/include/linux/irqchip/arm-gic-v4.h
@@ -75,6 +75,8 @@ struct its_vpe {
 	u16			vpe_id;
 	/* Pending VLPIs on schedule out? */
 	bool			pending_last;
+	/* VPT parse complete */
+	bool			vpe_ready;
 };
 
 /*
@@ -104,6 +106,7 @@ enum its_vcpu_info_cmd_type {
 	PROP_UPDATE_AND_INV_VLPI,
 	SCHEDULE_VPE,
 	DESCHEDULE_VPE,
+	COMMIT_VPE,
 	INVALL_VPE,
 	PROP_UPDATE_VSGI,
 };
@@ -129,6 +132,7 @@ int its_alloc_vcpu_irqs(struct its_vm *vm);
 void its_free_vcpu_irqs(struct its_vm *vm);
 int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en);
 int its_make_vpe_non_resident(struct its_vpe *vpe, bool db);
+int its_commit_vpe(struct its_vpe *vpe);
 int its_invall_vpe(struct its_vpe *vpe);
 int its_map_vlpi(int irq, struct its_vlpi_map *map);
 int its_get_vlpi(int irq, struct its_vlpi_map *map);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit
  2020-11-28 14:18 ` [PATCH v2 2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit Shenming Lu
@ 2020-11-30 11:22   ` Marc Zyngier
  2020-11-30 12:12     ` Shenming Lu
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2020-11-30 11:22 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Thomas Gleixner, linux-kernel, linux-arm-kernel, kvmarm, kvm,
	James Morse, Julien Thierry, Suzuki K Poulose, Catalin Marinas,
	Will Deacon, Eric Auger, Christoffer Dall, wanghaibin.wang,
	yuzenghui

On 2020-11-28 14:18, Shenming Lu wrote:
> In order to further reduce the impact of the wait delay of the
> VPT analysis, we can delay the execution of the polling on the
> GICR_VPENDBASER.Dirty bit (call it from kvm_vgic_flush_hwstate()
> corresponding to vPE resident), let the GIC and the CPU work in
> parallel on the entry path.
> 
> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> ---
>  arch/arm64/kvm/vgic/vgic-v4.c      | 16 ++++++++++++++++
>  arch/arm64/kvm/vgic/vgic.c         |  3 +++
>  drivers/irqchip/irq-gic-v3-its.c   | 16 ++++++++++++----
>  drivers/irqchip/irq-gic-v4.c       | 11 +++++++++++
>  include/kvm/arm_vgic.h             |  3 +++
>  include/linux/irqchip/arm-gic-v4.h |  4 ++++
>  6 files changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-v4.c 
> b/arch/arm64/kvm/vgic/vgic-v4.c
> index b5fa73c9fd35..b0da74809187 100644
> --- a/arch/arm64/kvm/vgic/vgic-v4.c
> +++ b/arch/arm64/kvm/vgic/vgic-v4.c
> @@ -353,6 +353,22 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
>  	return err;
>  }
> 
> +void vgic_v4_commit(struct kvm_vcpu *vcpu)
> +{
> +	struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
> +
> +	/*
> +	 * No need to wait for the vPE to be ready across a shallow guest
> +	 * exit, as only a vcpu_put will invalidate it.
> +	 */
> +	if (vpe->vpe_ready)
> +		return;
> +
> +	its_commit_vpe(vpe);
> +
> +	vpe->vpe_ready = true;

This should be written as:

if (!ready)
      commit();

and ready being driven by the commit() call itself.

> +}
> +
>  static struct vgic_its *vgic_get_its(struct kvm *kvm,
>  				     struct kvm_kernel_irq_routing_entry *irq_entry)
>  {
> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> index c3643b7f101b..1c597c9885fa 100644
> --- a/arch/arm64/kvm/vgic/vgic.c
> +++ b/arch/arm64/kvm/vgic/vgic.c
> @@ -915,6 +915,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
> 
>  	if (can_access_vgic_from_kernel())
>  		vgic_restore_state(vcpu);
> +
> +	if (vgic_supports_direct_msis(vcpu->kvm))
> +		vgic_v4_commit(vcpu);
>  }
> 
>  void kvm_vgic_load(struct kvm_vcpu *vcpu)
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 22f427135c6b..f30aba14933e 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -3842,8 +3842,6 @@ static void its_vpe_schedule(struct its_vpe *vpe)
>  	val |= vpe->idai ? GICR_VPENDBASER_IDAI : 0;
>  	val |= GICR_VPENDBASER_Valid;
>  	gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER);
> -
> -	its_wait_vpt_parse_complete();
>  }
> 
>  static void its_vpe_deschedule(struct its_vpe *vpe)
> @@ -3855,6 +3853,8 @@ static void its_vpe_deschedule(struct its_vpe 
> *vpe)
> 
>  	vpe->idai = !!(val & GICR_VPENDBASER_IDAI);
>  	vpe->pending_last = !!(val & GICR_VPENDBASER_PendingLast);
> +
> +	vpe->vpe_ready = false;

This should be set from the its_make_vpe_non_resident() call.

>  }
> 
>  static void its_vpe_invall(struct its_vpe *vpe)
> @@ -3891,6 +3891,10 @@ static int its_vpe_set_vcpu_affinity(struct
> irq_data *d, void *vcpu_info)
>  		its_vpe_deschedule(vpe);
>  		return 0;
> 
> +	case COMMIT_VPE:
> +		its_wait_vpt_parse_complete();
> +		return 0;
> +
>  	case INVALL_VPE:
>  		its_vpe_invall(vpe);
>  		return 0;
> @@ -4052,8 +4056,6 @@ static void its_vpe_4_1_schedule(struct its_vpe 
> *vpe,
>  	val |= FIELD_PREP(GICR_VPENDBASER_4_1_VPEID, vpe->vpe_id);
> 
>  	gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER);
> -
> -	its_wait_vpt_parse_complete();
>  }
> 
>  static void its_vpe_4_1_deschedule(struct its_vpe *vpe,
> @@ -4091,6 +4093,8 @@ static void its_vpe_4_1_deschedule(struct its_vpe 
> *vpe,
>  					    GICR_VPENDBASER_PendingLast);
>  		vpe->pending_last = true;
>  	}
> +
> +	vpe->vpe_ready = false;
>  }
> 
>  static void its_vpe_4_1_invall(struct its_vpe *vpe)
> @@ -4128,6 +4132,10 @@ static int its_vpe_4_1_set_vcpu_affinity(struct
> irq_data *d, void *vcpu_info)
>  		its_vpe_4_1_deschedule(vpe, info);
>  		return 0;
> 
> +	case COMMIT_VPE:
> +		its_wait_vpt_parse_complete();
> +		return 0;
> +
>  	case INVALL_VPE:
>  		its_vpe_4_1_invall(vpe);
>  		return 0;
> diff --git a/drivers/irqchip/irq-gic-v4.c 
> b/drivers/irqchip/irq-gic-v4.c
> index 0c18714ae13e..6cea71a4e68b 100644
> --- a/drivers/irqchip/irq-gic-v4.c
> +++ b/drivers/irqchip/irq-gic-v4.c
> @@ -258,6 +258,17 @@ int its_make_vpe_resident(struct its_vpe *vpe,
> bool g0en, bool g1en)
>  	return ret;
>  }
> 
> +int its_commit_vpe(struct its_vpe *vpe)
> +{
> +	struct its_cmd_info info = {
> +		.cmd_type = COMMIT_VPE,
> +	};
> +
> +	WARN_ON(preemptible());
> +
> +	return its_send_vpe_cmd(vpe, &info);
> +}
> +
>  int its_invall_vpe(struct its_vpe *vpe)
>  {
>  	struct its_cmd_info info = {
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index a8d8fdcd3723..f2170df6cf7c 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -401,7 +401,10 @@ int kvm_vgic_v4_set_forwarding(struct kvm *kvm, 
> int irq,
>  int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
>  				 struct kvm_kernel_irq_routing_entry *irq_entry);
> 
> +void vgic_v4_commit(struct kvm_vcpu *vcpu);
> +
>  int vgic_v4_load(struct kvm_vcpu *vcpu);
> +

Spurious new lines.

>  int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
> 
>  #endif /* __KVM_ARM_VGIC_H */
> diff --git a/include/linux/irqchip/arm-gic-v4.h
> b/include/linux/irqchip/arm-gic-v4.h
> index 6976b8331b60..936d88e482a9 100644
> --- a/include/linux/irqchip/arm-gic-v4.h
> +++ b/include/linux/irqchip/arm-gic-v4.h
> @@ -75,6 +75,8 @@ struct its_vpe {
>  	u16			vpe_id;
>  	/* Pending VLPIs on schedule out? */
>  	bool			pending_last;
> +	/* VPT parse complete */
> +	bool			vpe_ready;
>  };
> 
>  /*
> @@ -104,6 +106,7 @@ enum its_vcpu_info_cmd_type {
>  	PROP_UPDATE_AND_INV_VLPI,
>  	SCHEDULE_VPE,
>  	DESCHEDULE_VPE,
> +	COMMIT_VPE,
>  	INVALL_VPE,
>  	PROP_UPDATE_VSGI,
>  };
> @@ -129,6 +132,7 @@ int its_alloc_vcpu_irqs(struct its_vm *vm);
>  void its_free_vcpu_irqs(struct its_vm *vm);
>  int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en);
>  int its_make_vpe_non_resident(struct its_vpe *vpe, bool db);
> +int its_commit_vpe(struct its_vpe *vpe);
>  int its_invall_vpe(struct its_vpe *vpe);
>  int its_map_vlpi(int irq, struct its_vlpi_map *map);
>  int its_get_vlpi(int irq, struct its_vlpi_map *map);

In order to speed up the respin round-trip, I've taken the liberty
to refactor this patch myself. Please have a look at [1] and let
me know if you're OK with it.

Thanks,

         M.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/misc-5.11&id=57e3cebd022fbc035dcf190ac789fd2ffc747f5b
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit
  2020-11-30 11:22   ` Marc Zyngier
@ 2020-11-30 12:12     ` Shenming Lu
  2020-11-30 12:28       ` Marc Zyngier
  0 siblings, 1 reply; 8+ messages in thread
From: Shenming Lu @ 2020-11-30 12:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, linux-kernel, linux-arm-kernel, kvmarm, kvm,
	James Morse, Julien Thierry, Suzuki K Poulose, Catalin Marinas,
	Will Deacon, Eric Auger, Christoffer Dall, wanghaibin.wang,
	yuzenghui

On 2020/11/30 19:22, Marc Zyngier wrote:
> On 2020-11-28 14:18, Shenming Lu wrote:
>> In order to further reduce the impact of the wait delay of the
>> VPT analysis, we can delay the execution of the polling on the
>> GICR_VPENDBASER.Dirty bit (call it from kvm_vgic_flush_hwstate()
>> corresponding to vPE resident), let the GIC and the CPU work in
>> parallel on the entry path.
>>
>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>> ---
>>  arch/arm64/kvm/vgic/vgic-v4.c      | 16 ++++++++++++++++
>>  arch/arm64/kvm/vgic/vgic.c         |  3 +++
>>  drivers/irqchip/irq-gic-v3-its.c   | 16 ++++++++++++----
>>  drivers/irqchip/irq-gic-v4.c       | 11 +++++++++++
>>  include/kvm/arm_vgic.h             |  3 +++
>>  include/linux/irqchip/arm-gic-v4.h |  4 ++++
>>  6 files changed, 49 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
>> index b5fa73c9fd35..b0da74809187 100644
>> --- a/arch/arm64/kvm/vgic/vgic-v4.c
>> +++ b/arch/arm64/kvm/vgic/vgic-v4.c
>> @@ -353,6 +353,22 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
>>      return err;
>>  }
>>
>> +void vgic_v4_commit(struct kvm_vcpu *vcpu)
>> +{
>> +    struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
>> +
>> +    /*
>> +     * No need to wait for the vPE to be ready across a shallow guest
>> +     * exit, as only a vcpu_put will invalidate it.
>> +     */
>> +    if (vpe->vpe_ready)
>> +        return;
>> +
>> +    its_commit_vpe(vpe);
>> +
>> +    vpe->vpe_ready = true;
> 
> This should be written as:
> 
> if (!ready)
>      commit();
> 
> and ready being driven by the commit() call itself.
> 
>> +}
>> +
>>  static struct vgic_its *vgic_get_its(struct kvm *kvm,
>>                       struct kvm_kernel_irq_routing_entry *irq_entry)
>>  {
>> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
>> index c3643b7f101b..1c597c9885fa 100644
>> --- a/arch/arm64/kvm/vgic/vgic.c
>> +++ b/arch/arm64/kvm/vgic/vgic.c
>> @@ -915,6 +915,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>>
>>      if (can_access_vgic_from_kernel())
>>          vgic_restore_state(vcpu);
>> +
>> +    if (vgic_supports_direct_msis(vcpu->kvm))
>> +        vgic_v4_commit(vcpu);
>>  }
>>
>>  void kvm_vgic_load(struct kvm_vcpu *vcpu)
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 22f427135c6b..f30aba14933e 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -3842,8 +3842,6 @@ static void its_vpe_schedule(struct its_vpe *vpe)
>>      val |= vpe->idai ? GICR_VPENDBASER_IDAI : 0;
>>      val |= GICR_VPENDBASER_Valid;
>>      gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER);
>> -
>> -    its_wait_vpt_parse_complete();
>>  }
>>
>>  static void its_vpe_deschedule(struct its_vpe *vpe)
>> @@ -3855,6 +3853,8 @@ static void its_vpe_deschedule(struct its_vpe *vpe)
>>
>>      vpe->idai = !!(val & GICR_VPENDBASER_IDAI);
>>      vpe->pending_last = !!(val & GICR_VPENDBASER_PendingLast);
>> +
>> +    vpe->vpe_ready = false;
> 
> This should be set from the its_make_vpe_non_resident() call.
> 
>>  }
>>
>>  static void its_vpe_invall(struct its_vpe *vpe)
>> @@ -3891,6 +3891,10 @@ static int its_vpe_set_vcpu_affinity(struct
>> irq_data *d, void *vcpu_info)
>>          its_vpe_deschedule(vpe);
>>          return 0;
>>
>> +    case COMMIT_VPE:
>> +        its_wait_vpt_parse_complete();
>> +        return 0;
>> +
>>      case INVALL_VPE:
>>          its_vpe_invall(vpe);
>>          return 0;
>> @@ -4052,8 +4056,6 @@ static void its_vpe_4_1_schedule(struct its_vpe *vpe,
>>      val |= FIELD_PREP(GICR_VPENDBASER_4_1_VPEID, vpe->vpe_id);
>>
>>      gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER);
>> -
>> -    its_wait_vpt_parse_complete();
>>  }
>>
>>  static void its_vpe_4_1_deschedule(struct its_vpe *vpe,
>> @@ -4091,6 +4093,8 @@ static void its_vpe_4_1_deschedule(struct its_vpe *vpe,
>>                          GICR_VPENDBASER_PendingLast);
>>          vpe->pending_last = true;
>>      }
>> +
>> +    vpe->vpe_ready = false;
>>  }
>>
>>  static void its_vpe_4_1_invall(struct its_vpe *vpe)
>> @@ -4128,6 +4132,10 @@ static int its_vpe_4_1_set_vcpu_affinity(struct
>> irq_data *d, void *vcpu_info)
>>          its_vpe_4_1_deschedule(vpe, info);
>>          return 0;
>>
>> +    case COMMIT_VPE:
>> +        its_wait_vpt_parse_complete();
>> +        return 0;
>> +
>>      case INVALL_VPE:
>>          its_vpe_4_1_invall(vpe);
>>          return 0;
>> diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c
>> index 0c18714ae13e..6cea71a4e68b 100644
>> --- a/drivers/irqchip/irq-gic-v4.c
>> +++ b/drivers/irqchip/irq-gic-v4.c
>> @@ -258,6 +258,17 @@ int its_make_vpe_resident(struct its_vpe *vpe,
>> bool g0en, bool g1en)
>>      return ret;
>>  }
>>
>> +int its_commit_vpe(struct its_vpe *vpe)
>> +{
>> +    struct its_cmd_info info = {
>> +        .cmd_type = COMMIT_VPE,
>> +    };
>> +
>> +    WARN_ON(preemptible());
>> +
>> +    return its_send_vpe_cmd(vpe, &info);
>> +}
>> +
>>  int its_invall_vpe(struct its_vpe *vpe)
>>  {
>>      struct its_cmd_info info = {
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index a8d8fdcd3723..f2170df6cf7c 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -401,7 +401,10 @@ int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int irq,
>>  int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
>>                   struct kvm_kernel_irq_routing_entry *irq_entry);
>>
>> +void vgic_v4_commit(struct kvm_vcpu *vcpu);
>> +
>>  int vgic_v4_load(struct kvm_vcpu *vcpu);
>> +
> 
> Spurious new lines.
> 
>>  int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
>>
>>  #endif /* __KVM_ARM_VGIC_H */
>> diff --git a/include/linux/irqchip/arm-gic-v4.h
>> b/include/linux/irqchip/arm-gic-v4.h
>> index 6976b8331b60..936d88e482a9 100644
>> --- a/include/linux/irqchip/arm-gic-v4.h
>> +++ b/include/linux/irqchip/arm-gic-v4.h
>> @@ -75,6 +75,8 @@ struct its_vpe {
>>      u16            vpe_id;
>>      /* Pending VLPIs on schedule out? */
>>      bool            pending_last;
>> +    /* VPT parse complete */
>> +    bool            vpe_ready;
>>  };
>>
>>  /*
>> @@ -104,6 +106,7 @@ enum its_vcpu_info_cmd_type {
>>      PROP_UPDATE_AND_INV_VLPI,
>>      SCHEDULE_VPE,
>>      DESCHEDULE_VPE,
>> +    COMMIT_VPE,
>>      INVALL_VPE,
>>      PROP_UPDATE_VSGI,
>>  };
>> @@ -129,6 +132,7 @@ int its_alloc_vcpu_irqs(struct its_vm *vm);
>>  void its_free_vcpu_irqs(struct its_vm *vm);
>>  int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en);
>>  int its_make_vpe_non_resident(struct its_vpe *vpe, bool db);
>> +int its_commit_vpe(struct its_vpe *vpe);
>>  int its_invall_vpe(struct its_vpe *vpe);
>>  int its_map_vlpi(int irq, struct its_vlpi_map *map);
>>  int its_get_vlpi(int irq, struct its_vlpi_map *map);
> 
> In order to speed up the respin round-trip, I've taken the liberty
> to refactor this patch myself. Please have a look at [1] and let
> me know if you're OK with it.

I have looked at it and am OK.

By the way, will the first patch (set the delay_us to 1) be picked up
together?

Thanks,
Shenming

> 
> Thanks,
> 
>         M.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/misc-5.11&id=57e3cebd022fbc035dcf190ac789fd2ffc747f5b

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit
  2020-11-30 12:12     ` Shenming Lu
@ 2020-11-30 12:28       ` Marc Zyngier
  0 siblings, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2020-11-30 12:28 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Thomas Gleixner, linux-kernel, linux-arm-kernel, kvmarm, kvm,
	James Morse, Julien Thierry, Suzuki K Poulose, Catalin Marinas,
	Will Deacon, Eric Auger, Christoffer Dall, wanghaibin.wang,
	yuzenghui

On 2020-11-30 12:12, Shenming Lu wrote:
> On 2020/11/30 19:22, Marc Zyngier wrote:
>> On 2020-11-28 14:18, Shenming Lu wrote:
>>> In order to further reduce the impact of the wait delay of the
>>> VPT analysis, we can delay the execution of the polling on the
>>> GICR_VPENDBASER.Dirty bit (call it from kvm_vgic_flush_hwstate()
>>> corresponding to vPE resident), let the GIC and the CPU work in
>>> parallel on the entry path.
>>> 
>>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>>> ---
>>>  arch/arm64/kvm/vgic/vgic-v4.c      | 16 ++++++++++++++++
>>>  arch/arm64/kvm/vgic/vgic.c         |  3 +++
>>>  drivers/irqchip/irq-gic-v3-its.c   | 16 ++++++++++++----
>>>  drivers/irqchip/irq-gic-v4.c       | 11 +++++++++++
>>>  include/kvm/arm_vgic.h             |  3 +++
>>>  include/linux/irqchip/arm-gic-v4.h |  4 ++++
>>>  6 files changed, 49 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/arch/arm64/kvm/vgic/vgic-v4.c 
>>> b/arch/arm64/kvm/vgic/vgic-v4.c
>>> index b5fa73c9fd35..b0da74809187 100644
>>> --- a/arch/arm64/kvm/vgic/vgic-v4.c
>>> +++ b/arch/arm64/kvm/vgic/vgic-v4.c
>>> @@ -353,6 +353,22 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
>>>      return err;
>>>  }
>>> 
>>> +void vgic_v4_commit(struct kvm_vcpu *vcpu)
>>> +{
>>> +    struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
>>> +
>>> +    /*
>>> +     * No need to wait for the vPE to be ready across a shallow 
>>> guest
>>> +     * exit, as only a vcpu_put will invalidate it.
>>> +     */
>>> +    if (vpe->vpe_ready)
>>> +        return;
>>> +
>>> +    its_commit_vpe(vpe);
>>> +
>>> +    vpe->vpe_ready = true;
>> 
>> This should be written as:
>> 
>> if (!ready)
>>      commit();
>> 
>> and ready being driven by the commit() call itself.
>> 
>>> +}
>>> +
>>>  static struct vgic_its *vgic_get_its(struct kvm *kvm,
>>>                       struct kvm_kernel_irq_routing_entry *irq_entry)
>>>  {
>>> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
>>> index c3643b7f101b..1c597c9885fa 100644
>>> --- a/arch/arm64/kvm/vgic/vgic.c
>>> +++ b/arch/arm64/kvm/vgic/vgic.c
>>> @@ -915,6 +915,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu 
>>> *vcpu)
>>> 
>>>      if (can_access_vgic_from_kernel())
>>>          vgic_restore_state(vcpu);
>>> +
>>> +    if (vgic_supports_direct_msis(vcpu->kvm))
>>> +        vgic_v4_commit(vcpu);
>>>  }
>>> 
>>>  void kvm_vgic_load(struct kvm_vcpu *vcpu)
>>> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
>>> b/drivers/irqchip/irq-gic-v3-its.c
>>> index 22f427135c6b..f30aba14933e 100644
>>> --- a/drivers/irqchip/irq-gic-v3-its.c
>>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>>> @@ -3842,8 +3842,6 @@ static void its_vpe_schedule(struct its_vpe 
>>> *vpe)
>>>      val |= vpe->idai ? GICR_VPENDBASER_IDAI : 0;
>>>      val |= GICR_VPENDBASER_Valid;
>>>      gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER);
>>> -
>>> -    its_wait_vpt_parse_complete();
>>>  }
>>> 
>>>  static void its_vpe_deschedule(struct its_vpe *vpe)
>>> @@ -3855,6 +3853,8 @@ static void its_vpe_deschedule(struct its_vpe 
>>> *vpe)
>>> 
>>>      vpe->idai = !!(val & GICR_VPENDBASER_IDAI);
>>>      vpe->pending_last = !!(val & GICR_VPENDBASER_PendingLast);
>>> +
>>> +    vpe->vpe_ready = false;
>> 
>> This should be set from the its_make_vpe_non_resident() call.
>> 
>>>  }
>>> 
>>>  static void its_vpe_invall(struct its_vpe *vpe)
>>> @@ -3891,6 +3891,10 @@ static int its_vpe_set_vcpu_affinity(struct
>>> irq_data *d, void *vcpu_info)
>>>          its_vpe_deschedule(vpe);
>>>          return 0;
>>> 
>>> +    case COMMIT_VPE:
>>> +        its_wait_vpt_parse_complete();
>>> +        return 0;
>>> +
>>>      case INVALL_VPE:
>>>          its_vpe_invall(vpe);
>>>          return 0;
>>> @@ -4052,8 +4056,6 @@ static void its_vpe_4_1_schedule(struct its_vpe 
>>> *vpe,
>>>      val |= FIELD_PREP(GICR_VPENDBASER_4_1_VPEID, vpe->vpe_id);
>>> 
>>>      gicr_write_vpendbaser(val, vlpi_base + GICR_VPENDBASER);
>>> -
>>> -    its_wait_vpt_parse_complete();
>>>  }
>>> 
>>>  static void its_vpe_4_1_deschedule(struct its_vpe *vpe,
>>> @@ -4091,6 +4093,8 @@ static void its_vpe_4_1_deschedule(struct 
>>> its_vpe *vpe,
>>>                          GICR_VPENDBASER_PendingLast);
>>>          vpe->pending_last = true;
>>>      }
>>> +
>>> +    vpe->vpe_ready = false;
>>>  }
>>> 
>>>  static void its_vpe_4_1_invall(struct its_vpe *vpe)
>>> @@ -4128,6 +4132,10 @@ static int 
>>> its_vpe_4_1_set_vcpu_affinity(struct
>>> irq_data *d, void *vcpu_info)
>>>          its_vpe_4_1_deschedule(vpe, info);
>>>          return 0;
>>> 
>>> +    case COMMIT_VPE:
>>> +        its_wait_vpt_parse_complete();
>>> +        return 0;
>>> +
>>>      case INVALL_VPE:
>>>          its_vpe_4_1_invall(vpe);
>>>          return 0;
>>> diff --git a/drivers/irqchip/irq-gic-v4.c 
>>> b/drivers/irqchip/irq-gic-v4.c
>>> index 0c18714ae13e..6cea71a4e68b 100644
>>> --- a/drivers/irqchip/irq-gic-v4.c
>>> +++ b/drivers/irqchip/irq-gic-v4.c
>>> @@ -258,6 +258,17 @@ int its_make_vpe_resident(struct its_vpe *vpe,
>>> bool g0en, bool g1en)
>>>      return ret;
>>>  }
>>> 
>>> +int its_commit_vpe(struct its_vpe *vpe)
>>> +{
>>> +    struct its_cmd_info info = {
>>> +        .cmd_type = COMMIT_VPE,
>>> +    };
>>> +
>>> +    WARN_ON(preemptible());
>>> +
>>> +    return its_send_vpe_cmd(vpe, &info);
>>> +}
>>> +
>>>  int its_invall_vpe(struct its_vpe *vpe)
>>>  {
>>>      struct its_cmd_info info = {
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index a8d8fdcd3723..f2170df6cf7c 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -401,7 +401,10 @@ int kvm_vgic_v4_set_forwarding(struct kvm *kvm, 
>>> int irq,
>>>  int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
>>>                   struct kvm_kernel_irq_routing_entry *irq_entry);
>>> 
>>> +void vgic_v4_commit(struct kvm_vcpu *vcpu);
>>> +
>>>  int vgic_v4_load(struct kvm_vcpu *vcpu);
>>> +
>> 
>> Spurious new lines.
>> 
>>>  int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
>>> 
>>>  #endif /* __KVM_ARM_VGIC_H */
>>> diff --git a/include/linux/irqchip/arm-gic-v4.h
>>> b/include/linux/irqchip/arm-gic-v4.h
>>> index 6976b8331b60..936d88e482a9 100644
>>> --- a/include/linux/irqchip/arm-gic-v4.h
>>> +++ b/include/linux/irqchip/arm-gic-v4.h
>>> @@ -75,6 +75,8 @@ struct its_vpe {
>>>      u16            vpe_id;
>>>      /* Pending VLPIs on schedule out? */
>>>      bool            pending_last;
>>> +    /* VPT parse complete */
>>> +    bool            vpe_ready;
>>>  };
>>> 
>>>  /*
>>> @@ -104,6 +106,7 @@ enum its_vcpu_info_cmd_type {
>>>      PROP_UPDATE_AND_INV_VLPI,
>>>      SCHEDULE_VPE,
>>>      DESCHEDULE_VPE,
>>> +    COMMIT_VPE,
>>>      INVALL_VPE,
>>>      PROP_UPDATE_VSGI,
>>>  };
>>> @@ -129,6 +132,7 @@ int its_alloc_vcpu_irqs(struct its_vm *vm);
>>>  void its_free_vcpu_irqs(struct its_vm *vm);
>>>  int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool 
>>> g1en);
>>>  int its_make_vpe_non_resident(struct its_vpe *vpe, bool db);
>>> +int its_commit_vpe(struct its_vpe *vpe);
>>>  int its_invall_vpe(struct its_vpe *vpe);
>>>  int its_map_vlpi(int irq, struct its_vlpi_map *map);
>>>  int its_get_vlpi(int irq, struct its_vlpi_map *map);
>> 
>> In order to speed up the respin round-trip, I've taken the liberty
>> to refactor this patch myself. Please have a look at [1] and let
>> me know if you're OK with it.
> 
> I have looked at it and am OK.
> 
> By the way, will the first patch (set the delay_us to 1) be picked up
> together?

I'll route it via the irqchip tree.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [irqchip: irq/irqchip-next] irqchip/gic-v4.1: Reduce the delay when polling GICR_VPENDBASER.Dirty
  2020-11-28 14:18 ` [PATCH v2 1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit Shenming Lu
@ 2020-12-11 14:58   ` irqchip-bot for Shenming Lu
  0 siblings, 0 replies; 8+ messages in thread
From: irqchip-bot for Shenming Lu @ 2020-12-11 14:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Shenming Lu, Marc Zyngier, tglx

The following commit has been merged into the irq/irqchip-next branch of irqchip:

Commit-ID:     0b39498230ae53e6af981141be99f4c7d5144de6
Gitweb:        https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms/0b39498230ae53e6af981141be99f4c7d5144de6
Author:        Shenming Lu <lushenming@huawei.com>
AuthorDate:    Sat, 28 Nov 2020 22:18:56 +08:00
Committer:     Marc Zyngier <maz@kernel.org>
CommitterDate: Fri, 11 Dec 2020 14:47:10 

irqchip/gic-v4.1: Reduce the delay when polling GICR_VPENDBASER.Dirty

The 10us delay of the poll on the GICR_VPENDBASER.Dirty bit is too
high, which might greatly affect the total scheduling latency of a
vCPU in our measurement. So we reduce it to 1 to lessen the impact.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201128141857.983-2-lushenming@huawei.com
---
 drivers/irqchip/irq-gic-v3-its.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 4069c21..d74ef41 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -3808,7 +3808,7 @@ static void its_wait_vpt_parse_complete(void)
 	WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + GICR_VPENDBASER,
 						       val,
 						       !(val & GICR_VPENDBASER_Dirty),
-						       10, 500));
+						       1, 500));
 }
 
 static void its_vpe_schedule(struct its_vpe *vpe)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis
  2020-11-28 14:18 [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis Shenming Lu
  2020-11-28 14:18 ` [PATCH v2 1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit Shenming Lu
  2020-11-28 14:18 ` [PATCH v2 2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit Shenming Lu
@ 2020-12-11 15:01 ` Marc Zyngier
  2 siblings, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2020-12-11 15:01 UTC (permalink / raw)
  To: shawnguo, linux, leoyang.li, mark.rutland, zhiqiang.hou,
	Biwen Li, robh+dt, tglx, kvmarm, Will Deacon, kvm, Eric Auger,
	Christoffer Dall, linux-arm-kernel, Suzuki K Poulose,
	Shenming Lu, James Morse, Catalin Marinas, linux-kernel,
	Julien Thierry
  Cc: xiaobo.xie, Hou Zhiqiang, devicetree, Biwen Li, jiafei.pan,
	yuzenghui, wanghaibin.wang

On Sat, 28 Nov 2020 22:18:55 +0800, Shenming Lu wrote:
> Right after a vPE is made resident, the code starts polling the
> GICR_VPENDBASER.Dirty bit until it becomes 0, where the delay_us
> is set to 10. But in our measurement, it takes only hundreds of
> nanoseconds, or 1~2 microseconds, to finish parsing the VPT in most
> cases. What's more, we found that the MMIO delay on GICv4.1 system
> (HiSilicon) is about 10 times higher than that on GICv4.0 system in
> kvm-unit-tests (the specific data is as follows).
> 
> [...]

Applied to irq/irqchip-next, thanks!

[1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit
      commit: 0b39498230ae53e6af981141be99f4c7d5144de6

Patch 2 will be routed via the KVM/arm64 tree.

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-12-11 15:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-28 14:18 [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis Shenming Lu
2020-11-28 14:18 ` [PATCH v2 1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit Shenming Lu
2020-12-11 14:58   ` [irqchip: irq/irqchip-next] irqchip/gic-v4.1: Reduce the delay when polling GICR_VPENDBASER.Dirty irqchip-bot for Shenming Lu
2020-11-28 14:18 ` [PATCH v2 2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit Shenming Lu
2020-11-30 11:22   ` Marc Zyngier
2020-11-30 12:12     ` Shenming Lu
2020-11-30 12:28       ` Marc Zyngier
2020-12-11 15:01 ` [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).