All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/20] KVM: arm/arm64: Optimize arch timer register handling
@ 2017-09-23  0:41 ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas, Christoffer Dall

We currently spend a measurable amount of time on each entry/exit to the
guest dealing with arch timer registers, even when the timer is not
pending and not doing anything (on certain architectures).

We can do much better by moving the arch timer save/restore to the
vcpu_load and vcpu_put functions, but this means that if we don't read
back the timer state on every exit from the guest, then we have to be
able to start taking timer interrupts for the virtual timer in KVM and
handle that properly.

That has a number of entertaining consequences, such as having to make
sure we don't deadlock between any of the vgic code and interrupt
injection happening from an ISR.  On the plus side, being able to inject
virtual interrupts corresponding to a physical interrupt directly from
an ISR is probably a good system design change overall.

We also have to change the use of the physical vs. virtual counter in
the arm64 kernel to avoid having to save/restore the CNTVOFF_EL2
register on every return to the hypervisor.  The only reason I could
find for using the virtual counter for the kernel on systems with access
to the physical counter is to detect if firmware did not properly clear
CNTVOFF_EL2, and this change has to weighed against the existing check
(assuming I got this right).

On a non-VHE system (AMD Seattle) I have measured this to improve the
world-switch time by about ~100 cycles, but on an EL2 kernel (emulating
VHE behavior on the same hardware) this gives us around ~250 cycles
worth of improvement, and on Thunder-X we seem to get ~650 cycles
improvement, because we can avoid the extra configuration of trapping
accesses to the physical timer from EL1 on every switch.

These patches require that the GICv2 hardware (on such systems) is
properly reported by firmware to have the extra CPU interface page for
the deactivate register.

Code is also available here:
git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git timer-optimize-v3

Based on v4.14-rc1

Some recent numbers I ran on Thunder-X with v4.14-rc1 with/without these
patches (averaged over a few hundred thousand executions) for a base
hypercall cost:

Without this series, avg. cycles: 12,476
Without this series, min. cycles: 12,052

With this series, avg. cycles: 11,782
With this series, min. cycles: 11,435

Improvement ~650 cycles (over 5%)

Changes since v2:
 - Removed RFC tag
 - Included Marc's patch to support EOI/deactivate on broken firmware
   systems
 - Simplified patch 6 (was patch 5 in RFC v2)
 - Clarify percpu_devid interrupts in patch 12 (was patch 11)

Thanks,
  Christoffer

Christoffer Dall (19):
  arm64: Use physical counter for in-kernel reads
  arm64: Use the physical counter when available for read_cycles
  KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized
  KVM: arm/arm64: Support calling vgic_update_irq_pending from irq
    context
  KVM: arm/arm64: Check that system supports split eoi/deactivate
  KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
  KVM: arm/arm64: Rename soft timer to bg_timer
  KVM: arm/arm64: Use separate timer for phys timer emulation
  KVM: arm/arm64: Move timer/vgic flush/sync under disabled irq
  KVM: arm/arm64: Move timer save/restore out of the hyp code
  genirq: Document vcpu_info usage for percpu_devid interrupts
  KVM: arm/arm64: Set VCPU affinity for virt timer irq
  KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
  KVM: arm/arm64: Support EL1 phys timer register access in set/get reg
  KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
  KVM: arm/arm64: Move phys_timer_emulate function
  KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
  KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
  KVM: arm/arm64: Rework kvm_timer_should_fire

Marc Zyngier (1):
  irqchip/gic: Deal with broken firmware exposing only 4kB of GICv2 CPU
    interface

 Documentation/admin-guide/kernel-parameters.txt |   7 +
 arch/arm/include/asm/kvm_asm.h                  |   2 +
 arch/arm/include/asm/kvm_hyp.h                  |   4 +-
 arch/arm/include/uapi/asm/kvm.h                 |   6 +
 arch/arm/kvm/hyp/switch.c                       |   7 +-
 arch/arm64/include/asm/arch_timer.h             |  18 +-
 arch/arm64/include/asm/kvm_asm.h                |   2 +
 arch/arm64/include/asm/kvm_hyp.h                |   4 +-
 arch/arm64/include/asm/timex.h                  |   2 +-
 arch/arm64/include/uapi/asm/kvm.h               |   6 +
 arch/arm64/kvm/hyp/switch.c                     |   6 +-
 arch/arm64/kvm/sys_regs.c                       |  41 +--
 drivers/clocksource/arm_arch_timer.c            |  33 +-
 drivers/irqchip/irq-gic.c                       |  74 +++-
 include/kvm/arm_arch_timer.h                    |  19 +-
 kernel/irq/manage.c                             |   3 +-
 virt/kvm/arm/arch_timer.c                       | 446 ++++++++++++++++--------
 virt/kvm/arm/arm.c                              |  45 ++-
 virt/kvm/arm/hyp/timer-sr.c                     |  74 ++--
 virt/kvm/arm/vgic/vgic-its.c                    |  17 +-
 virt/kvm/arm/vgic/vgic-mmio-v2.c                |  22 +-
 virt/kvm/arm/vgic/vgic-mmio-v3.c                |  17 +-
 virt/kvm/arm/vgic/vgic-mmio.c                   |  44 ++-
 virt/kvm/arm/vgic/vgic-v2.c                     |   5 +-
 virt/kvm/arm/vgic/vgic-v3.c                     |  12 +-
 virt/kvm/arm/vgic/vgic.c                        |  63 ++--
 virt/kvm/arm/vgic/vgic.h                        |   3 +-
 27 files changed, 648 insertions(+), 334 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 00/20] KVM: arm/arm64: Optimize arch timer register handling
@ 2017-09-23  0:41 ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

We currently spend a measurable amount of time on each entry/exit to the
guest dealing with arch timer registers, even when the timer is not
pending and not doing anything (on certain architectures).

We can do much better by moving the arch timer save/restore to the
vcpu_load and vcpu_put functions, but this means that if we don't read
back the timer state on every exit from the guest, then we have to be
able to start taking timer interrupts for the virtual timer in KVM and
handle that properly.

That has a number of entertaining consequences, such as having to make
sure we don't deadlock between any of the vgic code and interrupt
injection happening from an ISR.  On the plus side, being able to inject
virtual interrupts corresponding to a physical interrupt directly from
an ISR is probably a good system design change overall.

We also have to change the use of the physical vs. virtual counter in
the arm64 kernel to avoid having to save/restore the CNTVOFF_EL2
register on every return to the hypervisor.  The only reason I could
find for using the virtual counter for the kernel on systems with access
to the physical counter is to detect if firmware did not properly clear
CNTVOFF_EL2, and this change has to weighed against the existing check
(assuming I got this right).

On a non-VHE system (AMD Seattle) I have measured this to improve the
world-switch time by about ~100 cycles, but on an EL2 kernel (emulating
VHE behavior on the same hardware) this gives us around ~250 cycles
worth of improvement, and on Thunder-X we seem to get ~650 cycles
improvement, because we can avoid the extra configuration of trapping
accesses to the physical timer from EL1 on every switch.

These patches require that the GICv2 hardware (on such systems) is
properly reported by firmware to have the extra CPU interface page for
the deactivate register.

Code is also available here:
git://git.kernel.org/pub/scm/linux/kernel/git/cdall/linux.git timer-optimize-v3

Based on v4.14-rc1

Some recent numbers I ran on Thunder-X with v4.14-rc1 with/without these
patches (averaged over a few hundred thousand executions) for a base
hypercall cost:

Without this series, avg. cycles: 12,476
Without this series, min. cycles: 12,052

With this series, avg. cycles: 11,782
With this series, min. cycles: 11,435

Improvement ~650 cycles (over 5%)

Changes since v2:
 - Removed RFC tag
 - Included Marc's patch to support EOI/deactivate on broken firmware
   systems
 - Simplified patch 6 (was patch 5 in RFC v2)
 - Clarify percpu_devid interrupts in patch 12 (was patch 11)

Thanks,
  Christoffer

Christoffer Dall (19):
  arm64: Use physical counter for in-kernel reads
  arm64: Use the physical counter when available for read_cycles
  KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized
  KVM: arm/arm64: Support calling vgic_update_irq_pending from irq
    context
  KVM: arm/arm64: Check that system supports split eoi/deactivate
  KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
  KVM: arm/arm64: Rename soft timer to bg_timer
  KVM: arm/arm64: Use separate timer for phys timer emulation
  KVM: arm/arm64: Move timer/vgic flush/sync under disabled irq
  KVM: arm/arm64: Move timer save/restore out of the hyp code
  genirq: Document vcpu_info usage for percpu_devid interrupts
  KVM: arm/arm64: Set VCPU affinity for virt timer irq
  KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
  KVM: arm/arm64: Support EL1 phys timer register access in set/get reg
  KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
  KVM: arm/arm64: Move phys_timer_emulate function
  KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
  KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
  KVM: arm/arm64: Rework kvm_timer_should_fire

Marc Zyngier (1):
  irqchip/gic: Deal with broken firmware exposing only 4kB of GICv2 CPU
    interface

 Documentation/admin-guide/kernel-parameters.txt |   7 +
 arch/arm/include/asm/kvm_asm.h                  |   2 +
 arch/arm/include/asm/kvm_hyp.h                  |   4 +-
 arch/arm/include/uapi/asm/kvm.h                 |   6 +
 arch/arm/kvm/hyp/switch.c                       |   7 +-
 arch/arm64/include/asm/arch_timer.h             |  18 +-
 arch/arm64/include/asm/kvm_asm.h                |   2 +
 arch/arm64/include/asm/kvm_hyp.h                |   4 +-
 arch/arm64/include/asm/timex.h                  |   2 +-
 arch/arm64/include/uapi/asm/kvm.h               |   6 +
 arch/arm64/kvm/hyp/switch.c                     |   6 +-
 arch/arm64/kvm/sys_regs.c                       |  41 +--
 drivers/clocksource/arm_arch_timer.c            |  33 +-
 drivers/irqchip/irq-gic.c                       |  74 +++-
 include/kvm/arm_arch_timer.h                    |  19 +-
 kernel/irq/manage.c                             |   3 +-
 virt/kvm/arm/arch_timer.c                       | 446 ++++++++++++++++--------
 virt/kvm/arm/arm.c                              |  45 ++-
 virt/kvm/arm/hyp/timer-sr.c                     |  74 ++--
 virt/kvm/arm/vgic/vgic-its.c                    |  17 +-
 virt/kvm/arm/vgic/vgic-mmio-v2.c                |  22 +-
 virt/kvm/arm/vgic/vgic-mmio-v3.c                |  17 +-
 virt/kvm/arm/vgic/vgic-mmio.c                   |  44 ++-
 virt/kvm/arm/vgic/vgic-v2.c                     |   5 +-
 virt/kvm/arm/vgic/vgic-v3.c                     |  12 +-
 virt/kvm/arm/vgic/vgic.c                        |  63 ++--
 virt/kvm/arm/vgic/vgic.h                        |   3 +-
 27 files changed, 648 insertions(+), 334 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 01/20] irqchip/gic: Deal with broken firmware exposing only 4kB of GICv2 CPU interface
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas, Christoffer Dall

From: Marc Zyngier <marc.zyngier@arm.com>

There is a lot of broken firmware out there that don't really
expose the information the kernel requires when it comes with dealing
with GICv2:

(1) Firmware that only describes the first 4kB of GICv2
(2) Firmware that describe 128kB of CPU interface, while
    the usable portion of the address space is between
    60 and 68kB

So far, we only deal with (2). But we have platforms exhibiting
behaviour (1), resulting in two sub-cases:
(a) The GIC is occupying 8kB, as required by the GICv2 architecture
(b) It is actually spread 128kB, and this is likely to be a version
    of (2)

This patch tries to work around both (a) and (b) by poking at
the outside of the described memory region, and try to work out
what is actually there. This is of course unsafe, and should
only be enabled if there is no way to otherwise fix the DT provided
by the firmware (we provide a "irqchip.gicv2_force_probe" option
to that effect).

Note that for the time being, we restrict ourselves to GICv2
implementations provided by ARM, since there I have no knowledge
of an alternative implementations. This could be relaxed if such
an implementation comes to light on a broken platform.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++
 drivers/irqchip/irq-gic.c                       | 71 +++++++++++++++++++++----
 2 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0549662..3daa0a5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1713,6 +1713,13 @@
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
 
+	irqchip.gicv2_force_probe=
+			[ARM, ARM64]
+			Format: <bool>
+			Force the kernel to look for the second 4kB page
+			of a GICv2 controller even if the memory range
+			exposed by the device tree is too small.
+
 	irqfixup	[HW]
 			When an interrupt is not handled search all handlers
 			for it. Intended to get systems with badly broken
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 651d726..f641e8e 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1256,6 +1256,19 @@ static void gic_teardown(struct gic_chip_data *gic)
 
 #ifdef CONFIG_OF
 static int gic_cnt __initdata;
+static bool gicv2_force_probe;
+
+static int __init gicv2_force_probe_cfg(char *buf)
+{
+	return strtobool(buf, &gicv2_force_probe);
+}
+early_param("irqchip.gicv2_force_probe", gicv2_force_probe_cfg);
+
+static bool gic_check_gicv2(void __iomem *base)
+{
+	u32 val = readl_relaxed(base + GIC_CPU_IDENT);
+	return (val & 0xff0fff) == 0x02043B;
+}
 
 static bool gic_check_eoimode(struct device_node *node, void __iomem **base)
 {
@@ -1265,20 +1278,60 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base)
 
 	if (!is_hyp_mode_available())
 		return false;
-	if (resource_size(&cpuif_res) < SZ_8K)
-		return false;
-	if (resource_size(&cpuif_res) == SZ_128K) {
-		u32 val_low, val_high;
+	if (resource_size(&cpuif_res) < SZ_8K) {
+		void __iomem *alt;
+		/*
+		 * Check for a stupid firmware that only exposes the
+		 * first page of a GICv2.
+		 */
+		if (!gic_check_gicv2(*base))
+			return false;
 
+		if (!gicv2_force_probe) {
+			pr_warn("GIC: GICv2 detected, but range too small and irqchip.gicv2_force_probe not set\n");
+			return false;
+		}
+
+		alt = ioremap(cpuif_res.start, SZ_8K);
+		if (!alt)
+			return false;
+		if (!gic_check_gicv2(alt + SZ_4K)) {
+			/*
+			 * The first page was that of a GICv2, and
+			 * the second was *something*. Let's trust it
+			 * to be a GICv2, and update the mapping.
+			 */
+			pr_warn("GIC: GICv2 at %pa, but range is too small (broken DT?), assuming 8kB\n",
+				&cpuif_res.start);
+			iounmap(*base);
+			*base = alt;
+			return true;
+		}
+
+		/*
+		 * We detected *two* initial GICv2 pages in a
+		 * row. Could be a GICv2 aliased over two 64kB
+		 * pages. Update the resource, map the iospace, and
+		 * pray.
+		 */
+		iounmap(alt);
+		alt = ioremap(cpuif_res.start, SZ_128K);
+		if (!alt)
+			return false;
+		pr_warn("GIC: Aliased GICv2 at %pa, trying to find the canonical range over 128kB\n",
+			&cpuif_res.start);
+		cpuif_res.end = cpuif_res.start + SZ_128K -1;
+		iounmap(*base);
+		*base = alt;
+	}
+	if (resource_size(&cpuif_res) == SZ_128K) {
 		/*
-		 * Verify that we have the first 4kB of a GIC400
+		 * Verify that we have the first 4kB of a GICv2
 		 * aliased over the first 64kB by checking the
 		 * GICC_IIDR register on both ends.
 		 */
-		val_low = readl_relaxed(*base + GIC_CPU_IDENT);
-		val_high = readl_relaxed(*base + GIC_CPU_IDENT + 0xf000);
-		if ((val_low & 0xffff0fff) != 0x0202043B ||
-		    val_low != val_high)
+		if (!gic_check_gicv2(*base) ||
+		    !gic_check_gicv2(*base + 0xf000))
 			return false;
 
 		/*
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 01/20] irqchip/gic: Deal with broken firmware exposing only 4kB of GICv2 CPU interface
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

From: Marc Zyngier <marc.zyngier@arm.com>

There is a lot of broken firmware out there that don't really
expose the information the kernel requires when it comes with dealing
with GICv2:

(1) Firmware that only describes the first 4kB of GICv2
(2) Firmware that describe 128kB of CPU interface, while
    the usable portion of the address space is between
    60 and 68kB

So far, we only deal with (2). But we have platforms exhibiting
behaviour (1), resulting in two sub-cases:
(a) The GIC is occupying 8kB, as required by the GICv2 architecture
(b) It is actually spread 128kB, and this is likely to be a version
    of (2)

This patch tries to work around both (a) and (b) by poking at
the outside of the described memory region, and try to work out
what is actually there. This is of course unsafe, and should
only be enabled if there is no way to otherwise fix the DT provided
by the firmware (we provide a "irqchip.gicv2_force_probe" option
to that effect).

Note that for the time being, we restrict ourselves to GICv2
implementations provided by ARM, since there I have no knowledge
of an alternative implementations. This could be relaxed if such
an implementation comes to light on a broken platform.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +++
 drivers/irqchip/irq-gic.c                       | 71 +++++++++++++++++++++----
 2 files changed, 69 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0549662..3daa0a5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1713,6 +1713,13 @@
 	irqaffinity=	[SMP] Set the default irq affinity mask
 			The argument is a cpu list, as described above.
 
+	irqchip.gicv2_force_probe=
+			[ARM, ARM64]
+			Format: <bool>
+			Force the kernel to look for the second 4kB page
+			of a GICv2 controller even if the memory range
+			exposed by the device tree is too small.
+
 	irqfixup	[HW]
 			When an interrupt is not handled search all handlers
 			for it. Intended to get systems with badly broken
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index 651d726..f641e8e 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1256,6 +1256,19 @@ static void gic_teardown(struct gic_chip_data *gic)
 
 #ifdef CONFIG_OF
 static int gic_cnt __initdata;
+static bool gicv2_force_probe;
+
+static int __init gicv2_force_probe_cfg(char *buf)
+{
+	return strtobool(buf, &gicv2_force_probe);
+}
+early_param("irqchip.gicv2_force_probe", gicv2_force_probe_cfg);
+
+static bool gic_check_gicv2(void __iomem *base)
+{
+	u32 val = readl_relaxed(base + GIC_CPU_IDENT);
+	return (val & 0xff0fff) == 0x02043B;
+}
 
 static bool gic_check_eoimode(struct device_node *node, void __iomem **base)
 {
@@ -1265,20 +1278,60 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base)
 
 	if (!is_hyp_mode_available())
 		return false;
-	if (resource_size(&cpuif_res) < SZ_8K)
-		return false;
-	if (resource_size(&cpuif_res) == SZ_128K) {
-		u32 val_low, val_high;
+	if (resource_size(&cpuif_res) < SZ_8K) {
+		void __iomem *alt;
+		/*
+		 * Check for a stupid firmware that only exposes the
+		 * first page of a GICv2.
+		 */
+		if (!gic_check_gicv2(*base))
+			return false;
 
+		if (!gicv2_force_probe) {
+			pr_warn("GIC: GICv2 detected, but range too small and irqchip.gicv2_force_probe not set\n");
+			return false;
+		}
+
+		alt = ioremap(cpuif_res.start, SZ_8K);
+		if (!alt)
+			return false;
+		if (!gic_check_gicv2(alt + SZ_4K)) {
+			/*
+			 * The first page was that of a GICv2, and
+			 * the second was *something*. Let's trust it
+			 * to be a GICv2, and update the mapping.
+			 */
+			pr_warn("GIC: GICv2 at %pa, but range is too small (broken DT?), assuming 8kB\n",
+				&cpuif_res.start);
+			iounmap(*base);
+			*base = alt;
+			return true;
+		}
+
+		/*
+		 * We detected *two* initial GICv2 pages in a
+		 * row. Could be a GICv2 aliased over two 64kB
+		 * pages. Update the resource, map the iospace, and
+		 * pray.
+		 */
+		iounmap(alt);
+		alt = ioremap(cpuif_res.start, SZ_128K);
+		if (!alt)
+			return false;
+		pr_warn("GIC: Aliased GICv2@%pa, trying to find the canonical range over 128kB\n",
+			&cpuif_res.start);
+		cpuif_res.end = cpuif_res.start + SZ_128K -1;
+		iounmap(*base);
+		*base = alt;
+	}
+	if (resource_size(&cpuif_res) == SZ_128K) {
 		/*
-		 * Verify that we have the first 4kB of a GIC400
+		 * Verify that we have the first 4kB of a GICv2
 		 * aliased over the first 64kB by checking the
 		 * GICC_IIDR register on both ends.
 		 */
-		val_low = readl_relaxed(*base + GIC_CPU_IDENT);
-		val_high = readl_relaxed(*base + GIC_CPU_IDENT + 0xf000);
-		if ((val_low & 0xffff0fff) != 0x0202043B ||
-		    val_low != val_high)
+		if (!gic_check_gicv2(*base) ||
+		    !gic_check_gicv2(*base + 0xf000))
 			return false;
 
 		/*
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas, Christoffer Dall

Using the physical counter allows KVM to retain the offset between the
virtual and physical counter as long as it is actively running a VCPU.

As soon as a VCPU is released, another thread is scheduled or we start
running userspace applications, we reset the offset to 0, so that
userspace accessing the virtual timer can still read the cirtual counter
and get the same view of time as the kernel.

This opens up potential improvements for KVM performance.

VHE kernels or kernels continuing to use the virtual timer are
unaffected.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm64/include/asm/arch_timer.h  | 9 ++++-----
 drivers/clocksource/arm_arch_timer.c | 3 +--
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index a652ce0..1859a1c 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
 
 static inline u64 arch_counter_get_cntpct(void)
 {
-	/*
-	 * AArch64 kernel and user space mandate the use of CNTVCT.
-	 */
-	BUG();
-	return 0;
+	u64 cval;
+	isb();
+	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
+	return cval;
 }
 
 static inline u64 arch_counter_get_cntvct(void)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index fd4b7f6..9b3322a 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
 
 	/* Register the CP15 based counter if we have one */
 	if (type & ARCH_TIMER_TYPE_CP15) {
-		if (IS_ENABLED(CONFIG_ARM64) ||
-		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
+		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
 			arch_timer_read_counter = arch_counter_get_cntvct;
 		else
 			arch_timer_read_counter = arch_counter_get_cntpct;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

Using the physical counter allows KVM to retain the offset between the
virtual and physical counter as long as it is actively running a VCPU.

As soon as a VCPU is released, another thread is scheduled or we start
running userspace applications, we reset the offset to 0, so that
userspace accessing the virtual timer can still read the cirtual counter
and get the same view of time as the kernel.

This opens up potential improvements for KVM performance.

VHE kernels or kernels continuing to use the virtual timer are
unaffected.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm64/include/asm/arch_timer.h  | 9 ++++-----
 drivers/clocksource/arm_arch_timer.c | 3 +--
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index a652ce0..1859a1c 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
 
 static inline u64 arch_counter_get_cntpct(void)
 {
-	/*
-	 * AArch64 kernel and user space mandate the use of CNTVCT.
-	 */
-	BUG();
-	return 0;
+	u64 cval;
+	isb();
+	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
+	return cval;
 }
 
 static inline u64 arch_counter_get_cntvct(void)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index fd4b7f6..9b3322a 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
 
 	/* Register the CP15 based counter if we have one */
 	if (type & ARCH_TIMER_TYPE_CP15) {
-		if (IS_ENABLED(CONFIG_ARM64) ||
-		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
+		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
 			arch_timer_read_counter = arch_counter_get_cntvct;
 		else
 			arch_timer_read_counter = arch_counter_get_cntpct;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas,
	Christoffer Dall, Mark Rutland

Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
arm64, but as we move to using the physical timer for the in-kernel
time-keeping, we need to make that more flexible.

First, we need to make sure the physical counter can be read on equal
terms to the virtual counter, which includes adding physical counter
read functions for timers that require errata.

Second, we need to make a choice between reading the physical vs virtual
counter, depending on which timer is used for time keeping in the kernel
otherwise.  We can do this using a static key to avoid a performance
penalty during runtime when reading the counter.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm64/include/asm/arch_timer.h  | 15 ++++++++++++---
 arch/arm64/include/asm/timex.h       |  2 +-
 drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index 1859a1c..c56d8cd 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -30,6 +30,8 @@
 
 #include <clocksource/arm_arch_timer.h>
 
+extern struct static_key_false arch_timer_phys_counter_available;
+
 #if IS_ENABLED(CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND)
 extern struct static_key_false arch_timer_read_ool_enabled;
 #define needs_unstable_timer_counter_workaround() \
@@ -52,6 +54,7 @@ struct arch_timer_erratum_workaround {
 	const char *desc;
 	u32 (*read_cntp_tval_el0)(void);
 	u32 (*read_cntv_tval_el0)(void);
+	u64 (*read_cntpct_el0)(void);
 	u64 (*read_cntvct_el0)(void);
 	int (*set_next_event_phys)(unsigned long, struct clock_event_device *);
 	int (*set_next_event_virt)(unsigned long, struct clock_event_device *);
@@ -148,10 +151,8 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
 
 static inline u64 arch_counter_get_cntpct(void)
 {
-	u64 cval;
 	isb();
-	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
-	return cval;
+	return arch_timer_reg_read_stable(cntpct_el0);
 }
 
 static inline u64 arch_counter_get_cntvct(void)
@@ -160,6 +161,14 @@ static inline u64 arch_counter_get_cntvct(void)
 	return arch_timer_reg_read_stable(cntvct_el0);
 }
 
+static inline u64 arch_counter_get_cycles(void)
+{
+	if (static_branch_unlikely(&arch_timer_phys_counter_available))
+	    return arch_counter_get_cntpct();
+	else
+	    return arch_counter_get_cntvct();
+}
+
 static inline int arch_timer_arch_init(void)
 {
 	return 0;
diff --git a/arch/arm64/include/asm/timex.h b/arch/arm64/include/asm/timex.h
index 81a076e..c0d214c 100644
--- a/arch/arm64/include/asm/timex.h
+++ b/arch/arm64/include/asm/timex.h
@@ -22,7 +22,7 @@
  * Use the current timer as a cycle counter since this is what we use for
  * the delay loop.
  */
-#define get_cycles()	arch_counter_get_cntvct()
+#define get_cycles()	arch_counter_get_cycles()
 
 #include <asm-generic/timex.h>
 
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 9b3322a..f35da20 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -77,6 +77,9 @@ static bool arch_timer_mem_use_virtual;
 static bool arch_counter_suspend_stop;
 static bool vdso_default = true;
 
+DEFINE_STATIC_KEY_FALSE(arch_timer_phys_counter_available);
+EXPORT_SYMBOL_GPL(arch_timer_phys_counter_available);
+
 static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
 
 static int __init early_evtstrm_cfg(char *buf)
@@ -217,6 +220,11 @@ static u32 notrace fsl_a008585_read_cntv_tval_el0(void)
 	return __fsl_a008585_read_reg(cntv_tval_el0);
 }
 
+static u64 notrace fsl_a008585_read_cntpct_el0(void)
+{
+	return __fsl_a008585_read_reg(cntpct_el0);
+}
+
 static u64 notrace fsl_a008585_read_cntvct_el0(void)
 {
 	return __fsl_a008585_read_reg(cntvct_el0);
@@ -258,6 +266,11 @@ static u32 notrace hisi_161010101_read_cntv_tval_el0(void)
 	return __hisi_161010101_read_reg(cntv_tval_el0);
 }
 
+static u64 notrace hisi_161010101_read_cntpct_el0(void)
+{
+	return __hisi_161010101_read_reg(cntpct_el0);
+}
+
 static u64 notrace hisi_161010101_read_cntvct_el0(void)
 {
 	return __hisi_161010101_read_reg(cntvct_el0);
@@ -288,6 +301,15 @@ static struct ate_acpi_oem_info hisi_161010101_oem_info[] = {
 #endif
 
 #ifdef CONFIG_ARM64_ERRATUM_858921
+static u64 notrace arm64_858921_read_cntpct_el0(void)
+{
+	u64 old, new;
+
+	old = read_sysreg(cntpct_el0);
+	new = read_sysreg(cntpct_el0);
+	return (((old ^ new) >> 32) & 1) ? old : new;
+}
+
 static u64 notrace arm64_858921_read_cntvct_el0(void)
 {
 	u64 old, new;
@@ -346,6 +368,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "Freescale erratum a005858",
 		.read_cntp_tval_el0 = fsl_a008585_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = fsl_a008585_read_cntv_tval_el0,
+		.read_cntpct_el0 = fsl_a008585_read_cntpct_el0,
 		.read_cntvct_el0 = fsl_a008585_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -358,6 +381,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "HiSilicon erratum 161010101",
 		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
+		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
 		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -368,6 +392,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "HiSilicon erratum 161010101",
 		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
+		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
 		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -378,6 +403,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.match_type = ate_match_local_cap_id,
 		.id = (void *)ARM64_WORKAROUND_858921,
 		.desc = "ARM erratum 858921",
+		.read_cntpct_el0 = arm64_858921_read_cntpct_el0,
 		.read_cntvct_el0 = arm64_858921_read_cntvct_el0,
 	},
 #endif
@@ -890,10 +916,12 @@ static void __init arch_counter_register(unsigned type)
 
 	/* Register the CP15 based counter if we have one */
 	if (type & ARCH_TIMER_TYPE_CP15) {
-		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
+		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
 			arch_timer_read_counter = arch_counter_get_cntvct;
-		else
+		} else {
 			arch_timer_read_counter = arch_counter_get_cntpct;
+			static_branch_enable(&arch_timer_phys_counter_available);
+		}
 
 		clocksource_counter.archdata.vdso_direct = vdso_default;
 	} else {
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
arm64, but as we move to using the physical timer for the in-kernel
time-keeping, we need to make that more flexible.

First, we need to make sure the physical counter can be read on equal
terms to the virtual counter, which includes adding physical counter
read functions for timers that require errata.

Second, we need to make a choice between reading the physical vs virtual
counter, depending on which timer is used for time keeping in the kernel
otherwise.  We can do this using a static key to avoid a performance
penalty during runtime when reading the counter.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm64/include/asm/arch_timer.h  | 15 ++++++++++++---
 arch/arm64/include/asm/timex.h       |  2 +-
 drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index 1859a1c..c56d8cd 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -30,6 +30,8 @@
 
 #include <clocksource/arm_arch_timer.h>
 
+extern struct static_key_false arch_timer_phys_counter_available;
+
 #if IS_ENABLED(CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND)
 extern struct static_key_false arch_timer_read_ool_enabled;
 #define needs_unstable_timer_counter_workaround() \
@@ -52,6 +54,7 @@ struct arch_timer_erratum_workaround {
 	const char *desc;
 	u32 (*read_cntp_tval_el0)(void);
 	u32 (*read_cntv_tval_el0)(void);
+	u64 (*read_cntpct_el0)(void);
 	u64 (*read_cntvct_el0)(void);
 	int (*set_next_event_phys)(unsigned long, struct clock_event_device *);
 	int (*set_next_event_virt)(unsigned long, struct clock_event_device *);
@@ -148,10 +151,8 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
 
 static inline u64 arch_counter_get_cntpct(void)
 {
-	u64 cval;
 	isb();
-	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
-	return cval;
+	return arch_timer_reg_read_stable(cntpct_el0);
 }
 
 static inline u64 arch_counter_get_cntvct(void)
@@ -160,6 +161,14 @@ static inline u64 arch_counter_get_cntvct(void)
 	return arch_timer_reg_read_stable(cntvct_el0);
 }
 
+static inline u64 arch_counter_get_cycles(void)
+{
+	if (static_branch_unlikely(&arch_timer_phys_counter_available))
+	    return arch_counter_get_cntpct();
+	else
+	    return arch_counter_get_cntvct();
+}
+
 static inline int arch_timer_arch_init(void)
 {
 	return 0;
diff --git a/arch/arm64/include/asm/timex.h b/arch/arm64/include/asm/timex.h
index 81a076e..c0d214c 100644
--- a/arch/arm64/include/asm/timex.h
+++ b/arch/arm64/include/asm/timex.h
@@ -22,7 +22,7 @@
  * Use the current timer as a cycle counter since this is what we use for
  * the delay loop.
  */
-#define get_cycles()	arch_counter_get_cntvct()
+#define get_cycles()	arch_counter_get_cycles()
 
 #include <asm-generic/timex.h>
 
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 9b3322a..f35da20 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -77,6 +77,9 @@ static bool arch_timer_mem_use_virtual;
 static bool arch_counter_suspend_stop;
 static bool vdso_default = true;
 
+DEFINE_STATIC_KEY_FALSE(arch_timer_phys_counter_available);
+EXPORT_SYMBOL_GPL(arch_timer_phys_counter_available);
+
 static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
 
 static int __init early_evtstrm_cfg(char *buf)
@@ -217,6 +220,11 @@ static u32 notrace fsl_a008585_read_cntv_tval_el0(void)
 	return __fsl_a008585_read_reg(cntv_tval_el0);
 }
 
+static u64 notrace fsl_a008585_read_cntpct_el0(void)
+{
+	return __fsl_a008585_read_reg(cntpct_el0);
+}
+
 static u64 notrace fsl_a008585_read_cntvct_el0(void)
 {
 	return __fsl_a008585_read_reg(cntvct_el0);
@@ -258,6 +266,11 @@ static u32 notrace hisi_161010101_read_cntv_tval_el0(void)
 	return __hisi_161010101_read_reg(cntv_tval_el0);
 }
 
+static u64 notrace hisi_161010101_read_cntpct_el0(void)
+{
+	return __hisi_161010101_read_reg(cntpct_el0);
+}
+
 static u64 notrace hisi_161010101_read_cntvct_el0(void)
 {
 	return __hisi_161010101_read_reg(cntvct_el0);
@@ -288,6 +301,15 @@ static struct ate_acpi_oem_info hisi_161010101_oem_info[] = {
 #endif
 
 #ifdef CONFIG_ARM64_ERRATUM_858921
+static u64 notrace arm64_858921_read_cntpct_el0(void)
+{
+	u64 old, new;
+
+	old = read_sysreg(cntpct_el0);
+	new = read_sysreg(cntpct_el0);
+	return (((old ^ new) >> 32) & 1) ? old : new;
+}
+
 static u64 notrace arm64_858921_read_cntvct_el0(void)
 {
 	u64 old, new;
@@ -346,6 +368,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "Freescale erratum a005858",
 		.read_cntp_tval_el0 = fsl_a008585_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = fsl_a008585_read_cntv_tval_el0,
+		.read_cntpct_el0 = fsl_a008585_read_cntpct_el0,
 		.read_cntvct_el0 = fsl_a008585_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -358,6 +381,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "HiSilicon erratum 161010101",
 		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
+		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
 		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -368,6 +392,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "HiSilicon erratum 161010101",
 		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
+		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
 		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -378,6 +403,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.match_type = ate_match_local_cap_id,
 		.id = (void *)ARM64_WORKAROUND_858921,
 		.desc = "ARM erratum 858921",
+		.read_cntpct_el0 = arm64_858921_read_cntpct_el0,
 		.read_cntvct_el0 = arm64_858921_read_cntvct_el0,
 	},
 #endif
@@ -890,10 +916,12 @@ static void __init arch_counter_register(unsigned type)
 
 	/* Register the CP15 based counter if we have one */
 	if (type & ARCH_TIMER_TYPE_CP15) {
-		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
+		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
 			arch_timer_read_counter = arch_counter_get_cntvct;
-		else
+		} else {
 			arch_timer_read_counter = arch_counter_get_cntpct;
+			static_branch_enable(&arch_timer_phys_counter_available);
+		}
 
 		clocksource_counter.archdata.vdso_direct = vdso_default;
 	} else {
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 04/20] KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas, Christoffer Dall

If the vgic is not initialized, don't try to grab its spinlocks or
traverse its data structures.

This is important because we soon have to start considering the active
state of a virtual interrupts when doing vcpu_load, which may happen
early on before the vgic is initialized.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index fed717e..e1f7dbc 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -777,6 +777,9 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
 	bool map_is_active;
 
+	if (!vgic_initialized(vcpu->kvm))
+		return false;
+
 	spin_lock(&irq->irq_lock);
 	map_is_active = irq->hw && irq->active;
 	spin_unlock(&irq->irq_lock);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 04/20] KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

If the vgic is not initialized, don't try to grab its spinlocks or
traverse its data structures.

This is important because we soon have to start considering the active
state of a virtual interrupts when doing vcpu_load, which may happen
early on before the vgic is initialized.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/vgic/vgic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index fed717e..e1f7dbc 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -777,6 +777,9 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
 	bool map_is_active;
 
+	if (!vgic_initialized(vcpu->kvm))
+		return false;
+
 	spin_lock(&irq->irq_lock);
 	map_is_active = irq->hw && irq->active;
 	spin_unlock(&irq->irq_lock);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 05/20] KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas, Christoffer Dall

We are about to optimize our timer handling logic which involves
injecting irqs to the vgic directly from the irq handler.

Unfortunately, the injection path can take any AP list lock and irq lock
and we must therefore make sure to use spin_lock_irqsave where ever
interrupts are enabled and we are taking any of those locks, to avoid
deadlocking between process context and the ISR.

This changes a lot of the VGIC code, but The good news are that the
changes are mostly mechanical.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/vgic/vgic-its.c     | 17 +++++++-----
 virt/kvm/arm/vgic/vgic-mmio-v2.c | 22 +++++++++------
 virt/kvm/arm/vgic/vgic-mmio-v3.c | 17 +++++++-----
 virt/kvm/arm/vgic/vgic-mmio.c    | 44 +++++++++++++++++------------
 virt/kvm/arm/vgic/vgic-v2.c      |  5 ++--
 virt/kvm/arm/vgic/vgic-v3.c      | 12 ++++----
 virt/kvm/arm/vgic/vgic.c         | 60 +++++++++++++++++++++++++---------------
 virt/kvm/arm/vgic/vgic.h         |  3 +-
 8 files changed, 108 insertions(+), 72 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index f51c1e1..9f5e347 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -278,6 +278,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
 	u64 propbase = GICR_PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
 	u8 prop;
 	int ret;
+	unsigned long flags;
 
 	ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
 			     &prop, 1);
@@ -285,15 +286,15 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
 	if (ret)
 		return ret;
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	if (!filter_vcpu || filter_vcpu == irq->target_vcpu) {
 		irq->priority = LPI_PROP_PRIORITY(prop);
 		irq->enabled = LPI_PROP_ENABLE_BIT(prop);
 
-		vgic_queue_irq_unlock(kvm, irq);
+		vgic_queue_irq_unlock(kvm, irq, flags);
 	} else {
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 	}
 
 	return 0;
@@ -393,6 +394,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
 	int ret = 0;
 	u32 *intids;
 	int nr_irqs, i;
+	unsigned long flags;
 
 	nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
 	if (nr_irqs < 0)
@@ -420,9 +422,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
 		}
 
 		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->pending_latch = pendmask & (1U << bit_nr);
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
@@ -515,6 +517,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
 {
 	struct kvm_vcpu *vcpu;
 	struct its_ite *ite;
+	unsigned long flags;
 
 	if (!its->enabled)
 		return -EBUSY;
@@ -530,9 +533,9 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
 	if (!vcpu->arch.vgic_cpu.lpis_enabled)
 		return -EBUSY;
 
-	spin_lock(&ite->irq->irq_lock);
+	spin_lock_irqsave(&ite->irq->irq_lock, flags);
 	ite->irq->pending_latch = true;
-	vgic_queue_irq_unlock(kvm, ite->irq);
+	vgic_queue_irq_unlock(kvm, ite->irq, flags);
 
 	return 0;
 }
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index b3d4a10..e21e2f4 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -74,6 +74,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
 	int mode = (val >> 24) & 0x03;
 	int c;
 	struct kvm_vcpu *vcpu;
+	unsigned long flags;
 
 	switch (mode) {
 	case 0x0:		/* as specified by targets */
@@ -97,11 +98,11 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
 
 		irq = vgic_get_irq(source_vcpu->kvm, vcpu, intid);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->pending_latch = true;
 		irq->source |= 1U << source_vcpu->vcpu_id;
 
-		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
+		vgic_queue_irq_unlock(source_vcpu->kvm, irq, flags);
 		vgic_put_irq(source_vcpu->kvm, irq);
 	}
 }
@@ -131,6 +132,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
 	u8 cpu_mask = GENMASK(atomic_read(&vcpu->kvm->online_vcpus) - 1, 0);
 	int i;
+	unsigned long flags;
 
 	/* GICD_ITARGETSR[0-7] are read-only */
 	if (intid < VGIC_NR_PRIVATE_IRQS)
@@ -140,13 +142,13 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid + i);
 		int target;
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->targets = (val >> (i * 8)) & cpu_mask;
 		target = irq->targets ? __ffs(irq->targets) : 0;
 		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -174,17 +176,18 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
 {
 	u32 intid = addr & 0x0f;
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len; i++) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->source &= ~((val >> (i * 8)) & 0xff);
 		if (!irq->source)
 			irq->pending_latch = false;
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -195,19 +198,20 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
 {
 	u32 intid = addr & 0x0f;
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len; i++) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->source |= (val >> (i * 8)) & 0xff;
 
 		if (irq->source) {
 			irq->pending_latch = true;
-			vgic_queue_irq_unlock(vcpu->kvm, irq);
+			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		} else {
-			spin_unlock(&irq->irq_lock);
+			spin_unlock_irqrestore(&irq->irq_lock, flags);
 		}
 		vgic_put_irq(vcpu->kvm, irq);
 	}
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 408ef06..8378610 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -129,6 +129,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
 {
 	int intid = VGIC_ADDR_TO_INTID(addr, 64);
 	struct vgic_irq *irq;
+	unsigned long flags;
 
 	/* The upper word is WI for us since we don't implement Aff3. */
 	if (addr & 4)
@@ -139,13 +140,13 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
 	if (!irq)
 		return;
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	/* We only care about and preserve Aff0, Aff1 and Aff2. */
 	irq->mpidr = val & GENMASK(23, 0);
 	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
 
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(vcpu->kvm, irq);
 }
 
@@ -241,11 +242,12 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len * 8; i++) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		if (test_bit(i, &val)) {
 			/*
 			 * pending_latch is set irrespective of irq type
@@ -253,10 +255,10 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
 			 * restore irq config before pending info.
 			 */
 			irq->pending_latch = true;
-			vgic_queue_irq_unlock(vcpu->kvm, irq);
+			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		} else {
 			irq->pending_latch = false;
-			spin_unlock(&irq->irq_lock);
+			spin_unlock_irqrestore(&irq->irq_lock, flags);
 		}
 
 		vgic_put_irq(vcpu->kvm, irq);
@@ -799,6 +801,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
 	int sgi, c;
 	int vcpu_id = vcpu->vcpu_id;
 	bool broadcast;
+	unsigned long flags;
 
 	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
 	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
@@ -837,10 +840,10 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
 
 		irq = vgic_get_irq(vcpu->kvm, c_vcpu, sgi);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->pending_latch = true;
 
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index c1e4bdd..deb51ee 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -69,13 +69,14 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->enabled = true;
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -87,15 +88,16 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->enabled = false;
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -126,14 +128,15 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->pending_latch = true;
 
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -144,15 +147,16 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->pending_latch = false;
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -181,7 +185,8 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
 				    bool new_active_state)
 {
 	struct kvm_vcpu *requester_vcpu;
-	spin_lock(&irq->irq_lock);
+	unsigned long flags;
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	/*
 	 * The vcpu parameter here can mean multiple things depending on how
@@ -216,9 +221,9 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
 
 	irq->active = new_active_state;
 	if (new_active_state)
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 	else
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 }
 
 /*
@@ -352,14 +357,15 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len; i++) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		/* Narrow the priority range to what we actually support */
 		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -390,6 +396,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 2);
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len * 4; i++) {
 		struct vgic_irq *irq;
@@ -404,14 +411,14 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
 			continue;
 
 		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		if (test_bit(i * 2 + 1, &val))
 			irq->config = VGIC_CONFIG_EDGE;
 		else
 			irq->config = VGIC_CONFIG_LEVEL;
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -443,6 +450,7 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
 {
 	int i;
 	int nr_irqs = vcpu->kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
+	unsigned long flags;
 
 	for (i = 0; i < 32; i++) {
 		struct vgic_irq *irq;
@@ -459,12 +467,12 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
 		 * restore irq config before line level.
 		 */
 		new_level = !!(val & (1U << i));
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->line_level = new_level;
 		if (new_level)
-			vgic_queue_irq_unlock(vcpu->kvm, irq);
+			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		else
-			spin_unlock(&irq->irq_lock);
+			spin_unlock_irqrestore(&irq->irq_lock, flags);
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index e4187e5..8089710 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -62,6 +62,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
 	int lr;
+	unsigned long flags;
 
 	cpuif->vgic_hcr &= ~GICH_HCR_UIE;
 
@@ -77,7 +78,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 
 		irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		/* Always preserve the active bit */
 		irq->active = !!(val & GICH_LR_ACTIVE_BIT);
@@ -104,7 +105,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 				irq->pending_latch = false;
 		}
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 96ea597..863351c 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -44,6 +44,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 	struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
 	u32 model = vcpu->kvm->arch.vgic.vgic_model;
 	int lr;
+	unsigned long flags;
 
 	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
 
@@ -66,7 +67,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 		if (!irq)	/* An LPI could have been unmapped. */
 			continue;
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		/* Always preserve the active bit */
 		irq->active = !!(val & ICH_LR_ACTIVE_BIT);
@@ -94,7 +95,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 				irq->pending_latch = false;
 		}
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
@@ -278,6 +279,7 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
 	bool status;
 	u8 val;
 	int ret;
+	unsigned long flags;
 
 retry:
 	vcpu = irq->target_vcpu;
@@ -296,13 +298,13 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
 
 	status = val & (1 << bit_nr);
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 	if (irq->target_vcpu != vcpu) {
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		goto retry;
 	}
 	irq->pending_latch = status;
-	vgic_queue_irq_unlock(vcpu->kvm, irq);
+	vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 
 	if (status) {
 		/* clear consumed data */
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index e1f7dbc..b1bd238 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -53,6 +53,10 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
  *   vcpuX->vcpu_id < vcpuY->vcpu_id:
  *     spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock);
  *     spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock);
+ *
+ * Since the VGIC must support injecting virtual interrupts from ISRs, we have
+ * to use the spin_lock_irqsave/spin_unlock_irqrestore versions of outer
+ * spinlocks for any lock that may be taken while injecting an interrupt.
  */
 
 /*
@@ -261,7 +265,8 @@ static bool vgic_validate_injection(struct vgic_irq *irq, bool level, void *owne
  * Needs to be entered with the IRQ lock already held, but will return
  * with all locks dropped.
  */
-bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
+bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
+			   unsigned long flags)
 {
 	struct kvm_vcpu *vcpu;
 
@@ -279,7 +284,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 		 * not need to be inserted into an ap_list and there is also
 		 * no more work for us to do.
 		 */
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 
 		/*
 		 * We have to kick the VCPU here, because we could be
@@ -301,11 +306,11 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 	 * We must unlock the irq lock to take the ap_list_lock where
 	 * we are going to insert this new pending interrupt.
 	 */
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 
 	/* someone can do stuff here, which we re-check below */
 
-	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
+	spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
 	spin_lock(&irq->irq_lock);
 
 	/*
@@ -322,9 +327,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 
 	if (unlikely(irq->vcpu || vcpu != vgic_target_oracle(irq))) {
 		spin_unlock(&irq->irq_lock);
-		spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
+		spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		goto retry;
 	}
 
@@ -337,7 +342,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 	irq->vcpu = vcpu;
 
 	spin_unlock(&irq->irq_lock);
-	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
+	spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
 
 	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
@@ -367,6 +372,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 {
 	struct kvm_vcpu *vcpu;
 	struct vgic_irq *irq;
+	unsigned long flags;
 	int ret;
 
 	trace_vgic_update_irq_pending(cpuid, intid, level);
@@ -383,11 +389,11 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 	if (!irq)
 		return -EINVAL;
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	if (!vgic_validate_injection(irq, level, owner)) {
 		/* Nothing to see here, move along... */
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(kvm, irq);
 		return 0;
 	}
@@ -397,7 +403,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 	else
 		irq->pending_latch = true;
 
-	vgic_queue_irq_unlock(kvm, irq);
+	vgic_queue_irq_unlock(kvm, irq, flags);
 	vgic_put_irq(kvm, irq);
 
 	return 0;
@@ -406,15 +412,16 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
 {
 	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
+	unsigned long flags;
 
 	BUG_ON(!irq);
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	irq->hw = true;
 	irq->hwintid = phys_irq;
 
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(vcpu->kvm, irq);
 
 	return 0;
@@ -423,6 +430,7 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 {
 	struct vgic_irq *irq;
+	unsigned long flags;
 
 	if (!vgic_initialized(vcpu->kvm))
 		return -EAGAIN;
@@ -430,12 +438,12 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
 	BUG_ON(!irq);
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	irq->hw = false;
 	irq->hwintid = 0;
 
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(vcpu->kvm, irq);
 
 	return 0;
@@ -486,9 +494,10 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq, *tmp;
+	unsigned long flags;
 
 retry:
-	spin_lock(&vgic_cpu->ap_list_lock);
+	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
 
 	list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) {
 		struct kvm_vcpu *target_vcpu, *vcpuA, *vcpuB;
@@ -528,7 +537,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 		/* This interrupt looks like it has to be migrated. */
 
 		spin_unlock(&irq->irq_lock);
-		spin_unlock(&vgic_cpu->ap_list_lock);
+		spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
 
 		/*
 		 * Ensure locking order by always locking the smallest
@@ -542,7 +551,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 			vcpuB = vcpu;
 		}
 
-		spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock);
+		spin_lock_irqsave(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
 		spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock,
 				 SINGLE_DEPTH_NESTING);
 		spin_lock(&irq->irq_lock);
@@ -566,11 +575,11 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 
 		spin_unlock(&irq->irq_lock);
 		spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock);
-		spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock);
+		spin_unlock_irqrestore(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
 		goto retry;
 	}
 
-	spin_unlock(&vgic_cpu->ap_list_lock);
+	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
 }
 
 static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
@@ -703,6 +712,8 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
 
+	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
+
 	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
 	vgic_flush_lr_state(vcpu);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
@@ -735,11 +746,12 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq;
 	bool pending = false;
+	unsigned long flags;
 
 	if (!vcpu->kvm->arch.vgic.enabled)
 		return false;
 
-	spin_lock(&vgic_cpu->ap_list_lock);
+	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
 
 	list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
 		spin_lock(&irq->irq_lock);
@@ -750,7 +762,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
 			break;
 	}
 
-	spin_unlock(&vgic_cpu->ap_list_lock);
+	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
 
 	return pending;
 }
@@ -776,13 +788,15 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 {
 	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
 	bool map_is_active;
+	unsigned long flags;
 
 	if (!vgic_initialized(vcpu->kvm))
 		return false;
+	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 	map_is_active = irq->hw && irq->active;
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(vcpu->kvm, irq);
 
 	return map_is_active;
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index bf9ceab..4f8aecb 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -140,7 +140,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
 struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
 			      u32 intid);
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
-bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
+bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
+			   unsigned long flags);
 void vgic_kick_vcpus(struct kvm *kvm);
 
 int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 05/20] KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

We are about to optimize our timer handling logic which involves
injecting irqs to the vgic directly from the irq handler.

Unfortunately, the injection path can take any AP list lock and irq lock
and we must therefore make sure to use spin_lock_irqsave where ever
interrupts are enabled and we are taking any of those locks, to avoid
deadlocking between process context and the ISR.

This changes a lot of the VGIC code, but The good news are that the
changes are mostly mechanical.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/vgic/vgic-its.c     | 17 +++++++-----
 virt/kvm/arm/vgic/vgic-mmio-v2.c | 22 +++++++++------
 virt/kvm/arm/vgic/vgic-mmio-v3.c | 17 +++++++-----
 virt/kvm/arm/vgic/vgic-mmio.c    | 44 +++++++++++++++++------------
 virt/kvm/arm/vgic/vgic-v2.c      |  5 ++--
 virt/kvm/arm/vgic/vgic-v3.c      | 12 ++++----
 virt/kvm/arm/vgic/vgic.c         | 60 +++++++++++++++++++++++++---------------
 virt/kvm/arm/vgic/vgic.h         |  3 +-
 8 files changed, 108 insertions(+), 72 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index f51c1e1..9f5e347 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -278,6 +278,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
 	u64 propbase = GICR_PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
 	u8 prop;
 	int ret;
+	unsigned long flags;
 
 	ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
 			     &prop, 1);
@@ -285,15 +286,15 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
 	if (ret)
 		return ret;
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	if (!filter_vcpu || filter_vcpu == irq->target_vcpu) {
 		irq->priority = LPI_PROP_PRIORITY(prop);
 		irq->enabled = LPI_PROP_ENABLE_BIT(prop);
 
-		vgic_queue_irq_unlock(kvm, irq);
+		vgic_queue_irq_unlock(kvm, irq, flags);
 	} else {
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 	}
 
 	return 0;
@@ -393,6 +394,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
 	int ret = 0;
 	u32 *intids;
 	int nr_irqs, i;
+	unsigned long flags;
 
 	nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
 	if (nr_irqs < 0)
@@ -420,9 +422,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
 		}
 
 		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->pending_latch = pendmask & (1U << bit_nr);
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
@@ -515,6 +517,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
 {
 	struct kvm_vcpu *vcpu;
 	struct its_ite *ite;
+	unsigned long flags;
 
 	if (!its->enabled)
 		return -EBUSY;
@@ -530,9 +533,9 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
 	if (!vcpu->arch.vgic_cpu.lpis_enabled)
 		return -EBUSY;
 
-	spin_lock(&ite->irq->irq_lock);
+	spin_lock_irqsave(&ite->irq->irq_lock, flags);
 	ite->irq->pending_latch = true;
-	vgic_queue_irq_unlock(kvm, ite->irq);
+	vgic_queue_irq_unlock(kvm, ite->irq, flags);
 
 	return 0;
 }
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
index b3d4a10..e21e2f4 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
@@ -74,6 +74,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
 	int mode = (val >> 24) & 0x03;
 	int c;
 	struct kvm_vcpu *vcpu;
+	unsigned long flags;
 
 	switch (mode) {
 	case 0x0:		/* as specified by targets */
@@ -97,11 +98,11 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
 
 		irq = vgic_get_irq(source_vcpu->kvm, vcpu, intid);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->pending_latch = true;
 		irq->source |= 1U << source_vcpu->vcpu_id;
 
-		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
+		vgic_queue_irq_unlock(source_vcpu->kvm, irq, flags);
 		vgic_put_irq(source_vcpu->kvm, irq);
 	}
 }
@@ -131,6 +132,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
 	u8 cpu_mask = GENMASK(atomic_read(&vcpu->kvm->online_vcpus) - 1, 0);
 	int i;
+	unsigned long flags;
 
 	/* GICD_ITARGETSR[0-7] are read-only */
 	if (intid < VGIC_NR_PRIVATE_IRQS)
@@ -140,13 +142,13 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid + i);
 		int target;
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->targets = (val >> (i * 8)) & cpu_mask;
 		target = irq->targets ? __ffs(irq->targets) : 0;
 		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -174,17 +176,18 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
 {
 	u32 intid = addr & 0x0f;
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len; i++) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->source &= ~((val >> (i * 8)) & 0xff);
 		if (!irq->source)
 			irq->pending_latch = false;
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -195,19 +198,20 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
 {
 	u32 intid = addr & 0x0f;
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len; i++) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->source |= (val >> (i * 8)) & 0xff;
 
 		if (irq->source) {
 			irq->pending_latch = true;
-			vgic_queue_irq_unlock(vcpu->kvm, irq);
+			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		} else {
-			spin_unlock(&irq->irq_lock);
+			spin_unlock_irqrestore(&irq->irq_lock, flags);
 		}
 		vgic_put_irq(vcpu->kvm, irq);
 	}
diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
index 408ef06..8378610 100644
--- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
+++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
@@ -129,6 +129,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
 {
 	int intid = VGIC_ADDR_TO_INTID(addr, 64);
 	struct vgic_irq *irq;
+	unsigned long flags;
 
 	/* The upper word is WI for us since we don't implement Aff3. */
 	if (addr & 4)
@@ -139,13 +140,13 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
 	if (!irq)
 		return;
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	/* We only care about and preserve Aff0, Aff1 and Aff2. */
 	irq->mpidr = val & GENMASK(23, 0);
 	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
 
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(vcpu->kvm, irq);
 }
 
@@ -241,11 +242,12 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len * 8; i++) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		if (test_bit(i, &val)) {
 			/*
 			 * pending_latch is set irrespective of irq type
@@ -253,10 +255,10 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
 			 * restore irq config before pending info.
 			 */
 			irq->pending_latch = true;
-			vgic_queue_irq_unlock(vcpu->kvm, irq);
+			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		} else {
 			irq->pending_latch = false;
-			spin_unlock(&irq->irq_lock);
+			spin_unlock_irqrestore(&irq->irq_lock, flags);
 		}
 
 		vgic_put_irq(vcpu->kvm, irq);
@@ -799,6 +801,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
 	int sgi, c;
 	int vcpu_id = vcpu->vcpu_id;
 	bool broadcast;
+	unsigned long flags;
 
 	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
 	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
@@ -837,10 +840,10 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
 
 		irq = vgic_get_irq(vcpu->kvm, c_vcpu, sgi);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->pending_latch = true;
 
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index c1e4bdd..deb51ee 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -69,13 +69,14 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->enabled = true;
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -87,15 +88,16 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->enabled = false;
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -126,14 +128,15 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->pending_latch = true;
 
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -144,15 +147,16 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
+	unsigned long flags;
 
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		irq->pending_latch = false;
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -181,7 +185,8 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
 				    bool new_active_state)
 {
 	struct kvm_vcpu *requester_vcpu;
-	spin_lock(&irq->irq_lock);
+	unsigned long flags;
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	/*
 	 * The vcpu parameter here can mean multiple things depending on how
@@ -216,9 +221,9 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
 
 	irq->active = new_active_state;
 	if (new_active_state)
-		vgic_queue_irq_unlock(vcpu->kvm, irq);
+		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 	else
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 }
 
 /*
@@ -352,14 +357,15 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len; i++) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		/* Narrow the priority range to what we actually support */
 		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
@@ -390,6 +396,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
 {
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 2);
 	int i;
+	unsigned long flags;
 
 	for (i = 0; i < len * 4; i++) {
 		struct vgic_irq *irq;
@@ -404,14 +411,14 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
 			continue;
 
 		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		if (test_bit(i * 2 + 1, &val))
 			irq->config = VGIC_CONFIG_EDGE;
 		else
 			irq->config = VGIC_CONFIG_LEVEL;
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 }
@@ -443,6 +450,7 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
 {
 	int i;
 	int nr_irqs = vcpu->kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
+	unsigned long flags;
 
 	for (i = 0; i < 32; i++) {
 		struct vgic_irq *irq;
@@ -459,12 +467,12 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
 		 * restore irq config before line level.
 		 */
 		new_level = !!(val & (1U << i));
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		irq->line_level = new_level;
 		if (new_level)
-			vgic_queue_irq_unlock(vcpu->kvm, irq);
+			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 		else
-			spin_unlock(&irq->irq_lock);
+			spin_unlock_irqrestore(&irq->irq_lock, flags);
 
 		vgic_put_irq(vcpu->kvm, irq);
 	}
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index e4187e5..8089710 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -62,6 +62,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
 	int lr;
+	unsigned long flags;
 
 	cpuif->vgic_hcr &= ~GICH_HCR_UIE;
 
@@ -77,7 +78,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 
 		irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		/* Always preserve the active bit */
 		irq->active = !!(val & GICH_LR_ACTIVE_BIT);
@@ -104,7 +105,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
 				irq->pending_latch = false;
 		}
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index 96ea597..863351c 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -44,6 +44,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 	struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
 	u32 model = vcpu->kvm->arch.vgic.vgic_model;
 	int lr;
+	unsigned long flags;
 
 	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
 
@@ -66,7 +67,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 		if (!irq)	/* An LPI could have been unmapped. */
 			continue;
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 
 		/* Always preserve the active bit */
 		irq->active = !!(val & ICH_LR_ACTIVE_BIT);
@@ -94,7 +95,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
 				irq->pending_latch = false;
 		}
 
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 
@@ -278,6 +279,7 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
 	bool status;
 	u8 val;
 	int ret;
+	unsigned long flags;
 
 retry:
 	vcpu = irq->target_vcpu;
@@ -296,13 +298,13 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
 
 	status = val & (1 << bit_nr);
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 	if (irq->target_vcpu != vcpu) {
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		goto retry;
 	}
 	irq->pending_latch = status;
-	vgic_queue_irq_unlock(vcpu->kvm, irq);
+	vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
 
 	if (status) {
 		/* clear consumed data */
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index e1f7dbc..b1bd238 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -53,6 +53,10 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
  *   vcpuX->vcpu_id < vcpuY->vcpu_id:
  *     spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock);
  *     spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock);
+ *
+ * Since the VGIC must support injecting virtual interrupts from ISRs, we have
+ * to use the spin_lock_irqsave/spin_unlock_irqrestore versions of outer
+ * spinlocks for any lock that may be taken while injecting an interrupt.
  */
 
 /*
@@ -261,7 +265,8 @@ static bool vgic_validate_injection(struct vgic_irq *irq, bool level, void *owne
  * Needs to be entered with the IRQ lock already held, but will return
  * with all locks dropped.
  */
-bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
+bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
+			   unsigned long flags)
 {
 	struct kvm_vcpu *vcpu;
 
@@ -279,7 +284,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 		 * not need to be inserted into an ap_list and there is also
 		 * no more work for us to do.
 		 */
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 
 		/*
 		 * We have to kick the VCPU here, because we could be
@@ -301,11 +306,11 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 	 * We must unlock the irq lock to take the ap_list_lock where
 	 * we are going to insert this new pending interrupt.
 	 */
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 
 	/* someone can do stuff here, which we re-check below */
 
-	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
+	spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
 	spin_lock(&irq->irq_lock);
 
 	/*
@@ -322,9 +327,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 
 	if (unlikely(irq->vcpu || vcpu != vgic_target_oracle(irq))) {
 		spin_unlock(&irq->irq_lock);
-		spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
+		spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
 
-		spin_lock(&irq->irq_lock);
+		spin_lock_irqsave(&irq->irq_lock, flags);
 		goto retry;
 	}
 
@@ -337,7 +342,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 	irq->vcpu = vcpu;
 
 	spin_unlock(&irq->irq_lock);
-	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
+	spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
 
 	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
@@ -367,6 +372,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 {
 	struct kvm_vcpu *vcpu;
 	struct vgic_irq *irq;
+	unsigned long flags;
 	int ret;
 
 	trace_vgic_update_irq_pending(cpuid, intid, level);
@@ -383,11 +389,11 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 	if (!irq)
 		return -EINVAL;
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	if (!vgic_validate_injection(irq, level, owner)) {
 		/* Nothing to see here, move along... */
-		spin_unlock(&irq->irq_lock);
+		spin_unlock_irqrestore(&irq->irq_lock, flags);
 		vgic_put_irq(kvm, irq);
 		return 0;
 	}
@@ -397,7 +403,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 	else
 		irq->pending_latch = true;
 
-	vgic_queue_irq_unlock(kvm, irq);
+	vgic_queue_irq_unlock(kvm, irq, flags);
 	vgic_put_irq(kvm, irq);
 
 	return 0;
@@ -406,15 +412,16 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
 {
 	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
+	unsigned long flags;
 
 	BUG_ON(!irq);
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	irq->hw = true;
 	irq->hwintid = phys_irq;
 
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(vcpu->kvm, irq);
 
 	return 0;
@@ -423,6 +430,7 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 {
 	struct vgic_irq *irq;
+	unsigned long flags;
 
 	if (!vgic_initialized(vcpu->kvm))
 		return -EAGAIN;
@@ -430,12 +438,12 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
 	BUG_ON(!irq);
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 
 	irq->hw = false;
 	irq->hwintid = 0;
 
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(vcpu->kvm, irq);
 
 	return 0;
@@ -486,9 +494,10 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq, *tmp;
+	unsigned long flags;
 
 retry:
-	spin_lock(&vgic_cpu->ap_list_lock);
+	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
 
 	list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) {
 		struct kvm_vcpu *target_vcpu, *vcpuA, *vcpuB;
@@ -528,7 +537,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 		/* This interrupt looks like it has to be migrated. */
 
 		spin_unlock(&irq->irq_lock);
-		spin_unlock(&vgic_cpu->ap_list_lock);
+		spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
 
 		/*
 		 * Ensure locking order by always locking the smallest
@@ -542,7 +551,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 			vcpuB = vcpu;
 		}
 
-		spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock);
+		spin_lock_irqsave(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
 		spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock,
 				 SINGLE_DEPTH_NESTING);
 		spin_lock(&irq->irq_lock);
@@ -566,11 +575,11 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 
 		spin_unlock(&irq->irq_lock);
 		spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock);
-		spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock);
+		spin_unlock_irqrestore(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
 		goto retry;
 	}
 
-	spin_unlock(&vgic_cpu->ap_list_lock);
+	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
 }
 
 static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
@@ -703,6 +712,8 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
 		return;
 
+	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
+
 	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
 	vgic_flush_lr_state(vcpu);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
@@ -735,11 +746,12 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq;
 	bool pending = false;
+	unsigned long flags;
 
 	if (!vcpu->kvm->arch.vgic.enabled)
 		return false;
 
-	spin_lock(&vgic_cpu->ap_list_lock);
+	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
 
 	list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
 		spin_lock(&irq->irq_lock);
@@ -750,7 +762,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
 			break;
 	}
 
-	spin_unlock(&vgic_cpu->ap_list_lock);
+	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
 
 	return pending;
 }
@@ -776,13 +788,15 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
 {
 	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
 	bool map_is_active;
+	unsigned long flags;
 
 	if (!vgic_initialized(vcpu->kvm))
 		return false;
+	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
-	spin_lock(&irq->irq_lock);
+	spin_lock_irqsave(&irq->irq_lock, flags);
 	map_is_active = irq->hw && irq->active;
-	spin_unlock(&irq->irq_lock);
+	spin_unlock_irqrestore(&irq->irq_lock, flags);
 	vgic_put_irq(vcpu->kvm, irq);
 
 	return map_is_active;
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index bf9ceab..4f8aecb 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -140,7 +140,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
 struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
 			      u32 intid);
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
-bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
+bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
+			   unsigned long flags);
 void vgic_kick_vcpus(struct kvm *kvm);
 
 int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

Some systems without proper firmware and/or hardware description data
don't support the split EOI and deactivate operation.

On such systems, we cannot leave the physical interrupt active after the
timer handler on the host has run, so we cannot support KVM with an
in-kernel GIC with the timer changes we are about to introduce.

This patch makes sure that trying to initialize the KVM GIC code will
fail on such systems.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 drivers/irqchip/irq-gic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index f641e8e..ab12bf4 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
 	if (ret)
 		return;
 
-	gic_set_kvm_info(&gic_v2_kvm_info);
+	if (static_key_true(&supports_deactivate))
+		gic_set_kvm_info(&gic_v2_kvm_info);
 }
 
 int __init
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

Some systems without proper firmware and/or hardware description data
don't support the split EOI and deactivate operation.

On such systems, we cannot leave the physical interrupt active after the
timer handler on the host has run, so we cannot support KVM with an
in-kernel GIC with the timer changes we are about to introduce.

This patch makes sure that trying to initialize the KVM GIC code will
fail on such systems.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 drivers/irqchip/irq-gic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index f641e8e..ab12bf4 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
 	if (ret)
 		return;
 
-	gic_set_kvm_info(&gic_v2_kvm_info);
+	if (static_key_true(&supports_deactivate))
+		gic_set_kvm_info(&gic_v2_kvm_info);
 }
 
 int __init
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas, Christoffer Dall

We are about to add an additional soft timer to the arch timer state for
a VCPU and would like to be able to reuse the functions to program and
cancel a timer, so we make them slightly more generic and rename to make
it more clear that these functions work on soft timers and not the
hardware resource that this code is managing.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 8e89d63..871d8ae 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -56,26 +56,22 @@ u64 kvm_phys_timer_read(void)
 	return timecounter->cc->read(timecounter->cc);
 }
 
-static bool timer_is_armed(struct arch_timer_cpu *timer)
+static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
 {
 	return timer->armed;
 }
 
-/* timer_arm: as in "arm the timer", not as in ARM the company */
-static void timer_arm(struct arch_timer_cpu *timer, u64 ns)
+static void soft_timer_start(struct hrtimer *hrt, u64 ns)
 {
-	timer->armed = true;
-	hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns),
+	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
 		      HRTIMER_MODE_ABS);
 }
 
-static void timer_disarm(struct arch_timer_cpu *timer)
+static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
 {
-	if (timer_is_armed(timer)) {
-		hrtimer_cancel(&timer->timer);
-		cancel_work_sync(&timer->expired);
-		timer->armed = false;
-	}
+	hrtimer_cancel(hrt);
+	if (work)
+		cancel_work_sync(work);
 }
 
 static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
@@ -271,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
 		return;
 
 	/*  The timer has not yet expired, schedule a background timer */
-	timer_arm(timer, kvm_timer_compute_delta(timer_ctx));
+	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
 }
 
 /*
@@ -285,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	BUG_ON(timer_is_armed(timer));
+	BUG_ON(soft_timer_is_armed(timer));
 
 	/*
 	 * No need to schedule a background timer if any guest timer has
@@ -306,13 +302,16 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	 * The guest timers have not yet expired, schedule a background timer.
 	 * Set the earliest expiration time among the guest timers.
 	 */
-	timer_arm(timer, kvm_timer_earliest_exp(vcpu));
+	timer->armed = true;
+	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
 }
 
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	timer_disarm(timer);
+
+	soft_timer_cancel(&timer->timer, &timer->expired);
+	timer->armed = false;
 }
 
 static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
@@ -448,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 * This is to cancel the background timer for the physical timer
 	 * emulation if it is set.
 	 */
-	timer_disarm(timer);
+	soft_timer_cancel(&timer->timer, &timer->expired);
 
 	/*
 	 * The guest could have modified the timer registers or the timer
@@ -615,7 +614,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
-	timer_disarm(timer);
+	soft_timer_cancel(&timer->timer, &timer->expired);
 	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

We are about to add an additional soft timer to the arch timer state for
a VCPU and would like to be able to reuse the functions to program and
cancel a timer, so we make them slightly more generic and rename to make
it more clear that these functions work on soft timers and not the
hardware resource that this code is managing.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 33 ++++++++++++++++-----------------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 8e89d63..871d8ae 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -56,26 +56,22 @@ u64 kvm_phys_timer_read(void)
 	return timecounter->cc->read(timecounter->cc);
 }
 
-static bool timer_is_armed(struct arch_timer_cpu *timer)
+static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
 {
 	return timer->armed;
 }
 
-/* timer_arm: as in "arm the timer", not as in ARM the company */
-static void timer_arm(struct arch_timer_cpu *timer, u64 ns)
+static void soft_timer_start(struct hrtimer *hrt, u64 ns)
 {
-	timer->armed = true;
-	hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns),
+	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
 		      HRTIMER_MODE_ABS);
 }
 
-static void timer_disarm(struct arch_timer_cpu *timer)
+static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
 {
-	if (timer_is_armed(timer)) {
-		hrtimer_cancel(&timer->timer);
-		cancel_work_sync(&timer->expired);
-		timer->armed = false;
-	}
+	hrtimer_cancel(hrt);
+	if (work)
+		cancel_work_sync(work);
 }
 
 static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
@@ -271,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
 		return;
 
 	/*  The timer has not yet expired, schedule a background timer */
-	timer_arm(timer, kvm_timer_compute_delta(timer_ctx));
+	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
 }
 
 /*
@@ -285,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	BUG_ON(timer_is_armed(timer));
+	BUG_ON(soft_timer_is_armed(timer));
 
 	/*
 	 * No need to schedule a background timer if any guest timer has
@@ -306,13 +302,16 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	 * The guest timers have not yet expired, schedule a background timer.
 	 * Set the earliest expiration time among the guest timers.
 	 */
-	timer_arm(timer, kvm_timer_earliest_exp(vcpu));
+	timer->armed = true;
+	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
 }
 
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	timer_disarm(timer);
+
+	soft_timer_cancel(&timer->timer, &timer->expired);
+	timer->armed = false;
 }
 
 static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
@@ -448,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 * This is to cancel the background timer for the physical timer
 	 * emulation if it is set.
 	 */
-	timer_disarm(timer);
+	soft_timer_cancel(&timer->timer, &timer->expired);
 
 	/*
 	 * The guest could have modified the timer registers or the timer
@@ -615,7 +614,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
-	timer_disarm(timer);
+	soft_timer_cancel(&timer->timer, &timer->expired);
 	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 08/20] KVM: arm/arm64: Rename soft timer to bg_timer
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

As we are about to introduce a separate hrtimer for the physical timer,
call this timer bg_timer, because we refer to this timer as the
background timer in the code and comments elsewhere.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |  3 +--
 virt/kvm/arm/arch_timer.c    | 22 +++++++++++-----------
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index f0053f8..dcbb2e1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -43,8 +43,7 @@ struct arch_timer_cpu {
 	struct arch_timer_context	ptimer;
 
 	/* Background timer used when the guest is not running */
-	struct hrtimer			timer;
-
+	struct hrtimer			bg_timer;
 	/* Work queued with the above timer expires */
 	struct work_struct		expired;
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 871d8ae..c2e8326 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -56,7 +56,7 @@ u64 kvm_phys_timer_read(void)
 	return timecounter->cc->read(timecounter->cc);
 }
 
-static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
+static bool bg_timer_is_armed(struct arch_timer_cpu *timer)
 {
 	return timer->armed;
 }
@@ -154,13 +154,13 @@ static u64 kvm_timer_earliest_exp(struct kvm_vcpu *vcpu)
 	return min(min_virt, min_phys);
 }
 
-static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
+static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
 {
 	struct arch_timer_cpu *timer;
 	struct kvm_vcpu *vcpu;
 	u64 ns;
 
-	timer = container_of(hrt, struct arch_timer_cpu, timer);
+	timer = container_of(hrt, struct arch_timer_cpu, bg_timer);
 	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
 
 	/*
@@ -267,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
 		return;
 
 	/*  The timer has not yet expired, schedule a background timer */
-	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
+	soft_timer_start(&timer->bg_timer, kvm_timer_compute_delta(timer_ctx));
 }
 
 /*
@@ -281,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	BUG_ON(soft_timer_is_armed(timer));
+	BUG_ON(bg_timer_is_armed(timer));
 
 	/*
 	 * No need to schedule a background timer if any guest timer has
@@ -303,14 +303,14 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	 * Set the earliest expiration time among the guest timers.
 	 */
 	timer->armed = true;
-	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
+	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
 }
 
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
-	soft_timer_cancel(&timer->timer, &timer->expired);
+	soft_timer_cancel(&timer->bg_timer, &timer->expired);
 	timer->armed = false;
 }
 
@@ -447,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 * This is to cancel the background timer for the physical timer
 	 * emulation if it is set.
 	 */
-	soft_timer_cancel(&timer->timer, &timer->expired);
+	soft_timer_cancel(&timer->bg_timer, &timer->expired);
 
 	/*
 	 * The guest could have modified the timer registers or the timer
@@ -504,8 +504,8 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	vcpu_ptimer(vcpu)->cntvoff = 0;
 
 	INIT_WORK(&timer->expired, kvm_timer_inject_irq_work);
-	hrtimer_init(&timer->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
-	timer->timer.function = kvm_timer_expire;
+	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	timer->bg_timer.function = kvm_bg_timer_expire;
 
 	vtimer->irq.irq = default_vtimer_irq.irq;
 	ptimer->irq.irq = default_ptimer_irq.irq;
@@ -614,7 +614,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
-	soft_timer_cancel(&timer->timer, &timer->expired);
+	soft_timer_cancel(&timer->bg_timer, &timer->expired);
 	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 08/20] KVM: arm/arm64: Rename soft timer to bg_timer
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

As we are about to introduce a separate hrtimer for the physical timer,
call this timer bg_timer, because we refer to this timer as the
background timer in the code and comments elsewhere.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |  3 +--
 virt/kvm/arm/arch_timer.c    | 22 +++++++++++-----------
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index f0053f8..dcbb2e1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -43,8 +43,7 @@ struct arch_timer_cpu {
 	struct arch_timer_context	ptimer;
 
 	/* Background timer used when the guest is not running */
-	struct hrtimer			timer;
-
+	struct hrtimer			bg_timer;
 	/* Work queued with the above timer expires */
 	struct work_struct		expired;
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 871d8ae..c2e8326 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -56,7 +56,7 @@ u64 kvm_phys_timer_read(void)
 	return timecounter->cc->read(timecounter->cc);
 }
 
-static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
+static bool bg_timer_is_armed(struct arch_timer_cpu *timer)
 {
 	return timer->armed;
 }
@@ -154,13 +154,13 @@ static u64 kvm_timer_earliest_exp(struct kvm_vcpu *vcpu)
 	return min(min_virt, min_phys);
 }
 
-static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
+static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
 {
 	struct arch_timer_cpu *timer;
 	struct kvm_vcpu *vcpu;
 	u64 ns;
 
-	timer = container_of(hrt, struct arch_timer_cpu, timer);
+	timer = container_of(hrt, struct arch_timer_cpu, bg_timer);
 	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
 
 	/*
@@ -267,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
 		return;
 
 	/*  The timer has not yet expired, schedule a background timer */
-	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
+	soft_timer_start(&timer->bg_timer, kvm_timer_compute_delta(timer_ctx));
 }
 
 /*
@@ -281,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	BUG_ON(soft_timer_is_armed(timer));
+	BUG_ON(bg_timer_is_armed(timer));
 
 	/*
 	 * No need to schedule a background timer if any guest timer has
@@ -303,14 +303,14 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	 * Set the earliest expiration time among the guest timers.
 	 */
 	timer->armed = true;
-	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
+	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
 }
 
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
-	soft_timer_cancel(&timer->timer, &timer->expired);
+	soft_timer_cancel(&timer->bg_timer, &timer->expired);
 	timer->armed = false;
 }
 
@@ -447,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 * This is to cancel the background timer for the physical timer
 	 * emulation if it is set.
 	 */
-	soft_timer_cancel(&timer->timer, &timer->expired);
+	soft_timer_cancel(&timer->bg_timer, &timer->expired);
 
 	/*
 	 * The guest could have modified the timer registers or the timer
@@ -504,8 +504,8 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	vcpu_ptimer(vcpu)->cntvoff = 0;
 
 	INIT_WORK(&timer->expired, kvm_timer_inject_irq_work);
-	hrtimer_init(&timer->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
-	timer->timer.function = kvm_timer_expire;
+	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	timer->bg_timer.function = kvm_bg_timer_expire;
 
 	vtimer->irq.irq = default_vtimer_irq.irq;
 	ptimer->irq.irq = default_ptimer_irq.irq;
@@ -614,7 +614,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
-	soft_timer_cancel(&timer->timer, &timer->expired);
+	soft_timer_cancel(&timer->bg_timer, &timer->expired);
 	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 09/20] KVM: arm/arm64: Use separate timer for phys timer emulation
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

We were using the same hrtimer for emulating the physical timer and for
making sure a blocking VCPU thread would be eventually woken up.  That
worked fine in the previous arch timer design, but as we are about to
actually use the soft timer expire function for the physical timer
emulation, change the logic to use a dedicated hrtimer.

This has the added benefit of not having to cancel any work in the sync
path, which in turn allows us to run the flush and sync with IRQs
disabled.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |  3 +++
 virt/kvm/arm/arch_timer.c    | 18 ++++++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index dcbb2e1..16887c0 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -47,6 +47,9 @@ struct arch_timer_cpu {
 	/* Work queued with the above timer expires */
 	struct work_struct		expired;
 
+	/* Physical timer emulation */
+	struct hrtimer			phys_timer;
+
 	/* Background timer active */
 	bool				armed;
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index c2e8326..7f87099 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -178,6 +178,12 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
 	return HRTIMER_NORESTART;
 }
 
+static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
+{
+	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
+	return HRTIMER_NORESTART;
+}
+
 bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 {
 	u64 cval, now;
@@ -255,7 +261,7 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 }
 
 /* Schedule the background timer for the emulated timer. */
-static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
+static void phys_timer_emulate(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer_ctx)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
@@ -267,7 +273,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
 		return;
 
 	/*  The timer has not yet expired, schedule a background timer */
-	soft_timer_start(&timer->bg_timer, kvm_timer_compute_delta(timer_ctx));
+	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
 }
 
 /*
@@ -424,7 +430,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	kvm_timer_update_state(vcpu);
 
 	/* Set the background timer for the physical timer emulation. */
-	kvm_timer_emulate(vcpu, vcpu_ptimer(vcpu));
+	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
 
 	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		kvm_timer_flush_hwstate_user(vcpu);
@@ -447,7 +453,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 * This is to cancel the background timer for the physical timer
 	 * emulation if it is set.
 	 */
-	soft_timer_cancel(&timer->bg_timer, &timer->expired);
+	soft_timer_cancel(&timer->phys_timer, NULL);
 
 	/*
 	 * The guest could have modified the timer registers or the timer
@@ -507,6 +513,9 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	timer->bg_timer.function = kvm_bg_timer_expire;
 
+	hrtimer_init(&timer->phys_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	timer->phys_timer.function = kvm_phys_timer_expire;
+
 	vtimer->irq.irq = default_vtimer_irq.irq;
 	ptimer->irq.irq = default_ptimer_irq.irq;
 }
@@ -615,6 +624,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	soft_timer_cancel(&timer->bg_timer, &timer->expired);
+	soft_timer_cancel(&timer->phys_timer, NULL);
 	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 09/20] KVM: arm/arm64: Use separate timer for phys timer emulation
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

We were using the same hrtimer for emulating the physical timer and for
making sure a blocking VCPU thread would be eventually woken up.  That
worked fine in the previous arch timer design, but as we are about to
actually use the soft timer expire function for the physical timer
emulation, change the logic to use a dedicated hrtimer.

This has the added benefit of not having to cancel any work in the sync
path, which in turn allows us to run the flush and sync with IRQs
disabled.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |  3 +++
 virt/kvm/arm/arch_timer.c    | 18 ++++++++++++++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index dcbb2e1..16887c0 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -47,6 +47,9 @@ struct arch_timer_cpu {
 	/* Work queued with the above timer expires */
 	struct work_struct		expired;
 
+	/* Physical timer emulation */
+	struct hrtimer			phys_timer;
+
 	/* Background timer active */
 	bool				armed;
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index c2e8326..7f87099 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -178,6 +178,12 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
 	return HRTIMER_NORESTART;
 }
 
+static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
+{
+	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
+	return HRTIMER_NORESTART;
+}
+
 bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 {
 	u64 cval, now;
@@ -255,7 +261,7 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 }
 
 /* Schedule the background timer for the emulated timer. */
-static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
+static void phys_timer_emulate(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer_ctx)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
@@ -267,7 +273,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
 		return;
 
 	/*  The timer has not yet expired, schedule a background timer */
-	soft_timer_start(&timer->bg_timer, kvm_timer_compute_delta(timer_ctx));
+	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
 }
 
 /*
@@ -424,7 +430,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	kvm_timer_update_state(vcpu);
 
 	/* Set the background timer for the physical timer emulation. */
-	kvm_timer_emulate(vcpu, vcpu_ptimer(vcpu));
+	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
 
 	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		kvm_timer_flush_hwstate_user(vcpu);
@@ -447,7 +453,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 * This is to cancel the background timer for the physical timer
 	 * emulation if it is set.
 	 */
-	soft_timer_cancel(&timer->bg_timer, &timer->expired);
+	soft_timer_cancel(&timer->phys_timer, NULL);
 
 	/*
 	 * The guest could have modified the timer registers or the timer
@@ -507,6 +513,9 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	timer->bg_timer.function = kvm_bg_timer_expire;
 
+	hrtimer_init(&timer->phys_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	timer->phys_timer.function = kvm_phys_timer_expire;
+
 	vtimer->irq.irq = default_vtimer_irq.irq;
 	ptimer->irq.irq = default_ptimer_irq.irq;
 }
@@ -615,6 +624,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	soft_timer_cancel(&timer->bg_timer, &timer->expired);
+	soft_timer_cancel(&timer->phys_timer, NULL);
 	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 10/20] KVM: arm/arm64: Move timer/vgic flush/sync under disabled irq
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas, Christoffer Dall

As we are about to play tricks with the timer to be more lazy in saving
and restoring state, we need to move the timer sync and flush functions
under a disabled irq section and since we have to flush the vgic state
after the timer and PMU state, we do the whole flush/sync sequence with
disabled irqs.

The only downside is a slightly longer delay before being able to
process hardware interrupts and run softirqs.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arm.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index b9f68e4..27db222 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -654,11 +654,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		kvm_pmu_flush_hwstate(vcpu);
 
+		local_irq_disable();
+
 		kvm_timer_flush_hwstate(vcpu);
 		kvm_vgic_flush_hwstate(vcpu);
 
-		local_irq_disable();
-
 		/*
 		 * If we have a singal pending, or need to notify a userspace
 		 * irqchip about timer or PMU level changes, then we exit (and
@@ -683,10 +683,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
-			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
 			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
+			local_irq_enable();
 			preempt_enable();
 			continue;
 		}
@@ -710,6 +710,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_arm_clear_debug(vcpu);
 
 		/*
+		 * We must sync the PMU and timer state before the vgic state so
+		 * that the vgic can properly sample the updated state of the
+		 * interrupt line.
+		 */
+		kvm_pmu_sync_hwstate(vcpu);
+		kvm_timer_sync_hwstate(vcpu);
+
+		kvm_vgic_sync_hwstate(vcpu);
+
+		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
 		 * pending, as we haven't serviced it yet!
@@ -732,16 +742,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		guest_exit();
 		trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
-		/*
-		 * We must sync the PMU and timer state before the vgic state so
-		 * that the vgic can properly sample the updated state of the
-		 * interrupt line.
-		 */
-		kvm_pmu_sync_hwstate(vcpu);
-		kvm_timer_sync_hwstate(vcpu);
-
-		kvm_vgic_sync_hwstate(vcpu);
-
 		preempt_enable();
 
 		ret = handle_exit(vcpu, run, ret);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 10/20] KVM: arm/arm64: Move timer/vgic flush/sync under disabled irq
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

As we are about to play tricks with the timer to be more lazy in saving
and restoring state, we need to move the timer sync and flush functions
under a disabled irq section and since we have to flush the vgic state
after the timer and PMU state, we do the whole flush/sync sequence with
disabled irqs.

The only downside is a slightly longer delay before being able to
process hardware interrupts and run softirqs.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arm.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index b9f68e4..27db222 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -654,11 +654,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		kvm_pmu_flush_hwstate(vcpu);
 
+		local_irq_disable();
+
 		kvm_timer_flush_hwstate(vcpu);
 		kvm_vgic_flush_hwstate(vcpu);
 
-		local_irq_disable();
-
 		/*
 		 * If we have a singal pending, or need to notify a userspace
 		 * irqchip about timer or PMU level changes, then we exit (and
@@ -683,10 +683,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
 		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
-			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
 			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
+			local_irq_enable();
 			preempt_enable();
 			continue;
 		}
@@ -710,6 +710,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_arm_clear_debug(vcpu);
 
 		/*
+		 * We must sync the PMU and timer state before the vgic state so
+		 * that the vgic can properly sample the updated state of the
+		 * interrupt line.
+		 */
+		kvm_pmu_sync_hwstate(vcpu);
+		kvm_timer_sync_hwstate(vcpu);
+
+		kvm_vgic_sync_hwstate(vcpu);
+
+		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
 		 * pending, as we haven't serviced it yet!
@@ -732,16 +742,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		guest_exit();
 		trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
-		/*
-		 * We must sync the PMU and timer state before the vgic state so
-		 * that the vgic can properly sample the updated state of the
-		 * interrupt line.
-		 */
-		kvm_pmu_sync_hwstate(vcpu);
-		kvm_timer_sync_hwstate(vcpu);
-
-		kvm_vgic_sync_hwstate(vcpu);
-
 		preempt_enable();
 
 		ret = handle_exit(vcpu, run, ret);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: kvm, Marc Zyngier, Will Deacon, Catalin Marinas, Christoffer Dall

As we are about to be lazy with saving and restoring the timer
registers, we prepare by moving all possible timer configuration logic
out of the hyp code.  All virtual timer registers can be programmed from
EL1 and since the arch timer is always a level triggered interrupt we
can safely do this with interrupts disabled in the host kernel on the
way to the guest without taking vtimer interrupts in the host kernel
(yet).

The downside is that the cntvoff register can only be programmed from
hyp mode, so we jump into hyp mode and back to program it.  This is also
safe, because the host kernel doesn't use the virtual timer in the KVM
code.  It may add a little performance performance penalty, but only
until following commits where we move this operation to vcpu load/put.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm/include/asm/kvm_asm.h   |  2 ++
 arch/arm/include/asm/kvm_hyp.h   |  4 +--
 arch/arm/kvm/hyp/switch.c        |  7 ++--
 arch/arm64/include/asm/kvm_asm.h |  2 ++
 arch/arm64/include/asm/kvm_hyp.h |  4 +--
 arch/arm64/kvm/hyp/switch.c      |  6 ++--
 virt/kvm/arm/arch_timer.c        | 40 ++++++++++++++++++++++
 virt/kvm/arm/hyp/timer-sr.c      | 74 +++++++++++++++++-----------------------
 8 files changed, 87 insertions(+), 52 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 14d68a4..36dd296 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -68,6 +68,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
+extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
+
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
 extern void __init_stage2_translation(void);
diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 14b5903..ab20ffa 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -98,8 +98,8 @@
 #define cntvoff_el2			CNTVOFF
 #define cnthctl_el2			CNTHCTL
 
-void __timer_save_state(struct kvm_vcpu *vcpu);
-void __timer_restore_state(struct kvm_vcpu *vcpu);
+void __timer_enable_traps(struct kvm_vcpu *vcpu);
+void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
 void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index ebd2dd4..330c9ce 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -174,7 +174,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__activate_vm(vcpu);
 
 	__vgic_restore_state(vcpu);
-	__timer_restore_state(vcpu);
+	__timer_enable_traps(vcpu);
 
 	__sysreg_restore_state(guest_ctxt);
 	__banked_restore_state(guest_ctxt);
@@ -191,7 +191,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__banked_save_state(guest_ctxt);
 	__sysreg_save_state(guest_ctxt);
-	__timer_save_state(vcpu);
+	__timer_disable_traps(vcpu);
+
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
@@ -237,7 +238,7 @@ void __hyp_text __noreturn __hyp_panic(int cause)
 
 		vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
 		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-		__timer_save_state(vcpu);
+		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
 		__deactivate_vm(vcpu);
 		__banked_restore_state(host_ctxt);
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 26a64d0..ab4d0a9 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -55,6 +55,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
+extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
+
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
 extern u64 __vgic_v3_get_ich_vtr_el2(void);
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b..08d3bb6 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -129,8 +129,8 @@ void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
-void __timer_save_state(struct kvm_vcpu *vcpu);
-void __timer_restore_state(struct kvm_vcpu *vcpu);
+void __timer_enable_traps(struct kvm_vcpu *vcpu);
+void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
 void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
 void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..4994f4b 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -298,7 +298,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__activate_vm(vcpu);
 
 	__vgic_restore_state(vcpu);
-	__timer_restore_state(vcpu);
+	__timer_enable_traps(vcpu);
 
 	/*
 	 * We must restore the 32-bit state before the sysregs, thanks
@@ -368,7 +368,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
-	__timer_save_state(vcpu);
+	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
@@ -436,7 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void)
 
 		vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
 		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-		__timer_save_state(vcpu);
+		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
 		__deactivate_vm(vcpu);
 		__sysreg_restore_host_state(host_ctxt);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 7f87099..4254f88 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -276,6 +276,20 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
 	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
 }
 
+static void timer_save_state(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (timer->enabled) {
+		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
+		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
+	}
+
+	/* Disable the virtual timer */
+	write_sysreg_el0(0, cntv_ctl);
+}
+
 /*
  * Schedule the background timer before calling kvm_vcpu_block, so that this
  * thread is removed from its waitqueue and made runnable when there's a timer
@@ -312,6 +326,18 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
 }
 
+static void timer_restore_state(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (timer->enabled) {
+		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
+		isb();
+		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
+	}
+}
+
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
@@ -320,6 +346,13 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 	timer->armed = false;
 }
 
+static void set_cntvoff(u64 cntvoff)
+{
+	u32 low = cntvoff & GENMASK(31, 0);
+	u32 high = (cntvoff >> 32) & GENMASK(31, 0);
+	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
+}
+
 static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
@@ -423,6 +456,7 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
 void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	if (unlikely(!timer->enabled))
 		return;
@@ -436,6 +470,9 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 		kvm_timer_flush_hwstate_user(vcpu);
 	else
 		kvm_timer_flush_hwstate_vgic(vcpu);
+
+	set_cntvoff(vtimer->cntvoff);
+	timer_restore_state(vcpu);
 }
 
 /**
@@ -455,6 +492,9 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 */
 	soft_timer_cancel(&timer->phys_timer, NULL);
 
+	timer_save_state(vcpu);
+	set_cntvoff(0);
+
 	/*
 	 * The guest could have modified the timer registers or the timer
 	 * could have expired, update the timer state.
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index 4734915..a6c3b10 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -21,58 +21,48 @@
 
 #include <asm/kvm_hyp.h>
 
-/* vcpu is already in the HYP VA space */
-void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
+void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
+{
+	u64 cntvoff = (u64)cntvoff_high << 32 | cntvoff_low;
+	write_sysreg(cntvoff, cntvoff_el2);
+}
+
+void __hyp_text enable_phys_timer(void)
 {
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	u64 val;
 
-	if (timer->enabled) {
-		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
-		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
-	}
+	/* Allow physical timer/counter access for the host */
+	val = read_sysreg(cnthctl_el2);
+	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
+	write_sysreg(val, cnthctl_el2);
+}
 
-	/* Disable the virtual timer */
-	write_sysreg_el0(0, cntv_ctl);
+void __hyp_text disable_phys_timer(void)
+{
+	u64 val;
 
 	/*
+	 * Disallow physical timer access for the guest
+	 * Physical counter access is allowed
+	 */
+	val = read_sysreg(cnthctl_el2);
+	val &= ~CNTHCTL_EL1PCEN;
+	val |= CNTHCTL_EL1PCTEN;
+	write_sysreg(val, cnthctl_el2);
+}
+
+void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
+{
+	/*
 	 * We don't need to do this for VHE since the host kernel runs in EL2
 	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
 	 */
-	if (!has_vhe()) {
-		/* Allow physical timer/counter access for the host */
-		val = read_sysreg(cnthctl_el2);
-		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
-		write_sysreg(val, cnthctl_el2);
-	}
-
-	/* Clear cntvoff for the host */
-	write_sysreg(0, cntvoff_el2);
+	if (!has_vhe())
+		enable_phys_timer();
 }
 
-void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
+void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
-	u64 val;
-
-	/* Those bits are already configured at boot on VHE-system */
-	if (!has_vhe()) {
-		/*
-		 * Disallow physical timer access for the guest
-		 * Physical counter access is allowed
-		 */
-		val = read_sysreg(cnthctl_el2);
-		val &= ~CNTHCTL_EL1PCEN;
-		val |= CNTHCTL_EL1PCTEN;
-		write_sysreg(val, cnthctl_el2);
-	}
-
-	if (timer->enabled) {
-		write_sysreg(vtimer->cntvoff, cntvoff_el2);
-		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
-		isb();
-		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
-	}
+	if (!has_vhe())
+		disable_phys_timer();
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

As we are about to be lazy with saving and restoring the timer
registers, we prepare by moving all possible timer configuration logic
out of the hyp code.  All virtual timer registers can be programmed from
EL1 and since the arch timer is always a level triggered interrupt we
can safely do this with interrupts disabled in the host kernel on the
way to the guest without taking vtimer interrupts in the host kernel
(yet).

The downside is that the cntvoff register can only be programmed from
hyp mode, so we jump into hyp mode and back to program it.  This is also
safe, because the host kernel doesn't use the virtual timer in the KVM
code.  It may add a little performance performance penalty, but only
until following commits where we move this operation to vcpu load/put.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm/include/asm/kvm_asm.h   |  2 ++
 arch/arm/include/asm/kvm_hyp.h   |  4 +--
 arch/arm/kvm/hyp/switch.c        |  7 ++--
 arch/arm64/include/asm/kvm_asm.h |  2 ++
 arch/arm64/include/asm/kvm_hyp.h |  4 +--
 arch/arm64/kvm/hyp/switch.c      |  6 ++--
 virt/kvm/arm/arch_timer.c        | 40 ++++++++++++++++++++++
 virt/kvm/arm/hyp/timer-sr.c      | 74 +++++++++++++++++-----------------------
 8 files changed, 87 insertions(+), 52 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 14d68a4..36dd296 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -68,6 +68,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
+extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
+
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
 extern void __init_stage2_translation(void);
diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 14b5903..ab20ffa 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -98,8 +98,8 @@
 #define cntvoff_el2			CNTVOFF
 #define cnthctl_el2			CNTHCTL
 
-void __timer_save_state(struct kvm_vcpu *vcpu);
-void __timer_restore_state(struct kvm_vcpu *vcpu);
+void __timer_enable_traps(struct kvm_vcpu *vcpu);
+void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
 void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index ebd2dd4..330c9ce 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -174,7 +174,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__activate_vm(vcpu);
 
 	__vgic_restore_state(vcpu);
-	__timer_restore_state(vcpu);
+	__timer_enable_traps(vcpu);
 
 	__sysreg_restore_state(guest_ctxt);
 	__banked_restore_state(guest_ctxt);
@@ -191,7 +191,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__banked_save_state(guest_ctxt);
 	__sysreg_save_state(guest_ctxt);
-	__timer_save_state(vcpu);
+	__timer_disable_traps(vcpu);
+
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
@@ -237,7 +238,7 @@ void __hyp_text __noreturn __hyp_panic(int cause)
 
 		vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
 		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-		__timer_save_state(vcpu);
+		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
 		__deactivate_vm(vcpu);
 		__banked_restore_state(host_ctxt);
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 26a64d0..ab4d0a9 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -55,6 +55,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
+extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
+
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
 extern u64 __vgic_v3_get_ich_vtr_el2(void);
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b..08d3bb6 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -129,8 +129,8 @@ void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
-void __timer_save_state(struct kvm_vcpu *vcpu);
-void __timer_restore_state(struct kvm_vcpu *vcpu);
+void __timer_enable_traps(struct kvm_vcpu *vcpu);
+void __timer_disable_traps(struct kvm_vcpu *vcpu);
 
 void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
 void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..4994f4b 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -298,7 +298,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	__activate_vm(vcpu);
 
 	__vgic_restore_state(vcpu);
-	__timer_restore_state(vcpu);
+	__timer_enable_traps(vcpu);
 
 	/*
 	 * We must restore the 32-bit state before the sysregs, thanks
@@ -368,7 +368,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_save_guest_state(guest_ctxt);
 	__sysreg32_save_state(vcpu);
-	__timer_save_state(vcpu);
+	__timer_disable_traps(vcpu);
 	__vgic_save_state(vcpu);
 
 	__deactivate_traps(vcpu);
@@ -436,7 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void)
 
 		vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
 		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
-		__timer_save_state(vcpu);
+		__timer_disable_traps(vcpu);
 		__deactivate_traps(vcpu);
 		__deactivate_vm(vcpu);
 		__sysreg_restore_host_state(host_ctxt);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 7f87099..4254f88 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -276,6 +276,20 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
 	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
 }
 
+static void timer_save_state(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (timer->enabled) {
+		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
+		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
+	}
+
+	/* Disable the virtual timer */
+	write_sysreg_el0(0, cntv_ctl);
+}
+
 /*
  * Schedule the background timer before calling kvm_vcpu_block, so that this
  * thread is removed from its waitqueue and made runnable when there's a timer
@@ -312,6 +326,18 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
 }
 
+static void timer_restore_state(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (timer->enabled) {
+		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
+		isb();
+		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
+	}
+}
+
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
@@ -320,6 +346,13 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 	timer->armed = false;
 }
 
+static void set_cntvoff(u64 cntvoff)
+{
+	u32 low = cntvoff & GENMASK(31, 0);
+	u32 high = (cntvoff >> 32) & GENMASK(31, 0);
+	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
+}
+
 static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
@@ -423,6 +456,7 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
 void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	if (unlikely(!timer->enabled))
 		return;
@@ -436,6 +470,9 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 		kvm_timer_flush_hwstate_user(vcpu);
 	else
 		kvm_timer_flush_hwstate_vgic(vcpu);
+
+	set_cntvoff(vtimer->cntvoff);
+	timer_restore_state(vcpu);
 }
 
 /**
@@ -455,6 +492,9 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 */
 	soft_timer_cancel(&timer->phys_timer, NULL);
 
+	timer_save_state(vcpu);
+	set_cntvoff(0);
+
 	/*
 	 * The guest could have modified the timer registers or the timer
 	 * could have expired, update the timer state.
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index 4734915..a6c3b10 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -21,58 +21,48 @@
 
 #include <asm/kvm_hyp.h>
 
-/* vcpu is already in the HYP VA space */
-void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
+void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
+{
+	u64 cntvoff = (u64)cntvoff_high << 32 | cntvoff_low;
+	write_sysreg(cntvoff, cntvoff_el2);
+}
+
+void __hyp_text enable_phys_timer(void)
 {
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	u64 val;
 
-	if (timer->enabled) {
-		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
-		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
-	}
+	/* Allow physical timer/counter access for the host */
+	val = read_sysreg(cnthctl_el2);
+	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
+	write_sysreg(val, cnthctl_el2);
+}
 
-	/* Disable the virtual timer */
-	write_sysreg_el0(0, cntv_ctl);
+void __hyp_text disable_phys_timer(void)
+{
+	u64 val;
 
 	/*
+	 * Disallow physical timer access for the guest
+	 * Physical counter access is allowed
+	 */
+	val = read_sysreg(cnthctl_el2);
+	val &= ~CNTHCTL_EL1PCEN;
+	val |= CNTHCTL_EL1PCTEN;
+	write_sysreg(val, cnthctl_el2);
+}
+
+void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
+{
+	/*
 	 * We don't need to do this for VHE since the host kernel runs in EL2
 	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
 	 */
-	if (!has_vhe()) {
-		/* Allow physical timer/counter access for the host */
-		val = read_sysreg(cnthctl_el2);
-		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
-		write_sysreg(val, cnthctl_el2);
-	}
-
-	/* Clear cntvoff for the host */
-	write_sysreg(0, cntvoff_el2);
+	if (!has_vhe())
+		enable_phys_timer();
 }
 
-void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
+void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
-	u64 val;
-
-	/* Those bits are already configured at boot on VHE-system */
-	if (!has_vhe()) {
-		/*
-		 * Disallow physical timer access for the guest
-		 * Physical counter access is allowed
-		 */
-		val = read_sysreg(cnthctl_el2);
-		val &= ~CNTHCTL_EL1PCEN;
-		val |= CNTHCTL_EL1PCTEN;
-		write_sysreg(val, cnthctl_el2);
-	}
-
-	if (timer->enabled) {
-		write_sysreg(vtimer->cntvoff, cntvoff_el2);
-		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
-		isb();
-		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
-	}
+	if (!has_vhe())
+		disable_phys_timer();
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 12/20] genirq: Document vcpu_info usage for percpu_devid interrupts
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:41   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Christoffer Dall, kvm, Marc Zyngier, Catalin Marinas,
	Will Deacon, Thomas Gleixner

It is currently unclear how to set the VCPU affinity for a percpu_devid
interrupt , since the Linux irq_data structure describes the state for
multiple interrupts, one for each physical CPU on the system.  Since
each such interrupt can be associated with different VCPUs or none at
all, associating a single VCPU state with such an interrupt does not
capture the necessary semantics.

The implementers of irq_set_affinity are the Intel and AMD IOMMUs, and
the ARM GIC irqchip.  The Intel and AMD callers do not appear to use
percpu_devid interrupts, and the ARM GIC implementation only checks the
pointer against NULL vs. non-NULL.

Therefore, simply update the function documentation to explain the
expected use in the context of percpu_devid interrupts, allowing future
changes or additions to irqchip implementers to do the right thing.

This allows us to set the VCPU affinity for the virtual timer interrupt
in KVM/ARM, which is a percpu_devid (PPI) interrupt.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 kernel/irq/manage.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 573dc52..2b2c94f 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -381,7 +381,8 @@ int irq_select_affinity_usr(unsigned int irq)
 /**
  *	irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
  *	@irq: interrupt number to set affinity
- *	@vcpu_info: vCPU specific data
+ *	@vcpu_info: vCPU specific data or pointer to a percpu array of vCPU
+ *	            specific data for percpu_devid interrupts
  *
  *	This function uses the vCPU specific data to set the vCPU
  *	affinity for an irq. The vCPU specific data is passed from
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 12/20] genirq: Document vcpu_info usage for percpu_devid interrupts
@ 2017-09-23  0:41   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:41 UTC (permalink / raw)
  To: linux-arm-kernel

It is currently unclear how to set the VCPU affinity for a percpu_devid
interrupt , since the Linux irq_data structure describes the state for
multiple interrupts, one for each physical CPU on the system.  Since
each such interrupt can be associated with different VCPUs or none at
all, associating a single VCPU state with such an interrupt does not
capture the necessary semantics.

The implementers of irq_set_affinity are the Intel and AMD IOMMUs, and
the ARM GIC irqchip.  The Intel and AMD callers do not appear to use
percpu_devid interrupts, and the ARM GIC implementation only checks the
pointer against NULL vs. non-NULL.

Therefore, simply update the function documentation to explain the
expected use in the context of percpu_devid interrupts, allowing future
changes or additions to irqchip implementers to do the right thing.

This allows us to set the VCPU affinity for the virtual timer interrupt
in KVM/ARM, which is a percpu_devid (PPI) interrupt.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 kernel/irq/manage.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 573dc52..2b2c94f 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -381,7 +381,8 @@ int irq_select_affinity_usr(unsigned int irq)
 /**
  *	irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
  *	@irq: interrupt number to set affinity
- *	@vcpu_info: vCPU specific data
+ *	@vcpu_info: vCPU specific data or pointer to a percpu array of vCPU
+ *	            specific data for percpu_devid interrupts
  *
  *	This function uses the vCPU specific data to set the vCPU
  *	affinity for an irq. The vCPU specific data is passed from
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 13/20] KVM: arm/arm64: Set VCPU affinity for virt timer irq
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:42   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

As we are about to take physical interrupts for the virtual timer on the
host but want to leave those active while running the VM (and let the VM
deactivate them), we need to set the vtimer PPI affinity accordingly.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 4254f88..4275f8f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -650,11 +650,20 @@ int kvm_timer_hyp_init(void)
 		return err;
 	}
 
+	err = irq_set_vcpu_affinity(host_vtimer_irq, kvm_get_running_vcpus());
+	if (err) {
+		kvm_err("kvm_arch_timer: error setting vcpu affinity\n");
+		goto out_free_irq;
+	}
+
 	kvm_info("virtual timer IRQ%d\n", host_vtimer_irq);
 
 	cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
 			  "kvm/arm/timer:starting", kvm_timer_starting_cpu,
 			  kvm_timer_dying_cpu);
+	return 0;
+out_free_irq:
+	free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
 	return err;
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 13/20] KVM: arm/arm64: Set VCPU affinity for virt timer irq
@ 2017-09-23  0:42   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

As we are about to take physical interrupts for the virtual timer on the
host but want to leave those active while running the VM (and let the VM
deactivate them), we need to set the vtimer PPI affinity accordingly.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 4254f88..4275f8f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -650,11 +650,20 @@ int kvm_timer_hyp_init(void)
 		return err;
 	}
 
+	err = irq_set_vcpu_affinity(host_vtimer_irq, kvm_get_running_vcpus());
+	if (err) {
+		kvm_err("kvm_arch_timer: error setting vcpu affinity\n");
+		goto out_free_irq;
+	}
+
 	kvm_info("virtual timer IRQ%d\n", host_vtimer_irq);
 
 	cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
 			  "kvm/arm/timer:starting", kvm_timer_starting_cpu,
 			  kvm_timer_dying_cpu);
+	return 0;
+out_free_irq:
+	free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
 	return err;
 }
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:42   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

We don't need to save and restore the hardware timer state and examine
if it generates interrupts on on every entry/exit to the guest.  The
timer hardware is perfectly capable of telling us when it has expired
by signaling interrupts.

When taking a vtimer interrupt in the host, we don't want to mess with
the timer configuration, we just want to forward the physical interrupt
to the guest as a virtual interrupt.  We can use the split priority drop
and deactivate feature of the GIC to do this, which leaves an EOI'ed
interrupt active on the physical distributor, making sure we don't keep
taking timer interrupts which would prevent the guest from running.  We
can then forward the physical interrupt to the VM using the HW bit in
the LR of the GIC VE, like we do already, which lets the guest directly
deactivate both the physical and virtual timer simultaneously, allowing
the timer hardware to exit the VM and generate a new physical interrupt
when the timer output is again asserted later on.

We do need to capture this state when migrating VCPUs between physical
CPUs, however, which we use the vcpu put/load functions for, which are
called through preempt notifiers whenever the thread is scheduled away
from the CPU or called directly if we return from the ioctl to
userspace.

One caveat is that we cannot restore the timer state during
kvm_timer_vcpu_load, because the flow of sleeping a VCPU is:

  1. kvm_vcpu_block
  2. kvm_timer_schedule
  3. schedule
  4. kvm_timer_vcpu_put (preempt notifier)
  5. schedule (vcpu thread gets scheduled back)
  6. kvm_timer_vcpu_load
        <---- We restore the hardware state here, but the bg_timer
	      hrtimer may have scheduled a work function that also
	      changes the timer state here.
  7. kvm_timer_unschedule
        <---- We can restore the state here instead

So, while we do need to restore the timer state in step (6) in all other
cases than when we called kvm_vcpu_block(), we have to defer the restore
to step (7) when coming back after kvm_vcpu_block().  Note that we
cannot simply call cancel_work_sync() in step (6), because vcpu_load can
be called from a preempt notifier.

An added benefit beyond not having to read and write the timer sysregs
on every entry and exit is that we no longer have to actively write the
active state to the physical distributor, because we set the affinity of
the vtimer interrupt when loading the timer state, so that the interrupt
automatically stays active after firing.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |   9 +-
 virt/kvm/arm/arch_timer.c    | 238 +++++++++++++++++++++++++++----------------
 virt/kvm/arm/arm.c           |  19 +++-
 virt/kvm/arm/hyp/timer-sr.c  |   8 +-
 4 files changed, 174 insertions(+), 100 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 16887c0..8e5ed54 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -31,8 +31,8 @@ struct arch_timer_context {
 	/* Timer IRQ */
 	struct kvm_irq_level		irq;
 
-	/* Active IRQ state caching */
-	bool				active_cleared_last;
+	/* Is the timer state loaded on the hardware timer */
+	bool			loaded;
 
 	/* Virtual offset */
 	u64			cntvoff;
@@ -80,10 +80,15 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
 u64 kvm_phys_timer_read(void);
 
+void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
 void kvm_timer_init_vhe(void);
 
 #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.vtimer)
 #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.ptimer)
+
+void enable_el1_phys_timer_access(void);
+void disable_el1_phys_timer_access(void);
+
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 4275f8f..70110ea 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -46,10 +46,9 @@ static const struct kvm_irq_level default_vtimer_irq = {
 	.level	= 1,
 };
 
-void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
-{
-	vcpu_vtimer(vcpu)->active_cleared_last = false;
-}
+static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
+static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
+				 struct arch_timer_context *timer_ctx);
 
 u64 kvm_phys_timer_read(void)
 {
@@ -74,17 +73,37 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
 		cancel_work_sync(work);
 }
 
-static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
+static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	/*
-	 * We disable the timer in the world switch and let it be
-	 * handled by kvm_timer_sync_hwstate(). Getting a timer
-	 * interrupt at this point is a sure sign of some major
-	 * breakage.
+	 * To prevent continuously exiting from the guest, we mask the
+	 * physical interrupt when the virtual level is high, such that the
+	 * guest can make forward progress.  Once we detect the output level
+	 * being deasserted, we unmask the interrupt again so that we exit
+	 * from the guest when the timer fires.
 	 */
-	pr_warn("Unexpected interrupt %d on vcpu %p\n", irq, vcpu);
+	if (vtimer->irq.level)
+		disable_percpu_irq(host_vtimer_irq);
+	else
+		enable_percpu_irq(host_vtimer_irq, 0);
+}
+
+static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
+{
+	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (!vtimer->irq.level) {
+		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
+		if (kvm_timer_irq_can_fire(vtimer))
+			kvm_timer_update_irq(vcpu, true, vtimer);
+	}
+
+	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
+		kvm_vtimer_update_mask_user(vcpu);
+
 	return IRQ_HANDLED;
 }
 
@@ -220,7 +239,6 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 {
 	int ret;
 
-	timer_ctx->active_cleared_last = false;
 	timer_ctx->irq.level = new_level;
 	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
 				   timer_ctx->irq.level);
@@ -276,10 +294,16 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
 	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
 }
 
-static void timer_save_state(struct kvm_vcpu *vcpu)
+static void vtimer_save_state(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (!vtimer->loaded)
+		goto out;
 
 	if (timer->enabled) {
 		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
@@ -288,6 +312,10 @@ static void timer_save_state(struct kvm_vcpu *vcpu)
 
 	/* Disable the virtual timer */
 	write_sysreg_el0(0, cntv_ctl);
+
+	vtimer->loaded = false;
+out:
+	local_irq_restore(flags);
 }
 
 /*
@@ -303,6 +331,8 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 
 	BUG_ON(bg_timer_is_armed(timer));
 
+	vtimer_save_state(vcpu);
+
 	/*
 	 * No need to schedule a background timer if any guest timer has
 	 * already expired, because kvm_vcpu_block will return before putting
@@ -326,16 +356,26 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
 }
 
-static void timer_restore_state(struct kvm_vcpu *vcpu)
+static void vtimer_restore_state(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (vtimer->loaded)
+		goto out;
 
 	if (timer->enabled) {
 		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
 		isb();
 		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
 	}
+
+	vtimer->loaded = true;
+out:
+	local_irq_restore(flags);
 }
 
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
@@ -344,6 +384,8 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 
 	soft_timer_cancel(&timer->bg_timer, &timer->expired);
 	timer->armed = false;
+
+	vtimer_restore_state(vcpu);
 }
 
 static void set_cntvoff(u64 cntvoff)
@@ -353,61 +395,56 @@ static void set_cntvoff(u64 cntvoff)
 	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
 }
 
-static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
+static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	bool phys_active;
 	int ret;
 
-	/*
-	* If we enter the guest with the virtual input level to the VGIC
-	* asserted, then we have already told the VGIC what we need to, and
-	* we don't need to exit from the guest until the guest deactivates
-	* the already injected interrupt, so therefore we should set the
-	* hardware active state to prevent unnecessary exits from the guest.
-	*
-	* Also, if we enter the guest with the virtual timer interrupt active,
-	* then it must be active on the physical distributor, because we set
-	* the HW bit and the guest must be able to deactivate the virtual and
-	* physical interrupt at the same time.
-	*
-	* Conversely, if the virtual input level is deasserted and the virtual
-	* interrupt is not active, then always clear the hardware active state
-	* to ensure that hardware interrupts from the timer triggers a guest
-	* exit.
-	*/
-	phys_active = vtimer->irq.level ||
-			kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
-
-	/*
-	 * We want to avoid hitting the (re)distributor as much as
-	 * possible, as this is a potentially expensive MMIO access
-	 * (not to mention locks in the irq layer), and a solution for
-	 * this is to cache the "active" state in memory.
-	 *
-	 * Things to consider: we cannot cache an "active set" state,
-	 * because the HW can change this behind our back (it becomes
-	 * "clear" in the HW). We must then restrict the caching to
-	 * the "clear" state.
-	 *
-	 * The cache is invalidated on:
-	 * - vcpu put, indicating that the HW cannot be trusted to be
-	 *   in a sane state on the next vcpu load,
-	 * - any change in the interrupt state
-	 *
-	 * Usage conditions:
-	 * - cached value is "active clear"
-	 * - value to be programmed is "active clear"
-	 */
-	if (vtimer->active_cleared_last && !phys_active)
-		return;
-
+	if (vtimer->irq.level || kvm_vgic_map_is_active(vcpu, vtimer->irq.irq))
+		phys_active = true;
+	else
+		phys_active = false;
 	ret = irq_set_irqchip_state(host_vtimer_irq,
 				    IRQCHIP_STATE_ACTIVE,
 				    phys_active);
 	WARN_ON(ret);
+}
+
+static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)
+{
+	kvm_vtimer_update_mask_user(vcpu);
+}
+
+void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (unlikely(!timer->enabled))
+		return;
+
+	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
+		kvm_timer_vcpu_load_user(vcpu);
+	else
+		kvm_timer_vcpu_load_vgic(vcpu);
 
-	vtimer->active_cleared_last = !phys_active;
+	set_cntvoff(vtimer->cntvoff);
+
+	/*
+	 * If we armed a soft timer and potentially queued work, we have to
+	 * cancel this, but cannot do it here, because canceling work can
+	 * sleep and we can be in the middle of a preempt notifier call.
+	 * Instead, when the timer has been armed, we know the return path
+	 * from kvm_vcpu_block will call kvm_timer_unschedule, so we can defer
+	 * restoring the state and canceling any soft timers and work items
+	 * until then.
+	 */
+	if (!bg_timer_is_armed(timer))
+		vtimer_restore_state(vcpu);
+
+	if (has_vhe())
+		disable_el1_phys_timer_access();
 }
 
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
@@ -427,23 +464,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
 	       ptimer->irq.level != plevel;
 }
 
-static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
-{
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
-
-	/*
-	 * To prevent continuously exiting from the guest, we mask the
-	 * physical interrupt such that the guest can make forward progress.
-	 * Once we detect the output level being deasserted, we unmask the
-	 * interrupt again so that we exit from the guest when the timer
-	 * fires.
-	*/
-	if (vtimer->irq.level)
-		disable_percpu_irq(host_vtimer_irq);
-	else
-		enable_percpu_irq(host_vtimer_irq, 0);
-}
-
 /**
  * kvm_timer_flush_hwstate - prepare timers before running the vcpu
  * @vcpu: The vcpu pointer
@@ -456,23 +476,55 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
 void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
 	if (unlikely(!timer->enabled))
 		return;
 
-	kvm_timer_update_state(vcpu);
+	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
+		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
 
 	/* Set the background timer for the physical timer emulation. */
 	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
+}
 
-	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
-		kvm_timer_flush_hwstate_user(vcpu);
-	else
-		kvm_timer_flush_hwstate_vgic(vcpu);
+void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
-	set_cntvoff(vtimer->cntvoff);
-	timer_restore_state(vcpu);
+	if (unlikely(!timer->enabled))
+		return;
+
+	if (has_vhe())
+		enable_el1_phys_timer_access();
+
+	vtimer_save_state(vcpu);
+
+	set_cntvoff(0);
+}
+
+static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
+		kvm_vtimer_update_mask_user(vcpu);
+		return;
+	}
+
+	/*
+	 * If the guest disabled the timer without acking the interrupt, then
+	 * we must make sure the physical and virtual active states are in
+	 * sync by deactivating the physical interrupt, because otherwise we
+	 * wouldn't see the next timer interrupt in the host.
+	 */
+	if (!kvm_vgic_map_is_active(vcpu, vtimer->irq.irq)) {
+		int ret;
+		ret = irq_set_irqchip_state(host_vtimer_irq,
+					    IRQCHIP_STATE_ACTIVE,
+					    false);
+		WARN_ON(ret);
+	}
 }
 
 /**
@@ -485,6 +537,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	/*
 	 * This is to cancel the background timer for the physical timer
@@ -492,14 +545,19 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 */
 	soft_timer_cancel(&timer->phys_timer, NULL);
 
-	timer_save_state(vcpu);
-	set_cntvoff(0);
-
 	/*
-	 * The guest could have modified the timer registers or the timer
-	 * could have expired, update the timer state.
+	 * If we entered the guest with the vtimer output asserted we have to
+	 * check if the guest has modified the timer so that we should lower
+	 * the line at this point.
 	 */
-	kvm_timer_update_state(vcpu);
+	if (vtimer->irq.level) {
+		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
+		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
+		if (!kvm_timer_should_fire(vtimer)) {
+			kvm_timer_update_irq(vcpu, false, vtimer);
+			unmask_vtimer_irq(vcpu);
+		}
+	}
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 27db222..132d39a 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -354,18 +354,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
 	kvm_arm_set_running_vcpu(vcpu);
-
 	kvm_vgic_load(vcpu);
+	kvm_timer_vcpu_load(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 
 	vcpu->cpu = -1;
 
 	kvm_arm_set_running_vcpu(NULL);
-	kvm_timer_vcpu_put(vcpu);
 }
 
 static void vcpu_power_off(struct kvm_vcpu *vcpu)
@@ -710,16 +710,27 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_arm_clear_debug(vcpu);
 
 		/*
-		 * We must sync the PMU and timer state before the vgic state so
+		 * We must sync the PMU state before the vgic state so
 		 * that the vgic can properly sample the updated state of the
 		 * interrupt line.
 		 */
 		kvm_pmu_sync_hwstate(vcpu);
-		kvm_timer_sync_hwstate(vcpu);
 
+		/*
+		 * Sync the vgic state before syncing the timer state because
+		 * the timer code needs to know if the virtual timer
+		 * interrupts are active.
+		 */
 		kvm_vgic_sync_hwstate(vcpu);
 
 		/*
+		 * Sync the timer hardware state before enabling interrupts as
+		 * we don't want vtimer interrupts to race with syncing the
+		 * timer virtual interrupt state.
+		 */
+		kvm_timer_sync_hwstate(vcpu);
+
+		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
 		 * pending, as we haven't serviced it yet!
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index a6c3b10..f398616 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -27,7 +27,7 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
 	write_sysreg(cntvoff, cntvoff_el2);
 }
 
-void __hyp_text enable_phys_timer(void)
+void __hyp_text enable_el1_phys_timer_access(void)
 {
 	u64 val;
 
@@ -37,7 +37,7 @@ void __hyp_text enable_phys_timer(void)
 	write_sysreg(val, cnthctl_el2);
 }
 
-void __hyp_text disable_phys_timer(void)
+void __hyp_text disable_el1_phys_timer_access(void)
 {
 	u64 val;
 
@@ -58,11 +58,11 @@ void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
 	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
 	 */
 	if (!has_vhe())
-		enable_phys_timer();
+		enable_el1_phys_timer_access();
 }
 
 void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
 	if (!has_vhe())
-		disable_phys_timer();
+		disable_el1_phys_timer_access();
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
@ 2017-09-23  0:42   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

We don't need to save and restore the hardware timer state and examine
if it generates interrupts on on every entry/exit to the guest.  The
timer hardware is perfectly capable of telling us when it has expired
by signaling interrupts.

When taking a vtimer interrupt in the host, we don't want to mess with
the timer configuration, we just want to forward the physical interrupt
to the guest as a virtual interrupt.  We can use the split priority drop
and deactivate feature of the GIC to do this, which leaves an EOI'ed
interrupt active on the physical distributor, making sure we don't keep
taking timer interrupts which would prevent the guest from running.  We
can then forward the physical interrupt to the VM using the HW bit in
the LR of the GIC VE, like we do already, which lets the guest directly
deactivate both the physical and virtual timer simultaneously, allowing
the timer hardware to exit the VM and generate a new physical interrupt
when the timer output is again asserted later on.

We do need to capture this state when migrating VCPUs between physical
CPUs, however, which we use the vcpu put/load functions for, which are
called through preempt notifiers whenever the thread is scheduled away
from the CPU or called directly if we return from the ioctl to
userspace.

One caveat is that we cannot restore the timer state during
kvm_timer_vcpu_load, because the flow of sleeping a VCPU is:

  1. kvm_vcpu_block
  2. kvm_timer_schedule
  3. schedule
  4. kvm_timer_vcpu_put (preempt notifier)
  5. schedule (vcpu thread gets scheduled back)
  6. kvm_timer_vcpu_load
        <---- We restore the hardware state here, but the bg_timer
	      hrtimer may have scheduled a work function that also
	      changes the timer state here.
  7. kvm_timer_unschedule
        <---- We can restore the state here instead

So, while we do need to restore the timer state in step (6) in all other
cases than when we called kvm_vcpu_block(), we have to defer the restore
to step (7) when coming back after kvm_vcpu_block().  Note that we
cannot simply call cancel_work_sync() in step (6), because vcpu_load can
be called from a preempt notifier.

An added benefit beyond not having to read and write the timer sysregs
on every entry and exit is that we no longer have to actively write the
active state to the physical distributor, because we set the affinity of
the vtimer interrupt when loading the timer state, so that the interrupt
automatically stays active after firing.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |   9 +-
 virt/kvm/arm/arch_timer.c    | 238 +++++++++++++++++++++++++++----------------
 virt/kvm/arm/arm.c           |  19 +++-
 virt/kvm/arm/hyp/timer-sr.c  |   8 +-
 4 files changed, 174 insertions(+), 100 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 16887c0..8e5ed54 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -31,8 +31,8 @@ struct arch_timer_context {
 	/* Timer IRQ */
 	struct kvm_irq_level		irq;
 
-	/* Active IRQ state caching */
-	bool				active_cleared_last;
+	/* Is the timer state loaded on the hardware timer */
+	bool			loaded;
 
 	/* Virtual offset */
 	u64			cntvoff;
@@ -80,10 +80,15 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
 u64 kvm_phys_timer_read(void);
 
+void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
 void kvm_timer_init_vhe(void);
 
 #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.vtimer)
 #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.ptimer)
+
+void enable_el1_phys_timer_access(void);
+void disable_el1_phys_timer_access(void);
+
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 4275f8f..70110ea 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -46,10 +46,9 @@ static const struct kvm_irq_level default_vtimer_irq = {
 	.level	= 1,
 };
 
-void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
-{
-	vcpu_vtimer(vcpu)->active_cleared_last = false;
-}
+static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
+static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
+				 struct arch_timer_context *timer_ctx);
 
 u64 kvm_phys_timer_read(void)
 {
@@ -74,17 +73,37 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
 		cancel_work_sync(work);
 }
 
-static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
+static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	/*
-	 * We disable the timer in the world switch and let it be
-	 * handled by kvm_timer_sync_hwstate(). Getting a timer
-	 * interrupt at this point is a sure sign of some major
-	 * breakage.
+	 * To prevent continuously exiting from the guest, we mask the
+	 * physical interrupt when the virtual level is high, such that the
+	 * guest can make forward progress.  Once we detect the output level
+	 * being deasserted, we unmask the interrupt again so that we exit
+	 * from the guest when the timer fires.
 	 */
-	pr_warn("Unexpected interrupt %d on vcpu %p\n", irq, vcpu);
+	if (vtimer->irq.level)
+		disable_percpu_irq(host_vtimer_irq);
+	else
+		enable_percpu_irq(host_vtimer_irq, 0);
+}
+
+static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
+{
+	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (!vtimer->irq.level) {
+		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
+		if (kvm_timer_irq_can_fire(vtimer))
+			kvm_timer_update_irq(vcpu, true, vtimer);
+	}
+
+	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
+		kvm_vtimer_update_mask_user(vcpu);
+
 	return IRQ_HANDLED;
 }
 
@@ -220,7 +239,6 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 {
 	int ret;
 
-	timer_ctx->active_cleared_last = false;
 	timer_ctx->irq.level = new_level;
 	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
 				   timer_ctx->irq.level);
@@ -276,10 +294,16 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
 	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
 }
 
-static void timer_save_state(struct kvm_vcpu *vcpu)
+static void vtimer_save_state(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (!vtimer->loaded)
+		goto out;
 
 	if (timer->enabled) {
 		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
@@ -288,6 +312,10 @@ static void timer_save_state(struct kvm_vcpu *vcpu)
 
 	/* Disable the virtual timer */
 	write_sysreg_el0(0, cntv_ctl);
+
+	vtimer->loaded = false;
+out:
+	local_irq_restore(flags);
 }
 
 /*
@@ -303,6 +331,8 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 
 	BUG_ON(bg_timer_is_armed(timer));
 
+	vtimer_save_state(vcpu);
+
 	/*
 	 * No need to schedule a background timer if any guest timer has
 	 * already expired, because kvm_vcpu_block will return before putting
@@ -326,16 +356,26 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
 }
 
-static void timer_restore_state(struct kvm_vcpu *vcpu)
+static void vtimer_restore_state(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	unsigned long flags;
+
+	local_irq_save(flags);
+
+	if (vtimer->loaded)
+		goto out;
 
 	if (timer->enabled) {
 		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
 		isb();
 		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
 	}
+
+	vtimer->loaded = true;
+out:
+	local_irq_restore(flags);
 }
 
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
@@ -344,6 +384,8 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 
 	soft_timer_cancel(&timer->bg_timer, &timer->expired);
 	timer->armed = false;
+
+	vtimer_restore_state(vcpu);
 }
 
 static void set_cntvoff(u64 cntvoff)
@@ -353,61 +395,56 @@ static void set_cntvoff(u64 cntvoff)
 	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
 }
 
-static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
+static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	bool phys_active;
 	int ret;
 
-	/*
-	* If we enter the guest with the virtual input level to the VGIC
-	* asserted, then we have already told the VGIC what we need to, and
-	* we don't need to exit from the guest until the guest deactivates
-	* the already injected interrupt, so therefore we should set the
-	* hardware active state to prevent unnecessary exits from the guest.
-	*
-	* Also, if we enter the guest with the virtual timer interrupt active,
-	* then it must be active on the physical distributor, because we set
-	* the HW bit and the guest must be able to deactivate the virtual and
-	* physical interrupt at the same time.
-	*
-	* Conversely, if the virtual input level is deasserted and the virtual
-	* interrupt is not active, then always clear the hardware active state
-	* to ensure that hardware interrupts from the timer triggers a guest
-	* exit.
-	*/
-	phys_active = vtimer->irq.level ||
-			kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
-
-	/*
-	 * We want to avoid hitting the (re)distributor as much as
-	 * possible, as this is a potentially expensive MMIO access
-	 * (not to mention locks in the irq layer), and a solution for
-	 * this is to cache the "active" state in memory.
-	 *
-	 * Things to consider: we cannot cache an "active set" state,
-	 * because the HW can change this behind our back (it becomes
-	 * "clear" in the HW). We must then restrict the caching to
-	 * the "clear" state.
-	 *
-	 * The cache is invalidated on:
-	 * - vcpu put, indicating that the HW cannot be trusted to be
-	 *   in a sane state on the next vcpu load,
-	 * - any change in the interrupt state
-	 *
-	 * Usage conditions:
-	 * - cached value is "active clear"
-	 * - value to be programmed is "active clear"
-	 */
-	if (vtimer->active_cleared_last && !phys_active)
-		return;
-
+	if (vtimer->irq.level || kvm_vgic_map_is_active(vcpu, vtimer->irq.irq))
+		phys_active = true;
+	else
+		phys_active = false;
 	ret = irq_set_irqchip_state(host_vtimer_irq,
 				    IRQCHIP_STATE_ACTIVE,
 				    phys_active);
 	WARN_ON(ret);
+}
+
+static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)
+{
+	kvm_vtimer_update_mask_user(vcpu);
+}
+
+void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (unlikely(!timer->enabled))
+		return;
+
+	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
+		kvm_timer_vcpu_load_user(vcpu);
+	else
+		kvm_timer_vcpu_load_vgic(vcpu);
 
-	vtimer->active_cleared_last = !phys_active;
+	set_cntvoff(vtimer->cntvoff);
+
+	/*
+	 * If we armed a soft timer and potentially queued work, we have to
+	 * cancel this, but cannot do it here, because canceling work can
+	 * sleep and we can be in the middle of a preempt notifier call.
+	 * Instead, when the timer has been armed, we know the return path
+	 * from kvm_vcpu_block will call kvm_timer_unschedule, so we can defer
+	 * restoring the state and canceling any soft timers and work items
+	 * until then.
+	 */
+	if (!bg_timer_is_armed(timer))
+		vtimer_restore_state(vcpu);
+
+	if (has_vhe())
+		disable_el1_phys_timer_access();
 }
 
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
@@ -427,23 +464,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
 	       ptimer->irq.level != plevel;
 }
 
-static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
-{
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
-
-	/*
-	 * To prevent continuously exiting from the guest, we mask the
-	 * physical interrupt such that the guest can make forward progress.
-	 * Once we detect the output level being deasserted, we unmask the
-	 * interrupt again so that we exit from the guest when the timer
-	 * fires.
-	*/
-	if (vtimer->irq.level)
-		disable_percpu_irq(host_vtimer_irq);
-	else
-		enable_percpu_irq(host_vtimer_irq, 0);
-}
-
 /**
  * kvm_timer_flush_hwstate - prepare timers before running the vcpu
  * @vcpu: The vcpu pointer
@@ -456,23 +476,55 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
 void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
 	if (unlikely(!timer->enabled))
 		return;
 
-	kvm_timer_update_state(vcpu);
+	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
+		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
 
 	/* Set the background timer for the physical timer emulation. */
 	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
+}
 
-	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
-		kvm_timer_flush_hwstate_user(vcpu);
-	else
-		kvm_timer_flush_hwstate_vgic(vcpu);
+void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
-	set_cntvoff(vtimer->cntvoff);
-	timer_restore_state(vcpu);
+	if (unlikely(!timer->enabled))
+		return;
+
+	if (has_vhe())
+		enable_el1_phys_timer_access();
+
+	vtimer_save_state(vcpu);
+
+	set_cntvoff(0);
+}
+
+static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+
+	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
+		kvm_vtimer_update_mask_user(vcpu);
+		return;
+	}
+
+	/*
+	 * If the guest disabled the timer without acking the interrupt, then
+	 * we must make sure the physical and virtual active states are in
+	 * sync by deactivating the physical interrupt, because otherwise we
+	 * wouldn't see the next timer interrupt in the host.
+	 */
+	if (!kvm_vgic_map_is_active(vcpu, vtimer->irq.irq)) {
+		int ret;
+		ret = irq_set_irqchip_state(host_vtimer_irq,
+					    IRQCHIP_STATE_ACTIVE,
+					    false);
+		WARN_ON(ret);
+	}
 }
 
 /**
@@ -485,6 +537,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	/*
 	 * This is to cancel the background timer for the physical timer
@@ -492,14 +545,19 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 	 */
 	soft_timer_cancel(&timer->phys_timer, NULL);
 
-	timer_save_state(vcpu);
-	set_cntvoff(0);
-
 	/*
-	 * The guest could have modified the timer registers or the timer
-	 * could have expired, update the timer state.
+	 * If we entered the guest with the vtimer output asserted we have to
+	 * check if the guest has modified the timer so that we should lower
+	 * the line at this point.
 	 */
-	kvm_timer_update_state(vcpu);
+	if (vtimer->irq.level) {
+		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
+		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
+		if (!kvm_timer_should_fire(vtimer)) {
+			kvm_timer_update_irq(vcpu, false, vtimer);
+			unmask_vtimer_irq(vcpu);
+		}
+	}
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 27db222..132d39a 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -354,18 +354,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
 
 	kvm_arm_set_running_vcpu(vcpu);
-
 	kvm_vgic_load(vcpu);
+	kvm_timer_vcpu_load(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+	kvm_timer_vcpu_put(vcpu);
 	kvm_vgic_put(vcpu);
 
 	vcpu->cpu = -1;
 
 	kvm_arm_set_running_vcpu(NULL);
-	kvm_timer_vcpu_put(vcpu);
 }
 
 static void vcpu_power_off(struct kvm_vcpu *vcpu)
@@ -710,16 +710,27 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_arm_clear_debug(vcpu);
 
 		/*
-		 * We must sync the PMU and timer state before the vgic state so
+		 * We must sync the PMU state before the vgic state so
 		 * that the vgic can properly sample the updated state of the
 		 * interrupt line.
 		 */
 		kvm_pmu_sync_hwstate(vcpu);
-		kvm_timer_sync_hwstate(vcpu);
 
+		/*
+		 * Sync the vgic state before syncing the timer state because
+		 * the timer code needs to know if the virtual timer
+		 * interrupts are active.
+		 */
 		kvm_vgic_sync_hwstate(vcpu);
 
 		/*
+		 * Sync the timer hardware state before enabling interrupts as
+		 * we don't want vtimer interrupts to race with syncing the
+		 * timer virtual interrupt state.
+		 */
+		kvm_timer_sync_hwstate(vcpu);
+
+		/*
 		 * We may have taken a host interrupt in HYP mode (ie
 		 * while executing the guest). This interrupt is still
 		 * pending, as we haven't serviced it yet!
diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
index a6c3b10..f398616 100644
--- a/virt/kvm/arm/hyp/timer-sr.c
+++ b/virt/kvm/arm/hyp/timer-sr.c
@@ -27,7 +27,7 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
 	write_sysreg(cntvoff, cntvoff_el2);
 }
 
-void __hyp_text enable_phys_timer(void)
+void __hyp_text enable_el1_phys_timer_access(void)
 {
 	u64 val;
 
@@ -37,7 +37,7 @@ void __hyp_text enable_phys_timer(void)
 	write_sysreg(val, cnthctl_el2);
 }
 
-void __hyp_text disable_phys_timer(void)
+void __hyp_text disable_el1_phys_timer_access(void)
 {
 	u64 val;
 
@@ -58,11 +58,11 @@ void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
 	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
 	 */
 	if (!has_vhe())
-		enable_phys_timer();
+		enable_el1_phys_timer_access();
 }
 
 void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
 	if (!has_vhe())
-		disable_phys_timer();
+		disable_el1_phys_timer_access();
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:42   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

Add suport for the physical timer registers in kvm_arm_timer_set_reg and
kvm_arm_timer_get_reg so that these functions can be reused to interact
with the rest of the system.

Note that this paves part of the way for the physical timer state
save/restore, but we still need to add those registers to
KVM_GET_REG_LIST before we support migrating the physical timer state.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm/include/uapi/asm/kvm.h   |  6 ++++++
 arch/arm64/include/uapi/asm/kvm.h |  6 ++++++
 virt/kvm/arm/arch_timer.c         | 33 +++++++++++++++++++++++++++++++--
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 5db2d4c..665c454 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -151,6 +151,12 @@ struct kvm_arch_memory_slot {
 	(__ARM_CP15_REG(op1, 0, crm, 0) | KVM_REG_SIZE_U64)
 #define ARM_CP15_REG64(...) __ARM_CP15_REG64(__VA_ARGS__)
 
+/* PL1 Physical Timer Registers */
+#define KVM_REG_ARM_PTIMER_CTL		ARM_CP15_REG32(0, 14, 2, 1)
+#define KVM_REG_ARM_PTIMER_CNT		ARM_CP15_REG64(0, 14)
+#define KVM_REG_ARM_PTIMER_CVAL		ARM_CP15_REG64(2, 14)
+
+/* Virtual Timer Registers */
 #define KVM_REG_ARM_TIMER_CTL		ARM_CP15_REG32(0, 14, 3, 1)
 #define KVM_REG_ARM_TIMER_CNT		ARM_CP15_REG64(1, 14)
 #define KVM_REG_ARM_TIMER_CVAL		ARM_CP15_REG64(3, 14)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9f3ca24..07be6e2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -195,6 +195,12 @@ struct kvm_arch_memory_slot {
 
 #define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_SIZE_U64)
 
+/* EL1 Physical Timer Registers */
+#define KVM_REG_ARM_PTIMER_CTL		ARM64_SYS_REG(3, 3, 14, 2, 1)
+#define KVM_REG_ARM_PTIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 2, 2)
+#define KVM_REG_ARM_PTIMER_CNT		ARM64_SYS_REG(3, 3, 14, 0, 1)
+
+/* EL0 Virtual Timer Registers */
 #define KVM_REG_ARM_TIMER_CTL		ARM64_SYS_REG(3, 3, 14, 3, 1)
 #define KVM_REG_ARM_TIMER_CNT		ARM64_SYS_REG(3, 3, 14, 3, 2)
 #define KVM_REG_ARM_TIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 0, 2)
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 70110ea..d5b632d 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -626,10 +626,11 @@ static void kvm_timer_init_interrupt(void *info)
 int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 {
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
 	switch (regid) {
 	case KVM_REG_ARM_TIMER_CTL:
-		vtimer->cnt_ctl = value;
+		vtimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
 		break;
 	case KVM_REG_ARM_TIMER_CNT:
 		update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
@@ -637,6 +638,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 	case KVM_REG_ARM_TIMER_CVAL:
 		vtimer->cnt_cval = value;
 		break;
+	case KVM_REG_ARM_PTIMER_CTL:
+		ptimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
+		break;
+	case KVM_REG_ARM_PTIMER_CVAL:
+		ptimer->cnt_cval = value;
+		break;
+
 	default:
 		return -1;
 	}
@@ -645,17 +653,38 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 	return 0;
 }
 
+static u64 read_timer_ctl(struct arch_timer_context *timer)
+{
+	/*
+	 * Set ISTATUS bit if it's expired.
+	 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
+	 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
+	 * regardless of ENABLE bit for our implementation convenience.
+	 */
+	if (!kvm_timer_compute_delta(timer))
+		return timer->cnt_ctl | ARCH_TIMER_CTRL_IT_STAT;
+	else
+		return timer->cnt_ctl;
+}
+
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 regid)
 {
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	switch (regid) {
 	case KVM_REG_ARM_TIMER_CTL:
-		return vtimer->cnt_ctl;
+		return read_timer_ctl(vtimer);
 	case KVM_REG_ARM_TIMER_CNT:
 		return kvm_phys_timer_read() - vtimer->cntvoff;
 	case KVM_REG_ARM_TIMER_CVAL:
 		return vtimer->cnt_cval;
+	case KVM_REG_ARM_PTIMER_CTL:
+		return read_timer_ctl(ptimer);
+	case KVM_REG_ARM_PTIMER_CVAL:
+		return ptimer->cnt_cval;
+	case KVM_REG_ARM_PTIMER_CNT:
+		return kvm_phys_timer_read();
 	}
 	return (u64)-1;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg
@ 2017-09-23  0:42   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

Add suport for the physical timer registers in kvm_arm_timer_set_reg and
kvm_arm_timer_get_reg so that these functions can be reused to interact
with the rest of the system.

Note that this paves part of the way for the physical timer state
save/restore, but we still need to add those registers to
KVM_GET_REG_LIST before we support migrating the physical timer state.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm/include/uapi/asm/kvm.h   |  6 ++++++
 arch/arm64/include/uapi/asm/kvm.h |  6 ++++++
 virt/kvm/arm/arch_timer.c         | 33 +++++++++++++++++++++++++++++++--
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 5db2d4c..665c454 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -151,6 +151,12 @@ struct kvm_arch_memory_slot {
 	(__ARM_CP15_REG(op1, 0, crm, 0) | KVM_REG_SIZE_U64)
 #define ARM_CP15_REG64(...) __ARM_CP15_REG64(__VA_ARGS__)
 
+/* PL1 Physical Timer Registers */
+#define KVM_REG_ARM_PTIMER_CTL		ARM_CP15_REG32(0, 14, 2, 1)
+#define KVM_REG_ARM_PTIMER_CNT		ARM_CP15_REG64(0, 14)
+#define KVM_REG_ARM_PTIMER_CVAL		ARM_CP15_REG64(2, 14)
+
+/* Virtual Timer Registers */
 #define KVM_REG_ARM_TIMER_CTL		ARM_CP15_REG32(0, 14, 3, 1)
 #define KVM_REG_ARM_TIMER_CNT		ARM_CP15_REG64(1, 14)
 #define KVM_REG_ARM_TIMER_CVAL		ARM_CP15_REG64(3, 14)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9f3ca24..07be6e2 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -195,6 +195,12 @@ struct kvm_arch_memory_slot {
 
 #define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_SIZE_U64)
 
+/* EL1 Physical Timer Registers */
+#define KVM_REG_ARM_PTIMER_CTL		ARM64_SYS_REG(3, 3, 14, 2, 1)
+#define KVM_REG_ARM_PTIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 2, 2)
+#define KVM_REG_ARM_PTIMER_CNT		ARM64_SYS_REG(3, 3, 14, 0, 1)
+
+/* EL0 Virtual Timer Registers */
 #define KVM_REG_ARM_TIMER_CTL		ARM64_SYS_REG(3, 3, 14, 3, 1)
 #define KVM_REG_ARM_TIMER_CNT		ARM64_SYS_REG(3, 3, 14, 3, 2)
 #define KVM_REG_ARM_TIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 0, 2)
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 70110ea..d5b632d 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -626,10 +626,11 @@ static void kvm_timer_init_interrupt(void *info)
 int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 {
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
 	switch (regid) {
 	case KVM_REG_ARM_TIMER_CTL:
-		vtimer->cnt_ctl = value;
+		vtimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
 		break;
 	case KVM_REG_ARM_TIMER_CNT:
 		update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
@@ -637,6 +638,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 	case KVM_REG_ARM_TIMER_CVAL:
 		vtimer->cnt_cval = value;
 		break;
+	case KVM_REG_ARM_PTIMER_CTL:
+		ptimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
+		break;
+	case KVM_REG_ARM_PTIMER_CVAL:
+		ptimer->cnt_cval = value;
+		break;
+
 	default:
 		return -1;
 	}
@@ -645,17 +653,38 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 	return 0;
 }
 
+static u64 read_timer_ctl(struct arch_timer_context *timer)
+{
+	/*
+	 * Set ISTATUS bit if it's expired.
+	 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
+	 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
+	 * regardless of ENABLE bit for our implementation convenience.
+	 */
+	if (!kvm_timer_compute_delta(timer))
+		return timer->cnt_ctl | ARCH_TIMER_CTRL_IT_STAT;
+	else
+		return timer->cnt_ctl;
+}
+
 u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 regid)
 {
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	switch (regid) {
 	case KVM_REG_ARM_TIMER_CTL:
-		return vtimer->cnt_ctl;
+		return read_timer_ctl(vtimer);
 	case KVM_REG_ARM_TIMER_CNT:
 		return kvm_phys_timer_read() - vtimer->cntvoff;
 	case KVM_REG_ARM_TIMER_CVAL:
 		return vtimer->cnt_cval;
+	case KVM_REG_ARM_PTIMER_CTL:
+		return read_timer_ctl(ptimer);
+	case KVM_REG_ARM_PTIMER_CVAL:
+		return ptimer->cnt_cval;
+	case KVM_REG_ARM_PTIMER_CNT:
+		return kvm_phys_timer_read();
 	}
 	return (u64)-1;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 16/20] KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:42   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

When trapping on a guest access to one of the timer registers, we were
messing with the internals of the timer state from the sysregs handling
code, and that logic was about to receive more added complexity when
optimizing the timer handling code.

Therefore, since we already have timer register access functions (to
access registers from userspace), reuse those for the timer register
traps from a VM and let the timer code maintain its own consistency.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm64/kvm/sys_regs.c | 41 ++++++++++++++---------------------------
 1 file changed, 14 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3..bb0e41b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -841,13 +841,16 @@ static bool access_cntp_tval(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	u64 now = kvm_phys_timer_read();
+	u64 cval;
 
-	if (p->is_write)
-		ptimer->cnt_cval = p->regval + now;
-	else
-		p->regval = ptimer->cnt_cval - now;
+	if (p->is_write) {
+		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL,
+				      p->regval + now);
+	} else {
+		cval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
+		p->regval = cval - now;
+	}
 
 	return true;
 }
@@ -856,24 +859,10 @@ static bool access_cntp_ctl(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
-
-	if (p->is_write) {
-		/* ISTATUS bit is read-only */
-		ptimer->cnt_ctl = p->regval & ~ARCH_TIMER_CTRL_IT_STAT;
-	} else {
-		u64 now = kvm_phys_timer_read();
-
-		p->regval = ptimer->cnt_ctl;
-		/*
-		 * Set ISTATUS bit if it's expired.
-		 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
-		 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
-		 * regardless of ENABLE bit for our implementation convenience.
-		 */
-		if (ptimer->cnt_cval <= now)
-			p->regval |= ARCH_TIMER_CTRL_IT_STAT;
-	}
+	if (p->is_write)
+		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CTL, p->regval);
+	else
+		p->regval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CTL);
 
 	return true;
 }
@@ -882,12 +871,10 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
-
 	if (p->is_write)
-		ptimer->cnt_cval = p->regval;
+		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL, p->regval);
 	else
-		p->regval = ptimer->cnt_cval;
+		p->regval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
 
 	return true;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 16/20] KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
@ 2017-09-23  0:42   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

When trapping on a guest access to one of the timer registers, we were
messing with the internals of the timer state from the sysregs handling
code, and that logic was about to receive more added complexity when
optimizing the timer handling code.

Therefore, since we already have timer register access functions (to
access registers from userspace), reuse those for the timer register
traps from a VM and let the timer code maintain its own consistency.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm64/kvm/sys_regs.c | 41 ++++++++++++++---------------------------
 1 file changed, 14 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3..bb0e41b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -841,13 +841,16 @@ static bool access_cntp_tval(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	u64 now = kvm_phys_timer_read();
+	u64 cval;
 
-	if (p->is_write)
-		ptimer->cnt_cval = p->regval + now;
-	else
-		p->regval = ptimer->cnt_cval - now;
+	if (p->is_write) {
+		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL,
+				      p->regval + now);
+	} else {
+		cval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
+		p->regval = cval - now;
+	}
 
 	return true;
 }
@@ -856,24 +859,10 @@ static bool access_cntp_ctl(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
-
-	if (p->is_write) {
-		/* ISTATUS bit is read-only */
-		ptimer->cnt_ctl = p->regval & ~ARCH_TIMER_CTRL_IT_STAT;
-	} else {
-		u64 now = kvm_phys_timer_read();
-
-		p->regval = ptimer->cnt_ctl;
-		/*
-		 * Set ISTATUS bit if it's expired.
-		 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
-		 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
-		 * regardless of ENABLE bit for our implementation convenience.
-		 */
-		if (ptimer->cnt_cval <= now)
-			p->regval |= ARCH_TIMER_CTRL_IT_STAT;
-	}
+	if (p->is_write)
+		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CTL, p->regval);
+	else
+		p->regval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CTL);
 
 	return true;
 }
@@ -882,12 +871,10 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 		struct sys_reg_params *p,
 		const struct sys_reg_desc *r)
 {
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
-
 	if (p->is_write)
-		ptimer->cnt_cval = p->regval;
+		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL, p->regval);
 	else
-		p->regval = ptimer->cnt_cval;
+		p->regval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
 
 	return true;
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 17/20] KVM: arm/arm64: Move phys_timer_emulate function
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:42   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

We are about to call phys_timer_emulate() from kvm_timer_update_state()
and modify phys_timer_emulate() at the same time.  Moving the function
and modifying it in a single patch makes the diff hard to read, so do
this separately first.

No functional change.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index d5b632d..1f82c21 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -252,6 +252,22 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 	}
 }
 
+/* Schedule the background timer for the emulated timer. */
+static void phys_timer_emulate(struct kvm_vcpu *vcpu,
+			      struct arch_timer_context *timer_ctx)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	if (kvm_timer_should_fire(timer_ctx))
+		return;
+
+	if (!kvm_timer_irq_can_fire(timer_ctx))
+		return;
+
+	/*  The timer has not yet expired, schedule a background timer */
+	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
+}
+
 /*
  * Check if there was a change in the timer state (should we raise or lower
  * the line level to the GIC).
@@ -278,22 +294,6 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
 }
 
-/* Schedule the background timer for the emulated timer. */
-static void phys_timer_emulate(struct kvm_vcpu *vcpu,
-			      struct arch_timer_context *timer_ctx)
-{
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-
-	if (kvm_timer_should_fire(timer_ctx))
-		return;
-
-	if (!kvm_timer_irq_can_fire(timer_ctx))
-		return;
-
-	/*  The timer has not yet expired, schedule a background timer */
-	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
-}
-
 static void vtimer_save_state(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 17/20] KVM: arm/arm64: Move phys_timer_emulate function
@ 2017-09-23  0:42   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

We are about to call phys_timer_emulate() from kvm_timer_update_state()
and modify phys_timer_emulate() at the same time.  Moving the function
and modifying it in a single patch makes the diff hard to read, so do
this separately first.

No functional change.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index d5b632d..1f82c21 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -252,6 +252,22 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 	}
 }
 
+/* Schedule the background timer for the emulated timer. */
+static void phys_timer_emulate(struct kvm_vcpu *vcpu,
+			      struct arch_timer_context *timer_ctx)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	if (kvm_timer_should_fire(timer_ctx))
+		return;
+
+	if (!kvm_timer_irq_can_fire(timer_ctx))
+		return;
+
+	/*  The timer has not yet expired, schedule a background timer */
+	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
+}
+
 /*
  * Check if there was a change in the timer state (should we raise or lower
  * the line level to the GIC).
@@ -278,22 +294,6 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
 }
 
-/* Schedule the background timer for the emulated timer. */
-static void phys_timer_emulate(struct kvm_vcpu *vcpu,
-			      struct arch_timer_context *timer_ctx)
-{
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-
-	if (kvm_timer_should_fire(timer_ctx))
-		return;
-
-	if (!kvm_timer_irq_can_fire(timer_ctx))
-		return;
-
-	/*  The timer has not yet expired, schedule a background timer */
-	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
-}
-
 static void vtimer_save_state(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:42   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

There is no need to schedule and cancel a hrtimer when entering and
exiting the guest, because we know when the physical timer is going to
fire when the guest programs it, and we can simply program the hrtimer
at that point.

Now when the register modifications from the guest go through the
kvm_arm_timer_set/get_reg functions, which always call
kvm_timer_update_state(), we can simply consider the timer state in this
function and schedule and cancel the timers as needed.

This avoids looking at the physical timer emulation state when entering
and exiting the VCPU, allowing for faster servicing of the VM when
needed.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 75 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 1f82c21..aa18a5d 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -199,7 +199,27 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
 
 static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
 {
-	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
+	struct arch_timer_context *ptimer;
+	struct arch_timer_cpu *timer;
+	struct kvm_vcpu *vcpu;
+	u64 ns;
+
+	timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
+	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
+	ptimer = vcpu_ptimer(vcpu);
+
+	/*
+	 * Check that the timer has really expired from the guest's
+	 * PoV (NTP on the host may have forced it to expire
+	 * early). If not ready, schedule for a later time.
+	 */
+	ns = kvm_timer_compute_delta(ptimer);
+	if (unlikely(ns)) {
+		hrtimer_forward_now(hrt, ns_to_ktime(ns));
+		return HRTIMER_RESTART;
+	}
+
+	kvm_timer_update_irq(vcpu, true, ptimer);
 	return HRTIMER_NORESTART;
 }
 
@@ -253,24 +273,28 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 }
 
 /* Schedule the background timer for the emulated timer. */
-static void phys_timer_emulate(struct kvm_vcpu *vcpu,
-			      struct arch_timer_context *timer_ctx)
+static void phys_timer_emulate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	if (kvm_timer_should_fire(timer_ctx))
-		return;
-
-	if (!kvm_timer_irq_can_fire(timer_ctx))
+	/*
+	 * If the timer can fire now we have just raised the IRQ line and we
+	 * don't need to have a soft timer scheduled for the future.  If the
+	 * timer cannot fire at all, then we also don't need a soft timer.
+	 */
+	if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
+		soft_timer_cancel(&timer->phys_timer, NULL);
 		return;
+	}
 
-	/*  The timer has not yet expired, schedule a background timer */
-	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
+	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(ptimer));
 }
 
 /*
- * Check if there was a change in the timer state (should we raise or lower
- * the line level to the GIC).
+ * Check if there was a change in the timer state, so that we should either
+ * raise or lower the line level to the GIC or schedule a background timer to
+ * emulate the physical timer.
  */
 static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 {
@@ -292,6 +316,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 
 	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
 		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
+
+	phys_timer_emulate(vcpu);
 }
 
 static void vtimer_save_state(struct kvm_vcpu *vcpu)
@@ -445,6 +471,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 
 	if (has_vhe())
 		disable_el1_phys_timer_access();
+
+	/* Set the background timer for the physical timer emulation. */
+	phys_timer_emulate(vcpu);
 }
 
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
@@ -480,12 +509,6 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 
 	if (unlikely(!timer->enabled))
 		return;
-
-	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
-		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
-
-	/* Set the background timer for the physical timer emulation. */
-	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
 }
 
 void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
@@ -500,6 +523,17 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 
 	vtimer_save_state(vcpu);
 
+	/*
+	 * Cancel the physical timer emulation, because the only case where we
+	 * need it after a vcpu_put is in the context of a sleeping VCPU, and
+	 * in that case we already factor in the deadline for the physical
+	 * timer when scheduling the bg_timer.
+	 *
+	 * In any case, we re-schedule the hrtimer for the physical timer when
+	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
+	 */
+	soft_timer_cancel(&timer->phys_timer, NULL);
+
 	set_cntvoff(0);
 }
 
@@ -536,16 +570,9 @@ static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
  */
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	/*
-	 * This is to cancel the background timer for the physical timer
-	 * emulation if it is set.
-	 */
-	soft_timer_cancel(&timer->phys_timer, NULL);
-
-	/*
 	 * If we entered the guest with the vtimer output asserted we have to
 	 * check if the guest has modified the timer so that we should lower
 	 * the line at this point.
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
@ 2017-09-23  0:42   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

There is no need to schedule and cancel a hrtimer when entering and
exiting the guest, because we know when the physical timer is going to
fire when the guest programs it, and we can simply program the hrtimer
at that point.

Now when the register modifications from the guest go through the
kvm_arm_timer_set/get_reg functions, which always call
kvm_timer_update_state(), we can simply consider the timer state in this
function and schedule and cancel the timers as needed.

This avoids looking at the physical timer emulation state when entering
and exiting the VCPU, allowing for faster servicing of the VM when
needed.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 75 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 1f82c21..aa18a5d 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -199,7 +199,27 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
 
 static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
 {
-	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
+	struct arch_timer_context *ptimer;
+	struct arch_timer_cpu *timer;
+	struct kvm_vcpu *vcpu;
+	u64 ns;
+
+	timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
+	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
+	ptimer = vcpu_ptimer(vcpu);
+
+	/*
+	 * Check that the timer has really expired from the guest's
+	 * PoV (NTP on the host may have forced it to expire
+	 * early). If not ready, schedule for a later time.
+	 */
+	ns = kvm_timer_compute_delta(ptimer);
+	if (unlikely(ns)) {
+		hrtimer_forward_now(hrt, ns_to_ktime(ns));
+		return HRTIMER_RESTART;
+	}
+
+	kvm_timer_update_irq(vcpu, true, ptimer);
 	return HRTIMER_NORESTART;
 }
 
@@ -253,24 +273,28 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 }
 
 /* Schedule the background timer for the emulated timer. */
-static void phys_timer_emulate(struct kvm_vcpu *vcpu,
-			      struct arch_timer_context *timer_ctx)
+static void phys_timer_emulate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	if (kvm_timer_should_fire(timer_ctx))
-		return;
-
-	if (!kvm_timer_irq_can_fire(timer_ctx))
+	/*
+	 * If the timer can fire now we have just raised the IRQ line and we
+	 * don't need to have a soft timer scheduled for the future.  If the
+	 * timer cannot fire at all, then we also don't need a soft timer.
+	 */
+	if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
+		soft_timer_cancel(&timer->phys_timer, NULL);
 		return;
+	}
 
-	/*  The timer has not yet expired, schedule a background timer */
-	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
+	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(ptimer));
 }
 
 /*
- * Check if there was a change in the timer state (should we raise or lower
- * the line level to the GIC).
+ * Check if there was a change in the timer state, so that we should either
+ * raise or lower the line level to the GIC or schedule a background timer to
+ * emulate the physical timer.
  */
 static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 {
@@ -292,6 +316,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 
 	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
 		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
+
+	phys_timer_emulate(vcpu);
 }
 
 static void vtimer_save_state(struct kvm_vcpu *vcpu)
@@ -445,6 +471,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 
 	if (has_vhe())
 		disable_el1_phys_timer_access();
+
+	/* Set the background timer for the physical timer emulation. */
+	phys_timer_emulate(vcpu);
 }
 
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
@@ -480,12 +509,6 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 
 	if (unlikely(!timer->enabled))
 		return;
-
-	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
-		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
-
-	/* Set the background timer for the physical timer emulation. */
-	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
 }
 
 void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
@@ -500,6 +523,17 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 
 	vtimer_save_state(vcpu);
 
+	/*
+	 * Cancel the physical timer emulation, because the only case where we
+	 * need it after a vcpu_put is in the context of a sleeping VCPU, and
+	 * in that case we already factor in the deadline for the physical
+	 * timer when scheduling the bg_timer.
+	 *
+	 * In any case, we re-schedule the hrtimer for the physical timer when
+	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
+	 */
+	soft_timer_cancel(&timer->phys_timer, NULL);
+
 	set_cntvoff(0);
 }
 
@@ -536,16 +570,9 @@ static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
  */
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 
 	/*
-	 * This is to cancel the background timer for the physical timer
-	 * emulation if it is set.
-	 */
-	soft_timer_cancel(&timer->phys_timer, NULL);
-
-	/*
 	 * If we entered the guest with the vtimer output asserted we have to
 	 * check if the guest has modified the timer so that we should lower
 	 * the line at this point.
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 19/20] KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:42   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

Now when both the vtimer and the ptimer when using both the in-kernel
vgic emulation and a userspace IRQ chip are driven by the timer signals
and at the vcpu load/put boundaries, instead of recomputing the timer
state at every entry/exit to/from the guest, we can get entirely rid of
the flush hwstate function.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |  1 -
 virt/kvm/arm/arch_timer.c    | 24 ------------------------
 virt/kvm/arm/arm.c           |  1 -
 3 files changed, 26 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 8e5ed54..af29563 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -61,7 +61,6 @@ int kvm_timer_hyp_init(void);
 int kvm_timer_enable(struct kvm_vcpu *vcpu);
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
-void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_timer_update_run(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index aa18a5d..f92459a 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -302,12 +302,6 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	/*
-	 * If userspace modified the timer registers via SET_ONE_REG before
-	 * the vgic was initialized, we mustn't set the vtimer->irq.level value
-	 * because the guest would never see the interrupt.  Instead wait
-	 * until we call this function from kvm_timer_flush_hwstate.
-	 */
 	if (unlikely(!timer->enabled))
 		return;
 
@@ -493,24 +487,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
 	       ptimer->irq.level != plevel;
 }
 
-/**
- * kvm_timer_flush_hwstate - prepare timers before running the vcpu
- * @vcpu: The vcpu pointer
- *
- * Check if the virtual timer has expired while we were running in the host,
- * and inject an interrupt if that was the case, making sure the timer is
- * masked or disabled on the host so that we keep executing.  Also schedule a
- * software timer for the physical timer if it is enabled.
- */
-void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
-{
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
-
-	if (unlikely(!timer->enabled))
-		return;
-}
-
 void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 132d39a..14c50d1 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -656,7 +656,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		local_irq_disable();
 
-		kvm_timer_flush_hwstate(vcpu);
 		kvm_vgic_flush_hwstate(vcpu);
 
 		/*
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 19/20] KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
@ 2017-09-23  0:42   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

Now when both the vtimer and the ptimer when using both the in-kernel
vgic emulation and a userspace IRQ chip are driven by the timer signals
and at the vcpu load/put boundaries, instead of recomputing the timer
state at every entry/exit to/from the guest, we can get entirely rid of
the flush hwstate function.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |  1 -
 virt/kvm/arm/arch_timer.c    | 24 ------------------------
 virt/kvm/arm/arm.c           |  1 -
 3 files changed, 26 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 8e5ed54..af29563 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -61,7 +61,6 @@ int kvm_timer_hyp_init(void);
 int kvm_timer_enable(struct kvm_vcpu *vcpu);
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
-void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
 void kvm_timer_update_run(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index aa18a5d..f92459a 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -302,12 +302,6 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	/*
-	 * If userspace modified the timer registers via SET_ONE_REG before
-	 * the vgic was initialized, we mustn't set the vtimer->irq.level value
-	 * because the guest would never see the interrupt.  Instead wait
-	 * until we call this function from kvm_timer_flush_hwstate.
-	 */
 	if (unlikely(!timer->enabled))
 		return;
 
@@ -493,24 +487,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
 	       ptimer->irq.level != plevel;
 }
 
-/**
- * kvm_timer_flush_hwstate - prepare timers before running the vcpu
- * @vcpu: The vcpu pointer
- *
- * Check if the virtual timer has expired while we were running in the host,
- * and inject an interrupt if that was the case, making sure the timer is
- * masked or disabled on the host so that we keep executing.  Also schedule a
- * software timer for the physical timer if it is enabled.
- */
-void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
-{
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
-
-	if (unlikely(!timer->enabled))
-		return;
-}
-
 void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 132d39a..14c50d1 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -656,7 +656,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		local_irq_disable();
 
-		kvm_timer_flush_hwstate(vcpu);
 		kvm_vgic_flush_hwstate(vcpu);
 
 		/*
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 20/20] KVM: arm/arm64: Rework kvm_timer_should_fire
  2017-09-23  0:41 ` Christoffer Dall
@ 2017-09-23  0:42   ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, kvm, Christoffer Dall

kvm_timer_should_fire() can be called in two different situations from
the kvm_vcpu_block().

The first case is before calling kvm_timer_schedule(), used for wait
polling, and in this case the VCPU thread is running and the timer state
is loaded onto the hardware so all we have to do is check if the virtual
interrupt lines are asserted, becasue the timer interrupt handler
functions will raise those lines as appropriate.

The second case is inside the wait loop of kvm_vcpu_block(), where we
have already called kvm_timer_schedule() and therefore the hardware will
be disabled and the software view of the timer state is up to date
(timer->loaded is false), and so we can simply check if the timer should
fire by looking at the software state.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |  3 ++-
 virt/kvm/arm/arch_timer.c    | 22 +++++++++++++++++++++-
 virt/kvm/arm/arm.c           |  3 +--
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index af29563..250db34 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -73,7 +73,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 
-bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
+bool kvm_timer_is_pending(struct kvm_vcpu *vcpu);
+
 void kvm_timer_schedule(struct kvm_vcpu *vcpu);
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index f92459a..1d0cd3a 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -49,6 +49,7 @@ static const struct kvm_irq_level default_vtimer_irq = {
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
 static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 				 struct arch_timer_context *timer_ctx);
+static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
 
 u64 kvm_phys_timer_read(void)
 {
@@ -223,7 +224,7 @@ static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
 	return HRTIMER_NORESTART;
 }
 
-bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
+static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 {
 	u64 cval, now;
 
@@ -236,6 +237,25 @@ bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 	return cval <= now;
 }
 
+bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
+
+	if (vtimer->irq.level || ptimer->irq.level)
+		return true;
+
+	/*
+	 * When this is called from withing the wait loop of kvm_vcpu_block(),
+	 * the software view of the timer state is up to date (timer->loaded
+	 * is false), and so we can simply check if the timer should fire now.
+	 */
+	if (!vtimer->loaded && kvm_timer_should_fire(vtimer))
+		return true;
+
+	return kvm_timer_should_fire(ptimer);
+}
+
 /*
  * Reflect the timer output level into the kvm_run structure
  */
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 14c50d1..bc126fb 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -307,8 +307,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 {
-	return kvm_timer_should_fire(vcpu_vtimer(vcpu)) ||
-	       kvm_timer_should_fire(vcpu_ptimer(vcpu));
+	return kvm_timer_is_pending(vcpu);
 }
 
 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 20/20] KVM: arm/arm64: Rework kvm_timer_should_fire
@ 2017-09-23  0:42   ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-09-23  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

kvm_timer_should_fire() can be called in two different situations from
the kvm_vcpu_block().

The first case is before calling kvm_timer_schedule(), used for wait
polling, and in this case the VCPU thread is running and the timer state
is loaded onto the hardware so all we have to do is check if the virtual
interrupt lines are asserted, becasue the timer interrupt handler
functions will raise those lines as appropriate.

The second case is inside the wait loop of kvm_vcpu_block(), where we
have already called kvm_timer_schedule() and therefore the hardware will
be disabled and the software view of the timer state is up to date
(timer->loaded is false), and so we can simply check if the timer should
fire by looking at the software state.

Signed-off-by: Christoffer Dall <cdall@linaro.org>
---
 include/kvm/arm_arch_timer.h |  3 ++-
 virt/kvm/arm/arch_timer.c    | 22 +++++++++++++++++++++-
 virt/kvm/arm/arm.c           |  3 +--
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index af29563..250db34 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -73,7 +73,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
 
-bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
+bool kvm_timer_is_pending(struct kvm_vcpu *vcpu);
+
 void kvm_timer_schedule(struct kvm_vcpu *vcpu);
 void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index f92459a..1d0cd3a 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -49,6 +49,7 @@ static const struct kvm_irq_level default_vtimer_irq = {
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
 static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
 				 struct arch_timer_context *timer_ctx);
+static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
 
 u64 kvm_phys_timer_read(void)
 {
@@ -223,7 +224,7 @@ static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
 	return HRTIMER_NORESTART;
 }
 
-bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
+static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 {
 	u64 cval, now;
 
@@ -236,6 +237,25 @@ bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 	return cval <= now;
 }
 
+bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
+
+	if (vtimer->irq.level || ptimer->irq.level)
+		return true;
+
+	/*
+	 * When this is called from withing the wait loop of kvm_vcpu_block(),
+	 * the software view of the timer state is up to date (timer->loaded
+	 * is false), and so we can simply check if the timer should fire now.
+	 */
+	if (!vtimer->loaded && kvm_timer_should_fire(vtimer))
+		return true;
+
+	return kvm_timer_should_fire(ptimer);
+}
+
 /*
  * Reflect the timer output level into the kvm_run structure
  */
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 14c50d1..bc126fb 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -307,8 +307,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 {
-	return kvm_timer_should_fire(vcpu_vtimer(vcpu)) ||
-	       kvm_timer_should_fire(vcpu_ptimer(vcpu));
+	return kvm_timer_is_pending(vcpu);
 }
 
 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 16:10     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:10 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas

On 23/09/17 01:41, Christoffer Dall wrote:
> Using the physical counter allows KVM to retain the offset between the
> virtual and physical counter as long as it is actively running a VCPU.
> 
> As soon as a VCPU is released, another thread is scheduled or we start
> running userspace applications, we reset the offset to 0, so that
> userspace accessing the virtual timer can still read the cirtual counter

s/cirtual/virtual/

> and get the same view of time as the kernel.
> 
> This opens up potential improvements for KVM performance.
> 
> VHE kernels or kernels continuing to use the virtual timer are
> unaffected.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm64/include/asm/arch_timer.h  | 9 ++++-----
>  drivers/clocksource/arm_arch_timer.c | 3 +--
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index a652ce0..1859a1c 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  
>  static inline u64 arch_counter_get_cntpct(void)
>  {
> -	/*
> -	 * AArch64 kernel and user space mandate the use of CNTVCT.
> -	 */
> -	BUG();
> -	return 0;
> +	u64 cval;
> +	isb();
> +	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> +	return cval;

This would be just fine if we were blessed with quality HW. This is 
unfortunately not the case, and we need a staggering amount of crap to 
deal with timer errata.

I suggest you replace this with the (fully untested) following:

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index a652ce0a5cb2..04275de614db 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -52,6 +52,7 @@ struct arch_timer_erratum_workaround {
 	const char *desc;
 	u32 (*read_cntp_tval_el0)(void);
 	u32 (*read_cntv_tval_el0)(void);
+	u64 (*read_cntpct_el0)(void);
 	u64 (*read_cntvct_el0)(void);
 	int (*set_next_event_phys)(unsigned long, struct clock_event_device *);
 	int (*set_next_event_virt)(unsigned long, struct clock_event_device *);
@@ -148,11 +149,8 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
 
 static inline u64 arch_counter_get_cntpct(void)
 {
-	/*
-	 * AArch64 kernel and user space mandate the use of CNTVCT.
-	 */
-	BUG();
-	return 0;
+	isb();
+	return arch_timer_reg_read_stable(cntpct_el0);
 }
 
 static inline u64 arch_counter_get_cntvct(void)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index fd4b7f684bd0..5b41a96fa8dd 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -217,6 +217,11 @@ static u32 notrace fsl_a008585_read_cntv_tval_el0(void)
 	return __fsl_a008585_read_reg(cntv_tval_el0);
 }
 
+static u64 notrace fsl_a008585_read_cntpct_el0(void)
+{
+	return __fsl_a008585_read_reg(cntpct_el0);
+}
+
 static u64 notrace fsl_a008585_read_cntvct_el0(void)
 {
 	return __fsl_a008585_read_reg(cntvct_el0);
@@ -258,6 +263,11 @@ static u32 notrace hisi_161010101_read_cntv_tval_el0(void)
 	return __hisi_161010101_read_reg(cntv_tval_el0);
 }
 
+static u64 notrace hisi_161010101_read_cntpct_el0(void)
+{
+	return __hisi_161010101_read_reg(cntpct_el0);
+}
+
 static u64 notrace hisi_161010101_read_cntvct_el0(void)
 {
 	return __hisi_161010101_read_reg(cntvct_el0);
@@ -296,6 +306,15 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
 	new = read_sysreg(cntvct_el0);
 	return (((old ^ new) >> 32) & 1) ? old : new;
 }
+
+static u64 notrace arm64_858921_read_cntpct_el0(void)
+{
+	u64 old, new;
+
+	old = read_sysreg(cntpct_el0);
+	new = read_sysreg(cntpct_el0);
+	return (((old ^ new) >> 32) & 1) ? old : new;
+}
 #endif
 
 #ifdef CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND
@@ -310,7 +329,7 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
 						struct clock_event_device *clk)
 {
 	unsigned long ctrl;
-	u64 cval = evt + arch_counter_get_cntvct();
+	u64 cval = evt + arch_timer_read_counter();
 
 	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
 	ctrl |= ARCH_TIMER_CTRL_ENABLE;
@@ -346,6 +365,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "Freescale erratum a005858",
 		.read_cntp_tval_el0 = fsl_a008585_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = fsl_a008585_read_cntv_tval_el0,
+		.read_cntpct_el0 = fsl_a008585_read_cntpct_el0,
 		.read_cntvct_el0 = fsl_a008585_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -358,6 +378,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "HiSilicon erratum 161010101",
 		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
+		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
 		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -368,6 +389,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "HiSilicon erratum 161010101",
 		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
+		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
 		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -378,6 +400,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.match_type = ate_match_local_cap_id,
 		.id = (void *)ARM64_WORKAROUND_858921,
 		.desc = "ARM erratum 858921",
+		.read_cntpct_el0 = arm64_858921_read_cntpct_el0,
 		.read_cntvct_el0 = arm64_858921_read_cntvct_el0,
 	},
 #endif

which ensures (at least in theory) that we do the right thing on our buggy
hardware...

>  }
>  
>  static inline u64 arch_counter_get_cntvct(void)
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index fd4b7f6..9b3322a 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
>  
>  	/* Register the CP15 based counter if we have one */
>  	if (type & ARCH_TIMER_TYPE_CP15) {
> -		if (IS_ENABLED(CONFIG_ARM64) ||
> -		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
>  			arch_timer_read_counter = arch_counter_get_cntvct;
>  		else
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads
@ 2017-10-09 16:10     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> Using the physical counter allows KVM to retain the offset between the
> virtual and physical counter as long as it is actively running a VCPU.
> 
> As soon as a VCPU is released, another thread is scheduled or we start
> running userspace applications, we reset the offset to 0, so that
> userspace accessing the virtual timer can still read the cirtual counter

s/cirtual/virtual/

> and get the same view of time as the kernel.
> 
> This opens up potential improvements for KVM performance.
> 
> VHE kernels or kernels continuing to use the virtual timer are
> unaffected.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm64/include/asm/arch_timer.h  | 9 ++++-----
>  drivers/clocksource/arm_arch_timer.c | 3 +--
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index a652ce0..1859a1c 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  
>  static inline u64 arch_counter_get_cntpct(void)
>  {
> -	/*
> -	 * AArch64 kernel and user space mandate the use of CNTVCT.
> -	 */
> -	BUG();
> -	return 0;
> +	u64 cval;
> +	isb();
> +	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> +	return cval;

This would be just fine if we were blessed with quality HW. This is 
unfortunately not the case, and we need a staggering amount of crap to 
deal with timer errata.

I suggest you replace this with the (fully untested) following:

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index a652ce0a5cb2..04275de614db 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -52,6 +52,7 @@ struct arch_timer_erratum_workaround {
 	const char *desc;
 	u32 (*read_cntp_tval_el0)(void);
 	u32 (*read_cntv_tval_el0)(void);
+	u64 (*read_cntpct_el0)(void);
 	u64 (*read_cntvct_el0)(void);
 	int (*set_next_event_phys)(unsigned long, struct clock_event_device *);
 	int (*set_next_event_virt)(unsigned long, struct clock_event_device *);
@@ -148,11 +149,8 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
 
 static inline u64 arch_counter_get_cntpct(void)
 {
-	/*
-	 * AArch64 kernel and user space mandate the use of CNTVCT.
-	 */
-	BUG();
-	return 0;
+	isb();
+	return arch_timer_reg_read_stable(cntpct_el0);
 }
 
 static inline u64 arch_counter_get_cntvct(void)
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index fd4b7f684bd0..5b41a96fa8dd 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -217,6 +217,11 @@ static u32 notrace fsl_a008585_read_cntv_tval_el0(void)
 	return __fsl_a008585_read_reg(cntv_tval_el0);
 }
 
+static u64 notrace fsl_a008585_read_cntpct_el0(void)
+{
+	return __fsl_a008585_read_reg(cntpct_el0);
+}
+
 static u64 notrace fsl_a008585_read_cntvct_el0(void)
 {
 	return __fsl_a008585_read_reg(cntvct_el0);
@@ -258,6 +263,11 @@ static u32 notrace hisi_161010101_read_cntv_tval_el0(void)
 	return __hisi_161010101_read_reg(cntv_tval_el0);
 }
 
+static u64 notrace hisi_161010101_read_cntpct_el0(void)
+{
+	return __hisi_161010101_read_reg(cntpct_el0);
+}
+
 static u64 notrace hisi_161010101_read_cntvct_el0(void)
 {
 	return __hisi_161010101_read_reg(cntvct_el0);
@@ -296,6 +306,15 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
 	new = read_sysreg(cntvct_el0);
 	return (((old ^ new) >> 32) & 1) ? old : new;
 }
+
+static u64 notrace arm64_858921_read_cntpct_el0(void)
+{
+	u64 old, new;
+
+	old = read_sysreg(cntpct_el0);
+	new = read_sysreg(cntpct_el0);
+	return (((old ^ new) >> 32) & 1) ? old : new;
+}
 #endif
 
 #ifdef CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND
@@ -310,7 +329,7 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
 						struct clock_event_device *clk)
 {
 	unsigned long ctrl;
-	u64 cval = evt + arch_counter_get_cntvct();
+	u64 cval = evt + arch_timer_read_counter();
 
 	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
 	ctrl |= ARCH_TIMER_CTRL_ENABLE;
@@ -346,6 +365,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "Freescale erratum a005858",
 		.read_cntp_tval_el0 = fsl_a008585_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = fsl_a008585_read_cntv_tval_el0,
+		.read_cntpct_el0 = fsl_a008585_read_cntpct_el0,
 		.read_cntvct_el0 = fsl_a008585_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -358,6 +378,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "HiSilicon erratum 161010101",
 		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
+		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
 		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -368,6 +389,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.desc = "HiSilicon erratum 161010101",
 		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
 		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
+		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
 		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
 		.set_next_event_phys = erratum_set_next_event_tval_phys,
 		.set_next_event_virt = erratum_set_next_event_tval_virt,
@@ -378,6 +400,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
 		.match_type = ate_match_local_cap_id,
 		.id = (void *)ARM64_WORKAROUND_858921,
 		.desc = "ARM erratum 858921",
+		.read_cntpct_el0 = arm64_858921_read_cntpct_el0,
 		.read_cntvct_el0 = arm64_858921_read_cntvct_el0,
 	},
 #endif

which ensures (at least in theory) that we do the right thing on our buggy
hardware...

>  }
>  
>  static inline u64 arch_counter_get_cntvct(void)
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index fd4b7f6..9b3322a 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
>  
>  	/* Register the CP15 based counter if we have one */
>  	if (type & ARCH_TIMER_TYPE_CP15) {
> -		if (IS_ENABLED(CONFIG_ARM64) ||
> -		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
>  			arch_timer_read_counter = arch_counter_get_cntvct;
>  		else
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 16:21     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:21 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas, Mark Rutland

On 23/09/17 01:41, Christoffer Dall wrote:
> Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
> arm64, but as we move to using the physical timer for the in-kernel
> time-keeping, we need to make that more flexible.
> 
> First, we need to make sure the physical counter can be read on equal
> terms to the virtual counter, which includes adding physical counter
> read functions for timers that require errata.
> 
> Second, we need to make a choice between reading the physical vs virtual
> counter, depending on which timer is used for time keeping in the kernel
> otherwise.  We can do this using a static key to avoid a performance
> penalty during runtime when reading the counter.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>

Right. I should have read patch #3. I'm an idiot.

> ---
>  arch/arm64/include/asm/arch_timer.h  | 15 ++++++++++++---
>  arch/arm64/include/asm/timex.h       |  2 +-
>  drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
>  3 files changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index 1859a1c..c56d8cd 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -30,6 +30,8 @@
>  
>  #include <clocksource/arm_arch_timer.h>
>  
> +extern struct static_key_false arch_timer_phys_counter_available;
> +
>  #if IS_ENABLED(CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND)
>  extern struct static_key_false arch_timer_read_ool_enabled;
>  #define needs_unstable_timer_counter_workaround() \
> @@ -52,6 +54,7 @@ struct arch_timer_erratum_workaround {
>  	const char *desc;
>  	u32 (*read_cntp_tval_el0)(void);
>  	u32 (*read_cntv_tval_el0)(void);
> +	u64 (*read_cntpct_el0)(void);
>  	u64 (*read_cntvct_el0)(void);
>  	int (*set_next_event_phys)(unsigned long, struct clock_event_device *);
>  	int (*set_next_event_virt)(unsigned long, struct clock_event_device *);
> @@ -148,10 +151,8 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  
>  static inline u64 arch_counter_get_cntpct(void)
>  {
> -	u64 cval;
>  	isb();
> -	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> -	return cval;
> +	return arch_timer_reg_read_stable(cntpct_el0);
>  }
>  
>  static inline u64 arch_counter_get_cntvct(void)
> @@ -160,6 +161,14 @@ static inline u64 arch_counter_get_cntvct(void)
>  	return arch_timer_reg_read_stable(cntvct_el0);
>  }
>  
> +static inline u64 arch_counter_get_cycles(void)
> +{
> +	if (static_branch_unlikely(&arch_timer_phys_counter_available))
> +	    return arch_counter_get_cntpct();
> +	else
> +	    return arch_counter_get_cntvct();
> +}
> +
>  static inline int arch_timer_arch_init(void)
>  {
>  	return 0;
> diff --git a/arch/arm64/include/asm/timex.h b/arch/arm64/include/asm/timex.h
> index 81a076e..c0d214c 100644
> --- a/arch/arm64/include/asm/timex.h
> +++ b/arch/arm64/include/asm/timex.h
> @@ -22,7 +22,7 @@
>   * Use the current timer as a cycle counter since this is what we use for
>   * the delay loop.
>   */
> -#define get_cycles()	arch_counter_get_cntvct()
> +#define get_cycles()	arch_counter_get_cycles()

Why can't this be arch_timer_read_counter() instead? Is there any 
measurable advantage in using a static key compared to a memory 
indirection?

>  
>  #include <asm-generic/timex.h>
>  
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 9b3322a..f35da20 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -77,6 +77,9 @@ static bool arch_timer_mem_use_virtual;
>  static bool arch_counter_suspend_stop;
>  static bool vdso_default = true;
>  
> +DEFINE_STATIC_KEY_FALSE(arch_timer_phys_counter_available);
> +EXPORT_SYMBOL_GPL(arch_timer_phys_counter_available);
> +
>  static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
>  
>  static int __init early_evtstrm_cfg(char *buf)
> @@ -217,6 +220,11 @@ static u32 notrace fsl_a008585_read_cntv_tval_el0(void)
>  	return __fsl_a008585_read_reg(cntv_tval_el0);
>  }
>  
> +static u64 notrace fsl_a008585_read_cntpct_el0(void)
> +{
> +	return __fsl_a008585_read_reg(cntpct_el0);
> +}
> +
>  static u64 notrace fsl_a008585_read_cntvct_el0(void)
>  {
>  	return __fsl_a008585_read_reg(cntvct_el0);
> @@ -258,6 +266,11 @@ static u32 notrace hisi_161010101_read_cntv_tval_el0(void)
>  	return __hisi_161010101_read_reg(cntv_tval_el0);
>  }
>  
> +static u64 notrace hisi_161010101_read_cntpct_el0(void)
> +{
> +	return __hisi_161010101_read_reg(cntpct_el0);
> +}
> +
>  static u64 notrace hisi_161010101_read_cntvct_el0(void)
>  {
>  	return __hisi_161010101_read_reg(cntvct_el0);
> @@ -288,6 +301,15 @@ static struct ate_acpi_oem_info hisi_161010101_oem_info[] = {
>  #endif
>  
>  #ifdef CONFIG_ARM64_ERRATUM_858921
> +static u64 notrace arm64_858921_read_cntpct_el0(void)
> +{
> +	u64 old, new;
> +
> +	old = read_sysreg(cntpct_el0);
> +	new = read_sysreg(cntpct_el0);
> +	return (((old ^ new) >> 32) & 1) ? old : new;
> +}
> +
>  static u64 notrace arm64_858921_read_cntvct_el0(void)
>  {
>  	u64 old, new;
> @@ -346,6 +368,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
>  		.desc = "Freescale erratum a005858",
>  		.read_cntp_tval_el0 = fsl_a008585_read_cntp_tval_el0,
>  		.read_cntv_tval_el0 = fsl_a008585_read_cntv_tval_el0,
> +		.read_cntpct_el0 = fsl_a008585_read_cntpct_el0,
>  		.read_cntvct_el0 = fsl_a008585_read_cntvct_el0,
>  		.set_next_event_phys = erratum_set_next_event_tval_phys,
>  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> @@ -358,6 +381,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
>  		.desc = "HiSilicon erratum 161010101",
>  		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
>  		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
> +		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
>  		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
>  		.set_next_event_phys = erratum_set_next_event_tval_phys,
>  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> @@ -368,6 +392,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
>  		.desc = "HiSilicon erratum 161010101",
>  		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
>  		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
> +		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
>  		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
>  		.set_next_event_phys = erratum_set_next_event_tval_phys,
>  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> @@ -378,6 +403,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
>  		.match_type = ate_match_local_cap_id,
>  		.id = (void *)ARM64_WORKAROUND_858921,
>  		.desc = "ARM erratum 858921",
> +		.read_cntpct_el0 = arm64_858921_read_cntpct_el0,
>  		.read_cntvct_el0 = arm64_858921_read_cntvct_el0,
>  	},
>  #endif
> @@ -890,10 +916,12 @@ static void __init arch_counter_register(unsigned type)
>  
>  	/* Register the CP15 based counter if we have one */
>  	if (type & ARCH_TIMER_TYPE_CP15) {
> -		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
>  			arch_timer_read_counter = arch_counter_get_cntvct;
> -		else
> +		} else {
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> +			static_branch_enable(&arch_timer_phys_counter_available);
> +		}
>  
>  		clocksource_counter.archdata.vdso_direct = vdso_default;
>  	} else {
> 

In my reply to patch #2, I had the following hunk:

@@ -310,7 +329,7 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
 						struct clock_event_device *clk)
 {
 	unsigned long ctrl;
-	u64 cval = evt + arch_counter_get_cntvct();
+	u64 cval = evt + arch_timer_read_counter();
 
 	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
 	ctrl |= ARCH_TIMER_CTRL_ENABLE;

Once we start using a different timer, this could well have an effect...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles
@ 2017-10-09 16:21     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
> arm64, but as we move to using the physical timer for the in-kernel
> time-keeping, we need to make that more flexible.
> 
> First, we need to make sure the physical counter can be read on equal
> terms to the virtual counter, which includes adding physical counter
> read functions for timers that require errata.
> 
> Second, we need to make a choice between reading the physical vs virtual
> counter, depending on which timer is used for time keeping in the kernel
> otherwise.  We can do this using a static key to avoid a performance
> penalty during runtime when reading the counter.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>

Right. I should have read patch #3. I'm an idiot.

> ---
>  arch/arm64/include/asm/arch_timer.h  | 15 ++++++++++++---
>  arch/arm64/include/asm/timex.h       |  2 +-
>  drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
>  3 files changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index 1859a1c..c56d8cd 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -30,6 +30,8 @@
>  
>  #include <clocksource/arm_arch_timer.h>
>  
> +extern struct static_key_false arch_timer_phys_counter_available;
> +
>  #if IS_ENABLED(CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND)
>  extern struct static_key_false arch_timer_read_ool_enabled;
>  #define needs_unstable_timer_counter_workaround() \
> @@ -52,6 +54,7 @@ struct arch_timer_erratum_workaround {
>  	const char *desc;
>  	u32 (*read_cntp_tval_el0)(void);
>  	u32 (*read_cntv_tval_el0)(void);
> +	u64 (*read_cntpct_el0)(void);
>  	u64 (*read_cntvct_el0)(void);
>  	int (*set_next_event_phys)(unsigned long, struct clock_event_device *);
>  	int (*set_next_event_virt)(unsigned long, struct clock_event_device *);
> @@ -148,10 +151,8 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  
>  static inline u64 arch_counter_get_cntpct(void)
>  {
> -	u64 cval;
>  	isb();
> -	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> -	return cval;
> +	return arch_timer_reg_read_stable(cntpct_el0);
>  }
>  
>  static inline u64 arch_counter_get_cntvct(void)
> @@ -160,6 +161,14 @@ static inline u64 arch_counter_get_cntvct(void)
>  	return arch_timer_reg_read_stable(cntvct_el0);
>  }
>  
> +static inline u64 arch_counter_get_cycles(void)
> +{
> +	if (static_branch_unlikely(&arch_timer_phys_counter_available))
> +	    return arch_counter_get_cntpct();
> +	else
> +	    return arch_counter_get_cntvct();
> +}
> +
>  static inline int arch_timer_arch_init(void)
>  {
>  	return 0;
> diff --git a/arch/arm64/include/asm/timex.h b/arch/arm64/include/asm/timex.h
> index 81a076e..c0d214c 100644
> --- a/arch/arm64/include/asm/timex.h
> +++ b/arch/arm64/include/asm/timex.h
> @@ -22,7 +22,7 @@
>   * Use the current timer as a cycle counter since this is what we use for
>   * the delay loop.
>   */
> -#define get_cycles()	arch_counter_get_cntvct()
> +#define get_cycles()	arch_counter_get_cycles()

Why can't this be arch_timer_read_counter() instead? Is there any 
measurable advantage in using a static key compared to a memory 
indirection?

>  
>  #include <asm-generic/timex.h>
>  
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 9b3322a..f35da20 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -77,6 +77,9 @@ static bool arch_timer_mem_use_virtual;
>  static bool arch_counter_suspend_stop;
>  static bool vdso_default = true;
>  
> +DEFINE_STATIC_KEY_FALSE(arch_timer_phys_counter_available);
> +EXPORT_SYMBOL_GPL(arch_timer_phys_counter_available);
> +
>  static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
>  
>  static int __init early_evtstrm_cfg(char *buf)
> @@ -217,6 +220,11 @@ static u32 notrace fsl_a008585_read_cntv_tval_el0(void)
>  	return __fsl_a008585_read_reg(cntv_tval_el0);
>  }
>  
> +static u64 notrace fsl_a008585_read_cntpct_el0(void)
> +{
> +	return __fsl_a008585_read_reg(cntpct_el0);
> +}
> +
>  static u64 notrace fsl_a008585_read_cntvct_el0(void)
>  {
>  	return __fsl_a008585_read_reg(cntvct_el0);
> @@ -258,6 +266,11 @@ static u32 notrace hisi_161010101_read_cntv_tval_el0(void)
>  	return __hisi_161010101_read_reg(cntv_tval_el0);
>  }
>  
> +static u64 notrace hisi_161010101_read_cntpct_el0(void)
> +{
> +	return __hisi_161010101_read_reg(cntpct_el0);
> +}
> +
>  static u64 notrace hisi_161010101_read_cntvct_el0(void)
>  {
>  	return __hisi_161010101_read_reg(cntvct_el0);
> @@ -288,6 +301,15 @@ static struct ate_acpi_oem_info hisi_161010101_oem_info[] = {
>  #endif
>  
>  #ifdef CONFIG_ARM64_ERRATUM_858921
> +static u64 notrace arm64_858921_read_cntpct_el0(void)
> +{
> +	u64 old, new;
> +
> +	old = read_sysreg(cntpct_el0);
> +	new = read_sysreg(cntpct_el0);
> +	return (((old ^ new) >> 32) & 1) ? old : new;
> +}
> +
>  static u64 notrace arm64_858921_read_cntvct_el0(void)
>  {
>  	u64 old, new;
> @@ -346,6 +368,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
>  		.desc = "Freescale erratum a005858",
>  		.read_cntp_tval_el0 = fsl_a008585_read_cntp_tval_el0,
>  		.read_cntv_tval_el0 = fsl_a008585_read_cntv_tval_el0,
> +		.read_cntpct_el0 = fsl_a008585_read_cntpct_el0,
>  		.read_cntvct_el0 = fsl_a008585_read_cntvct_el0,
>  		.set_next_event_phys = erratum_set_next_event_tval_phys,
>  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> @@ -358,6 +381,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
>  		.desc = "HiSilicon erratum 161010101",
>  		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
>  		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
> +		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
>  		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
>  		.set_next_event_phys = erratum_set_next_event_tval_phys,
>  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> @@ -368,6 +392,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
>  		.desc = "HiSilicon erratum 161010101",
>  		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
>  		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
> +		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
>  		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
>  		.set_next_event_phys = erratum_set_next_event_tval_phys,
>  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> @@ -378,6 +403,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
>  		.match_type = ate_match_local_cap_id,
>  		.id = (void *)ARM64_WORKAROUND_858921,
>  		.desc = "ARM erratum 858921",
> +		.read_cntpct_el0 = arm64_858921_read_cntpct_el0,
>  		.read_cntvct_el0 = arm64_858921_read_cntvct_el0,
>  	},
>  #endif
> @@ -890,10 +916,12 @@ static void __init arch_counter_register(unsigned type)
>  
>  	/* Register the CP15 based counter if we have one */
>  	if (type & ARCH_TIMER_TYPE_CP15) {
> -		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
>  			arch_timer_read_counter = arch_counter_get_cntvct;
> -		else
> +		} else {
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> +			static_branch_enable(&arch_timer_phys_counter_available);
> +		}
>  
>  		clocksource_counter.archdata.vdso_direct = vdso_default;
>  	} else {
> 

In my reply to patch #2, I had the following hunk:

@@ -310,7 +329,7 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
 						struct clock_event_device *clk)
 {
 	unsigned long ctrl;
-	u64 cval = evt + arch_counter_get_cntvct();
+	u64 cval = evt + arch_timer_read_counter();
 
 	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
 	ctrl |= ARCH_TIMER_CTRL_ENABLE;

Once we start using a different timer, this could well have an effect...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 04/20] KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 16:22     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:22 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas

On 23/09/17 01:41, Christoffer Dall wrote:
> If the vgic is not initialized, don't try to grab its spinlocks or
> traverse its data structures.
> 
> This is important because we soon have to start considering the active
> state of a virtual interrupts when doing vcpu_load, which may happen
> early on before the vgic is initialized.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/vgic/vgic.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index fed717e..e1f7dbc 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -777,6 +777,9 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>  	bool map_is_active;
>  
> +	if (!vgic_initialized(vcpu->kvm))
> +		return false;
> +
>  	spin_lock(&irq->irq_lock);
>  	map_is_active = irq->hw && irq->active;
>  	spin_unlock(&irq->irq_lock);
> 

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 04/20] KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized
@ 2017-10-09 16:22     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> If the vgic is not initialized, don't try to grab its spinlocks or
> traverse its data structures.
> 
> This is important because we soon have to start considering the active
> state of a virtual interrupts when doing vcpu_load, which may happen
> early on before the vgic is initialized.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/vgic/vgic.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index fed717e..e1f7dbc 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -777,6 +777,9 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>  	bool map_is_active;
>  
> +	if (!vgic_initialized(vcpu->kvm))
> +		return false;
> +
>  	spin_lock(&irq->irq_lock);
>  	map_is_active = irq->hw && irq->active;
>  	spin_unlock(&irq->irq_lock);
> 

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 05/20] KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 16:37     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:37 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas

On 23/09/17 01:41, Christoffer Dall wrote:
> We are about to optimize our timer handling logic which involves
> injecting irqs to the vgic directly from the irq handler.
> 
> Unfortunately, the injection path can take any AP list lock and irq lock
> and we must therefore make sure to use spin_lock_irqsave where ever
> interrupts are enabled and we are taking any of those locks, to avoid
> deadlocking between process context and the ISR.
> 
> This changes a lot of the VGIC code, but The good news are that the
> changes are mostly mechanical.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/vgic/vgic-its.c     | 17 +++++++-----
>  virt/kvm/arm/vgic/vgic-mmio-v2.c | 22 +++++++++------
>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 17 +++++++-----
>  virt/kvm/arm/vgic/vgic-mmio.c    | 44 +++++++++++++++++------------
>  virt/kvm/arm/vgic/vgic-v2.c      |  5 ++--
>  virt/kvm/arm/vgic/vgic-v3.c      | 12 ++++----
>  virt/kvm/arm/vgic/vgic.c         | 60 +++++++++++++++++++++++++---------------
>  virt/kvm/arm/vgic/vgic.h         |  3 +-
>  8 files changed, 108 insertions(+), 72 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index f51c1e1..9f5e347 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -278,6 +278,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
>  	u64 propbase = GICR_PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
>  	u8 prop;
>  	int ret;
> +	unsigned long flags;
>  
>  	ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
>  			     &prop, 1);
> @@ -285,15 +286,15 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
>  	if (ret)
>  		return ret;
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	if (!filter_vcpu || filter_vcpu == irq->target_vcpu) {
>  		irq->priority = LPI_PROP_PRIORITY(prop);
>  		irq->enabled = LPI_PROP_ENABLE_BIT(prop);
>  
> -		vgic_queue_irq_unlock(kvm, irq);
> +		vgic_queue_irq_unlock(kvm, irq, flags);
>  	} else {
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	}
>  
>  	return 0;
> @@ -393,6 +394,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
>  	int ret = 0;
>  	u32 *intids;
>  	int nr_irqs, i;
> +	unsigned long flags;
>  
>  	nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
>  	if (nr_irqs < 0)
> @@ -420,9 +422,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
>  		}
>  
>  		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->pending_latch = pendmask & (1U << bit_nr);
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
> @@ -515,6 +517,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
>  {
>  	struct kvm_vcpu *vcpu;
>  	struct its_ite *ite;
> +	unsigned long flags;
>  
>  	if (!its->enabled)
>  		return -EBUSY;
> @@ -530,9 +533,9 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
>  	if (!vcpu->arch.vgic_cpu.lpis_enabled)
>  		return -EBUSY;
>  
> -	spin_lock(&ite->irq->irq_lock);
> +	spin_lock_irqsave(&ite->irq->irq_lock, flags);
>  	ite->irq->pending_latch = true;
> -	vgic_queue_irq_unlock(kvm, ite->irq);
> +	vgic_queue_irq_unlock(kvm, ite->irq, flags);
>  
>  	return 0;
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> index b3d4a10..e21e2f4 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> @@ -74,6 +74,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>  	int mode = (val >> 24) & 0x03;
>  	int c;
>  	struct kvm_vcpu *vcpu;
> +	unsigned long flags;
>  
>  	switch (mode) {
>  	case 0x0:		/* as specified by targets */
> @@ -97,11 +98,11 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>  
>  		irq = vgic_get_irq(source_vcpu->kvm, vcpu, intid);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->pending_latch = true;
>  		irq->source |= 1U << source_vcpu->vcpu_id;
>  
> -		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(source_vcpu->kvm, irq, flags);
>  		vgic_put_irq(source_vcpu->kvm, irq);
>  	}
>  }
> @@ -131,6 +132,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
>  	u8 cpu_mask = GENMASK(atomic_read(&vcpu->kvm->online_vcpus) - 1, 0);
>  	int i;
> +	unsigned long flags;
>  
>  	/* GICD_ITARGETSR[0-7] are read-only */
>  	if (intid < VGIC_NR_PRIVATE_IRQS)
> @@ -140,13 +142,13 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid + i);
>  		int target;
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->targets = (val >> (i * 8)) & cpu_mask;
>  		target = irq->targets ? __ffs(irq->targets) : 0;
>  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -174,17 +176,18 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = addr & 0x0f;
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len; i++) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->source &= ~((val >> (i * 8)) & 0xff);
>  		if (!irq->source)
>  			irq->pending_latch = false;
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -195,19 +198,20 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = addr & 0x0f;
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len; i++) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->source |= (val >> (i * 8)) & 0xff;
>  
>  		if (irq->source) {
>  			irq->pending_latch = true;
> -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		} else {
> -			spin_unlock(&irq->irq_lock);
> +			spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		}
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> index 408ef06..8378610 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> @@ -129,6 +129,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  {
>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>  	struct vgic_irq *irq;
> +	unsigned long flags;
>  
>  	/* The upper word is WI for us since we don't implement Aff3. */
>  	if (addr & 4)
> @@ -139,13 +140,13 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  	if (!irq)
>  		return;
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
>  	irq->mpidr = val & GENMASK(23, 0);
>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>  
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(vcpu->kvm, irq);
>  }
>  
> @@ -241,11 +242,12 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len * 8; i++) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		if (test_bit(i, &val)) {
>  			/*
>  			 * pending_latch is set irrespective of irq type
> @@ -253,10 +255,10 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
>  			 * restore irq config before pending info.
>  			 */
>  			irq->pending_latch = true;
> -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		} else {
>  			irq->pending_latch = false;
> -			spin_unlock(&irq->irq_lock);
> +			spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		}
>  
>  		vgic_put_irq(vcpu->kvm, irq);
> @@ -799,6 +801,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>  	int sgi, c;
>  	int vcpu_id = vcpu->vcpu_id;
>  	bool broadcast;
> +	unsigned long flags;
>  
>  	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
>  	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
> @@ -837,10 +840,10 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>  
>  		irq = vgic_get_irq(vcpu->kvm, c_vcpu, sgi);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->pending_latch = true;
>  
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index c1e4bdd..deb51ee 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -69,13 +69,14 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->enabled = true;
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> @@ -87,15 +88,16 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->enabled = false;
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -126,14 +128,15 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->pending_latch = true;
>  
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -144,15 +147,16 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->pending_latch = false;
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -181,7 +185,8 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
>  				    bool new_active_state)
>  {
>  	struct kvm_vcpu *requester_vcpu;
> -	spin_lock(&irq->irq_lock);
> +	unsigned long flags;
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	/*
>  	 * The vcpu parameter here can mean multiple things depending on how
> @@ -216,9 +221,9 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
>  
>  	irq->active = new_active_state;
>  	if (new_active_state)
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  	else
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  }
>  
>  /*
> @@ -352,14 +357,15 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len; i++) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		/* Narrow the priority range to what we actually support */
>  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> @@ -390,6 +396,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 2);
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len * 4; i++) {
>  		struct vgic_irq *irq;
> @@ -404,14 +411,14 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>  			continue;
>  
>  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		if (test_bit(i * 2 + 1, &val))
>  			irq->config = VGIC_CONFIG_EDGE;
>  		else
>  			irq->config = VGIC_CONFIG_LEVEL;
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -443,6 +450,7 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
>  {
>  	int i;
>  	int nr_irqs = vcpu->kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
> +	unsigned long flags;
>  
>  	for (i = 0; i < 32; i++) {
>  		struct vgic_irq *irq;
> @@ -459,12 +467,12 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
>  		 * restore irq config before line level.
>  		 */
>  		new_level = !!(val & (1U << i));
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->line_level = new_level;
>  		if (new_level)
> -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		else
> -			spin_unlock(&irq->irq_lock);
> +			spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e4187e5..8089710 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -62,6 +62,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
>  	int lr;
> +	unsigned long flags;
>  
>  	cpuif->vgic_hcr &= ~GICH_HCR_UIE;
>  
> @@ -77,7 +78,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  
>  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		/* Always preserve the active bit */
>  		irq->active = !!(val & GICH_LR_ACTIVE_BIT);
> @@ -104,7 +105,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 96ea597..863351c 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -44,6 +44,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  	struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
>  	u32 model = vcpu->kvm->arch.vgic.vgic_model;
>  	int lr;
> +	unsigned long flags;
>  
>  	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
>  
> @@ -66,7 +67,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  		if (!irq)	/* An LPI could have been unmapped. */
>  			continue;
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		/* Always preserve the active bit */
>  		irq->active = !!(val & ICH_LR_ACTIVE_BIT);
> @@ -94,7 +95,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
> @@ -278,6 +279,7 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>  	bool status;
>  	u8 val;
>  	int ret;
> +	unsigned long flags;
>  
>  retry:
>  	vcpu = irq->target_vcpu;
> @@ -296,13 +298,13 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>  
>  	status = val & (1 << bit_nr);
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  	if (irq->target_vcpu != vcpu) {
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		goto retry;
>  	}
>  	irq->pending_latch = status;
> -	vgic_queue_irq_unlock(vcpu->kvm, irq);
> +	vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  
>  	if (status) {
>  		/* clear consumed data */
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index e1f7dbc..b1bd238 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -53,6 +53,10 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
>   *   vcpuX->vcpu_id < vcpuY->vcpu_id:
>   *     spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock);
>   *     spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock);
> + *
> + * Since the VGIC must support injecting virtual interrupts from ISRs, we have
> + * to use the spin_lock_irqsave/spin_unlock_irqrestore versions of outer
> + * spinlocks for any lock that may be taken while injecting an interrupt.
>   */
>  
>  /*
> @@ -261,7 +265,8 @@ static bool vgic_validate_injection(struct vgic_irq *irq, bool level, void *owne
>   * Needs to be entered with the IRQ lock already held, but will return
>   * with all locks dropped.
>   */
> -bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> +bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
> +			   unsigned long flags)
>  {
>  	struct kvm_vcpu *vcpu;
>  
> @@ -279,7 +284,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  		 * not need to be inserted into an ap_list and there is also
>  		 * no more work for us to do.
>  		 */
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>  		/*
>  		 * We have to kick the VCPU here, because we could be
> @@ -301,11 +306,11 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  	 * We must unlock the irq lock to take the ap_list_lock where
>  	 * we are going to insert this new pending interrupt.
>  	 */
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>  	/* someone can do stuff here, which we re-check below */
>  
> -	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +	spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
>  	spin_lock(&irq->irq_lock);
>  
>  	/*
> @@ -322,9 +327,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  
>  	if (unlikely(irq->vcpu || vcpu != vgic_target_oracle(irq))) {
>  		spin_unlock(&irq->irq_lock);
> -		spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +		spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		goto retry;
>  	}
>  
> @@ -337,7 +342,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  	irq->vcpu = vcpu;
>  
>  	spin_unlock(&irq->irq_lock);
> -	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +	spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
>  
>  	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
> @@ -367,6 +372,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  {
>  	struct kvm_vcpu *vcpu;
>  	struct vgic_irq *irq;
> +	unsigned long flags;
>  	int ret;
>  
>  	trace_vgic_update_irq_pending(cpuid, intid, level);
> @@ -383,11 +389,11 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  	if (!irq)
>  		return -EINVAL;
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	if (!vgic_validate_injection(irq, level, owner)) {
>  		/* Nothing to see here, move along... */
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(kvm, irq);
>  		return 0;
>  	}
> @@ -397,7 +403,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  	else
>  		irq->pending_latch = true;
>  
> -	vgic_queue_irq_unlock(kvm, irq);
> +	vgic_queue_irq_unlock(kvm, irq, flags);
>  	vgic_put_irq(kvm, irq);
>  
>  	return 0;
> @@ -406,15 +412,16 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
>  {
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> +	unsigned long flags;
>  
>  	BUG_ON(!irq);
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	irq->hw = true;
>  	irq->hwintid = phys_irq;
>  
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return 0;
> @@ -423,6 +430,7 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
>  	struct vgic_irq *irq;
> +	unsigned long flags;
>  
>  	if (!vgic_initialized(vcpu->kvm))
>  		return -EAGAIN;
> @@ -430,12 +438,12 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>  	BUG_ON(!irq);
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	irq->hw = false;
>  	irq->hwintid = 0;
>  
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return 0;
> @@ -486,9 +494,10 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_irq *irq, *tmp;
> +	unsigned long flags;
>  
>  retry:
> -	spin_lock(&vgic_cpu->ap_list_lock);
> +	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
>  
>  	list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) {
>  		struct kvm_vcpu *target_vcpu, *vcpuA, *vcpuB;
> @@ -528,7 +537,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
>  		/* This interrupt looks like it has to be migrated. */
>  
>  		spin_unlock(&irq->irq_lock);
> -		spin_unlock(&vgic_cpu->ap_list_lock);
> +		spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
>  
>  		/*
>  		 * Ensure locking order by always locking the smallest
> @@ -542,7 +551,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
>  			vcpuB = vcpu;
>  		}
>  
> -		spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock);
> +		spin_lock_irqsave(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
>  		spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock,
>  				 SINGLE_DEPTH_NESTING);
>  		spin_lock(&irq->irq_lock);
> @@ -566,11 +575,11 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
>  
>  		spin_unlock(&irq->irq_lock);
>  		spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock);
> -		spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock);
> +		spin_unlock_irqrestore(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
>  		goto retry;
>  	}
>  
> -	spin_unlock(&vgic_cpu->ap_list_lock);
> +	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
>  }
>  
>  static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
> @@ -703,6 +712,8 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>  	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
>  		return;
>  
> +	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
> +
>  	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  	vgic_flush_lr_state(vcpu);
>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> @@ -735,11 +746,12 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_irq *irq;
>  	bool pending = false;
> +	unsigned long flags;
>  
>  	if (!vcpu->kvm->arch.vgic.enabled)
>  		return false;
>  
> -	spin_lock(&vgic_cpu->ap_list_lock);
> +	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
>  
>  	list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
>  		spin_lock(&irq->irq_lock);
> @@ -750,7 +762,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
>  			break;
>  	}
>  
> -	spin_unlock(&vgic_cpu->ap_list_lock);
> +	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
>  
>  	return pending;
>  }
> @@ -776,13 +788,15 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>  	bool map_is_active;
> +	unsigned long flags;
>  
>  	if (!vgic_initialized(vcpu->kvm))
>  		return false;
> +	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);

I'm a bit puzzled by this sequence: Either interrupts are disabled and
we don't need the irqsave version, or they aren't and the BUG_ON will
fire. kvm_vgic_map_is_active is called (indirectly) from
kvm_timer_flush_hwstate. And at this stage of the patches, we definitely
call this function with interrupts enabled.

Is it just a patch splitting snafu? Or something more serious? Same goes
for the DEBUG_SPINLOCK_BUG_ON in kvm_vgic_flush_hwstate.

>  	map_is_active = irq->hw && irq->active;
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return map_is_active;
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index bf9ceab..4f8aecb 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -140,7 +140,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid);
>  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> -bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
> +bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
> +			   unsigned long flags);
>  void vgic_kick_vcpus(struct kvm *kvm);
>  
>  int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
> 

Otherwise looks good to me.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 05/20] KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context
@ 2017-10-09 16:37     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> We are about to optimize our timer handling logic which involves
> injecting irqs to the vgic directly from the irq handler.
> 
> Unfortunately, the injection path can take any AP list lock and irq lock
> and we must therefore make sure to use spin_lock_irqsave where ever
> interrupts are enabled and we are taking any of those locks, to avoid
> deadlocking between process context and the ISR.
> 
> This changes a lot of the VGIC code, but The good news are that the
> changes are mostly mechanical.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/vgic/vgic-its.c     | 17 +++++++-----
>  virt/kvm/arm/vgic/vgic-mmio-v2.c | 22 +++++++++------
>  virt/kvm/arm/vgic/vgic-mmio-v3.c | 17 +++++++-----
>  virt/kvm/arm/vgic/vgic-mmio.c    | 44 +++++++++++++++++------------
>  virt/kvm/arm/vgic/vgic-v2.c      |  5 ++--
>  virt/kvm/arm/vgic/vgic-v3.c      | 12 ++++----
>  virt/kvm/arm/vgic/vgic.c         | 60 +++++++++++++++++++++++++---------------
>  virt/kvm/arm/vgic/vgic.h         |  3 +-
>  8 files changed, 108 insertions(+), 72 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index f51c1e1..9f5e347 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -278,6 +278,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
>  	u64 propbase = GICR_PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
>  	u8 prop;
>  	int ret;
> +	unsigned long flags;
>  
>  	ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
>  			     &prop, 1);
> @@ -285,15 +286,15 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
>  	if (ret)
>  		return ret;
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	if (!filter_vcpu || filter_vcpu == irq->target_vcpu) {
>  		irq->priority = LPI_PROP_PRIORITY(prop);
>  		irq->enabled = LPI_PROP_ENABLE_BIT(prop);
>  
> -		vgic_queue_irq_unlock(kvm, irq);
> +		vgic_queue_irq_unlock(kvm, irq, flags);
>  	} else {
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	}
>  
>  	return 0;
> @@ -393,6 +394,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
>  	int ret = 0;
>  	u32 *intids;
>  	int nr_irqs, i;
> +	unsigned long flags;
>  
>  	nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
>  	if (nr_irqs < 0)
> @@ -420,9 +422,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
>  		}
>  
>  		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->pending_latch = pendmask & (1U << bit_nr);
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
> @@ -515,6 +517,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
>  {
>  	struct kvm_vcpu *vcpu;
>  	struct its_ite *ite;
> +	unsigned long flags;
>  
>  	if (!its->enabled)
>  		return -EBUSY;
> @@ -530,9 +533,9 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
>  	if (!vcpu->arch.vgic_cpu.lpis_enabled)
>  		return -EBUSY;
>  
> -	spin_lock(&ite->irq->irq_lock);
> +	spin_lock_irqsave(&ite->irq->irq_lock, flags);
>  	ite->irq->pending_latch = true;
> -	vgic_queue_irq_unlock(kvm, ite->irq);
> +	vgic_queue_irq_unlock(kvm, ite->irq, flags);
>  
>  	return 0;
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> index b3d4a10..e21e2f4 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> @@ -74,6 +74,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>  	int mode = (val >> 24) & 0x03;
>  	int c;
>  	struct kvm_vcpu *vcpu;
> +	unsigned long flags;
>  
>  	switch (mode) {
>  	case 0x0:		/* as specified by targets */
> @@ -97,11 +98,11 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
>  
>  		irq = vgic_get_irq(source_vcpu->kvm, vcpu, intid);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->pending_latch = true;
>  		irq->source |= 1U << source_vcpu->vcpu_id;
>  
> -		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(source_vcpu->kvm, irq, flags);
>  		vgic_put_irq(source_vcpu->kvm, irq);
>  	}
>  }
> @@ -131,6 +132,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
>  	u8 cpu_mask = GENMASK(atomic_read(&vcpu->kvm->online_vcpus) - 1, 0);
>  	int i;
> +	unsigned long flags;
>  
>  	/* GICD_ITARGETSR[0-7] are read-only */
>  	if (intid < VGIC_NR_PRIVATE_IRQS)
> @@ -140,13 +142,13 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid + i);
>  		int target;
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->targets = (val >> (i * 8)) & cpu_mask;
>  		target = irq->targets ? __ffs(irq->targets) : 0;
>  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -174,17 +176,18 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = addr & 0x0f;
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len; i++) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->source &= ~((val >> (i * 8)) & 0xff);
>  		if (!irq->source)
>  			irq->pending_latch = false;
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -195,19 +198,20 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = addr & 0x0f;
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len; i++) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->source |= (val >> (i * 8)) & 0xff;
>  
>  		if (irq->source) {
>  			irq->pending_latch = true;
> -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		} else {
> -			spin_unlock(&irq->irq_lock);
> +			spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		}
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> index 408ef06..8378610 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> @@ -129,6 +129,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  {
>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>  	struct vgic_irq *irq;
> +	unsigned long flags;
>  
>  	/* The upper word is WI for us since we don't implement Aff3. */
>  	if (addr & 4)
> @@ -139,13 +140,13 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  	if (!irq)
>  		return;
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
>  	irq->mpidr = val & GENMASK(23, 0);
>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>  
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(vcpu->kvm, irq);
>  }
>  
> @@ -241,11 +242,12 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len * 8; i++) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		if (test_bit(i, &val)) {
>  			/*
>  			 * pending_latch is set irrespective of irq type
> @@ -253,10 +255,10 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
>  			 * restore irq config before pending info.
>  			 */
>  			irq->pending_latch = true;
> -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		} else {
>  			irq->pending_latch = false;
> -			spin_unlock(&irq->irq_lock);
> +			spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		}
>  
>  		vgic_put_irq(vcpu->kvm, irq);
> @@ -799,6 +801,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>  	int sgi, c;
>  	int vcpu_id = vcpu->vcpu_id;
>  	bool broadcast;
> +	unsigned long flags;
>  
>  	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
>  	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
> @@ -837,10 +840,10 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
>  
>  		irq = vgic_get_irq(vcpu->kvm, c_vcpu, sgi);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->pending_latch = true;
>  
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index c1e4bdd..deb51ee 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -69,13 +69,14 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->enabled = true;
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> @@ -87,15 +88,16 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->enabled = false;
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -126,14 +128,15 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->pending_latch = true;
>  
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -144,15 +147,16 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
> +	unsigned long flags;
>  
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		irq->pending_latch = false;
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -181,7 +185,8 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
>  				    bool new_active_state)
>  {
>  	struct kvm_vcpu *requester_vcpu;
> -	spin_lock(&irq->irq_lock);
> +	unsigned long flags;
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	/*
>  	 * The vcpu parameter here can mean multiple things depending on how
> @@ -216,9 +221,9 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
>  
>  	irq->active = new_active_state;
>  	if (new_active_state)
> -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  	else
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  }
>  
>  /*
> @@ -352,14 +357,15 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len; i++) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		/* Narrow the priority range to what we actually support */
>  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> @@ -390,6 +396,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>  {
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 2);
>  	int i;
> +	unsigned long flags;
>  
>  	for (i = 0; i < len * 4; i++) {
>  		struct vgic_irq *irq;
> @@ -404,14 +411,14 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
>  			continue;
>  
>  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		if (test_bit(i * 2 + 1, &val))
>  			irq->config = VGIC_CONFIG_EDGE;
>  		else
>  			irq->config = VGIC_CONFIG_LEVEL;
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  }
> @@ -443,6 +450,7 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
>  {
>  	int i;
>  	int nr_irqs = vcpu->kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
> +	unsigned long flags;
>  
>  	for (i = 0; i < 32; i++) {
>  		struct vgic_irq *irq;
> @@ -459,12 +467,12 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
>  		 * restore irq config before line level.
>  		 */
>  		new_level = !!(val & (1U << i));
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		irq->line_level = new_level;
>  		if (new_level)
> -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  		else
> -			spin_unlock(&irq->irq_lock);
> +			spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
> diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> index e4187e5..8089710 100644
> --- a/virt/kvm/arm/vgic/vgic-v2.c
> +++ b/virt/kvm/arm/vgic/vgic-v2.c
> @@ -62,6 +62,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
>  	int lr;
> +	unsigned long flags;
>  
>  	cpuif->vgic_hcr &= ~GICH_HCR_UIE;
>  
> @@ -77,7 +78,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  
>  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		/* Always preserve the active bit */
>  		irq->active = !!(val & GICH_LR_ACTIVE_BIT);
> @@ -104,7 +105,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
> diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> index 96ea597..863351c 100644
> --- a/virt/kvm/arm/vgic/vgic-v3.c
> +++ b/virt/kvm/arm/vgic/vgic-v3.c
> @@ -44,6 +44,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  	struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
>  	u32 model = vcpu->kvm->arch.vgic.vgic_model;
>  	int lr;
> +	unsigned long flags;
>  
>  	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
>  
> @@ -66,7 +67,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  		if (!irq)	/* An LPI could have been unmapped. */
>  			continue;
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  		/* Always preserve the active bit */
>  		irq->active = !!(val & ICH_LR_ACTIVE_BIT);
> @@ -94,7 +95,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
>  				irq->pending_latch = false;
>  		}
>  
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  
> @@ -278,6 +279,7 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>  	bool status;
>  	u8 val;
>  	int ret;
> +	unsigned long flags;
>  
>  retry:
>  	vcpu = irq->target_vcpu;
> @@ -296,13 +298,13 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
>  
>  	status = val & (1 << bit_nr);
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  	if (irq->target_vcpu != vcpu) {
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		goto retry;
>  	}
>  	irq->pending_latch = status;
> -	vgic_queue_irq_unlock(vcpu->kvm, irq);
> +	vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>  
>  	if (status) {
>  		/* clear consumed data */
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index e1f7dbc..b1bd238 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -53,6 +53,10 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
>   *   vcpuX->vcpu_id < vcpuY->vcpu_id:
>   *     spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock);
>   *     spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock);
> + *
> + * Since the VGIC must support injecting virtual interrupts from ISRs, we have
> + * to use the spin_lock_irqsave/spin_unlock_irqrestore versions of outer
> + * spinlocks for any lock that may be taken while injecting an interrupt.
>   */
>  
>  /*
> @@ -261,7 +265,8 @@ static bool vgic_validate_injection(struct vgic_irq *irq, bool level, void *owne
>   * Needs to be entered with the IRQ lock already held, but will return
>   * with all locks dropped.
>   */
> -bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> +bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
> +			   unsigned long flags)
>  {
>  	struct kvm_vcpu *vcpu;
>  
> @@ -279,7 +284,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  		 * not need to be inserted into an ap_list and there is also
>  		 * no more work for us to do.
>  		 */
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>  		/*
>  		 * We have to kick the VCPU here, because we could be
> @@ -301,11 +306,11 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  	 * We must unlock the irq lock to take the ap_list_lock where
>  	 * we are going to insert this new pending interrupt.
>  	 */
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  
>  	/* someone can do stuff here, which we re-check below */
>  
> -	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +	spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
>  	spin_lock(&irq->irq_lock);
>  
>  	/*
> @@ -322,9 +327,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  
>  	if (unlikely(irq->vcpu || vcpu != vgic_target_oracle(irq))) {
>  		spin_unlock(&irq->irq_lock);
> -		spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +		spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
>  
> -		spin_lock(&irq->irq_lock);
> +		spin_lock_irqsave(&irq->irq_lock, flags);
>  		goto retry;
>  	}
>  
> @@ -337,7 +342,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  	irq->vcpu = vcpu;
>  
>  	spin_unlock(&irq->irq_lock);
> -	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +	spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
>  
>  	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
> @@ -367,6 +372,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  {
>  	struct kvm_vcpu *vcpu;
>  	struct vgic_irq *irq;
> +	unsigned long flags;
>  	int ret;
>  
>  	trace_vgic_update_irq_pending(cpuid, intid, level);
> @@ -383,11 +389,11 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  	if (!irq)
>  		return -EINVAL;
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	if (!vgic_validate_injection(irq, level, owner)) {
>  		/* Nothing to see here, move along... */
> -		spin_unlock(&irq->irq_lock);
> +		spin_unlock_irqrestore(&irq->irq_lock, flags);
>  		vgic_put_irq(kvm, irq);
>  		return 0;
>  	}
> @@ -397,7 +403,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  	else
>  		irq->pending_latch = true;
>  
> -	vgic_queue_irq_unlock(kvm, irq);
> +	vgic_queue_irq_unlock(kvm, irq, flags);
>  	vgic_put_irq(kvm, irq);
>  
>  	return 0;
> @@ -406,15 +412,16 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>  int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
>  {
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> +	unsigned long flags;
>  
>  	BUG_ON(!irq);
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	irq->hw = true;
>  	irq->hwintid = phys_irq;
>  
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return 0;
> @@ -423,6 +430,7 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
>  	struct vgic_irq *irq;
> +	unsigned long flags;
>  
>  	if (!vgic_initialized(vcpu->kvm))
>  		return -EAGAIN;
> @@ -430,12 +438,12 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>  	BUG_ON(!irq);
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);
>  
>  	irq->hw = false;
>  	irq->hwintid = 0;
>  
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return 0;
> @@ -486,9 +494,10 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
>  {
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_irq *irq, *tmp;
> +	unsigned long flags;
>  
>  retry:
> -	spin_lock(&vgic_cpu->ap_list_lock);
> +	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
>  
>  	list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) {
>  		struct kvm_vcpu *target_vcpu, *vcpuA, *vcpuB;
> @@ -528,7 +537,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
>  		/* This interrupt looks like it has to be migrated. */
>  
>  		spin_unlock(&irq->irq_lock);
> -		spin_unlock(&vgic_cpu->ap_list_lock);
> +		spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
>  
>  		/*
>  		 * Ensure locking order by always locking the smallest
> @@ -542,7 +551,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
>  			vcpuB = vcpu;
>  		}
>  
> -		spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock);
> +		spin_lock_irqsave(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
>  		spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock,
>  				 SINGLE_DEPTH_NESTING);
>  		spin_lock(&irq->irq_lock);
> @@ -566,11 +575,11 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
>  
>  		spin_unlock(&irq->irq_lock);
>  		spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock);
> -		spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock);
> +		spin_unlock_irqrestore(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
>  		goto retry;
>  	}
>  
> -	spin_unlock(&vgic_cpu->ap_list_lock);
> +	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
>  }
>  
>  static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
> @@ -703,6 +712,8 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>  	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
>  		return;
>  
> +	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
> +
>  	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  	vgic_flush_lr_state(vcpu);
>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> @@ -735,11 +746,12 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_irq *irq;
>  	bool pending = false;
> +	unsigned long flags;
>  
>  	if (!vcpu->kvm->arch.vgic.enabled)
>  		return false;
>  
> -	spin_lock(&vgic_cpu->ap_list_lock);
> +	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
>  
>  	list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
>  		spin_lock(&irq->irq_lock);
> @@ -750,7 +762,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
>  			break;
>  	}
>  
> -	spin_unlock(&vgic_cpu->ap_list_lock);
> +	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
>  
>  	return pending;
>  }
> @@ -776,13 +788,15 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>  {
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>  	bool map_is_active;
> +	unsigned long flags;
>  
>  	if (!vgic_initialized(vcpu->kvm))
>  		return false;
> +	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
>  
> -	spin_lock(&irq->irq_lock);
> +	spin_lock_irqsave(&irq->irq_lock, flags);

I'm a bit puzzled by this sequence: Either interrupts are disabled and
we don't need the irqsave version, or they aren't and the BUG_ON will
fire. kvm_vgic_map_is_active is called (indirectly) from
kvm_timer_flush_hwstate. And at this stage of the patches, we definitely
call this function with interrupts enabled.

Is it just a patch splitting snafu? Or something more serious? Same goes
for the DEBUG_SPINLOCK_BUG_ON in kvm_vgic_flush_hwstate.

>  	map_is_active = irq->hw && irq->active;
> -	spin_unlock(&irq->irq_lock);
> +	spin_unlock_irqrestore(&irq->irq_lock, flags);
>  	vgic_put_irq(vcpu->kvm, irq);
>  
>  	return map_is_active;
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index bf9ceab..4f8aecb 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -140,7 +140,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
>  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
>  			      u32 intid);
>  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> -bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
> +bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
> +			   unsigned long flags);
>  void vgic_kick_vcpus(struct kvm *kvm);
>  
>  int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
> 

Otherwise looks good to me.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 16:47     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:47 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, kvm

On 23/09/17 01:41, Christoffer Dall wrote:
> Some systems without proper firmware and/or hardware description data
> don't support the split EOI and deactivate operation.
> 
> On such systems, we cannot leave the physical interrupt active after the
> timer handler on the host has run, so we cannot support KVM with an
> in-kernel GIC with the timer changes we are about to introduce.
> 
> This patch makes sure that trying to initialize the KVM GIC code will
> fail on such systems.
> 
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  drivers/irqchip/irq-gic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index f641e8e..ab12bf4 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
>  	if (ret)
>  		return;
>  
> -	gic_set_kvm_info(&gic_v2_kvm_info);
> +	if (static_key_true(&supports_deactivate))
> +		gic_set_kvm_info(&gic_v2_kvm_info);
>  }
>  
>  int __init
> 

Should we add the same level of checking on the ACPI path, just for the
sake symmetry?

Also, do we need to add the same thing for GICv3?

Otherwise looks OK to me.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
@ 2017-10-09 16:47     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 16:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> Some systems without proper firmware and/or hardware description data
> don't support the split EOI and deactivate operation.
> 
> On such systems, we cannot leave the physical interrupt active after the
> timer handler on the host has run, so we cannot support KVM with an
> in-kernel GIC with the timer changes we are about to introduce.
> 
> This patch makes sure that trying to initialize the KVM GIC code will
> fail on such systems.
> 
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  drivers/irqchip/irq-gic.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index f641e8e..ab12bf4 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
>  	if (ret)
>  		return;
>  
> -	gic_set_kvm_info(&gic_v2_kvm_info);
> +	if (static_key_true(&supports_deactivate))
> +		gic_set_kvm_info(&gic_v2_kvm_info);
>  }
>  
>  int __init
> 

Should we add the same level of checking on the ACPI path, just for the
sake symmetry?

Also, do we need to add the same thing for GICv3?

Otherwise looks OK to me.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 17:05     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:05 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas

On 23/09/17 01:41, Christoffer Dall wrote:
> We are about to add an additional soft timer to the arch timer state for
> a VCPU and would like to be able to reuse the functions to program and
> cancel a timer, so we make them slightly more generic and rename to make
> it more clear that these functions work on soft timers and not the
> hardware resource that this code is managing.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 33 ++++++++++++++++-----------------
>  1 file changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 8e89d63..871d8ae 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -56,26 +56,22 @@ u64 kvm_phys_timer_read(void)
>  	return timecounter->cc->read(timecounter->cc);
>  }
>  
> -static bool timer_is_armed(struct arch_timer_cpu *timer)
> +static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
>  {
>  	return timer->armed;
>  }
>  
> -/* timer_arm: as in "arm the timer", not as in ARM the company */
> -static void timer_arm(struct arch_timer_cpu *timer, u64 ns)
> +static void soft_timer_start(struct hrtimer *hrt, u64 ns)
>  {
> -	timer->armed = true;
> -	hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns),
> +	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
>  		      HRTIMER_MODE_ABS);
>  }
>  
> -static void timer_disarm(struct arch_timer_cpu *timer)
> +static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
>  {
> -	if (timer_is_armed(timer)) {
> -		hrtimer_cancel(&timer->timer);
> -		cancel_work_sync(&timer->expired);
> -		timer->armed = false;
> -	}
> +	hrtimer_cancel(hrt);
> +	if (work)

When can this happen? Something in a following patch?

> +		cancel_work_sync(work);
>  }
>  
>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> @@ -271,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
>  		return;
>  
>  	/*  The timer has not yet expired, schedule a background timer */
> -	timer_arm(timer, kvm_timer_compute_delta(timer_ctx));
> +	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
>  }
>  
>  /*
> @@ -285,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> -	BUG_ON(timer_is_armed(timer));
> +	BUG_ON(soft_timer_is_armed(timer));
>  
>  	/*
>  	 * No need to schedule a background timer if any guest timer has
> @@ -306,13 +302,16 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	 * The guest timers have not yet expired, schedule a background timer.
>  	 * Set the earliest expiration time among the guest timers.
>  	 */
> -	timer_arm(timer, kvm_timer_earliest_exp(vcpu));
> +	timer->armed = true;
> +	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
>  }
>  
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	timer_disarm(timer);
> +
> +	soft_timer_cancel(&timer->timer, &timer->expired);
> +	timer->armed = false;
>  }
>  
>  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> @@ -448,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  	 * This is to cancel the background timer for the physical timer
>  	 * emulation if it is set.
>  	 */
> -	timer_disarm(timer);
> +	soft_timer_cancel(&timer->timer, &timer->expired);

timer_disarm() used to set timer->armed to false, but that's not the
case any more. Don't we risk hitting the BUG_ON() in kvm_timer_schedule
if we hit WFI?

>  
>  	/*
>  	 * The guest could have modified the timer registers or the timer
> @@ -615,7 +614,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
> -	timer_disarm(timer);
> +	soft_timer_cancel(&timer->timer, &timer->expired);
>  	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
>  }
>  
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
@ 2017-10-09 17:05     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> We are about to add an additional soft timer to the arch timer state for
> a VCPU and would like to be able to reuse the functions to program and
> cancel a timer, so we make them slightly more generic and rename to make
> it more clear that these functions work on soft timers and not the
> hardware resource that this code is managing.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 33 ++++++++++++++++-----------------
>  1 file changed, 16 insertions(+), 17 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 8e89d63..871d8ae 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -56,26 +56,22 @@ u64 kvm_phys_timer_read(void)
>  	return timecounter->cc->read(timecounter->cc);
>  }
>  
> -static bool timer_is_armed(struct arch_timer_cpu *timer)
> +static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
>  {
>  	return timer->armed;
>  }
>  
> -/* timer_arm: as in "arm the timer", not as in ARM the company */
> -static void timer_arm(struct arch_timer_cpu *timer, u64 ns)
> +static void soft_timer_start(struct hrtimer *hrt, u64 ns)
>  {
> -	timer->armed = true;
> -	hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns),
> +	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
>  		      HRTIMER_MODE_ABS);
>  }
>  
> -static void timer_disarm(struct arch_timer_cpu *timer)
> +static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
>  {
> -	if (timer_is_armed(timer)) {
> -		hrtimer_cancel(&timer->timer);
> -		cancel_work_sync(&timer->expired);
> -		timer->armed = false;
> -	}
> +	hrtimer_cancel(hrt);
> +	if (work)

When can this happen? Something in a following patch?

> +		cancel_work_sync(work);
>  }
>  
>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> @@ -271,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
>  		return;
>  
>  	/*  The timer has not yet expired, schedule a background timer */
> -	timer_arm(timer, kvm_timer_compute_delta(timer_ctx));
> +	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
>  }
>  
>  /*
> @@ -285,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> -	BUG_ON(timer_is_armed(timer));
> +	BUG_ON(soft_timer_is_armed(timer));
>  
>  	/*
>  	 * No need to schedule a background timer if any guest timer has
> @@ -306,13 +302,16 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	 * The guest timers have not yet expired, schedule a background timer.
>  	 * Set the earliest expiration time among the guest timers.
>  	 */
> -	timer_arm(timer, kvm_timer_earliest_exp(vcpu));
> +	timer->armed = true;
> +	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
>  }
>  
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	timer_disarm(timer);
> +
> +	soft_timer_cancel(&timer->timer, &timer->expired);
> +	timer->armed = false;
>  }
>  
>  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> @@ -448,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  	 * This is to cancel the background timer for the physical timer
>  	 * emulation if it is set.
>  	 */
> -	timer_disarm(timer);
> +	soft_timer_cancel(&timer->timer, &timer->expired);

timer_disarm() used to set timer->armed to false, but that's not the
case any more. Don't we risk hitting the BUG_ON() in kvm_timer_schedule
if we hit WFI?

>  
>  	/*
>  	 * The guest could have modified the timer registers or the timer
> @@ -615,7 +614,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
> -	timer_disarm(timer);
> +	soft_timer_cancel(&timer->timer, &timer->expired);
>  	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
>  }
>  
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 08/20] KVM: arm/arm64: Rename soft timer to bg_timer
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 17:06     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:06 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas

On 23/09/17 01:41, Christoffer Dall wrote:
> As we are about to introduce a separate hrtimer for the physical timer,
> call this timer bg_timer, because we refer to this timer as the
> background timer in the code and comments elsewhere.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 08/20] KVM: arm/arm64: Rename soft timer to bg_timer
@ 2017-10-09 17:06     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> As we are about to introduce a separate hrtimer for the physical timer,
> call this timer bg_timer, because we refer to this timer as the
> background timer in the code and comments elsewhere.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 09/20] KVM: arm/arm64: Use separate timer for phys timer emulation
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 17:23     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:23 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas

On 23/09/17 01:41, Christoffer Dall wrote:
> We were using the same hrtimer for emulating the physical timer and for
> making sure a blocking VCPU thread would be eventually woken up.  That
> worked fine in the previous arch timer design, but as we are about to
> actually use the soft timer expire function for the physical timer
> emulation, change the logic to use a dedicated hrtimer.
> 
> This has the added benefit of not having to cancel any work in the sync
> path, which in turn allows us to run the flush and sync with IRQs
> disabled.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  include/kvm/arm_arch_timer.h |  3 +++
>  virt/kvm/arm/arch_timer.c    | 18 ++++++++++++++----
>  2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index dcbb2e1..16887c0 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -47,6 +47,9 @@ struct arch_timer_cpu {
>  	/* Work queued with the above timer expires */
>  	struct work_struct		expired;
>  
> +	/* Physical timer emulation */
> +	struct hrtimer			phys_timer;
> +
>  	/* Background timer active */
>  	bool				armed;
>  
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index c2e8326..7f87099 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -178,6 +178,12 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
>  	return HRTIMER_NORESTART;
>  }
>  
> +static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
> +{
> +	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> +	return HRTIMER_NORESTART;
> +}
> +

So what prevents this handler from actually firing? Is it that we cancel
the hrtimer while interrupts are still disabled, hence the timer never
fires? If that's the intention, then this patch is slightly out of
place, as we haven't moved the timer sync within the irq_disable() section.

Or am I missing something obvious?

>  bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  {
>  	u64 cval, now;
> @@ -255,7 +261,7 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  }
>  
>  /* Schedule the background timer for the emulated timer. */
> -static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> +static void phys_timer_emulate(struct kvm_vcpu *vcpu,
>  			      struct arch_timer_context *timer_ctx)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> @@ -267,7 +273,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
>  		return;
>  
>  	/*  The timer has not yet expired, schedule a background timer */
> -	soft_timer_start(&timer->bg_timer, kvm_timer_compute_delta(timer_ctx));
> +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
>  }
>  
>  /*
> @@ -424,7 +430,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  	kvm_timer_update_state(vcpu);
>  
>  	/* Set the background timer for the physical timer emulation. */
> -	kvm_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> +	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
>  
>  	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
>  		kvm_timer_flush_hwstate_user(vcpu);
> @@ -447,7 +453,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  	 * This is to cancel the background timer for the physical timer
>  	 * emulation if it is set.
>  	 */
> -	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> +	soft_timer_cancel(&timer->phys_timer, NULL);

Right, that now explains the "work" test in one of the previous patches.

>  
>  	/*
>  	 * The guest could have modified the timer registers or the timer
> @@ -507,6 +513,9 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
>  	timer->bg_timer.function = kvm_bg_timer_expire;
>  
> +	hrtimer_init(&timer->phys_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> +	timer->phys_timer.function = kvm_phys_timer_expire;
> +
>  	vtimer->irq.irq = default_vtimer_irq.irq;
>  	ptimer->irq.irq = default_ptimer_irq.irq;
>  }
> @@ -615,6 +624,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> +	soft_timer_cancel(&timer->phys_timer, NULL);
>  	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
>  }
>  
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 09/20] KVM: arm/arm64: Use separate timer for phys timer emulation
@ 2017-10-09 17:23     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> We were using the same hrtimer for emulating the physical timer and for
> making sure a blocking VCPU thread would be eventually woken up.  That
> worked fine in the previous arch timer design, but as we are about to
> actually use the soft timer expire function for the physical timer
> emulation, change the logic to use a dedicated hrtimer.
> 
> This has the added benefit of not having to cancel any work in the sync
> path, which in turn allows us to run the flush and sync with IRQs
> disabled.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  include/kvm/arm_arch_timer.h |  3 +++
>  virt/kvm/arm/arch_timer.c    | 18 ++++++++++++++----
>  2 files changed, 17 insertions(+), 4 deletions(-)
> 
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index dcbb2e1..16887c0 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -47,6 +47,9 @@ struct arch_timer_cpu {
>  	/* Work queued with the above timer expires */
>  	struct work_struct		expired;
>  
> +	/* Physical timer emulation */
> +	struct hrtimer			phys_timer;
> +
>  	/* Background timer active */
>  	bool				armed;
>  
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index c2e8326..7f87099 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -178,6 +178,12 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
>  	return HRTIMER_NORESTART;
>  }
>  
> +static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
> +{
> +	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> +	return HRTIMER_NORESTART;
> +}
> +

So what prevents this handler from actually firing? Is it that we cancel
the hrtimer while interrupts are still disabled, hence the timer never
fires? If that's the intention, then this patch is slightly out of
place, as we haven't moved the timer sync within the irq_disable() section.

Or am I missing something obvious?

>  bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  {
>  	u64 cval, now;
> @@ -255,7 +261,7 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  }
>  
>  /* Schedule the background timer for the emulated timer. */
> -static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> +static void phys_timer_emulate(struct kvm_vcpu *vcpu,
>  			      struct arch_timer_context *timer_ctx)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> @@ -267,7 +273,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
>  		return;
>  
>  	/*  The timer has not yet expired, schedule a background timer */
> -	soft_timer_start(&timer->bg_timer, kvm_timer_compute_delta(timer_ctx));
> +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
>  }
>  
>  /*
> @@ -424,7 +430,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  	kvm_timer_update_state(vcpu);
>  
>  	/* Set the background timer for the physical timer emulation. */
> -	kvm_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> +	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
>  
>  	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
>  		kvm_timer_flush_hwstate_user(vcpu);
> @@ -447,7 +453,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  	 * This is to cancel the background timer for the physical timer
>  	 * emulation if it is set.
>  	 */
> -	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> +	soft_timer_cancel(&timer->phys_timer, NULL);

Right, that now explains the "work" test in one of the previous patches.

>  
>  	/*
>  	 * The guest could have modified the timer registers or the timer
> @@ -507,6 +513,9 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
>  	timer->bg_timer.function = kvm_bg_timer_expire;
>  
> +	hrtimer_init(&timer->phys_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> +	timer->phys_timer.function = kvm_phys_timer_expire;
> +
>  	vtimer->irq.irq = default_vtimer_irq.irq;
>  	ptimer->irq.irq = default_ptimer_irq.irq;
>  }
> @@ -615,6 +624,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> +	soft_timer_cancel(&timer->phys_timer, NULL);
>  	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
>  }
>  
> 

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 10/20] KVM: arm/arm64: Move timer/vgic flush/sync under disabled irq
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 17:34     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:34 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas

On 23/09/17 01:41, Christoffer Dall wrote:
> As we are about to play tricks with the timer to be more lazy in saving
> and restoring state, we need to move the timer sync and flush functions
> under a disabled irq section and since we have to flush the vgic state
> after the timer and PMU state, we do the whole flush/sync sequence with
> disabled irqs.
> 
> The only downside is a slightly longer delay before being able to
> process hardware interrupts and run softirqs.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arm.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index b9f68e4..27db222 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -654,11 +654,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		kvm_pmu_flush_hwstate(vcpu);
>  
> +		local_irq_disable();
> +
>  		kvm_timer_flush_hwstate(vcpu);
>  		kvm_vgic_flush_hwstate(vcpu);
>  
> -		local_irq_disable();
> -
>  		/*
>  		 * If we have a singal pending, or need to notify a userspace
>  		 * irqchip about timer or PMU level changes, then we exit (and
> @@ -683,10 +683,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
>  		    kvm_request_pending(vcpu)) {
>  			vcpu->mode = OUTSIDE_GUEST_MODE;
> -			local_irq_enable();
>  			kvm_pmu_sync_hwstate(vcpu);
>  			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
> +			local_irq_enable();
>  			preempt_enable();
>  			continue;
>  		}
> @@ -710,6 +710,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_arm_clear_debug(vcpu);
>  
>  		/*
> +		 * We must sync the PMU and timer state before the vgic state so
> +		 * that the vgic can properly sample the updated state of the
> +		 * interrupt line.
> +		 */
> +		kvm_pmu_sync_hwstate(vcpu);
> +		kvm_timer_sync_hwstate(vcpu);
> +
> +		kvm_vgic_sync_hwstate(vcpu);
> +
> +		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
>  		 * pending, as we haven't serviced it yet!
> @@ -732,16 +742,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		guest_exit();
>  		trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>  
> -		/*
> -		 * We must sync the PMU and timer state before the vgic state so
> -		 * that the vgic can properly sample the updated state of the
> -		 * interrupt line.
> -		 */
> -		kvm_pmu_sync_hwstate(vcpu);
> -		kvm_timer_sync_hwstate(vcpu);
> -
> -		kvm_vgic_sync_hwstate(vcpu);
> -
>  		preempt_enable();
>  
>  		ret = handle_exit(vcpu, run, ret);
> 

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 10/20] KVM: arm/arm64: Move timer/vgic flush/sync under disabled irq
@ 2017-10-09 17:34     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> As we are about to play tricks with the timer to be more lazy in saving
> and restoring state, we need to move the timer sync and flush functions
> under a disabled irq section and since we have to flush the vgic state
> after the timer and PMU state, we do the whole flush/sync sequence with
> disabled irqs.
> 
> The only downside is a slightly longer delay before being able to
> process hardware interrupts and run softirqs.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arm.c | 26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index b9f68e4..27db222 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -654,11 +654,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		kvm_pmu_flush_hwstate(vcpu);
>  
> +		local_irq_disable();
> +
>  		kvm_timer_flush_hwstate(vcpu);
>  		kvm_vgic_flush_hwstate(vcpu);
>  
> -		local_irq_disable();
> -
>  		/*
>  		 * If we have a singal pending, or need to notify a userspace
>  		 * irqchip about timer or PMU level changes, then we exit (and
> @@ -683,10 +683,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
>  		    kvm_request_pending(vcpu)) {
>  			vcpu->mode = OUTSIDE_GUEST_MODE;
> -			local_irq_enable();
>  			kvm_pmu_sync_hwstate(vcpu);
>  			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
> +			local_irq_enable();
>  			preempt_enable();
>  			continue;
>  		}
> @@ -710,6 +710,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_arm_clear_debug(vcpu);
>  
>  		/*
> +		 * We must sync the PMU and timer state before the vgic state so
> +		 * that the vgic can properly sample the updated state of the
> +		 * interrupt line.
> +		 */
> +		kvm_pmu_sync_hwstate(vcpu);
> +		kvm_timer_sync_hwstate(vcpu);
> +
> +		kvm_vgic_sync_hwstate(vcpu);
> +
> +		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
>  		 * pending, as we haven't serviced it yet!
> @@ -732,16 +742,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		guest_exit();
>  		trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>  
> -		/*
> -		 * We must sync the PMU and timer state before the vgic state so
> -		 * that the vgic can properly sample the updated state of the
> -		 * interrupt line.
> -		 */
> -		kvm_pmu_sync_hwstate(vcpu);
> -		kvm_timer_sync_hwstate(vcpu);
> -
> -		kvm_vgic_sync_hwstate(vcpu);
> -
>  		preempt_enable();
>  
>  		ret = handle_exit(vcpu, run, ret);
> 

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 17:47     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:47 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, kvm

On 23/09/17 01:41, Christoffer Dall wrote:
> As we are about to be lazy with saving and restoring the timer
> registers, we prepare by moving all possible timer configuration logic
> out of the hyp code.  All virtual timer registers can be programmed from
> EL1 and since the arch timer is always a level triggered interrupt we
> can safely do this with interrupts disabled in the host kernel on the
> way to the guest without taking vtimer interrupts in the host kernel
> (yet).
> 
> The downside is that the cntvoff register can only be programmed from
> hyp mode, so we jump into hyp mode and back to program it.  This is also
> safe, because the host kernel doesn't use the virtual timer in the KVM
> code.  It may add a little performance performance penalty, but only
> until following commits where we move this operation to vcpu load/put.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_asm.h   |  2 ++
>  arch/arm/include/asm/kvm_hyp.h   |  4 +--
>  arch/arm/kvm/hyp/switch.c        |  7 ++--
>  arch/arm64/include/asm/kvm_asm.h |  2 ++
>  arch/arm64/include/asm/kvm_hyp.h |  4 +--
>  arch/arm64/kvm/hyp/switch.c      |  6 ++--
>  virt/kvm/arm/arch_timer.c        | 40 ++++++++++++++++++++++
>  virt/kvm/arm/hyp/timer-sr.c      | 74 +++++++++++++++++-----------------------
>  8 files changed, 87 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 14d68a4..36dd296 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -68,6 +68,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
> +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> +
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
>  extern void __init_stage2_translation(void);
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index 14b5903..ab20ffa 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -98,8 +98,8 @@
>  #define cntvoff_el2			CNTVOFF
>  #define cnthctl_el2			CNTHCTL
>  
> -void __timer_save_state(struct kvm_vcpu *vcpu);
> -void __timer_restore_state(struct kvm_vcpu *vcpu);
> +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> +void __timer_disable_traps(struct kvm_vcpu *vcpu);
>  
>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index ebd2dd4..330c9ce 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -174,7 +174,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  	__activate_vm(vcpu);
>  
>  	__vgic_restore_state(vcpu);
> -	__timer_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
>  
>  	__sysreg_restore_state(guest_ctxt);
>  	__banked_restore_state(guest_ctxt);
> @@ -191,7 +191,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	__banked_save_state(guest_ctxt);
>  	__sysreg_save_state(guest_ctxt);
> -	__timer_save_state(vcpu);
> +	__timer_disable_traps(vcpu);
> +
>  	__vgic_save_state(vcpu);
>  
>  	__deactivate_traps(vcpu);
> @@ -237,7 +238,7 @@ void __hyp_text __noreturn __hyp_panic(int cause)
>  
>  		vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
>  		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -		__timer_save_state(vcpu);
> +		__timer_disable_traps(vcpu);
>  		__deactivate_traps(vcpu);
>  		__deactivate_vm(vcpu);
>  		__banked_restore_state(host_ctxt);
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 26a64d0..ab4d0a9 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -55,6 +55,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
> +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> +
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
>  extern u64 __vgic_v3_get_ich_vtr_el2(void);
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 4572a9b..08d3bb6 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -129,8 +129,8 @@ void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>  int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
>  
> -void __timer_save_state(struct kvm_vcpu *vcpu);
> -void __timer_restore_state(struct kvm_vcpu *vcpu);
> +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> +void __timer_disable_traps(struct kvm_vcpu *vcpu);
>  
>  void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 945e79c..4994f4b 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -298,7 +298,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  	__activate_vm(vcpu);
>  
>  	__vgic_restore_state(vcpu);
> -	__timer_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
>  
>  	/*
>  	 * We must restore the 32-bit state before the sysregs, thanks
> @@ -368,7 +368,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	__sysreg_save_guest_state(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
> -	__timer_save_state(vcpu);
> +	__timer_disable_traps(vcpu);
>  	__vgic_save_state(vcpu);
>  
>  	__deactivate_traps(vcpu);
> @@ -436,7 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void)
>  
>  		vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
>  		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -		__timer_save_state(vcpu);
> +		__timer_disable_traps(vcpu);
>  		__deactivate_traps(vcpu);
>  		__deactivate_vm(vcpu);
>  		__sysreg_restore_host_state(host_ctxt);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 7f87099..4254f88 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -276,6 +276,20 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
>  	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
>  }
>  
> +static void timer_save_state(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (timer->enabled) {
> +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> +		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> +	}
> +
> +	/* Disable the virtual timer */
> +	write_sysreg_el0(0, cntv_ctl);
> +}
> +
>  /*
>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>   * thread is removed from its waitqueue and made runnable when there's a timer
> @@ -312,6 +326,18 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
>  }
>  
> +static void timer_restore_state(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (timer->enabled) {
> +		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> +		isb();
> +		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> +	}
> +}
> +
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> @@ -320,6 +346,13 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  	timer->armed = false;
>  }
>  
> +static void set_cntvoff(u64 cntvoff)
> +{
> +	u32 low = cntvoff & GENMASK(31, 0);
> +	u32 high = (cntvoff >> 32) & GENMASK(31, 0);

upper_32_bits/lower_32_bits?

> +	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);

Maybe a comment as to why we need to split the 64bit value in two 32bit
words (32bit ARM PCS is getting in the way).

> +}
> +
>  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> @@ -423,6 +456,7 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
>  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	if (unlikely(!timer->enabled))
>  		return;
> @@ -436,6 +470,9 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  		kvm_timer_flush_hwstate_user(vcpu);
>  	else
>  		kvm_timer_flush_hwstate_vgic(vcpu);
> +
> +	set_cntvoff(vtimer->cntvoff);
> +	timer_restore_state(vcpu);
>  }
>  
>  /**
> @@ -455,6 +492,9 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  	 */
>  	soft_timer_cancel(&timer->phys_timer, NULL);
>  
> +	timer_save_state(vcpu);
> +	set_cntvoff(0);
> +
>  	/*
>  	 * The guest could have modified the timer registers or the timer
>  	 * could have expired, update the timer state.
> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> index 4734915..a6c3b10 100644
> --- a/virt/kvm/arm/hyp/timer-sr.c
> +++ b/virt/kvm/arm/hyp/timer-sr.c
> @@ -21,58 +21,48 @@
>  
>  #include <asm/kvm_hyp.h>
>  
> -/* vcpu is already in the HYP VA space */
> -void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
> +void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> +{
> +	u64 cntvoff = (u64)cntvoff_high << 32 | cntvoff_low;
> +	write_sysreg(cntvoff, cntvoff_el2);
> +}
> +
> +void __hyp_text enable_phys_timer(void)
>  {
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	u64 val;
>  
> -	if (timer->enabled) {
> -		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> -		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> -	}
> +	/* Allow physical timer/counter access for the host */
> +	val = read_sysreg(cnthctl_el2);
> +	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> +	write_sysreg(val, cnthctl_el2);
> +}
>  
> -	/* Disable the virtual timer */
> -	write_sysreg_el0(0, cntv_ctl);
> +void __hyp_text disable_phys_timer(void)
> +{
> +	u64 val;
>  
>  	/*
> +	 * Disallow physical timer access for the guest
> +	 * Physical counter access is allowed
> +	 */
> +	val = read_sysreg(cnthctl_el2);
> +	val &= ~CNTHCTL_EL1PCEN;
> +	val |= CNTHCTL_EL1PCTEN;
> +	write_sysreg(val, cnthctl_el2);
> +}
> +
> +void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> +{
> +	/*
>  	 * We don't need to do this for VHE since the host kernel runs in EL2
>  	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
>  	 */
> -	if (!has_vhe()) {
> -		/* Allow physical timer/counter access for the host */
> -		val = read_sysreg(cnthctl_el2);
> -		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> -		write_sysreg(val, cnthctl_el2);
> -	}
> -
> -	/* Clear cntvoff for the host */
> -	write_sysreg(0, cntvoff_el2);
> +	if (!has_vhe())
> +		enable_phys_timer();
>  }
>  
> -void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
> +void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
>  {
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> -	u64 val;
> -
> -	/* Those bits are already configured at boot on VHE-system */
> -	if (!has_vhe()) {
> -		/*
> -		 * Disallow physical timer access for the guest
> -		 * Physical counter access is allowed
> -		 */
> -		val = read_sysreg(cnthctl_el2);
> -		val &= ~CNTHCTL_EL1PCEN;
> -		val |= CNTHCTL_EL1PCTEN;
> -		write_sysreg(val, cnthctl_el2);
> -	}
> -
> -	if (timer->enabled) {
> -		write_sysreg(vtimer->cntvoff, cntvoff_el2);
> -		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> -		isb();
> -		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> -	}
> +	if (!has_vhe())
> +		disable_phys_timer();
>  }
> 

Otherwise:

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code
@ 2017-10-09 17:47     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> As we are about to be lazy with saving and restoring the timer
> registers, we prepare by moving all possible timer configuration logic
> out of the hyp code.  All virtual timer registers can be programmed from
> EL1 and since the arch timer is always a level triggered interrupt we
> can safely do this with interrupts disabled in the host kernel on the
> way to the guest without taking vtimer interrupts in the host kernel
> (yet).
> 
> The downside is that the cntvoff register can only be programmed from
> hyp mode, so we jump into hyp mode and back to program it.  This is also
> safe, because the host kernel doesn't use the virtual timer in the KVM
> code.  It may add a little performance performance penalty, but only
> until following commits where we move this operation to vcpu load/put.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_asm.h   |  2 ++
>  arch/arm/include/asm/kvm_hyp.h   |  4 +--
>  arch/arm/kvm/hyp/switch.c        |  7 ++--
>  arch/arm64/include/asm/kvm_asm.h |  2 ++
>  arch/arm64/include/asm/kvm_hyp.h |  4 +--
>  arch/arm64/kvm/hyp/switch.c      |  6 ++--
>  virt/kvm/arm/arch_timer.c        | 40 ++++++++++++++++++++++
>  virt/kvm/arm/hyp/timer-sr.c      | 74 +++++++++++++++++-----------------------
>  8 files changed, 87 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 14d68a4..36dd296 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -68,6 +68,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
> +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> +
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
>  extern void __init_stage2_translation(void);
> diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> index 14b5903..ab20ffa 100644
> --- a/arch/arm/include/asm/kvm_hyp.h
> +++ b/arch/arm/include/asm/kvm_hyp.h
> @@ -98,8 +98,8 @@
>  #define cntvoff_el2			CNTVOFF
>  #define cnthctl_el2			CNTHCTL
>  
> -void __timer_save_state(struct kvm_vcpu *vcpu);
> -void __timer_restore_state(struct kvm_vcpu *vcpu);
> +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> +void __timer_disable_traps(struct kvm_vcpu *vcpu);
>  
>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> index ebd2dd4..330c9ce 100644
> --- a/arch/arm/kvm/hyp/switch.c
> +++ b/arch/arm/kvm/hyp/switch.c
> @@ -174,7 +174,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  	__activate_vm(vcpu);
>  
>  	__vgic_restore_state(vcpu);
> -	__timer_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
>  
>  	__sysreg_restore_state(guest_ctxt);
>  	__banked_restore_state(guest_ctxt);
> @@ -191,7 +191,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	__banked_save_state(guest_ctxt);
>  	__sysreg_save_state(guest_ctxt);
> -	__timer_save_state(vcpu);
> +	__timer_disable_traps(vcpu);
> +
>  	__vgic_save_state(vcpu);
>  
>  	__deactivate_traps(vcpu);
> @@ -237,7 +238,7 @@ void __hyp_text __noreturn __hyp_panic(int cause)
>  
>  		vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
>  		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -		__timer_save_state(vcpu);
> +		__timer_disable_traps(vcpu);
>  		__deactivate_traps(vcpu);
>  		__deactivate_vm(vcpu);
>  		__banked_restore_state(host_ctxt);
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 26a64d0..ab4d0a9 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -55,6 +55,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
>  
> +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> +
>  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
>  
>  extern u64 __vgic_v3_get_ich_vtr_el2(void);
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 4572a9b..08d3bb6 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -129,8 +129,8 @@ void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>  int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
>  
> -void __timer_save_state(struct kvm_vcpu *vcpu);
> -void __timer_restore_state(struct kvm_vcpu *vcpu);
> +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> +void __timer_disable_traps(struct kvm_vcpu *vcpu);
>  
>  void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 945e79c..4994f4b 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -298,7 +298,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  	__activate_vm(vcpu);
>  
>  	__vgic_restore_state(vcpu);
> -	__timer_restore_state(vcpu);
> +	__timer_enable_traps(vcpu);
>  
>  	/*
>  	 * We must restore the 32-bit state before the sysregs, thanks
> @@ -368,7 +368,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	__sysreg_save_guest_state(guest_ctxt);
>  	__sysreg32_save_state(vcpu);
> -	__timer_save_state(vcpu);
> +	__timer_disable_traps(vcpu);
>  	__vgic_save_state(vcpu);
>  
>  	__deactivate_traps(vcpu);
> @@ -436,7 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void)
>  
>  		vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
>  		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> -		__timer_save_state(vcpu);
> +		__timer_disable_traps(vcpu);
>  		__deactivate_traps(vcpu);
>  		__deactivate_vm(vcpu);
>  		__sysreg_restore_host_state(host_ctxt);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 7f87099..4254f88 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -276,6 +276,20 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
>  	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
>  }
>  
> +static void timer_save_state(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (timer->enabled) {
> +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> +		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> +	}
> +
> +	/* Disable the virtual timer */
> +	write_sysreg_el0(0, cntv_ctl);
> +}
> +
>  /*
>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>   * thread is removed from its waitqueue and made runnable when there's a timer
> @@ -312,6 +326,18 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
>  }
>  
> +static void timer_restore_state(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (timer->enabled) {
> +		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> +		isb();
> +		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> +	}
> +}
> +
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> @@ -320,6 +346,13 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  	timer->armed = false;
>  }
>  
> +static void set_cntvoff(u64 cntvoff)
> +{
> +	u32 low = cntvoff & GENMASK(31, 0);
> +	u32 high = (cntvoff >> 32) & GENMASK(31, 0);

upper_32_bits/lower_32_bits?

> +	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);

Maybe a comment as to why we need to split the 64bit value in two 32bit
words (32bit ARM PCS is getting in the way).

> +}
> +
>  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> @@ -423,6 +456,7 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
>  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	if (unlikely(!timer->enabled))
>  		return;
> @@ -436,6 +470,9 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  		kvm_timer_flush_hwstate_user(vcpu);
>  	else
>  		kvm_timer_flush_hwstate_vgic(vcpu);
> +
> +	set_cntvoff(vtimer->cntvoff);
> +	timer_restore_state(vcpu);
>  }
>  
>  /**
> @@ -455,6 +492,9 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  	 */
>  	soft_timer_cancel(&timer->phys_timer, NULL);
>  
> +	timer_save_state(vcpu);
> +	set_cntvoff(0);
> +
>  	/*
>  	 * The guest could have modified the timer registers or the timer
>  	 * could have expired, update the timer state.
> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> index 4734915..a6c3b10 100644
> --- a/virt/kvm/arm/hyp/timer-sr.c
> +++ b/virt/kvm/arm/hyp/timer-sr.c
> @@ -21,58 +21,48 @@
>  
>  #include <asm/kvm_hyp.h>
>  
> -/* vcpu is already in the HYP VA space */
> -void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
> +void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> +{
> +	u64 cntvoff = (u64)cntvoff_high << 32 | cntvoff_low;
> +	write_sysreg(cntvoff, cntvoff_el2);
> +}
> +
> +void __hyp_text enable_phys_timer(void)
>  {
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	u64 val;
>  
> -	if (timer->enabled) {
> -		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> -		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> -	}
> +	/* Allow physical timer/counter access for the host */
> +	val = read_sysreg(cnthctl_el2);
> +	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> +	write_sysreg(val, cnthctl_el2);
> +}
>  
> -	/* Disable the virtual timer */
> -	write_sysreg_el0(0, cntv_ctl);
> +void __hyp_text disable_phys_timer(void)
> +{
> +	u64 val;
>  
>  	/*
> +	 * Disallow physical timer access for the guest
> +	 * Physical counter access is allowed
> +	 */
> +	val = read_sysreg(cnthctl_el2);
> +	val &= ~CNTHCTL_EL1PCEN;
> +	val |= CNTHCTL_EL1PCTEN;
> +	write_sysreg(val, cnthctl_el2);
> +}
> +
> +void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> +{
> +	/*
>  	 * We don't need to do this for VHE since the host kernel runs in EL2
>  	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
>  	 */
> -	if (!has_vhe()) {
> -		/* Allow physical timer/counter access for the host */
> -		val = read_sysreg(cnthctl_el2);
> -		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> -		write_sysreg(val, cnthctl_el2);
> -	}
> -
> -	/* Clear cntvoff for the host */
> -	write_sysreg(0, cntvoff_el2);
> +	if (!has_vhe())
> +		enable_phys_timer();
>  }
>  
> -void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
> +void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
>  {
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> -	u64 val;
> -
> -	/* Those bits are already configured at boot on VHE-system */
> -	if (!has_vhe()) {
> -		/*
> -		 * Disallow physical timer access for the guest
> -		 * Physical counter access is allowed
> -		 */
> -		val = read_sysreg(cnthctl_el2);
> -		val &= ~CNTHCTL_EL1PCEN;
> -		val |= CNTHCTL_EL1PCTEN;
> -		write_sysreg(val, cnthctl_el2);
> -	}
> -
> -	if (timer->enabled) {
> -		write_sysreg(vtimer->cntvoff, cntvoff_el2);
> -		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> -		isb();
> -		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> -	}
> +	if (!has_vhe())
> +		disable_phys_timer();
>  }
> 

Otherwise:

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 12/20] genirq: Document vcpu_info usage for percpu_devid interrupts
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-09 17:48     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:48 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas, Thomas Gleixner

On 23/09/17 01:41, Christoffer Dall wrote:
> It is currently unclear how to set the VCPU affinity for a percpu_devid
> interrupt , since the Linux irq_data structure describes the state for
> multiple interrupts, one for each physical CPU on the system.  Since
> each such interrupt can be associated with different VCPUs or none at
> all, associating a single VCPU state with such an interrupt does not
> capture the necessary semantics.
> 
> The implementers of irq_set_affinity are the Intel and AMD IOMMUs, and
> the ARM GIC irqchip.  The Intel and AMD callers do not appear to use
> percpu_devid interrupts, and the ARM GIC implementation only checks the
> pointer against NULL vs. non-NULL.
> 
> Therefore, simply update the function documentation to explain the
> expected use in the context of percpu_devid interrupts, allowing future
> changes or additions to irqchip implementers to do the right thing.
> 
> This allows us to set the VCPU affinity for the virtual timer interrupt
> in KVM/ARM, which is a percpu_devid (PPI) interrupt.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  kernel/irq/manage.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 573dc52..2b2c94f 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -381,7 +381,8 @@ int irq_select_affinity_usr(unsigned int irq)
>  /**
>   *	irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
>   *	@irq: interrupt number to set affinity
> - *	@vcpu_info: vCPU specific data
> + *	@vcpu_info: vCPU specific data or pointer to a percpu array of vCPU
> + *	            specific data for percpu_devid interrupts
>   *
>   *	This function uses the vCPU specific data to set the vCPU
>   *	affinity for an irq. The vCPU specific data is passed from
> 

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 12/20] genirq: Document vcpu_info usage for percpu_devid interrupts
@ 2017-10-09 17:48     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:41, Christoffer Dall wrote:
> It is currently unclear how to set the VCPU affinity for a percpu_devid
> interrupt , since the Linux irq_data structure describes the state for
> multiple interrupts, one for each physical CPU on the system.  Since
> each such interrupt can be associated with different VCPUs or none at
> all, associating a single VCPU state with such an interrupt does not
> capture the necessary semantics.
> 
> The implementers of irq_set_affinity are the Intel and AMD IOMMUs, and
> the ARM GIC irqchip.  The Intel and AMD callers do not appear to use
> percpu_devid interrupts, and the ARM GIC implementation only checks the
> pointer against NULL vs. non-NULL.
> 
> Therefore, simply update the function documentation to explain the
> expected use in the context of percpu_devid interrupts, allowing future
> changes or additions to irqchip implementers to do the right thing.
> 
> This allows us to set the VCPU affinity for the virtual timer interrupt
> in KVM/ARM, which is a percpu_devid (PPI) interrupt.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  kernel/irq/manage.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 573dc52..2b2c94f 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -381,7 +381,8 @@ int irq_select_affinity_usr(unsigned int irq)
>  /**
>   *	irq_set_vcpu_affinity - Set vcpu affinity for the interrupt
>   *	@irq: interrupt number to set affinity
> - *	@vcpu_info: vCPU specific data
> + *	@vcpu_info: vCPU specific data or pointer to a percpu array of vCPU
> + *	            specific data for percpu_devid interrupts
>   *
>   *	This function uses the vCPU specific data to set the vCPU
>   *	affinity for an irq. The vCPU specific data is passed from
> 

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 13/20] KVM: arm/arm64: Set VCPU affinity for virt timer irq
  2017-09-23  0:42   ` Christoffer Dall
@ 2017-10-09 17:52     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:52 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel
  Cc: kvm, Will Deacon, Catalin Marinas

On 23/09/17 01:42, Christoffer Dall wrote:
> As we are about to take physical interrupts for the virtual timer on the
> host but want to leave those active while running the VM (and let the VM
> deactivate them), we need to set the vtimer PPI affinity accordingly.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 4254f88..4275f8f 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -650,11 +650,20 @@ int kvm_timer_hyp_init(void)
>  		return err;
>  	}
>  
> +	err = irq_set_vcpu_affinity(host_vtimer_irq, kvm_get_running_vcpus());
> +	if (err) {
> +		kvm_err("kvm_arch_timer: error setting vcpu affinity\n");
> +		goto out_free_irq;
> +	}
> +
>  	kvm_info("virtual timer IRQ%d\n", host_vtimer_irq);
>  
>  	cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
>  			  "kvm/arm/timer:starting", kvm_timer_starting_cpu,
>  			  kvm_timer_dying_cpu);
> +	return 0;
> +out_free_irq:
> +	free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
>  	return err;
>  }
>  
> 

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 13/20] KVM: arm/arm64: Set VCPU affinity for virt timer irq
@ 2017-10-09 17:52     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-09 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 23/09/17 01:42, Christoffer Dall wrote:
> As we are about to take physical interrupts for the virtual timer on the
> host but want to leave those active while running the VM (and let the VM
> deactivate them), we need to set the vtimer PPI affinity accordingly.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 4254f88..4275f8f 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -650,11 +650,20 @@ int kvm_timer_hyp_init(void)
>  		return err;
>  	}
>  
> +	err = irq_set_vcpu_affinity(host_vtimer_irq, kvm_get_running_vcpus());
> +	if (err) {
> +		kvm_err("kvm_arch_timer: error setting vcpu affinity\n");
> +		goto out_free_irq;
> +	}
> +
>  	kvm_info("virtual timer IRQ%d\n", host_vtimer_irq);
>  
>  	cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING,
>  			  "kvm/arm/timer:starting", kvm_timer_starting_cpu,
>  			  kvm_timer_dying_cpu);
> +	return 0;
> +out_free_irq:
> +	free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus());
>  	return err;
>  }
>  
> 

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
  2017-09-23  0:42   ` Christoffer Dall
@ 2017-10-10  8:47     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  8:47 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel, kvm

On Sat, Sep 23 2017 at  2:42:01 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> We don't need to save and restore the hardware timer state and examine
> if it generates interrupts on on every entry/exit to the guest.  The
> timer hardware is perfectly capable of telling us when it has expired
> by signaling interrupts.
>
> When taking a vtimer interrupt in the host, we don't want to mess with
> the timer configuration, we just want to forward the physical interrupt
> to the guest as a virtual interrupt.  We can use the split priority drop
> and deactivate feature of the GIC to do this, which leaves an EOI'ed
> interrupt active on the physical distributor, making sure we don't keep
> taking timer interrupts which would prevent the guest from running.  We
> can then forward the physical interrupt to the VM using the HW bit in
> the LR of the GIC VE, like we do already, which lets the guest directly

VE?

> deactivate both the physical and virtual timer simultaneously, allowing
> the timer hardware to exit the VM and generate a new physical interrupt
> when the timer output is again asserted later on.
>
> We do need to capture this state when migrating VCPUs between physical
> CPUs, however, which we use the vcpu put/load functions for, which are
> called through preempt notifiers whenever the thread is scheduled away
> from the CPU or called directly if we return from the ioctl to
> userspace.
>
> One caveat is that we cannot restore the timer state during
> kvm_timer_vcpu_load, because the flow of sleeping a VCPU is:
>
>   1. kvm_vcpu_block
>   2. kvm_timer_schedule
>   3. schedule
>   4. kvm_timer_vcpu_put (preempt notifier)
>   5. schedule (vcpu thread gets scheduled back)
>   6. kvm_timer_vcpu_load
>         <---- We restore the hardware state here, but the bg_timer
> 	      hrtimer may have scheduled a work function that also
> 	      changes the timer state here.
>   7. kvm_timer_unschedule
>         <---- We can restore the state here instead
>
> So, while we do need to restore the timer state in step (6) in all other
> cases than when we called kvm_vcpu_block(), we have to defer the restore
> to step (7) when coming back after kvm_vcpu_block().  Note that we
> cannot simply call cancel_work_sync() in step (6), because vcpu_load can
> be called from a preempt notifier.
>
> An added benefit beyond not having to read and write the timer sysregs
> on every entry and exit is that we no longer have to actively write the
> active state to the physical distributor, because we set the affinity of

I don't understand this thing about the affinity of the timer. It is a
PPI, so it cannot go anywhere else.

> the vtimer interrupt when loading the timer state, so that the interrupt
> automatically stays active after firing.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  include/kvm/arm_arch_timer.h |   9 +-
>  virt/kvm/arm/arch_timer.c    | 238 +++++++++++++++++++++++++++----------------
>  virt/kvm/arm/arm.c           |  19 +++-
>  virt/kvm/arm/hyp/timer-sr.c  |   8 +-
>  4 files changed, 174 insertions(+), 100 deletions(-)
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 16887c0..8e5ed54 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -31,8 +31,8 @@ struct arch_timer_context {
>  	/* Timer IRQ */
>  	struct kvm_irq_level		irq;
>  
> -	/* Active IRQ state caching */
> -	bool				active_cleared_last;
> +	/* Is the timer state loaded on the hardware timer */
> +	bool			loaded;

I think this little guy is pretty crucial to understand the flow, as
there is now two points where we save/restore the timer:
vcpu_load/vcpu_put and timer_schedule/timer_unschedule. Both can be
executed on the blocking path, and this is the predicate to find out if
there is actually something to do.

Would you mind adding a small comment to that effect?

>  
>  	/* Virtual offset */
>  	u64			cntvoff;
> @@ -80,10 +80,15 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
>  u64 kvm_phys_timer_read(void);
>  
> +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
>  
>  void kvm_timer_init_vhe(void);
>  
>  #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.vtimer)
>  #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.ptimer)
> +
> +void enable_el1_phys_timer_access(void);
> +void disable_el1_phys_timer_access(void);
> +
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 4275f8f..70110ea 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -46,10 +46,9 @@ static const struct kvm_irq_level default_vtimer_irq = {
>  	.level	= 1,
>  };
>  
> -void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> -{
> -	vcpu_vtimer(vcpu)->active_cleared_last = false;
> -}
> +static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> +				 struct arch_timer_context *timer_ctx);
>  
>  u64 kvm_phys_timer_read(void)
>  {
> @@ -74,17 +73,37 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
>  		cancel_work_sync(work);
>  }
>  
> -static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> +static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	/*
> -	 * We disable the timer in the world switch and let it be
> -	 * handled by kvm_timer_sync_hwstate(). Getting a timer
> -	 * interrupt at this point is a sure sign of some major
> -	 * breakage.
> +	 * To prevent continuously exiting from the guest, we mask the
> +	 * physical interrupt when the virtual level is high, such that the
> +	 * guest can make forward progress.  Once we detect the output level
> +	 * being deasserted, we unmask the interrupt again so that we exit
> +	 * from the guest when the timer fires.

Maybe an additional comment indicating that this only makes sense when
we don't have an in-kernel GIC? I know this wasn't in the original code,
but I started asking myself all kind of questions until I realised what
this was for...

>  	 */
> -	pr_warn("Unexpected interrupt %d on vcpu %p\n", irq, vcpu);
> +	if (vtimer->irq.level)
> +		disable_percpu_irq(host_vtimer_irq);
> +	else
> +		enable_percpu_irq(host_vtimer_irq, 0);
> +}
> +
> +static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> +{
> +	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (!vtimer->irq.level) {
> +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> +		if (kvm_timer_irq_can_fire(vtimer))
> +			kvm_timer_update_irq(vcpu, true, vtimer);
> +	}
> +
> +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> +		kvm_vtimer_update_mask_user(vcpu);
> +
>  	return IRQ_HANDLED;
>  }
>  
> @@ -220,7 +239,6 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  {
>  	int ret;
>  
> -	timer_ctx->active_cleared_last = false;
>  	timer_ctx->irq.level = new_level;
>  	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
>  				   timer_ctx->irq.level);
> @@ -276,10 +294,16 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
>  	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
>  }
>  
> -static void timer_save_state(struct kvm_vcpu *vcpu)
> +static void vtimer_save_state(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	unsigned long flags;
> +
> +	local_irq_save(flags);

Is that to avoid racing against the timer when doing a
vcpu_put/timer/schedule?

> +
> +	if (!vtimer->loaded)
> +		goto out;
>  
>  	if (timer->enabled) {
>  		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> @@ -288,6 +312,10 @@ static void timer_save_state(struct kvm_vcpu *vcpu)
>  
>  	/* Disable the virtual timer */
>  	write_sysreg_el0(0, cntv_ctl);
> +
> +	vtimer->loaded = false;
> +out:
> +	local_irq_restore(flags);
>  }
>  
>  /*
> @@ -303,6 +331,8 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  
>  	BUG_ON(bg_timer_is_armed(timer));
>  
> +	vtimer_save_state(vcpu);
> +
>  	/*
>  	 * No need to schedule a background timer if any guest timer has
>  	 * already expired, because kvm_vcpu_block will return before putting
> @@ -326,16 +356,26 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
>  }
>  
> -static void timer_restore_state(struct kvm_vcpu *vcpu)
> +static void vtimer_restore_state(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +
> +	if (vtimer->loaded)
> +		goto out;
>  
>  	if (timer->enabled) {
>  		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
>  		isb();
>  		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
>  	}
> +
> +	vtimer->loaded = true;
> +out:
> +	local_irq_restore(flags);
>  }
>  
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> @@ -344,6 +384,8 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  
>  	soft_timer_cancel(&timer->bg_timer, &timer->expired);
>  	timer->armed = false;
> +
> +	vtimer_restore_state(vcpu);
>  }
>  
>  static void set_cntvoff(u64 cntvoff)
> @@ -353,61 +395,56 @@ static void set_cntvoff(u64 cntvoff)
>  	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
>  }
>  
> -static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> +static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	bool phys_active;
>  	int ret;
>  
> -	/*
> -	* If we enter the guest with the virtual input level to the VGIC
> -	* asserted, then we have already told the VGIC what we need to, and
> -	* we don't need to exit from the guest until the guest deactivates
> -	* the already injected interrupt, so therefore we should set the
> -	* hardware active state to prevent unnecessary exits from the guest.
> -	*
> -	* Also, if we enter the guest with the virtual timer interrupt active,
> -	* then it must be active on the physical distributor, because we set
> -	* the HW bit and the guest must be able to deactivate the virtual and
> -	* physical interrupt at the same time.
> -	*
> -	* Conversely, if the virtual input level is deasserted and the virtual
> -	* interrupt is not active, then always clear the hardware active state
> -	* to ensure that hardware interrupts from the timer triggers a guest
> -	* exit.
> -	*/
> -	phys_active = vtimer->irq.level ||
> -			kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
> -
> -	/*
> -	 * We want to avoid hitting the (re)distributor as much as
> -	 * possible, as this is a potentially expensive MMIO access
> -	 * (not to mention locks in the irq layer), and a solution for
> -	 * this is to cache the "active" state in memory.
> -	 *
> -	 * Things to consider: we cannot cache an "active set" state,
> -	 * because the HW can change this behind our back (it becomes
> -	 * "clear" in the HW). We must then restrict the caching to
> -	 * the "clear" state.
> -	 *
> -	 * The cache is invalidated on:
> -	 * - vcpu put, indicating that the HW cannot be trusted to be
> -	 *   in a sane state on the next vcpu load,
> -	 * - any change in the interrupt state
> -	 *
> -	 * Usage conditions:
> -	 * - cached value is "active clear"
> -	 * - value to be programmed is "active clear"
> -	 */
> -	if (vtimer->active_cleared_last && !phys_active)
> -		return;
> -
> +	if (vtimer->irq.level || kvm_vgic_map_is_active(vcpu, vtimer->irq.irq))
> +		phys_active = true;
> +	else
> +		phys_active = false;

nit: this can be written as:

     phys_active = (vtimer->irq.level ||
     		    kvm_vgic_map_is_active(vcpu, vtimer->irq.irq));

Not that it matters in the slightest...

>  	ret = irq_set_irqchip_state(host_vtimer_irq,
>  				    IRQCHIP_STATE_ACTIVE,
>  				    phys_active);
>  	WARN_ON(ret);
> +}
> +
> +static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)
> +{
> +	kvm_vtimer_update_mask_user(vcpu);
> +}
> +
> +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (unlikely(!timer->enabled))
> +		return;
> +
> +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> +		kvm_timer_vcpu_load_user(vcpu);
> +	else
> +		kvm_timer_vcpu_load_vgic(vcpu);
>  
> -	vtimer->active_cleared_last = !phys_active;
> +	set_cntvoff(vtimer->cntvoff);
> +
> +	/*
> +	 * If we armed a soft timer and potentially queued work, we have to
> +	 * cancel this, but cannot do it here, because canceling work can
> +	 * sleep and we can be in the middle of a preempt notifier call.
> +	 * Instead, when the timer has been armed, we know the return path
> +	 * from kvm_vcpu_block will call kvm_timer_unschedule, so we can defer
> +	 * restoring the state and canceling any soft timers and work items
> +	 * until then.
> +	 */
> +	if (!bg_timer_is_armed(timer))
> +		vtimer_restore_state(vcpu);
> +
> +	if (has_vhe())
> +		disable_el1_phys_timer_access();
>  }
>  
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> @@ -427,23 +464,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
>  	       ptimer->irq.level != plevel;
>  }
>  
> -static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> -{
> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> -
> -	/*
> -	 * To prevent continuously exiting from the guest, we mask the
> -	 * physical interrupt such that the guest can make forward progress.
> -	 * Once we detect the output level being deasserted, we unmask the
> -	 * interrupt again so that we exit from the guest when the timer
> -	 * fires.
> -	*/
> -	if (vtimer->irq.level)
> -		disable_percpu_irq(host_vtimer_irq);
> -	else
> -		enable_percpu_irq(host_vtimer_irq, 0);
> -}
> -
>  /**
>   * kvm_timer_flush_hwstate - prepare timers before running the vcpu
>   * @vcpu: The vcpu pointer
> @@ -456,23 +476,55 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
>  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
>  	if (unlikely(!timer->enabled))
>  		return;
>  
> -	kvm_timer_update_state(vcpu);
> +	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> +		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
>  
>  	/* Set the background timer for the physical timer emulation. */
>  	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> +}
>  
> -	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> -		kvm_timer_flush_hwstate_user(vcpu);
> -	else
> -		kvm_timer_flush_hwstate_vgic(vcpu);
> +void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
> -	set_cntvoff(vtimer->cntvoff);
> -	timer_restore_state(vcpu);
> +	if (unlikely(!timer->enabled))
> +		return;
> +
> +	if (has_vhe())
> +		enable_el1_phys_timer_access();
> +
> +	vtimer_save_state(vcpu);
> +
> +	set_cntvoff(0);

Can this be moved into vtimer_save_state()? And thinking of it, why
don't we reset cntvoff in kvm_timer_schedule() as well? 

> +}
> +
> +static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
> +		kvm_vtimer_update_mask_user(vcpu);
> +		return;
> +	}
> +
> +	/*
> +	 * If the guest disabled the timer without acking the interrupt, then
> +	 * we must make sure the physical and virtual active states are in
> +	 * sync by deactivating the physical interrupt, because otherwise we
> +	 * wouldn't see the next timer interrupt in the host.
> +	 */
> +	if (!kvm_vgic_map_is_active(vcpu, vtimer->irq.irq)) {
> +		int ret;
> +		ret = irq_set_irqchip_state(host_vtimer_irq,
> +					    IRQCHIP_STATE_ACTIVE,
> +					    false);
> +		WARN_ON(ret);
> +	}
>  }
>  
>  /**
> @@ -485,6 +537,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	/*
>  	 * This is to cancel the background timer for the physical timer
> @@ -492,14 +545,19 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  	 */
>  	soft_timer_cancel(&timer->phys_timer, NULL);
>  
> -	timer_save_state(vcpu);
> -	set_cntvoff(0);
> -
>  	/*
> -	 * The guest could have modified the timer registers or the timer
> -	 * could have expired, update the timer state.
> +	 * If we entered the guest with the vtimer output asserted we have to
> +	 * check if the guest has modified the timer so that we should lower
> +	 * the line at this point.
>  	 */
> -	kvm_timer_update_state(vcpu);
> +	if (vtimer->irq.level) {
> +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> +		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> +		if (!kvm_timer_should_fire(vtimer)) {
> +			kvm_timer_update_irq(vcpu, false, vtimer);
> +			unmask_vtimer_irq(vcpu);
> +		}
> +	}
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 27db222..132d39a 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -354,18 +354,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>  
>  	kvm_arm_set_running_vcpu(vcpu);
> -
>  	kvm_vgic_load(vcpu);
> +	kvm_timer_vcpu_load(vcpu);
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	kvm_timer_vcpu_put(vcpu);
>  	kvm_vgic_put(vcpu);
>  
>  	vcpu->cpu = -1;
>  
>  	kvm_arm_set_running_vcpu(NULL);
> -	kvm_timer_vcpu_put(vcpu);
>  }
>  
>  static void vcpu_power_off(struct kvm_vcpu *vcpu)
> @@ -710,16 +710,27 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_arm_clear_debug(vcpu);
>  
>  		/*
> -		 * We must sync the PMU and timer state before the vgic state so
> +		 * We must sync the PMU state before the vgic state so
>  		 * that the vgic can properly sample the updated state of the
>  		 * interrupt line.
>  		 */
>  		kvm_pmu_sync_hwstate(vcpu);
> -		kvm_timer_sync_hwstate(vcpu);
>  
> +		/*
> +		 * Sync the vgic state before syncing the timer state because
> +		 * the timer code needs to know if the virtual timer
> +		 * interrupts are active.
> +		 */
>  		kvm_vgic_sync_hwstate(vcpu);
>  
>  		/*
> +		 * Sync the timer hardware state before enabling interrupts as
> +		 * we don't want vtimer interrupts to race with syncing the
> +		 * timer virtual interrupt state.
> +		 */
> +		kvm_timer_sync_hwstate(vcpu);
> +
> +		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
>  		 * pending, as we haven't serviced it yet!
> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> index a6c3b10..f398616 100644
> --- a/virt/kvm/arm/hyp/timer-sr.c
> +++ b/virt/kvm/arm/hyp/timer-sr.c
> @@ -27,7 +27,7 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
>  	write_sysreg(cntvoff, cntvoff_el2);
>  }
>  
> -void __hyp_text enable_phys_timer(void)
> +void __hyp_text enable_el1_phys_timer_access(void)
>  {
>  	u64 val;
>  
> @@ -37,7 +37,7 @@ void __hyp_text enable_phys_timer(void)
>  	write_sysreg(val, cnthctl_el2);
>  }
>  
> -void __hyp_text disable_phys_timer(void)
> +void __hyp_text disable_el1_phys_timer_access(void)
>  {
>  	u64 val;
>  
> @@ -58,11 +58,11 @@ void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
>  	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
>  	 */
>  	if (!has_vhe())
> -		enable_phys_timer();
> +		enable_el1_phys_timer_access();
>  }
>  
>  void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
>  {
>  	if (!has_vhe())
> -		disable_phys_timer();
> +		disable_el1_phys_timer_access();
>  }

It'd be nice to move this renaming to the patch that introduce these two
functions.

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
@ 2017-10-10  8:47     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  8:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23 2017 at  2:42:01 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> We don't need to save and restore the hardware timer state and examine
> if it generates interrupts on on every entry/exit to the guest.  The
> timer hardware is perfectly capable of telling us when it has expired
> by signaling interrupts.
>
> When taking a vtimer interrupt in the host, we don't want to mess with
> the timer configuration, we just want to forward the physical interrupt
> to the guest as a virtual interrupt.  We can use the split priority drop
> and deactivate feature of the GIC to do this, which leaves an EOI'ed
> interrupt active on the physical distributor, making sure we don't keep
> taking timer interrupts which would prevent the guest from running.  We
> can then forward the physical interrupt to the VM using the HW bit in
> the LR of the GIC VE, like we do already, which lets the guest directly

VE?

> deactivate both the physical and virtual timer simultaneously, allowing
> the timer hardware to exit the VM and generate a new physical interrupt
> when the timer output is again asserted later on.
>
> We do need to capture this state when migrating VCPUs between physical
> CPUs, however, which we use the vcpu put/load functions for, which are
> called through preempt notifiers whenever the thread is scheduled away
> from the CPU or called directly if we return from the ioctl to
> userspace.
>
> One caveat is that we cannot restore the timer state during
> kvm_timer_vcpu_load, because the flow of sleeping a VCPU is:
>
>   1. kvm_vcpu_block
>   2. kvm_timer_schedule
>   3. schedule
>   4. kvm_timer_vcpu_put (preempt notifier)
>   5. schedule (vcpu thread gets scheduled back)
>   6. kvm_timer_vcpu_load
>         <---- We restore the hardware state here, but the bg_timer
> 	      hrtimer may have scheduled a work function that also
> 	      changes the timer state here.
>   7. kvm_timer_unschedule
>         <---- We can restore the state here instead
>
> So, while we do need to restore the timer state in step (6) in all other
> cases than when we called kvm_vcpu_block(), we have to defer the restore
> to step (7) when coming back after kvm_vcpu_block().  Note that we
> cannot simply call cancel_work_sync() in step (6), because vcpu_load can
> be called from a preempt notifier.
>
> An added benefit beyond not having to read and write the timer sysregs
> on every entry and exit is that we no longer have to actively write the
> active state to the physical distributor, because we set the affinity of

I don't understand this thing about the affinity of the timer. It is a
PPI, so it cannot go anywhere else.

> the vtimer interrupt when loading the timer state, so that the interrupt
> automatically stays active after firing.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  include/kvm/arm_arch_timer.h |   9 +-
>  virt/kvm/arm/arch_timer.c    | 238 +++++++++++++++++++++++++++----------------
>  virt/kvm/arm/arm.c           |  19 +++-
>  virt/kvm/arm/hyp/timer-sr.c  |   8 +-
>  4 files changed, 174 insertions(+), 100 deletions(-)
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 16887c0..8e5ed54 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -31,8 +31,8 @@ struct arch_timer_context {
>  	/* Timer IRQ */
>  	struct kvm_irq_level		irq;
>  
> -	/* Active IRQ state caching */
> -	bool				active_cleared_last;
> +	/* Is the timer state loaded on the hardware timer */
> +	bool			loaded;

I think this little guy is pretty crucial to understand the flow, as
there is now two points where we save/restore the timer:
vcpu_load/vcpu_put and timer_schedule/timer_unschedule. Both can be
executed on the blocking path, and this is the predicate to find out if
there is actually something to do.

Would you mind adding a small comment to that effect?

>  
>  	/* Virtual offset */
>  	u64			cntvoff;
> @@ -80,10 +80,15 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
>  u64 kvm_phys_timer_read(void);
>  
> +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
>  
>  void kvm_timer_init_vhe(void);
>  
>  #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.vtimer)
>  #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.ptimer)
> +
> +void enable_el1_phys_timer_access(void);
> +void disable_el1_phys_timer_access(void);
> +
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 4275f8f..70110ea 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -46,10 +46,9 @@ static const struct kvm_irq_level default_vtimer_irq = {
>  	.level	= 1,
>  };
>  
> -void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> -{
> -	vcpu_vtimer(vcpu)->active_cleared_last = false;
> -}
> +static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> +				 struct arch_timer_context *timer_ctx);
>  
>  u64 kvm_phys_timer_read(void)
>  {
> @@ -74,17 +73,37 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
>  		cancel_work_sync(work);
>  }
>  
> -static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> +static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	/*
> -	 * We disable the timer in the world switch and let it be
> -	 * handled by kvm_timer_sync_hwstate(). Getting a timer
> -	 * interrupt at this point is a sure sign of some major
> -	 * breakage.
> +	 * To prevent continuously exiting from the guest, we mask the
> +	 * physical interrupt when the virtual level is high, such that the
> +	 * guest can make forward progress.  Once we detect the output level
> +	 * being deasserted, we unmask the interrupt again so that we exit
> +	 * from the guest when the timer fires.

Maybe an additional comment indicating that this only makes sense when
we don't have an in-kernel GIC? I know this wasn't in the original code,
but I started asking myself all kind of questions until I realised what
this was for...

>  	 */
> -	pr_warn("Unexpected interrupt %d on vcpu %p\n", irq, vcpu);
> +	if (vtimer->irq.level)
> +		disable_percpu_irq(host_vtimer_irq);
> +	else
> +		enable_percpu_irq(host_vtimer_irq, 0);
> +}
> +
> +static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> +{
> +	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (!vtimer->irq.level) {
> +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> +		if (kvm_timer_irq_can_fire(vtimer))
> +			kvm_timer_update_irq(vcpu, true, vtimer);
> +	}
> +
> +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> +		kvm_vtimer_update_mask_user(vcpu);
> +
>  	return IRQ_HANDLED;
>  }
>  
> @@ -220,7 +239,6 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  {
>  	int ret;
>  
> -	timer_ctx->active_cleared_last = false;
>  	timer_ctx->irq.level = new_level;
>  	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
>  				   timer_ctx->irq.level);
> @@ -276,10 +294,16 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
>  	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
>  }
>  
> -static void timer_save_state(struct kvm_vcpu *vcpu)
> +static void vtimer_save_state(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	unsigned long flags;
> +
> +	local_irq_save(flags);

Is that to avoid racing against the timer when doing a
vcpu_put/timer/schedule?

> +
> +	if (!vtimer->loaded)
> +		goto out;
>  
>  	if (timer->enabled) {
>  		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> @@ -288,6 +312,10 @@ static void timer_save_state(struct kvm_vcpu *vcpu)
>  
>  	/* Disable the virtual timer */
>  	write_sysreg_el0(0, cntv_ctl);
> +
> +	vtimer->loaded = false;
> +out:
> +	local_irq_restore(flags);
>  }
>  
>  /*
> @@ -303,6 +331,8 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  
>  	BUG_ON(bg_timer_is_armed(timer));
>  
> +	vtimer_save_state(vcpu);
> +
>  	/*
>  	 * No need to schedule a background timer if any guest timer has
>  	 * already expired, because kvm_vcpu_block will return before putting
> @@ -326,16 +356,26 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
>  }
>  
> -static void timer_restore_state(struct kvm_vcpu *vcpu)
> +static void vtimer_restore_state(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +
> +	if (vtimer->loaded)
> +		goto out;
>  
>  	if (timer->enabled) {
>  		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
>  		isb();
>  		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
>  	}
> +
> +	vtimer->loaded = true;
> +out:
> +	local_irq_restore(flags);
>  }
>  
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> @@ -344,6 +384,8 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  
>  	soft_timer_cancel(&timer->bg_timer, &timer->expired);
>  	timer->armed = false;
> +
> +	vtimer_restore_state(vcpu);
>  }
>  
>  static void set_cntvoff(u64 cntvoff)
> @@ -353,61 +395,56 @@ static void set_cntvoff(u64 cntvoff)
>  	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
>  }
>  
> -static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> +static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	bool phys_active;
>  	int ret;
>  
> -	/*
> -	* If we enter the guest with the virtual input level to the VGIC
> -	* asserted, then we have already told the VGIC what we need to, and
> -	* we don't need to exit from the guest until the guest deactivates
> -	* the already injected interrupt, so therefore we should set the
> -	* hardware active state to prevent unnecessary exits from the guest.
> -	*
> -	* Also, if we enter the guest with the virtual timer interrupt active,
> -	* then it must be active on the physical distributor, because we set
> -	* the HW bit and the guest must be able to deactivate the virtual and
> -	* physical interrupt at the same time.
> -	*
> -	* Conversely, if the virtual input level is deasserted and the virtual
> -	* interrupt is not active, then always clear the hardware active state
> -	* to ensure that hardware interrupts from the timer triggers a guest
> -	* exit.
> -	*/
> -	phys_active = vtimer->irq.level ||
> -			kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
> -
> -	/*
> -	 * We want to avoid hitting the (re)distributor as much as
> -	 * possible, as this is a potentially expensive MMIO access
> -	 * (not to mention locks in the irq layer), and a solution for
> -	 * this is to cache the "active" state in memory.
> -	 *
> -	 * Things to consider: we cannot cache an "active set" state,
> -	 * because the HW can change this behind our back (it becomes
> -	 * "clear" in the HW). We must then restrict the caching to
> -	 * the "clear" state.
> -	 *
> -	 * The cache is invalidated on:
> -	 * - vcpu put, indicating that the HW cannot be trusted to be
> -	 *   in a sane state on the next vcpu load,
> -	 * - any change in the interrupt state
> -	 *
> -	 * Usage conditions:
> -	 * - cached value is "active clear"
> -	 * - value to be programmed is "active clear"
> -	 */
> -	if (vtimer->active_cleared_last && !phys_active)
> -		return;
> -
> +	if (vtimer->irq.level || kvm_vgic_map_is_active(vcpu, vtimer->irq.irq))
> +		phys_active = true;
> +	else
> +		phys_active = false;

nit: this can be written as:

     phys_active = (vtimer->irq.level ||
     		    kvm_vgic_map_is_active(vcpu, vtimer->irq.irq));

Not that it matters in the slightest...

>  	ret = irq_set_irqchip_state(host_vtimer_irq,
>  				    IRQCHIP_STATE_ACTIVE,
>  				    phys_active);
>  	WARN_ON(ret);
> +}
> +
> +static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)
> +{
> +	kvm_vtimer_update_mask_user(vcpu);
> +}
> +
> +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (unlikely(!timer->enabled))
> +		return;
> +
> +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> +		kvm_timer_vcpu_load_user(vcpu);
> +	else
> +		kvm_timer_vcpu_load_vgic(vcpu);
>  
> -	vtimer->active_cleared_last = !phys_active;
> +	set_cntvoff(vtimer->cntvoff);
> +
> +	/*
> +	 * If we armed a soft timer and potentially queued work, we have to
> +	 * cancel this, but cannot do it here, because canceling work can
> +	 * sleep and we can be in the middle of a preempt notifier call.
> +	 * Instead, when the timer has been armed, we know the return path
> +	 * from kvm_vcpu_block will call kvm_timer_unschedule, so we can defer
> +	 * restoring the state and canceling any soft timers and work items
> +	 * until then.
> +	 */
> +	if (!bg_timer_is_armed(timer))
> +		vtimer_restore_state(vcpu);
> +
> +	if (has_vhe())
> +		disable_el1_phys_timer_access();
>  }
>  
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> @@ -427,23 +464,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
>  	       ptimer->irq.level != plevel;
>  }
>  
> -static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> -{
> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> -
> -	/*
> -	 * To prevent continuously exiting from the guest, we mask the
> -	 * physical interrupt such that the guest can make forward progress.
> -	 * Once we detect the output level being deasserted, we unmask the
> -	 * interrupt again so that we exit from the guest when the timer
> -	 * fires.
> -	*/
> -	if (vtimer->irq.level)
> -		disable_percpu_irq(host_vtimer_irq);
> -	else
> -		enable_percpu_irq(host_vtimer_irq, 0);
> -}
> -
>  /**
>   * kvm_timer_flush_hwstate - prepare timers before running the vcpu
>   * @vcpu: The vcpu pointer
> @@ -456,23 +476,55 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
>  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
>  	if (unlikely(!timer->enabled))
>  		return;
>  
> -	kvm_timer_update_state(vcpu);
> +	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> +		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
>  
>  	/* Set the background timer for the physical timer emulation. */
>  	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> +}
>  
> -	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> -		kvm_timer_flush_hwstate_user(vcpu);
> -	else
> -		kvm_timer_flush_hwstate_vgic(vcpu);
> +void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
> -	set_cntvoff(vtimer->cntvoff);
> -	timer_restore_state(vcpu);
> +	if (unlikely(!timer->enabled))
> +		return;
> +
> +	if (has_vhe())
> +		enable_el1_phys_timer_access();
> +
> +	vtimer_save_state(vcpu);
> +
> +	set_cntvoff(0);

Can this be moved into vtimer_save_state()? And thinking of it, why
don't we reset cntvoff in kvm_timer_schedule() as well? 

> +}
> +
> +static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +
> +	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
> +		kvm_vtimer_update_mask_user(vcpu);
> +		return;
> +	}
> +
> +	/*
> +	 * If the guest disabled the timer without acking the interrupt, then
> +	 * we must make sure the physical and virtual active states are in
> +	 * sync by deactivating the physical interrupt, because otherwise we
> +	 * wouldn't see the next timer interrupt in the host.
> +	 */
> +	if (!kvm_vgic_map_is_active(vcpu, vtimer->irq.irq)) {
> +		int ret;
> +		ret = irq_set_irqchip_state(host_vtimer_irq,
> +					    IRQCHIP_STATE_ACTIVE,
> +					    false);
> +		WARN_ON(ret);
> +	}
>  }
>  
>  /**
> @@ -485,6 +537,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	/*
>  	 * This is to cancel the background timer for the physical timer
> @@ -492,14 +545,19 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  	 */
>  	soft_timer_cancel(&timer->phys_timer, NULL);
>  
> -	timer_save_state(vcpu);
> -	set_cntvoff(0);
> -
>  	/*
> -	 * The guest could have modified the timer registers or the timer
> -	 * could have expired, update the timer state.
> +	 * If we entered the guest with the vtimer output asserted we have to
> +	 * check if the guest has modified the timer so that we should lower
> +	 * the line at this point.
>  	 */
> -	kvm_timer_update_state(vcpu);
> +	if (vtimer->irq.level) {
> +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> +		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> +		if (!kvm_timer_should_fire(vtimer)) {
> +			kvm_timer_update_irq(vcpu, false, vtimer);
> +			unmask_vtimer_irq(vcpu);
> +		}
> +	}
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 27db222..132d39a 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -354,18 +354,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
>  
>  	kvm_arm_set_running_vcpu(vcpu);
> -
>  	kvm_vgic_load(vcpu);
> +	kvm_timer_vcpu_load(vcpu);
>  }
>  
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
> +	kvm_timer_vcpu_put(vcpu);
>  	kvm_vgic_put(vcpu);
>  
>  	vcpu->cpu = -1;
>  
>  	kvm_arm_set_running_vcpu(NULL);
> -	kvm_timer_vcpu_put(vcpu);
>  }
>  
>  static void vcpu_power_off(struct kvm_vcpu *vcpu)
> @@ -710,16 +710,27 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_arm_clear_debug(vcpu);
>  
>  		/*
> -		 * We must sync the PMU and timer state before the vgic state so
> +		 * We must sync the PMU state before the vgic state so
>  		 * that the vgic can properly sample the updated state of the
>  		 * interrupt line.
>  		 */
>  		kvm_pmu_sync_hwstate(vcpu);
> -		kvm_timer_sync_hwstate(vcpu);
>  
> +		/*
> +		 * Sync the vgic state before syncing the timer state because
> +		 * the timer code needs to know if the virtual timer
> +		 * interrupts are active.
> +		 */
>  		kvm_vgic_sync_hwstate(vcpu);
>  
>  		/*
> +		 * Sync the timer hardware state before enabling interrupts as
> +		 * we don't want vtimer interrupts to race with syncing the
> +		 * timer virtual interrupt state.
> +		 */
> +		kvm_timer_sync_hwstate(vcpu);
> +
> +		/*
>  		 * We may have taken a host interrupt in HYP mode (ie
>  		 * while executing the guest). This interrupt is still
>  		 * pending, as we haven't serviced it yet!
> diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> index a6c3b10..f398616 100644
> --- a/virt/kvm/arm/hyp/timer-sr.c
> +++ b/virt/kvm/arm/hyp/timer-sr.c
> @@ -27,7 +27,7 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
>  	write_sysreg(cntvoff, cntvoff_el2);
>  }
>  
> -void __hyp_text enable_phys_timer(void)
> +void __hyp_text enable_el1_phys_timer_access(void)
>  {
>  	u64 val;
>  
> @@ -37,7 +37,7 @@ void __hyp_text enable_phys_timer(void)
>  	write_sysreg(val, cnthctl_el2);
>  }
>  
> -void __hyp_text disable_phys_timer(void)
> +void __hyp_text disable_el1_phys_timer_access(void)
>  {
>  	u64 val;
>  
> @@ -58,11 +58,11 @@ void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
>  	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
>  	 */
>  	if (!has_vhe())
> -		enable_phys_timer();
> +		enable_el1_phys_timer_access();
>  }
>  
>  void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
>  {
>  	if (!has_vhe())
> -		disable_phys_timer();
> +		disable_el1_phys_timer_access();
>  }

It'd be nice to move this renaming to the patch that introduce these two
functions.

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg
  2017-09-23  0:42   ` Christoffer Dall
@ 2017-10-10  9:10     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:10 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel, kvm

On Sat, Sep 23 2017 at  2:42:02 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> Add suport for the physical timer registers in kvm_arm_timer_set_reg and
> kvm_arm_timer_get_reg so that these functions can be reused to interact
> with the rest of the system.
>
> Note that this paves part of the way for the physical timer state
> save/restore, but we still need to add those registers to
> KVM_GET_REG_LIST before we support migrating the physical timer state.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm/include/uapi/asm/kvm.h   |  6 ++++++
>  arch/arm64/include/uapi/asm/kvm.h |  6 ++++++
>  virt/kvm/arm/arch_timer.c         | 33 +++++++++++++++++++++++++++++++--
>  3 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index 5db2d4c..665c454 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -151,6 +151,12 @@ struct kvm_arch_memory_slot {
>  	(__ARM_CP15_REG(op1, 0, crm, 0) | KVM_REG_SIZE_U64)
>  #define ARM_CP15_REG64(...) __ARM_CP15_REG64(__VA_ARGS__)
>  
> +/* PL1 Physical Timer Registers */
> +#define KVM_REG_ARM_PTIMER_CTL		ARM_CP15_REG32(0, 14, 2, 1)
> +#define KVM_REG_ARM_PTIMER_CNT		ARM_CP15_REG64(0, 14)
> +#define KVM_REG_ARM_PTIMER_CVAL		ARM_CP15_REG64(2, 14)
> +
> +/* Virtual Timer Registers */
>  #define KVM_REG_ARM_TIMER_CTL		ARM_CP15_REG32(0, 14, 3, 1)
>  #define KVM_REG_ARM_TIMER_CNT		ARM_CP15_REG64(1, 14)
>  #define KVM_REG_ARM_TIMER_CVAL		ARM_CP15_REG64(3, 14)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 9f3ca24..07be6e2 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -195,6 +195,12 @@ struct kvm_arch_memory_slot {
>  
>  #define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_SIZE_U64)
>  
> +/* EL1 Physical Timer Registers */

These are EL0 registers, even if we tend to restrict them to EL1. Even
the 32bit version is not strictly a PL1 register, since PL1 can delegate
it to userspace (but the ARMv7 ARM still carries this PL1 thing...).

> +#define KVM_REG_ARM_PTIMER_CTL		ARM64_SYS_REG(3, 3, 14, 2, 1)
> +#define KVM_REG_ARM_PTIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 2, 2)
> +#define KVM_REG_ARM_PTIMER_CNT		ARM64_SYS_REG(3, 3, 14, 0, 1)
> +
> +/* EL0 Virtual Timer Registers */
>  #define KVM_REG_ARM_TIMER_CTL		ARM64_SYS_REG(3, 3, 14, 3, 1)
>  #define KVM_REG_ARM_TIMER_CNT		ARM64_SYS_REG(3, 3, 14, 3, 2)
>  #define KVM_REG_ARM_TIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 0, 2)
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 70110ea..d5b632d 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -626,10 +626,11 @@ static void kvm_timer_init_interrupt(void *info)
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  {
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
>  	switch (regid) {
>  	case KVM_REG_ARM_TIMER_CTL:
> -		vtimer->cnt_ctl = value;
> +		vtimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;

Ah, interesting. Does this change anything to userspace behaviour?

>  		break;
>  	case KVM_REG_ARM_TIMER_CNT:
>  		update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
> @@ -637,6 +638,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  	case KVM_REG_ARM_TIMER_CVAL:
>  		vtimer->cnt_cval = value;
>  		break;
> +	case KVM_REG_ARM_PTIMER_CTL:
> +		ptimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
> +		break;
> +	case KVM_REG_ARM_PTIMER_CVAL:
> +		ptimer->cnt_cval = value;
> +		break;
> +
>  	default:
>  		return -1;
>  	}
> @@ -645,17 +653,38 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  	return 0;
>  }
>  
> +static u64 read_timer_ctl(struct arch_timer_context *timer)
> +{
> +	/*
> +	 * Set ISTATUS bit if it's expired.
> +	 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
> +	 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
> +	 * regardless of ENABLE bit for our implementation convenience.
> +	 */
> +	if (!kvm_timer_compute_delta(timer))
> +		return timer->cnt_ctl | ARCH_TIMER_CTRL_IT_STAT;
> +	else
> +		return timer->cnt_ctl;

Can't we end-up with a stale IT_STAT bit here if the timer has been
snapshoted with an interrupt pending, and then CVAL updated to expire
later?

> +}
> +
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 regid)
>  {
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	switch (regid) {
>  	case KVM_REG_ARM_TIMER_CTL:
> -		return vtimer->cnt_ctl;
> +		return read_timer_ctl(vtimer);
>  	case KVM_REG_ARM_TIMER_CNT:
>  		return kvm_phys_timer_read() - vtimer->cntvoff;
>  	case KVM_REG_ARM_TIMER_CVAL:
>  		return vtimer->cnt_cval;
> +	case KVM_REG_ARM_PTIMER_CTL:
> +		return read_timer_ctl(ptimer);
> +	case KVM_REG_ARM_PTIMER_CVAL:
> +		return ptimer->cnt_cval;
> +	case KVM_REG_ARM_PTIMER_CNT:
> +		return kvm_phys_timer_read();
>  	}
>  	return (u64)-1;
>  }

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg
@ 2017-10-10  9:10     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23 2017 at  2:42:02 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> Add suport for the physical timer registers in kvm_arm_timer_set_reg and
> kvm_arm_timer_get_reg so that these functions can be reused to interact
> with the rest of the system.
>
> Note that this paves part of the way for the physical timer state
> save/restore, but we still need to add those registers to
> KVM_GET_REG_LIST before we support migrating the physical timer state.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm/include/uapi/asm/kvm.h   |  6 ++++++
>  arch/arm64/include/uapi/asm/kvm.h |  6 ++++++
>  virt/kvm/arm/arch_timer.c         | 33 +++++++++++++++++++++++++++++++--
>  3 files changed, 43 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> index 5db2d4c..665c454 100644
> --- a/arch/arm/include/uapi/asm/kvm.h
> +++ b/arch/arm/include/uapi/asm/kvm.h
> @@ -151,6 +151,12 @@ struct kvm_arch_memory_slot {
>  	(__ARM_CP15_REG(op1, 0, crm, 0) | KVM_REG_SIZE_U64)
>  #define ARM_CP15_REG64(...) __ARM_CP15_REG64(__VA_ARGS__)
>  
> +/* PL1 Physical Timer Registers */
> +#define KVM_REG_ARM_PTIMER_CTL		ARM_CP15_REG32(0, 14, 2, 1)
> +#define KVM_REG_ARM_PTIMER_CNT		ARM_CP15_REG64(0, 14)
> +#define KVM_REG_ARM_PTIMER_CVAL		ARM_CP15_REG64(2, 14)
> +
> +/* Virtual Timer Registers */
>  #define KVM_REG_ARM_TIMER_CTL		ARM_CP15_REG32(0, 14, 3, 1)
>  #define KVM_REG_ARM_TIMER_CNT		ARM_CP15_REG64(1, 14)
>  #define KVM_REG_ARM_TIMER_CVAL		ARM_CP15_REG64(3, 14)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 9f3ca24..07be6e2 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -195,6 +195,12 @@ struct kvm_arch_memory_slot {
>  
>  #define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_SIZE_U64)
>  
> +/* EL1 Physical Timer Registers */

These are EL0 registers, even if we tend to restrict them to EL1. Even
the 32bit version is not strictly a PL1 register, since PL1 can delegate
it to userspace (but the ARMv7 ARM still carries this PL1 thing...).

> +#define KVM_REG_ARM_PTIMER_CTL		ARM64_SYS_REG(3, 3, 14, 2, 1)
> +#define KVM_REG_ARM_PTIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 2, 2)
> +#define KVM_REG_ARM_PTIMER_CNT		ARM64_SYS_REG(3, 3, 14, 0, 1)
> +
> +/* EL0 Virtual Timer Registers */
>  #define KVM_REG_ARM_TIMER_CTL		ARM64_SYS_REG(3, 3, 14, 3, 1)
>  #define KVM_REG_ARM_TIMER_CNT		ARM64_SYS_REG(3, 3, 14, 3, 2)
>  #define KVM_REG_ARM_TIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 0, 2)
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 70110ea..d5b632d 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -626,10 +626,11 @@ static void kvm_timer_init_interrupt(void *info)
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  {
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
>  	switch (regid) {
>  	case KVM_REG_ARM_TIMER_CTL:
> -		vtimer->cnt_ctl = value;
> +		vtimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;

Ah, interesting. Does this change anything to userspace behaviour?

>  		break;
>  	case KVM_REG_ARM_TIMER_CNT:
>  		update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
> @@ -637,6 +638,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  	case KVM_REG_ARM_TIMER_CVAL:
>  		vtimer->cnt_cval = value;
>  		break;
> +	case KVM_REG_ARM_PTIMER_CTL:
> +		ptimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
> +		break;
> +	case KVM_REG_ARM_PTIMER_CVAL:
> +		ptimer->cnt_cval = value;
> +		break;
> +
>  	default:
>  		return -1;
>  	}
> @@ -645,17 +653,38 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  	return 0;
>  }
>  
> +static u64 read_timer_ctl(struct arch_timer_context *timer)
> +{
> +	/*
> +	 * Set ISTATUS bit if it's expired.
> +	 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
> +	 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
> +	 * regardless of ENABLE bit for our implementation convenience.
> +	 */
> +	if (!kvm_timer_compute_delta(timer))
> +		return timer->cnt_ctl | ARCH_TIMER_CTRL_IT_STAT;
> +	else
> +		return timer->cnt_ctl;

Can't we end-up with a stale IT_STAT bit here if the timer has been
snapshoted with an interrupt pending, and then CVAL updated to expire
later?

> +}
> +
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 regid)
>  {
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	switch (regid) {
>  	case KVM_REG_ARM_TIMER_CTL:
> -		return vtimer->cnt_ctl;
> +		return read_timer_ctl(vtimer);
>  	case KVM_REG_ARM_TIMER_CNT:
>  		return kvm_phys_timer_read() - vtimer->cntvoff;
>  	case KVM_REG_ARM_TIMER_CVAL:
>  		return vtimer->cnt_cval;
> +	case KVM_REG_ARM_PTIMER_CTL:
> +		return read_timer_ctl(ptimer);
> +	case KVM_REG_ARM_PTIMER_CVAL:
> +		return ptimer->cnt_cval;
> +	case KVM_REG_ARM_PTIMER_CNT:
> +		return kvm_phys_timer_read();
>  	}
>  	return (u64)-1;
>  }

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 16/20] KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
  2017-09-23  0:42   ` Christoffer Dall
@ 2017-10-10  9:12     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:12 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Sat, Sep 23 2017 at  2:42:03 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> When trapping on a guest access to one of the timer registers, we were
> messing with the internals of the timer state from the sysregs handling
> code, and that logic was about to receive more added complexity when
> optimizing the timer handling code.
>
> Therefore, since we already have timer register access functions (to
> access registers from userspace), reuse those for the timer register
> traps from a VM and let the timer code maintain its own consistency.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 41 ++++++++++++++---------------------------
>  1 file changed, 14 insertions(+), 27 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 2e070d3..bb0e41b 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -841,13 +841,16 @@ static bool access_cntp_tval(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> -	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	u64 now = kvm_phys_timer_read();
> +	u64 cval;
>  
> -	if (p->is_write)
> -		ptimer->cnt_cval = p->regval + now;
> -	else
> -		p->regval = ptimer->cnt_cval - now;
> +	if (p->is_write) {
> +		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL,
> +				      p->regval + now);
> +	} else {
> +		cval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
> +		p->regval = cval - now;
> +	}
>  
>  	return true;
>  }
> @@ -856,24 +859,10 @@ static bool access_cntp_ctl(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> -	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> -
> -	if (p->is_write) {
> -		/* ISTATUS bit is read-only */
> -		ptimer->cnt_ctl = p->regval & ~ARCH_TIMER_CTRL_IT_STAT;
> -	} else {
> -		u64 now = kvm_phys_timer_read();
> -
> -		p->regval = ptimer->cnt_ctl;
> -		/*
> -		 * Set ISTATUS bit if it's expired.
> -		 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
> -		 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
> -		 * regardless of ENABLE bit for our implementation convenience.
> -		 */
> -		if (ptimer->cnt_cval <= now)
> -			p->regval |= ARCH_TIMER_CTRL_IT_STAT;
> -	}
> +	if (p->is_write)
> +		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CTL, p->regval);
> +	else
> +		p->regval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CTL);
>  
>  	return true;
>  }
> @@ -882,12 +871,10 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> -	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> -
>  	if (p->is_write)
> -		ptimer->cnt_cval = p->regval;
> +		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL, p->regval);
>  	else
> -		p->regval = ptimer->cnt_cval;
> +		p->regval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
>  
>  	return true;
>  }

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 16/20] KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
@ 2017-10-10  9:12     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23 2017 at  2:42:03 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> When trapping on a guest access to one of the timer registers, we were
> messing with the internals of the timer state from the sysregs handling
> code, and that logic was about to receive more added complexity when
> optimizing the timer handling code.
>
> Therefore, since we already have timer register access functions (to
> access registers from userspace), reuse those for the timer register
> traps from a VM and let the timer code maintain its own consistency.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm64/kvm/sys_regs.c | 41 ++++++++++++++---------------------------
>  1 file changed, 14 insertions(+), 27 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 2e070d3..bb0e41b 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -841,13 +841,16 @@ static bool access_cntp_tval(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> -	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  	u64 now = kvm_phys_timer_read();
> +	u64 cval;
>  
> -	if (p->is_write)
> -		ptimer->cnt_cval = p->regval + now;
> -	else
> -		p->regval = ptimer->cnt_cval - now;
> +	if (p->is_write) {
> +		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL,
> +				      p->regval + now);
> +	} else {
> +		cval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
> +		p->regval = cval - now;
> +	}
>  
>  	return true;
>  }
> @@ -856,24 +859,10 @@ static bool access_cntp_ctl(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> -	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> -
> -	if (p->is_write) {
> -		/* ISTATUS bit is read-only */
> -		ptimer->cnt_ctl = p->regval & ~ARCH_TIMER_CTRL_IT_STAT;
> -	} else {
> -		u64 now = kvm_phys_timer_read();
> -
> -		p->regval = ptimer->cnt_ctl;
> -		/*
> -		 * Set ISTATUS bit if it's expired.
> -		 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
> -		 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
> -		 * regardless of ENABLE bit for our implementation convenience.
> -		 */
> -		if (ptimer->cnt_cval <= now)
> -			p->regval |= ARCH_TIMER_CTRL_IT_STAT;
> -	}
> +	if (p->is_write)
> +		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CTL, p->regval);
> +	else
> +		p->regval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CTL);
>  
>  	return true;
>  }
> @@ -882,12 +871,10 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>  		struct sys_reg_params *p,
>  		const struct sys_reg_desc *r)
>  {
> -	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> -
>  	if (p->is_write)
> -		ptimer->cnt_cval = p->regval;
> +		kvm_arm_timer_set_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL, p->regval);
>  	else
> -		p->regval = ptimer->cnt_cval;
> +		p->regval = kvm_arm_timer_get_reg(vcpu, KVM_REG_ARM_PTIMER_CVAL);
>  
>  	return true;
>  }

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 17/20] KVM: arm/arm64: Move phys_timer_emulate function
  2017-09-23  0:42   ` Christoffer Dall
@ 2017-10-10  9:21     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:21 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Sat, Sep 23 2017 at  2:42:04 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> We are about to call phys_timer_emulate() from kvm_timer_update_state()
> and modify phys_timer_emulate() at the same time.  Moving the function
> and modifying it in a single patch makes the diff hard to read, so do
> this separately first.
>
> No functional change.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index d5b632d..1f82c21 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -252,6 +252,22 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  	}
>  }
>  
> +/* Schedule the background timer for the emulated timer. */
> +static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> +			      struct arch_timer_context *timer_ctx)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	if (kvm_timer_should_fire(timer_ctx))
> +		return;
> +
> +	if (!kvm_timer_irq_can_fire(timer_ctx))
> +		return;
> +
> +	/*  The timer has not yet expired, schedule a background timer */
> +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> +}
> +
>  /*
>   * Check if there was a change in the timer state (should we raise or lower
>   * the line level to the GIC).
> @@ -278,22 +294,6 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
>  }
>  
> -/* Schedule the background timer for the emulated timer. */
> -static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> -			      struct arch_timer_context *timer_ctx)
> -{
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -
> -	if (kvm_timer_should_fire(timer_ctx))
> -		return;
> -
> -	if (!kvm_timer_irq_can_fire(timer_ctx))
> -		return;
> -
> -	/*  The timer has not yet expired, schedule a background timer */
> -	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> -}
> -
>  static void vtimer_save_state(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 17/20] KVM: arm/arm64: Move phys_timer_emulate function
@ 2017-10-10  9:21     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23 2017 at  2:42:04 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> We are about to call phys_timer_emulate() from kvm_timer_update_state()
> and modify phys_timer_emulate() at the same time.  Moving the function
> and modifying it in a single patch makes the diff hard to read, so do
> this separately first.
>
> No functional change.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index d5b632d..1f82c21 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -252,6 +252,22 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  	}
>  }
>  
> +/* Schedule the background timer for the emulated timer. */
> +static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> +			      struct arch_timer_context *timer_ctx)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	if (kvm_timer_should_fire(timer_ctx))
> +		return;
> +
> +	if (!kvm_timer_irq_can_fire(timer_ctx))
> +		return;
> +
> +	/*  The timer has not yet expired, schedule a background timer */
> +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> +}
> +
>  /*
>   * Check if there was a change in the timer state (should we raise or lower
>   * the line level to the GIC).
> @@ -278,22 +294,6 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
>  }
>  
> -/* Schedule the background timer for the emulated timer. */
> -static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> -			      struct arch_timer_context *timer_ctx)
> -{
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -
> -	if (kvm_timer_should_fire(timer_ctx))
> -		return;
> -
> -	if (!kvm_timer_irq_can_fire(timer_ctx))
> -		return;
> -
> -	/*  The timer has not yet expired, schedule a background timer */
> -	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> -}
> -
>  static void vtimer_save_state(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
  2017-09-23  0:42   ` Christoffer Dall
@ 2017-10-10  9:45     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:45 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Sat, Sep 23 2017 at  2:42:05 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> There is no need to schedule and cancel a hrtimer when entering and
> exiting the guest, because we know when the physical timer is going to
> fire when the guest programs it, and we can simply program the hrtimer
> at that point.
>
> Now when the register modifications from the guest go through the
> kvm_arm_timer_set/get_reg functions, which always call
> kvm_timer_update_state(), we can simply consider the timer state in this
> function and schedule and cancel the timers as needed.
>
> This avoids looking at the physical timer emulation state when entering
> and exiting the VCPU, allowing for faster servicing of the VM when
> needed.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 75 ++++++++++++++++++++++++++++++++---------------
>  1 file changed, 51 insertions(+), 24 deletions(-)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 1f82c21..aa18a5d 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -199,7 +199,27 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
>  
>  static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
>  {
> -	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> +	struct arch_timer_context *ptimer;
> +	struct arch_timer_cpu *timer;
> +	struct kvm_vcpu *vcpu;
> +	u64 ns;
> +
> +	timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
> +	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
> +	ptimer = vcpu_ptimer(vcpu);
> +
> +	/*
> +	 * Check that the timer has really expired from the guest's
> +	 * PoV (NTP on the host may have forced it to expire
> +	 * early). If not ready, schedule for a later time.
> +	 */
> +	ns = kvm_timer_compute_delta(ptimer);
> +	if (unlikely(ns)) {
> +		hrtimer_forward_now(hrt, ns_to_ktime(ns));
> +		return HRTIMER_RESTART;
> +	}

Don't we already have a similar logic for the background timer (I must
admit I've lost track of how we changed things in this series)? If so,
can we make this common code?

> +
> +	kvm_timer_update_irq(vcpu, true, ptimer);
>  	return HRTIMER_NORESTART;
>  }
>  
> @@ -253,24 +273,28 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  }
>  
>  /* Schedule the background timer for the emulated timer. */
> -static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> -			      struct arch_timer_context *timer_ctx)
> +static void phys_timer_emulate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> -	if (kvm_timer_should_fire(timer_ctx))
> -		return;
> -
> -	if (!kvm_timer_irq_can_fire(timer_ctx))
> +	/*
> +	 * If the timer can fire now we have just raised the IRQ line and we
> +	 * don't need to have a soft timer scheduled for the future.  If the
> +	 * timer cannot fire at all, then we also don't need a soft timer.
> +	 */
> +	if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
> +		soft_timer_cancel(&timer->phys_timer, NULL);
>  		return;
> +	}
>  
> -	/*  The timer has not yet expired, schedule a background timer */
> -	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(ptimer));
>  }
>  
>  /*
> - * Check if there was a change in the timer state (should we raise or lower
> - * the line level to the GIC).
> + * Check if there was a change in the timer state, so that we should either
> + * raise or lower the line level to the GIC or schedule a background timer to
> + * emulate the physical timer.
>   */
>  static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  {
> @@ -292,6 +316,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  
>  	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
>  		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> +
> +	phys_timer_emulate(vcpu);
>  }
>  
>  static void vtimer_save_state(struct kvm_vcpu *vcpu)
> @@ -445,6 +471,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  
>  	if (has_vhe())
>  		disable_el1_phys_timer_access();
> +
> +	/* Set the background timer for the physical timer emulation. */
> +	phys_timer_emulate(vcpu);
>  }
>  
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> @@ -480,12 +509,6 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  
>  	if (unlikely(!timer->enabled))
>  		return;
> -
> -	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> -		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> -
> -	/* Set the background timer for the physical timer emulation. */
> -	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
>  }
>  
>  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -500,6 +523,17 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  
>  	vtimer_save_state(vcpu);
>  
> +	/*
> +	 * Cancel the physical timer emulation, because the only case where we
> +	 * need it after a vcpu_put is in the context of a sleeping VCPU, and
> +	 * in that case we already factor in the deadline for the physical
> +	 * timer when scheduling the bg_timer.
> +	 *
> +	 * In any case, we re-schedule the hrtimer for the physical timer when
> +	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
> +	 */
> +	soft_timer_cancel(&timer->phys_timer, NULL);
> +
>  	set_cntvoff(0);
>  }
>  
> @@ -536,16 +570,9 @@ static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
>   */
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	/*
> -	 * This is to cancel the background timer for the physical timer
> -	 * emulation if it is set.
> -	 */
> -	soft_timer_cancel(&timer->phys_timer, NULL);
> -
> -	/*
>  	 * If we entered the guest with the vtimer output asserted we have to
>  	 * check if the guest has modified the timer so that we should lower
>  	 * the line at this point.

Otherwise:

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
@ 2017-10-10  9:45     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23 2017 at  2:42:05 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> There is no need to schedule and cancel a hrtimer when entering and
> exiting the guest, because we know when the physical timer is going to
> fire when the guest programs it, and we can simply program the hrtimer
> at that point.
>
> Now when the register modifications from the guest go through the
> kvm_arm_timer_set/get_reg functions, which always call
> kvm_timer_update_state(), we can simply consider the timer state in this
> function and schedule and cancel the timers as needed.
>
> This avoids looking at the physical timer emulation state when entering
> and exiting the VCPU, allowing for faster servicing of the VM when
> needed.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 75 ++++++++++++++++++++++++++++++++---------------
>  1 file changed, 51 insertions(+), 24 deletions(-)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 1f82c21..aa18a5d 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -199,7 +199,27 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
>  
>  static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
>  {
> -	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> +	struct arch_timer_context *ptimer;
> +	struct arch_timer_cpu *timer;
> +	struct kvm_vcpu *vcpu;
> +	u64 ns;
> +
> +	timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
> +	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
> +	ptimer = vcpu_ptimer(vcpu);
> +
> +	/*
> +	 * Check that the timer has really expired from the guest's
> +	 * PoV (NTP on the host may have forced it to expire
> +	 * early). If not ready, schedule for a later time.
> +	 */
> +	ns = kvm_timer_compute_delta(ptimer);
> +	if (unlikely(ns)) {
> +		hrtimer_forward_now(hrt, ns_to_ktime(ns));
> +		return HRTIMER_RESTART;
> +	}

Don't we already have a similar logic for the background timer (I must
admit I've lost track of how we changed things in this series)? If so,
can we make this common code?

> +
> +	kvm_timer_update_irq(vcpu, true, ptimer);
>  	return HRTIMER_NORESTART;
>  }
>  
> @@ -253,24 +273,28 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  }
>  
>  /* Schedule the background timer for the emulated timer. */
> -static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> -			      struct arch_timer_context *timer_ctx)
> +static void phys_timer_emulate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> -	if (kvm_timer_should_fire(timer_ctx))
> -		return;
> -
> -	if (!kvm_timer_irq_can_fire(timer_ctx))
> +	/*
> +	 * If the timer can fire now we have just raised the IRQ line and we
> +	 * don't need to have a soft timer scheduled for the future.  If the
> +	 * timer cannot fire at all, then we also don't need a soft timer.
> +	 */
> +	if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
> +		soft_timer_cancel(&timer->phys_timer, NULL);
>  		return;
> +	}
>  
> -	/*  The timer has not yet expired, schedule a background timer */
> -	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(ptimer));
>  }
>  
>  /*
> - * Check if there was a change in the timer state (should we raise or lower
> - * the line level to the GIC).
> + * Check if there was a change in the timer state, so that we should either
> + * raise or lower the line level to the GIC or schedule a background timer to
> + * emulate the physical timer.
>   */
>  static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  {
> @@ -292,6 +316,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  
>  	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
>  		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> +
> +	phys_timer_emulate(vcpu);
>  }
>  
>  static void vtimer_save_state(struct kvm_vcpu *vcpu)
> @@ -445,6 +471,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
>  
>  	if (has_vhe())
>  		disable_el1_phys_timer_access();
> +
> +	/* Set the background timer for the physical timer emulation. */
> +	phys_timer_emulate(vcpu);
>  }
>  
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> @@ -480,12 +509,6 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  
>  	if (unlikely(!timer->enabled))
>  		return;
> -
> -	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> -		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> -
> -	/* Set the background timer for the physical timer emulation. */
> -	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
>  }
>  
>  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> @@ -500,6 +523,17 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  
>  	vtimer_save_state(vcpu);
>  
> +	/*
> +	 * Cancel the physical timer emulation, because the only case where we
> +	 * need it after a vcpu_put is in the context of a sleeping VCPU, and
> +	 * in that case we already factor in the deadline for the physical
> +	 * timer when scheduling the bg_timer.
> +	 *
> +	 * In any case, we re-schedule the hrtimer for the physical timer when
> +	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
> +	 */
> +	soft_timer_cancel(&timer->phys_timer, NULL);
> +
>  	set_cntvoff(0);
>  }
>  
> @@ -536,16 +570,9 @@ static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
>   */
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  
>  	/*
> -	 * This is to cancel the background timer for the physical timer
> -	 * emulation if it is set.
> -	 */
> -	soft_timer_cancel(&timer->phys_timer, NULL);
> -
> -	/*
>  	 * If we entered the guest with the vtimer output asserted we have to
>  	 * check if the guest has modified the timer so that we should lower
>  	 * the line at this point.

Otherwise:

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 19/20] KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
  2017-09-23  0:42   ` Christoffer Dall
@ 2017-10-10  9:46     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:46 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Sat, Sep 23 2017 at  2:42:06 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> Now when both the vtimer and the ptimer when using both the in-kernel
> vgic emulation and a userspace IRQ chip are driven by the timer signals
> and at the vcpu load/put boundaries, instead of recomputing the timer
> state at every entry/exit to/from the guest, we can get entirely rid of
> the flush hwstate function.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  include/kvm/arm_arch_timer.h |  1 -
>  virt/kvm/arm/arch_timer.c    | 24 ------------------------
>  virt/kvm/arm/arm.c           |  1 -
>  3 files changed, 26 deletions(-)
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 8e5ed54..af29563 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -61,7 +61,6 @@ int kvm_timer_hyp_init(void);
>  int kvm_timer_enable(struct kvm_vcpu *vcpu);
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
>  void kvm_timer_update_run(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index aa18a5d..f92459a 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -302,12 +302,6 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> -	/*
> -	 * If userspace modified the timer registers via SET_ONE_REG before
> -	 * the vgic was initialized, we mustn't set the vtimer->irq.level value
> -	 * because the guest would never see the interrupt.  Instead wait
> -	 * until we call this function from kvm_timer_flush_hwstate.
> -	 */
>  	if (unlikely(!timer->enabled))
>  		return;
>  
> @@ -493,24 +487,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
>  	       ptimer->irq.level != plevel;
>  }
>  
> -/**
> - * kvm_timer_flush_hwstate - prepare timers before running the vcpu
> - * @vcpu: The vcpu pointer
> - *
> - * Check if the virtual timer has expired while we were running in the host,
> - * and inject an interrupt if that was the case, making sure the timer is
> - * masked or disabled on the host so that we keep executing.  Also schedule a
> - * software timer for the physical timer if it is enabled.
> - */
> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> -{
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> -
> -	if (unlikely(!timer->enabled))
> -		return;
> -}
> -
>  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 132d39a..14c50d1 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -656,7 +656,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		local_irq_disable();
>  
> -		kvm_timer_flush_hwstate(vcpu);
>  		kvm_vgic_flush_hwstate(vcpu);
>  
>  		/*

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 19/20] KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
@ 2017-10-10  9:46     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23 2017 at  2:42:06 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> Now when both the vtimer and the ptimer when using both the in-kernel
> vgic emulation and a userspace IRQ chip are driven by the timer signals
> and at the vcpu load/put boundaries, instead of recomputing the timer
> state at every entry/exit to/from the guest, we can get entirely rid of
> the flush hwstate function.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  include/kvm/arm_arch_timer.h |  1 -
>  virt/kvm/arm/arch_timer.c    | 24 ------------------------
>  virt/kvm/arm/arm.c           |  1 -
>  3 files changed, 26 deletions(-)
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 8e5ed54..af29563 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -61,7 +61,6 @@ int kvm_timer_hyp_init(void);
>  int kvm_timer_enable(struct kvm_vcpu *vcpu);
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu);
>  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
>  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu);
>  void kvm_timer_update_run(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index aa18a5d..f92459a 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -302,12 +302,6 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> -	/*
> -	 * If userspace modified the timer registers via SET_ONE_REG before
> -	 * the vgic was initialized, we mustn't set the vtimer->irq.level value
> -	 * because the guest would never see the interrupt.  Instead wait
> -	 * until we call this function from kvm_timer_flush_hwstate.
> -	 */
>  	if (unlikely(!timer->enabled))
>  		return;
>  
> @@ -493,24 +487,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
>  	       ptimer->irq.level != plevel;
>  }
>  
> -/**
> - * kvm_timer_flush_hwstate - prepare timers before running the vcpu
> - * @vcpu: The vcpu pointer
> - *
> - * Check if the virtual timer has expired while we were running in the host,
> - * and inject an interrupt if that was the case, making sure the timer is
> - * masked or disabled on the host so that we keep executing.  Also schedule a
> - * software timer for the physical timer if it is enabled.
> - */
> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> -{
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> -
> -	if (unlikely(!timer->enabled))
> -		return;
> -}
> -
>  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 132d39a..14c50d1 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -656,7 +656,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		local_irq_disable();
>  
> -		kvm_timer_flush_hwstate(vcpu);
>  		kvm_vgic_flush_hwstate(vcpu);
>  
>  		/*

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 20/20] KVM: arm/arm64: Rework kvm_timer_should_fire
  2017-09-23  0:42   ` Christoffer Dall
@ 2017-10-10  9:59     ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:59 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Sat, Sep 23 2017 at  2:42:07 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> kvm_timer_should_fire() can be called in two different situations from
> the kvm_vcpu_block().
>
> The first case is before calling kvm_timer_schedule(), used for wait
> polling, and in this case the VCPU thread is running and the timer state
> is loaded onto the hardware so all we have to do is check if the virtual
> interrupt lines are asserted, becasue the timer interrupt handler
> functions will raise those lines as appropriate.
>
> The second case is inside the wait loop of kvm_vcpu_block(), where we
> have already called kvm_timer_schedule() and therefore the hardware will
> be disabled and the software view of the timer state is up to date
> (timer->loaded is false), and so we can simply check if the timer should
> fire by looking at the software state.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  include/kvm/arm_arch_timer.h |  3 ++-
>  virt/kvm/arm/arch_timer.c    | 22 +++++++++++++++++++++-
>  virt/kvm/arm/arm.c           |  3 +--
>  3 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index af29563..250db34 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -73,7 +73,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>  int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>  int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>  
> -bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
> +bool kvm_timer_is_pending(struct kvm_vcpu *vcpu);
> +
>  void kvm_timer_schedule(struct kvm_vcpu *vcpu);
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index f92459a..1d0cd3a 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -49,6 +49,7 @@ static const struct kvm_irq_level default_vtimer_irq = {
>  static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
>  static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  				 struct arch_timer_context *timer_ctx);
> +static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
>  
>  u64 kvm_phys_timer_read(void)
>  {
> @@ -223,7 +224,7 @@ static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
>  	return HRTIMER_NORESTART;
>  }
>  
> -bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
> +static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  {
>  	u64 cval, now;
>  
> @@ -236,6 +237,25 @@ bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  	return cval <= now;
>  }
>  
> +bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +
> +	if (vtimer->irq.level || ptimer->irq.level)
> +		return true;
> +
> +	/*
> +	 * When this is called from withing the wait loop of kvm_vcpu_block(),
> +	 * the software view of the timer state is up to date (timer->loaded
> +	 * is false), and so we can simply check if the timer should fire now.
> +	 */
> +	if (!vtimer->loaded && kvm_timer_should_fire(vtimer))
> +		return true;
> +
> +	return kvm_timer_should_fire(ptimer);
> +}
> +
>  /*
>   * Reflect the timer output level into the kvm_run structure
>   */
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 14c50d1..bc126fb 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -307,8 +307,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_timer_should_fire(vcpu_vtimer(vcpu)) ||
> -	       kvm_timer_should_fire(vcpu_ptimer(vcpu));
> +	return kvm_timer_is_pending(vcpu);
>  }
>  
>  void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 20/20] KVM: arm/arm64: Rework kvm_timer_should_fire
@ 2017-10-10  9:59     ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-10  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Sep 23 2017 at  2:42:07 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> kvm_timer_should_fire() can be called in two different situations from
> the kvm_vcpu_block().
>
> The first case is before calling kvm_timer_schedule(), used for wait
> polling, and in this case the VCPU thread is running and the timer state
> is loaded onto the hardware so all we have to do is check if the virtual
> interrupt lines are asserted, becasue the timer interrupt handler
> functions will raise those lines as appropriate.
>
> The second case is inside the wait loop of kvm_vcpu_block(), where we
> have already called kvm_timer_schedule() and therefore the hardware will
> be disabled and the software view of the timer state is up to date
> (timer->loaded is false), and so we can simply check if the timer should
> fire by looking at the software state.
>
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  include/kvm/arm_arch_timer.h |  3 ++-
>  virt/kvm/arm/arch_timer.c    | 22 +++++++++++++++++++++-
>  virt/kvm/arm/arm.c           |  3 +--
>  3 files changed, 24 insertions(+), 4 deletions(-)
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index af29563..250db34 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -73,7 +73,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>  int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>  int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
>  
> -bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
> +bool kvm_timer_is_pending(struct kvm_vcpu *vcpu);
> +
>  void kvm_timer_schedule(struct kvm_vcpu *vcpu);
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index f92459a..1d0cd3a 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -49,6 +49,7 @@ static const struct kvm_irq_level default_vtimer_irq = {
>  static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
>  static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
>  				 struct arch_timer_context *timer_ctx);
> +static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx);
>  
>  u64 kvm_phys_timer_read(void)
>  {
> @@ -223,7 +224,7 @@ static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
>  	return HRTIMER_NORESTART;
>  }
>  
> -bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
> +static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  {
>  	u64 cval, now;
>  
> @@ -236,6 +237,25 @@ bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
>  	return cval <= now;
>  }
>  
> +bool kvm_timer_is_pending(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +
> +	if (vtimer->irq.level || ptimer->irq.level)
> +		return true;
> +
> +	/*
> +	 * When this is called from withing the wait loop of kvm_vcpu_block(),
> +	 * the software view of the timer state is up to date (timer->loaded
> +	 * is false), and so we can simply check if the timer should fire now.
> +	 */
> +	if (!vtimer->loaded && kvm_timer_should_fire(vtimer))
> +		return true;
> +
> +	return kvm_timer_should_fire(ptimer);
> +}
> +
>  /*
>   * Reflect the timer output level into the kvm_run structure
>   */
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 14c50d1..bc126fb 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -307,8 +307,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  {
> -	return kvm_timer_should_fire(vcpu_vtimer(vcpu)) ||
> -	       kvm_timer_should_fire(vcpu_ptimer(vcpu));
> +	return kvm_timer_is_pending(vcpu);
>  }
>  
>  void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads
  2017-09-23  0:41   ` Christoffer Dall
@ 2017-10-17 15:33     ` Will Deacon
  -1 siblings, 0 replies; 110+ messages in thread
From: Will Deacon @ 2017-10-17 15:33 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Marc Zyngier, Catalin Marinas, kvmarm, linux-arm-kernel, kvm

Hi Christoffer,

On Sat, Sep 23, 2017 at 02:41:49AM +0200, Christoffer Dall wrote:
> Using the physical counter allows KVM to retain the offset between the
> virtual and physical counter as long as it is actively running a VCPU.
> 
> As soon as a VCPU is released, another thread is scheduled or we start
> running userspace applications, we reset the offset to 0, so that
> userspace accessing the virtual timer can still read the cirtual counter
> and get the same view of time as the kernel.
> 
> This opens up potential improvements for KVM performance.
> 
> VHE kernels or kernels continuing to use the virtual timer are
> unaffected.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm64/include/asm/arch_timer.h  | 9 ++++-----
>  drivers/clocksource/arm_arch_timer.c | 3 +--
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index a652ce0..1859a1c 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  
>  static inline u64 arch_counter_get_cntpct(void)
>  {
> -	/*
> -	 * AArch64 kernel and user space mandate the use of CNTVCT.
> -	 */
> -	BUG();
> -	return 0;
> +	u64 cval;
> +	isb();
> +	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> +	return cval;
>  }
>  
>  static inline u64 arch_counter_get_cntvct(void)
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index fd4b7f6..9b3322a 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
>  
>  	/* Register the CP15 based counter if we have one */
>  	if (type & ARCH_TIMER_TYPE_CP15) {
> -		if (IS_ENABLED(CONFIG_ARM64) ||
> -		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)

Please can you add an is_hyp_mode_available() check here, as you suggested
last time?

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/521542.html

Without it, I worry that the kernel timekeeper will be out of sync with the
vDSO (which uses the virtual counter) on systems where CNTVOFF is
initialised to a consistent non-zero offset and Linux was loaded at EL1.

Thanks,

Will

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads
@ 2017-10-17 15:33     ` Will Deacon
  0 siblings, 0 replies; 110+ messages in thread
From: Will Deacon @ 2017-10-17 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On Sat, Sep 23, 2017 at 02:41:49AM +0200, Christoffer Dall wrote:
> Using the physical counter allows KVM to retain the offset between the
> virtual and physical counter as long as it is actively running a VCPU.
> 
> As soon as a VCPU is released, another thread is scheduled or we start
> running userspace applications, we reset the offset to 0, so that
> userspace accessing the virtual timer can still read the cirtual counter
> and get the same view of time as the kernel.
> 
> This opens up potential improvements for KVM performance.
> 
> VHE kernels or kernels continuing to use the virtual timer are
> unaffected.
> 
> Signed-off-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm64/include/asm/arch_timer.h  | 9 ++++-----
>  drivers/clocksource/arm_arch_timer.c | 3 +--
>  2 files changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index a652ce0..1859a1c 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  
>  static inline u64 arch_counter_get_cntpct(void)
>  {
> -	/*
> -	 * AArch64 kernel and user space mandate the use of CNTVCT.
> -	 */
> -	BUG();
> -	return 0;
> +	u64 cval;
> +	isb();
> +	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> +	return cval;
>  }
>  
>  static inline u64 arch_counter_get_cntvct(void)
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index fd4b7f6..9b3322a 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
>  
>  	/* Register the CP15 based counter if we have one */
>  	if (type & ARCH_TIMER_TYPE_CP15) {
> -		if (IS_ENABLED(CONFIG_ARM64) ||
> -		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)

Please can you add an is_hyp_mode_available() check here, as you suggested
last time?

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/521542.html

Without it, I worry that the kernel timekeeper will be out of sync with the
vDSO (which uses the virtual counter) on systems where CNTVOFF is
initialised to a consistent non-zero offset and Linux was loaded at EL1.

Thanks,

Will

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads
  2017-10-17 15:33     ` Will Deacon
@ 2017-10-18 10:00       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 10:00 UTC (permalink / raw)
  To: Will Deacon; +Cc: kvmarm, linux-arm-kernel, kvm, Marc Zyngier, Catalin Marinas

On Tue, Oct 17, 2017 at 04:33:05PM +0100, Will Deacon wrote:
> Hi Christoffer,
> 
> On Sat, Sep 23, 2017 at 02:41:49AM +0200, Christoffer Dall wrote:
> > Using the physical counter allows KVM to retain the offset between the
> > virtual and physical counter as long as it is actively running a VCPU.
> > 
> > As soon as a VCPU is released, another thread is scheduled or we start
> > running userspace applications, we reset the offset to 0, so that
> > userspace accessing the virtual timer can still read the cirtual counter
> > and get the same view of time as the kernel.
> > 
> > This opens up potential improvements for KVM performance.
> > 
> > VHE kernels or kernels continuing to use the virtual timer are
> > unaffected.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  arch/arm64/include/asm/arch_timer.h  | 9 ++++-----
> >  drivers/clocksource/arm_arch_timer.c | 3 +--
> >  2 files changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> > index a652ce0..1859a1c 100644
> > --- a/arch/arm64/include/asm/arch_timer.h
> > +++ b/arch/arm64/include/asm/arch_timer.h
> > @@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
> >  
> >  static inline u64 arch_counter_get_cntpct(void)
> >  {
> > -	/*
> > -	 * AArch64 kernel and user space mandate the use of CNTVCT.
> > -	 */
> > -	BUG();
> > -	return 0;
> > +	u64 cval;
> > +	isb();
> > +	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> > +	return cval;
> >  }
> >  
> >  static inline u64 arch_counter_get_cntvct(void)
> > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > index fd4b7f6..9b3322a 100644
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
> >  
> >  	/* Register the CP15 based counter if we have one */
> >  	if (type & ARCH_TIMER_TYPE_CP15) {
> > -		if (IS_ENABLED(CONFIG_ARM64) ||
> > -		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> > +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> 
> Please can you add an is_hyp_mode_available() check here, as you suggested
> last time?
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/521542.html
> 
> Without it, I worry that the kernel timekeeper will be out of sync with the
> vDSO (which uses the virtual counter) on systems where CNTVOFF is
> initialised to a consistent non-zero offset and Linux was loaded at EL1.
> 

Yes, will do.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads
@ 2017-10-18 10:00       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 10:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 17, 2017 at 04:33:05PM +0100, Will Deacon wrote:
> Hi Christoffer,
> 
> On Sat, Sep 23, 2017 at 02:41:49AM +0200, Christoffer Dall wrote:
> > Using the physical counter allows KVM to retain the offset between the
> > virtual and physical counter as long as it is actively running a VCPU.
> > 
> > As soon as a VCPU is released, another thread is scheduled or we start
> > running userspace applications, we reset the offset to 0, so that
> > userspace accessing the virtual timer can still read the cirtual counter
> > and get the same view of time as the kernel.
> > 
> > This opens up potential improvements for KVM performance.
> > 
> > VHE kernels or kernels continuing to use the virtual timer are
> > unaffected.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  arch/arm64/include/asm/arch_timer.h  | 9 ++++-----
> >  drivers/clocksource/arm_arch_timer.c | 3 +--
> >  2 files changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> > index a652ce0..1859a1c 100644
> > --- a/arch/arm64/include/asm/arch_timer.h
> > +++ b/arch/arm64/include/asm/arch_timer.h
> > @@ -148,11 +148,10 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
> >  
> >  static inline u64 arch_counter_get_cntpct(void)
> >  {
> > -	/*
> > -	 * AArch64 kernel and user space mandate the use of CNTVCT.
> > -	 */
> > -	BUG();
> > -	return 0;
> > +	u64 cval;
> > +	isb();
> > +	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> > +	return cval;
> >  }
> >  
> >  static inline u64 arch_counter_get_cntvct(void)
> > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > index fd4b7f6..9b3322a 100644
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -890,8 +890,7 @@ static void __init arch_counter_register(unsigned type)
> >  
> >  	/* Register the CP15 based counter if we have one */
> >  	if (type & ARCH_TIMER_TYPE_CP15) {
> > -		if (IS_ENABLED(CONFIG_ARM64) ||
> > -		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> > +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> 
> Please can you add an is_hyp_mode_available() check here, as you suggested
> last time?
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2017-July/521542.html
> 
> Without it, I worry that the kernel timekeeper will be out of sync with the
> vDSO (which uses the virtual counter) on systems where CNTVOFF is
> initialised to a consistent non-zero offset and Linux was loaded at EL1.
> 

Yes, will do.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles
  2017-10-09 16:21     ` Marc Zyngier
@ 2017-10-18 11:34       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 11:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas,
	Mark Rutland

On Mon, Oct 09, 2017 at 05:21:24PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
> > arm64, but as we move to using the physical timer for the in-kernel
> > time-keeping, we need to make that more flexible.
> > 
> > First, we need to make sure the physical counter can be read on equal
> > terms to the virtual counter, which includes adding physical counter
> > read functions for timers that require errata.
> > 
> > Second, we need to make a choice between reading the physical vs virtual
> > counter, depending on which timer is used for time keeping in the kernel
> > otherwise.  We can do this using a static key to avoid a performance
> > penalty during runtime when reading the counter.
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> 
> Right. I should have read patch #3. I'm an idiot.
> 
> > ---
> >  arch/arm64/include/asm/arch_timer.h  | 15 ++++++++++++---
> >  arch/arm64/include/asm/timex.h       |  2 +-
> >  drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
> >  3 files changed, 43 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> > index 1859a1c..c56d8cd 100644
> > --- a/arch/arm64/include/asm/arch_timer.h
> > +++ b/arch/arm64/include/asm/arch_timer.h
> > @@ -30,6 +30,8 @@
> >  
> >  #include <clocksource/arm_arch_timer.h>
> >  
> > +extern struct static_key_false arch_timer_phys_counter_available;
> > +
> >  #if IS_ENABLED(CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND)
> >  extern struct static_key_false arch_timer_read_ool_enabled;
> >  #define needs_unstable_timer_counter_workaround() \
> > @@ -52,6 +54,7 @@ struct arch_timer_erratum_workaround {
> >  	const char *desc;
> >  	u32 (*read_cntp_tval_el0)(void);
> >  	u32 (*read_cntv_tval_el0)(void);
> > +	u64 (*read_cntpct_el0)(void);
> >  	u64 (*read_cntvct_el0)(void);
> >  	int (*set_next_event_phys)(unsigned long, struct clock_event_device *);
> >  	int (*set_next_event_virt)(unsigned long, struct clock_event_device *);
> > @@ -148,10 +151,8 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
> >  
> >  static inline u64 arch_counter_get_cntpct(void)
> >  {
> > -	u64 cval;
> >  	isb();
> > -	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> > -	return cval;
> > +	return arch_timer_reg_read_stable(cntpct_el0);
> >  }
> >  
> >  static inline u64 arch_counter_get_cntvct(void)
> > @@ -160,6 +161,14 @@ static inline u64 arch_counter_get_cntvct(void)
> >  	return arch_timer_reg_read_stable(cntvct_el0);
> >  }
> >  
> > +static inline u64 arch_counter_get_cycles(void)
> > +{
> > +	if (static_branch_unlikely(&arch_timer_phys_counter_available))
> > +	    return arch_counter_get_cntpct();
> > +	else
> > +	    return arch_counter_get_cntvct();
> > +}
> > +
> >  static inline int arch_timer_arch_init(void)
> >  {
> >  	return 0;
> > diff --git a/arch/arm64/include/asm/timex.h b/arch/arm64/include/asm/timex.h
> > index 81a076e..c0d214c 100644
> > --- a/arch/arm64/include/asm/timex.h
> > +++ b/arch/arm64/include/asm/timex.h
> > @@ -22,7 +22,7 @@
> >   * Use the current timer as a cycle counter since this is what we use for
> >   * the delay loop.
> >   */
> > -#define get_cycles()	arch_counter_get_cntvct()
> > +#define get_cycles()	arch_counter_get_cycles()
> 
> Why can't this be arch_timer_read_counter() instead? Is there any 
> measurable advantage in using a static key compared to a memory 
> indirection?
> 

No reason.  I think I thought there was an include dependency issue that
led me to do it the other way, but I must have confused myself, because
using arch_timer_read_counter seems to work perfectly well.

> >  
> >  #include <asm-generic/timex.h>
> >  
> > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > index 9b3322a..f35da20 100644
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -77,6 +77,9 @@ static bool arch_timer_mem_use_virtual;
> >  static bool arch_counter_suspend_stop;
> >  static bool vdso_default = true;
> >  
> > +DEFINE_STATIC_KEY_FALSE(arch_timer_phys_counter_available);
> > +EXPORT_SYMBOL_GPL(arch_timer_phys_counter_available);
> > +
> >  static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
> >  
> >  static int __init early_evtstrm_cfg(char *buf)
> > @@ -217,6 +220,11 @@ static u32 notrace fsl_a008585_read_cntv_tval_el0(void)
> >  	return __fsl_a008585_read_reg(cntv_tval_el0);
> >  }
> >  
> > +static u64 notrace fsl_a008585_read_cntpct_el0(void)
> > +{
> > +	return __fsl_a008585_read_reg(cntpct_el0);
> > +}
> > +
> >  static u64 notrace fsl_a008585_read_cntvct_el0(void)
> >  {
> >  	return __fsl_a008585_read_reg(cntvct_el0);
> > @@ -258,6 +266,11 @@ static u32 notrace hisi_161010101_read_cntv_tval_el0(void)
> >  	return __hisi_161010101_read_reg(cntv_tval_el0);
> >  }
> >  
> > +static u64 notrace hisi_161010101_read_cntpct_el0(void)
> > +{
> > +	return __hisi_161010101_read_reg(cntpct_el0);
> > +}
> > +
> >  static u64 notrace hisi_161010101_read_cntvct_el0(void)
> >  {
> >  	return __hisi_161010101_read_reg(cntvct_el0);
> > @@ -288,6 +301,15 @@ static struct ate_acpi_oem_info hisi_161010101_oem_info[] = {
> >  #endif
> >  
> >  #ifdef CONFIG_ARM64_ERRATUM_858921
> > +static u64 notrace arm64_858921_read_cntpct_el0(void)
> > +{
> > +	u64 old, new;
> > +
> > +	old = read_sysreg(cntpct_el0);
> > +	new = read_sysreg(cntpct_el0);
> > +	return (((old ^ new) >> 32) & 1) ? old : new;
> > +}
> > +
> >  static u64 notrace arm64_858921_read_cntvct_el0(void)
> >  {
> >  	u64 old, new;
> > @@ -346,6 +368,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
> >  		.desc = "Freescale erratum a005858",
> >  		.read_cntp_tval_el0 = fsl_a008585_read_cntp_tval_el0,
> >  		.read_cntv_tval_el0 = fsl_a008585_read_cntv_tval_el0,
> > +		.read_cntpct_el0 = fsl_a008585_read_cntpct_el0,
> >  		.read_cntvct_el0 = fsl_a008585_read_cntvct_el0,
> >  		.set_next_event_phys = erratum_set_next_event_tval_phys,
> >  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> > @@ -358,6 +381,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
> >  		.desc = "HiSilicon erratum 161010101",
> >  		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
> >  		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
> > +		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
> >  		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
> >  		.set_next_event_phys = erratum_set_next_event_tval_phys,
> >  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> > @@ -368,6 +392,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
> >  		.desc = "HiSilicon erratum 161010101",
> >  		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
> >  		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
> > +		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
> >  		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
> >  		.set_next_event_phys = erratum_set_next_event_tval_phys,
> >  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> > @@ -378,6 +403,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
> >  		.match_type = ate_match_local_cap_id,
> >  		.id = (void *)ARM64_WORKAROUND_858921,
> >  		.desc = "ARM erratum 858921",
> > +		.read_cntpct_el0 = arm64_858921_read_cntpct_el0,
> >  		.read_cntvct_el0 = arm64_858921_read_cntvct_el0,
> >  	},
> >  #endif
> > @@ -890,10 +916,12 @@ static void __init arch_counter_register(unsigned type)
> >  
> >  	/* Register the CP15 based counter if we have one */
> >  	if (type & ARCH_TIMER_TYPE_CP15) {
> > -		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> > +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
> >  			arch_timer_read_counter = arch_counter_get_cntvct;
> > -		else
> > +		} else {
> >  			arch_timer_read_counter = arch_counter_get_cntpct;
> > +			static_branch_enable(&arch_timer_phys_counter_available);
> > +		}
> >  
> >  		clocksource_counter.archdata.vdso_direct = vdso_default;
> >  	} else {
> > 
> 
> In my reply to patch #2, I had the following hunk:
> 
> @@ -310,7 +329,7 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
>  						struct clock_event_device *clk)
>  {
>  	unsigned long ctrl;
> -	u64 cval = evt + arch_counter_get_cntvct();
> +	u64 cval = evt + arch_timer_read_counter();
>  
>  	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
>  	ctrl |= ARCH_TIMER_CTRL_ENABLE;
> 
> Once we start using a different timer, this could well have an effect...
> 

Right, but wouldn't the following be a more correct way to go about it then:

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 9a7b359..07f19db 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -329,16 +329,19 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
 						struct clock_event_device *clk)
 {
 	unsigned long ctrl;
-	u64 cval = evt + arch_timer_read_counter();
+	u64 cval;
 
 	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
 	ctrl |= ARCH_TIMER_CTRL_ENABLE;
 	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
 
-	if (access == ARCH_TIMER_PHYS_ACCESS)
+	if (access == ARCH_TIMER_PHYS_ACCESS) {
+		cval = evt + arch_counter_get_cntpct();
 		write_sysreg(cval, cntp_cval_el0);
-	else
+	} else {
+		cval = evt + arch_counter_get_cntvct();
 		write_sysreg(cval, cntv_cval_el0);
+	}
 
 	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
 }


Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles
@ 2017-10-18 11:34       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 11:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 05:21:24PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
> > arm64, but as we move to using the physical timer for the in-kernel
> > time-keeping, we need to make that more flexible.
> > 
> > First, we need to make sure the physical counter can be read on equal
> > terms to the virtual counter, which includes adding physical counter
> > read functions for timers that require errata.
> > 
> > Second, we need to make a choice between reading the physical vs virtual
> > counter, depending on which timer is used for time keeping in the kernel
> > otherwise.  We can do this using a static key to avoid a performance
> > penalty during runtime when reading the counter.
> > 
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> 
> Right. I should have read patch #3. I'm an idiot.
> 
> > ---
> >  arch/arm64/include/asm/arch_timer.h  | 15 ++++++++++++---
> >  arch/arm64/include/asm/timex.h       |  2 +-
> >  drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++++++++--
> >  3 files changed, 43 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> > index 1859a1c..c56d8cd 100644
> > --- a/arch/arm64/include/asm/arch_timer.h
> > +++ b/arch/arm64/include/asm/arch_timer.h
> > @@ -30,6 +30,8 @@
> >  
> >  #include <clocksource/arm_arch_timer.h>
> >  
> > +extern struct static_key_false arch_timer_phys_counter_available;
> > +
> >  #if IS_ENABLED(CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND)
> >  extern struct static_key_false arch_timer_read_ool_enabled;
> >  #define needs_unstable_timer_counter_workaround() \
> > @@ -52,6 +54,7 @@ struct arch_timer_erratum_workaround {
> >  	const char *desc;
> >  	u32 (*read_cntp_tval_el0)(void);
> >  	u32 (*read_cntv_tval_el0)(void);
> > +	u64 (*read_cntpct_el0)(void);
> >  	u64 (*read_cntvct_el0)(void);
> >  	int (*set_next_event_phys)(unsigned long, struct clock_event_device *);
> >  	int (*set_next_event_virt)(unsigned long, struct clock_event_device *);
> > @@ -148,10 +151,8 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
> >  
> >  static inline u64 arch_counter_get_cntpct(void)
> >  {
> > -	u64 cval;
> >  	isb();
> > -	asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> > -	return cval;
> > +	return arch_timer_reg_read_stable(cntpct_el0);
> >  }
> >  
> >  static inline u64 arch_counter_get_cntvct(void)
> > @@ -160,6 +161,14 @@ static inline u64 arch_counter_get_cntvct(void)
> >  	return arch_timer_reg_read_stable(cntvct_el0);
> >  }
> >  
> > +static inline u64 arch_counter_get_cycles(void)
> > +{
> > +	if (static_branch_unlikely(&arch_timer_phys_counter_available))
> > +	    return arch_counter_get_cntpct();
> > +	else
> > +	    return arch_counter_get_cntvct();
> > +}
> > +
> >  static inline int arch_timer_arch_init(void)
> >  {
> >  	return 0;
> > diff --git a/arch/arm64/include/asm/timex.h b/arch/arm64/include/asm/timex.h
> > index 81a076e..c0d214c 100644
> > --- a/arch/arm64/include/asm/timex.h
> > +++ b/arch/arm64/include/asm/timex.h
> > @@ -22,7 +22,7 @@
> >   * Use the current timer as a cycle counter since this is what we use for
> >   * the delay loop.
> >   */
> > -#define get_cycles()	arch_counter_get_cntvct()
> > +#define get_cycles()	arch_counter_get_cycles()
> 
> Why can't this be arch_timer_read_counter() instead? Is there any 
> measurable advantage in using a static key compared to a memory 
> indirection?
> 

No reason.  I think I thought there was an include dependency issue that
led me to do it the other way, but I must have confused myself, because
using arch_timer_read_counter seems to work perfectly well.

> >  
> >  #include <asm-generic/timex.h>
> >  
> > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> > index 9b3322a..f35da20 100644
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -77,6 +77,9 @@ static bool arch_timer_mem_use_virtual;
> >  static bool arch_counter_suspend_stop;
> >  static bool vdso_default = true;
> >  
> > +DEFINE_STATIC_KEY_FALSE(arch_timer_phys_counter_available);
> > +EXPORT_SYMBOL_GPL(arch_timer_phys_counter_available);
> > +
> >  static bool evtstrm_enable = IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM);
> >  
> >  static int __init early_evtstrm_cfg(char *buf)
> > @@ -217,6 +220,11 @@ static u32 notrace fsl_a008585_read_cntv_tval_el0(void)
> >  	return __fsl_a008585_read_reg(cntv_tval_el0);
> >  }
> >  
> > +static u64 notrace fsl_a008585_read_cntpct_el0(void)
> > +{
> > +	return __fsl_a008585_read_reg(cntpct_el0);
> > +}
> > +
> >  static u64 notrace fsl_a008585_read_cntvct_el0(void)
> >  {
> >  	return __fsl_a008585_read_reg(cntvct_el0);
> > @@ -258,6 +266,11 @@ static u32 notrace hisi_161010101_read_cntv_tval_el0(void)
> >  	return __hisi_161010101_read_reg(cntv_tval_el0);
> >  }
> >  
> > +static u64 notrace hisi_161010101_read_cntpct_el0(void)
> > +{
> > +	return __hisi_161010101_read_reg(cntpct_el0);
> > +}
> > +
> >  static u64 notrace hisi_161010101_read_cntvct_el0(void)
> >  {
> >  	return __hisi_161010101_read_reg(cntvct_el0);
> > @@ -288,6 +301,15 @@ static struct ate_acpi_oem_info hisi_161010101_oem_info[] = {
> >  #endif
> >  
> >  #ifdef CONFIG_ARM64_ERRATUM_858921
> > +static u64 notrace arm64_858921_read_cntpct_el0(void)
> > +{
> > +	u64 old, new;
> > +
> > +	old = read_sysreg(cntpct_el0);
> > +	new = read_sysreg(cntpct_el0);
> > +	return (((old ^ new) >> 32) & 1) ? old : new;
> > +}
> > +
> >  static u64 notrace arm64_858921_read_cntvct_el0(void)
> >  {
> >  	u64 old, new;
> > @@ -346,6 +368,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
> >  		.desc = "Freescale erratum a005858",
> >  		.read_cntp_tval_el0 = fsl_a008585_read_cntp_tval_el0,
> >  		.read_cntv_tval_el0 = fsl_a008585_read_cntv_tval_el0,
> > +		.read_cntpct_el0 = fsl_a008585_read_cntpct_el0,
> >  		.read_cntvct_el0 = fsl_a008585_read_cntvct_el0,
> >  		.set_next_event_phys = erratum_set_next_event_tval_phys,
> >  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> > @@ -358,6 +381,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
> >  		.desc = "HiSilicon erratum 161010101",
> >  		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
> >  		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
> > +		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
> >  		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
> >  		.set_next_event_phys = erratum_set_next_event_tval_phys,
> >  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> > @@ -368,6 +392,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
> >  		.desc = "HiSilicon erratum 161010101",
> >  		.read_cntp_tval_el0 = hisi_161010101_read_cntp_tval_el0,
> >  		.read_cntv_tval_el0 = hisi_161010101_read_cntv_tval_el0,
> > +		.read_cntpct_el0 = hisi_161010101_read_cntpct_el0,
> >  		.read_cntvct_el0 = hisi_161010101_read_cntvct_el0,
> >  		.set_next_event_phys = erratum_set_next_event_tval_phys,
> >  		.set_next_event_virt = erratum_set_next_event_tval_virt,
> > @@ -378,6 +403,7 @@ static const struct arch_timer_erratum_workaround ool_workarounds[] = {
> >  		.match_type = ate_match_local_cap_id,
> >  		.id = (void *)ARM64_WORKAROUND_858921,
> >  		.desc = "ARM erratum 858921",
> > +		.read_cntpct_el0 = arm64_858921_read_cntpct_el0,
> >  		.read_cntvct_el0 = arm64_858921_read_cntvct_el0,
> >  	},
> >  #endif
> > @@ -890,10 +916,12 @@ static void __init arch_counter_register(unsigned type)
> >  
> >  	/* Register the CP15 based counter if we have one */
> >  	if (type & ARCH_TIMER_TYPE_CP15) {
> > -		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI)
> > +		if (arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
> >  			arch_timer_read_counter = arch_counter_get_cntvct;
> > -		else
> > +		} else {
> >  			arch_timer_read_counter = arch_counter_get_cntpct;
> > +			static_branch_enable(&arch_timer_phys_counter_available);
> > +		}
> >  
> >  		clocksource_counter.archdata.vdso_direct = vdso_default;
> >  	} else {
> > 
> 
> In my reply to patch #2, I had the following hunk:
> 
> @@ -310,7 +329,7 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
>  						struct clock_event_device *clk)
>  {
>  	unsigned long ctrl;
> -	u64 cval = evt + arch_counter_get_cntvct();
> +	u64 cval = evt + arch_timer_read_counter();
>  
>  	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
>  	ctrl |= ARCH_TIMER_CTRL_ENABLE;
> 
> Once we start using a different timer, this could well have an effect...
> 

Right, but wouldn't the following be a more correct way to go about it then:

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 9a7b359..07f19db 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -329,16 +329,19 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
 						struct clock_event_device *clk)
 {
 	unsigned long ctrl;
-	u64 cval = evt + arch_timer_read_counter();
+	u64 cval;
 
 	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
 	ctrl |= ARCH_TIMER_CTRL_ENABLE;
 	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
 
-	if (access == ARCH_TIMER_PHYS_ACCESS)
+	if (access == ARCH_TIMER_PHYS_ACCESS) {
+		cval = evt + arch_counter_get_cntpct();
 		write_sysreg(cval, cntp_cval_el0);
-	else
+	} else {
+		cval = evt + arch_counter_get_cntvct();
 		write_sysreg(cval, cntv_cval_el0);
+	}
 
 	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
 }


Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 05/20] KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context
  2017-10-09 16:37     ` Marc Zyngier
@ 2017-10-18 11:54       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 11:54 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel, kvm

On Mon, Oct 09, 2017 at 05:37:43PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > We are about to optimize our timer handling logic which involves
> > injecting irqs to the vgic directly from the irq handler.
> > 
> > Unfortunately, the injection path can take any AP list lock and irq lock
> > and we must therefore make sure to use spin_lock_irqsave where ever
> > interrupts are enabled and we are taking any of those locks, to avoid
> > deadlocking between process context and the ISR.
> > 
> > This changes a lot of the VGIC code, but The good news are that the
> > changes are mostly mechanical.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic/vgic-its.c     | 17 +++++++-----
> >  virt/kvm/arm/vgic/vgic-mmio-v2.c | 22 +++++++++------
> >  virt/kvm/arm/vgic/vgic-mmio-v3.c | 17 +++++++-----
> >  virt/kvm/arm/vgic/vgic-mmio.c    | 44 +++++++++++++++++------------
> >  virt/kvm/arm/vgic/vgic-v2.c      |  5 ++--
> >  virt/kvm/arm/vgic/vgic-v3.c      | 12 ++++----
> >  virt/kvm/arm/vgic/vgic.c         | 60 +++++++++++++++++++++++++---------------
> >  virt/kvm/arm/vgic/vgic.h         |  3 +-
> >  8 files changed, 108 insertions(+), 72 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> > index f51c1e1..9f5e347 100644
> > --- a/virt/kvm/arm/vgic/vgic-its.c
> > +++ b/virt/kvm/arm/vgic/vgic-its.c
> > @@ -278,6 +278,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
> >  	u64 propbase = GICR_PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
> >  	u8 prop;
> >  	int ret;
> > +	unsigned long flags;
> >  
> >  	ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
> >  			     &prop, 1);
> > @@ -285,15 +286,15 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
> >  	if (ret)
> >  		return ret;
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	if (!filter_vcpu || filter_vcpu == irq->target_vcpu) {
> >  		irq->priority = LPI_PROP_PRIORITY(prop);
> >  		irq->enabled = LPI_PROP_ENABLE_BIT(prop);
> >  
> > -		vgic_queue_irq_unlock(kvm, irq);
> > +		vgic_queue_irq_unlock(kvm, irq, flags);
> >  	} else {
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	}
> >  
> >  	return 0;
> > @@ -393,6 +394,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
> >  	int ret = 0;
> >  	u32 *intids;
> >  	int nr_irqs, i;
> > +	unsigned long flags;
> >  
> >  	nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
> >  	if (nr_irqs < 0)
> > @@ -420,9 +422,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
> >  		}
> >  
> >  		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->pending_latch = pendmask & (1U << bit_nr);
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  
> > @@ -515,6 +517,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
> >  {
> >  	struct kvm_vcpu *vcpu;
> >  	struct its_ite *ite;
> > +	unsigned long flags;
> >  
> >  	if (!its->enabled)
> >  		return -EBUSY;
> > @@ -530,9 +533,9 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
> >  	if (!vcpu->arch.vgic_cpu.lpis_enabled)
> >  		return -EBUSY;
> >  
> > -	spin_lock(&ite->irq->irq_lock);
> > +	spin_lock_irqsave(&ite->irq->irq_lock, flags);
> >  	ite->irq->pending_latch = true;
> > -	vgic_queue_irq_unlock(kvm, ite->irq);
> > +	vgic_queue_irq_unlock(kvm, ite->irq, flags);
> >  
> >  	return 0;
> >  }
> > diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> > index b3d4a10..e21e2f4 100644
> > --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
> > +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> > @@ -74,6 +74,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
> >  	int mode = (val >> 24) & 0x03;
> >  	int c;
> >  	struct kvm_vcpu *vcpu;
> > +	unsigned long flags;
> >  
> >  	switch (mode) {
> >  	case 0x0:		/* as specified by targets */
> > @@ -97,11 +98,11 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
> >  
> >  		irq = vgic_get_irq(source_vcpu->kvm, vcpu, intid);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->pending_latch = true;
> >  		irq->source |= 1U << source_vcpu->vcpu_id;
> >  
> > -		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(source_vcpu->kvm, irq, flags);
> >  		vgic_put_irq(source_vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -131,6 +132,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
> >  	u8 cpu_mask = GENMASK(atomic_read(&vcpu->kvm->online_vcpus) - 1, 0);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	/* GICD_ITARGETSR[0-7] are read-only */
> >  	if (intid < VGIC_NR_PRIVATE_IRQS)
> > @@ -140,13 +142,13 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid + i);
> >  		int target;
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->targets = (val >> (i * 8)) & cpu_mask;
> >  		target = irq->targets ? __ffs(irq->targets) : 0;
> >  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -174,17 +176,18 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = addr & 0x0f;
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len; i++) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->source &= ~((val >> (i * 8)) & 0xff);
> >  		if (!irq->source)
> >  			irq->pending_latch = false;
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -195,19 +198,20 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = addr & 0x0f;
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len; i++) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->source |= (val >> (i * 8)) & 0xff;
> >  
> >  		if (irq->source) {
> >  			irq->pending_latch = true;
> > -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		} else {
> > -			spin_unlock(&irq->irq_lock);
> > +			spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		}
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> > index 408ef06..8378610 100644
> > --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
> > +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> > @@ -129,6 +129,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> >  {
> >  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
> >  	struct vgic_irq *irq;
> > +	unsigned long flags;
> >  
> >  	/* The upper word is WI for us since we don't implement Aff3. */
> >  	if (addr & 4)
> > @@ -139,13 +140,13 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> >  	if (!irq)
> >  		return;
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
> >  	irq->mpidr = val & GENMASK(23, 0);
> >  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
> >  
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	vgic_put_irq(vcpu->kvm, irq);
> >  }
> >  
> > @@ -241,11 +242,12 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len * 8; i++) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		if (test_bit(i, &val)) {
> >  			/*
> >  			 * pending_latch is set irrespective of irq type
> > @@ -253,10 +255,10 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
> >  			 * restore irq config before pending info.
> >  			 */
> >  			irq->pending_latch = true;
> > -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		} else {
> >  			irq->pending_latch = false;
> > -			spin_unlock(&irq->irq_lock);
> > +			spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		}
> >  
> >  		vgic_put_irq(vcpu->kvm, irq);
> > @@ -799,6 +801,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
> >  	int sgi, c;
> >  	int vcpu_id = vcpu->vcpu_id;
> >  	bool broadcast;
> > +	unsigned long flags;
> >  
> >  	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
> >  	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
> > @@ -837,10 +840,10 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
> >  
> >  		irq = vgic_get_irq(vcpu->kvm, c_vcpu, sgi);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->pending_latch = true;
> >  
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> > index c1e4bdd..deb51ee 100644
> > --- a/virt/kvm/arm/vgic/vgic-mmio.c
> > +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> > @@ -69,13 +69,14 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->enabled = true;
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > @@ -87,15 +88,16 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->enabled = false;
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -126,14 +128,15 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->pending_latch = true;
> >  
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -144,15 +147,16 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->pending_latch = false;
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -181,7 +185,8 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> >  				    bool new_active_state)
> >  {
> >  	struct kvm_vcpu *requester_vcpu;
> > -	spin_lock(&irq->irq_lock);
> > +	unsigned long flags;
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	/*
> >  	 * The vcpu parameter here can mean multiple things depending on how
> > @@ -216,9 +221,9 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> >  
> >  	irq->active = new_active_state;
> >  	if (new_active_state)
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  	else
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  }
> >  
> >  /*
> > @@ -352,14 +357,15 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len; i++) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		/* Narrow the priority range to what we actually support */
> >  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > @@ -390,6 +396,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 2);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len * 4; i++) {
> >  		struct vgic_irq *irq;
> > @@ -404,14 +411,14 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
> >  			continue;
> >  
> >  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		if (test_bit(i * 2 + 1, &val))
> >  			irq->config = VGIC_CONFIG_EDGE;
> >  		else
> >  			irq->config = VGIC_CONFIG_LEVEL;
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -443,6 +450,7 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
> >  {
> >  	int i;
> >  	int nr_irqs = vcpu->kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < 32; i++) {
> >  		struct vgic_irq *irq;
> > @@ -459,12 +467,12 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
> >  		 * restore irq config before line level.
> >  		 */
> >  		new_level = !!(val & (1U << i));
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->line_level = new_level;
> >  		if (new_level)
> > -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		else
> > -			spin_unlock(&irq->irq_lock);
> > +			spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> > index e4187e5..8089710 100644
> > --- a/virt/kvm/arm/vgic/vgic-v2.c
> > +++ b/virt/kvm/arm/vgic/vgic-v2.c
> > @@ -62,6 +62,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >  	struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
> >  	int lr;
> > +	unsigned long flags;
> >  
> >  	cpuif->vgic_hcr &= ~GICH_HCR_UIE;
> >  
> > @@ -77,7 +78,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  
> >  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		/* Always preserve the active bit */
> >  		irq->active = !!(val & GICH_LR_ACTIVE_BIT);
> > @@ -104,7 +105,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> > index 96ea597..863351c 100644
> > --- a/virt/kvm/arm/vgic/vgic-v3.c
> > +++ b/virt/kvm/arm/vgic/vgic-v3.c
> > @@ -44,6 +44,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >  	struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
> >  	u32 model = vcpu->kvm->arch.vgic.vgic_model;
> >  	int lr;
> > +	unsigned long flags;
> >  
> >  	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
> >  
> > @@ -66,7 +67,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >  		if (!irq)	/* An LPI could have been unmapped. */
> >  			continue;
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		/* Always preserve the active bit */
> >  		irq->active = !!(val & ICH_LR_ACTIVE_BIT);
> > @@ -94,7 +95,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  
> > @@ -278,6 +279,7 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
> >  	bool status;
> >  	u8 val;
> >  	int ret;
> > +	unsigned long flags;
> >  
> >  retry:
> >  	vcpu = irq->target_vcpu;
> > @@ -296,13 +298,13 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
> >  
> >  	status = val & (1 << bit_nr);
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  	if (irq->target_vcpu != vcpu) {
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		goto retry;
> >  	}
> >  	irq->pending_latch = status;
> > -	vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +	vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  
> >  	if (status) {
> >  		/* clear consumed data */
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index e1f7dbc..b1bd238 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -53,6 +53,10 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
> >   *   vcpuX->vcpu_id < vcpuY->vcpu_id:
> >   *     spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock);
> >   *     spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock);
> > + *
> > + * Since the VGIC must support injecting virtual interrupts from ISRs, we have
> > + * to use the spin_lock_irqsave/spin_unlock_irqrestore versions of outer
> > + * spinlocks for any lock that may be taken while injecting an interrupt.
> >   */
> >  
> >  /*
> > @@ -261,7 +265,8 @@ static bool vgic_validate_injection(struct vgic_irq *irq, bool level, void *owne
> >   * Needs to be entered with the IRQ lock already held, but will return
> >   * with all locks dropped.
> >   */
> > -bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> > +bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
> > +			   unsigned long flags)
> >  {
> >  	struct kvm_vcpu *vcpu;
> >  
> > @@ -279,7 +284,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  		 * not need to be inserted into an ap_list and there is also
> >  		 * no more work for us to do.
> >  		 */
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  
> >  		/*
> >  		 * We have to kick the VCPU here, because we could be
> > @@ -301,11 +306,11 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  	 * We must unlock the irq lock to take the ap_list_lock where
> >  	 * we are going to insert this new pending interrupt.
> >  	 */
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  
> >  	/* someone can do stuff here, which we re-check below */
> >  
> > -	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > +	spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> >  	spin_lock(&irq->irq_lock);
> >  
> >  	/*
> > @@ -322,9 +327,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  
> >  	if (unlikely(irq->vcpu || vcpu != vgic_target_oracle(irq))) {
> >  		spin_unlock(&irq->irq_lock);
> > -		spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > +		spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		goto retry;
> >  	}
> >  
> > @@ -337,7 +342,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  	irq->vcpu = vcpu;
> >  
> >  	spin_unlock(&irq->irq_lock);
> > -	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > +	spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> >  
> >  	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> > @@ -367,6 +372,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> >  {
> >  	struct kvm_vcpu *vcpu;
> >  	struct vgic_irq *irq;
> > +	unsigned long flags;
> >  	int ret;
> >  
> >  	trace_vgic_update_irq_pending(cpuid, intid, level);
> > @@ -383,11 +389,11 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> >  	if (!irq)
> >  		return -EINVAL;
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	if (!vgic_validate_injection(irq, level, owner)) {
> >  		/* Nothing to see here, move along... */
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(kvm, irq);
> >  		return 0;
> >  	}
> > @@ -397,7 +403,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> >  	else
> >  		irq->pending_latch = true;
> >  
> > -	vgic_queue_irq_unlock(kvm, irq);
> > +	vgic_queue_irq_unlock(kvm, irq, flags);
> >  	vgic_put_irq(kvm, irq);
> >  
> >  	return 0;
> > @@ -406,15 +412,16 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> >  int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
> >  {
> >  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> > +	unsigned long flags;
> >  
> >  	BUG_ON(!irq);
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	irq->hw = true;
> >  	irq->hwintid = phys_irq;
> >  
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	vgic_put_irq(vcpu->kvm, irq);
> >  
> >  	return 0;
> > @@ -423,6 +430,7 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
> >  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
> >  {
> >  	struct vgic_irq *irq;
> > +	unsigned long flags;
> >  
> >  	if (!vgic_initialized(vcpu->kvm))
> >  		return -EAGAIN;
> > @@ -430,12 +438,12 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
> >  	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> >  	BUG_ON(!irq);
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	irq->hw = false;
> >  	irq->hwintid = 0;
> >  
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	vgic_put_irq(vcpu->kvm, irq);
> >  
> >  	return 0;
> > @@ -486,9 +494,10 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >  	struct vgic_irq *irq, *tmp;
> > +	unsigned long flags;
> >  
> >  retry:
> > -	spin_lock(&vgic_cpu->ap_list_lock);
> > +	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
> >  
> >  	list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) {
> >  		struct kvm_vcpu *target_vcpu, *vcpuA, *vcpuB;
> > @@ -528,7 +537,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
> >  		/* This interrupt looks like it has to be migrated. */
> >  
> >  		spin_unlock(&irq->irq_lock);
> > -		spin_unlock(&vgic_cpu->ap_list_lock);
> > +		spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
> >  
> >  		/*
> >  		 * Ensure locking order by always locking the smallest
> > @@ -542,7 +551,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
> >  			vcpuB = vcpu;
> >  		}
> >  
> > -		spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock);
> > +		spin_lock_irqsave(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
> >  		spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock,
> >  				 SINGLE_DEPTH_NESTING);
> >  		spin_lock(&irq->irq_lock);
> > @@ -566,11 +575,11 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
> >  
> >  		spin_unlock(&irq->irq_lock);
> >  		spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock);
> > -		spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock);
> > +		spin_unlock_irqrestore(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
> >  		goto retry;
> >  	}
> >  
> > -	spin_unlock(&vgic_cpu->ap_list_lock);
> > +	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
> >  }
> >  
> >  static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
> > @@ -703,6 +712,8 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
> >  		return;
> >  
> > +	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
> > +
> >  	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >  	vgic_flush_lr_state(vcpu);
> >  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > @@ -735,11 +746,12 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> >  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >  	struct vgic_irq *irq;
> >  	bool pending = false;
> > +	unsigned long flags;
> >  
> >  	if (!vcpu->kvm->arch.vgic.enabled)
> >  		return false;
> >  
> > -	spin_lock(&vgic_cpu->ap_list_lock);
> > +	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
> >  
> >  	list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
> >  		spin_lock(&irq->irq_lock);
> > @@ -750,7 +762,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> >  			break;
> >  	}
> >  
> > -	spin_unlock(&vgic_cpu->ap_list_lock);
> > +	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
> >  
> >  	return pending;
> >  }
> > @@ -776,13 +788,15 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
> >  {
> >  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> >  	bool map_is_active;
> > +	unsigned long flags;
> >  
> >  	if (!vgic_initialized(vcpu->kvm))
> >  		return false;
> > +	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> 
> I'm a bit puzzled by this sequence: Either interrupts are disabled and
> we don't need the irqsave version, or they aren't and the BUG_ON will
> fire. kvm_vgic_map_is_active is called (indirectly) from
> kvm_timer_flush_hwstate. And at this stage of the patches, we definitely
> call this function with interrupts enabled.
> 
> Is it just a patch splitting snafu? Or something more serious? Same goes
> for the DEBUG_SPINLOCK_BUG_ON in kvm_vgic_flush_hwstate.

It's a leftover thing from before I realized that this also needs to be
called from kvm_timer_vcpu_load_vgic, which has interrupts enabled, and
so I changed the simple spin_lock/spin_unlock to the irqsave/irqrestore
versions, but apparently forgot to take out the assert.  (And apparently
didn't run this with spinlock debugging enabled).

Thanks for spotting it.
-Christoffer

> >  	map_is_active = irq->hw && irq->active;
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	vgic_put_irq(vcpu->kvm, irq);
> >  
> >  	return map_is_active;
> > diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> > index bf9ceab..4f8aecb 100644
> > --- a/virt/kvm/arm/vgic/vgic.h
> > +++ b/virt/kvm/arm/vgic/vgic.h
> > @@ -140,7 +140,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
> >  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
> >  			      u32 intid);
> >  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> > -bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
> > +bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
> > +			   unsigned long flags);
> >  void vgic_kick_vcpus(struct kvm *kvm);
> >  
> >  int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
> > 
> 
> Otherwise looks good to me.
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 05/20] KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context
@ 2017-10-18 11:54       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 11:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 05:37:43PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > We are about to optimize our timer handling logic which involves
> > injecting irqs to the vgic directly from the irq handler.
> > 
> > Unfortunately, the injection path can take any AP list lock and irq lock
> > and we must therefore make sure to use spin_lock_irqsave where ever
> > interrupts are enabled and we are taking any of those locks, to avoid
> > deadlocking between process context and the ISR.
> > 
> > This changes a lot of the VGIC code, but The good news are that the
> > changes are mostly mechanical.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic/vgic-its.c     | 17 +++++++-----
> >  virt/kvm/arm/vgic/vgic-mmio-v2.c | 22 +++++++++------
> >  virt/kvm/arm/vgic/vgic-mmio-v3.c | 17 +++++++-----
> >  virt/kvm/arm/vgic/vgic-mmio.c    | 44 +++++++++++++++++------------
> >  virt/kvm/arm/vgic/vgic-v2.c      |  5 ++--
> >  virt/kvm/arm/vgic/vgic-v3.c      | 12 ++++----
> >  virt/kvm/arm/vgic/vgic.c         | 60 +++++++++++++++++++++++++---------------
> >  virt/kvm/arm/vgic/vgic.h         |  3 +-
> >  8 files changed, 108 insertions(+), 72 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> > index f51c1e1..9f5e347 100644
> > --- a/virt/kvm/arm/vgic/vgic-its.c
> > +++ b/virt/kvm/arm/vgic/vgic-its.c
> > @@ -278,6 +278,7 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
> >  	u64 propbase = GICR_PROPBASER_ADDRESS(kvm->arch.vgic.propbaser);
> >  	u8 prop;
> >  	int ret;
> > +	unsigned long flags;
> >  
> >  	ret = kvm_read_guest(kvm, propbase + irq->intid - GIC_LPI_OFFSET,
> >  			     &prop, 1);
> > @@ -285,15 +286,15 @@ static int update_lpi_config(struct kvm *kvm, struct vgic_irq *irq,
> >  	if (ret)
> >  		return ret;
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	if (!filter_vcpu || filter_vcpu == irq->target_vcpu) {
> >  		irq->priority = LPI_PROP_PRIORITY(prop);
> >  		irq->enabled = LPI_PROP_ENABLE_BIT(prop);
> >  
> > -		vgic_queue_irq_unlock(kvm, irq);
> > +		vgic_queue_irq_unlock(kvm, irq, flags);
> >  	} else {
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	}
> >  
> >  	return 0;
> > @@ -393,6 +394,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
> >  	int ret = 0;
> >  	u32 *intids;
> >  	int nr_irqs, i;
> > +	unsigned long flags;
> >  
> >  	nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
> >  	if (nr_irqs < 0)
> > @@ -420,9 +422,9 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
> >  		}
> >  
> >  		irq = vgic_get_irq(vcpu->kvm, NULL, intids[i]);
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->pending_latch = pendmask & (1U << bit_nr);
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  
> > @@ -515,6 +517,7 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
> >  {
> >  	struct kvm_vcpu *vcpu;
> >  	struct its_ite *ite;
> > +	unsigned long flags;
> >  
> >  	if (!its->enabled)
> >  		return -EBUSY;
> > @@ -530,9 +533,9 @@ static int vgic_its_trigger_msi(struct kvm *kvm, struct vgic_its *its,
> >  	if (!vcpu->arch.vgic_cpu.lpis_enabled)
> >  		return -EBUSY;
> >  
> > -	spin_lock(&ite->irq->irq_lock);
> > +	spin_lock_irqsave(&ite->irq->irq_lock, flags);
> >  	ite->irq->pending_latch = true;
> > -	vgic_queue_irq_unlock(kvm, ite->irq);
> > +	vgic_queue_irq_unlock(kvm, ite->irq, flags);
> >  
> >  	return 0;
> >  }
> > diff --git a/virt/kvm/arm/vgic/vgic-mmio-v2.c b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> > index b3d4a10..e21e2f4 100644
> > --- a/virt/kvm/arm/vgic/vgic-mmio-v2.c
> > +++ b/virt/kvm/arm/vgic/vgic-mmio-v2.c
> > @@ -74,6 +74,7 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
> >  	int mode = (val >> 24) & 0x03;
> >  	int c;
> >  	struct kvm_vcpu *vcpu;
> > +	unsigned long flags;
> >  
> >  	switch (mode) {
> >  	case 0x0:		/* as specified by targets */
> > @@ -97,11 +98,11 @@ static void vgic_mmio_write_sgir(struct kvm_vcpu *source_vcpu,
> >  
> >  		irq = vgic_get_irq(source_vcpu->kvm, vcpu, intid);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->pending_latch = true;
> >  		irq->source |= 1U << source_vcpu->vcpu_id;
> >  
> > -		vgic_queue_irq_unlock(source_vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(source_vcpu->kvm, irq, flags);
> >  		vgic_put_irq(source_vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -131,6 +132,7 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
> >  	u8 cpu_mask = GENMASK(atomic_read(&vcpu->kvm->online_vcpus) - 1, 0);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	/* GICD_ITARGETSR[0-7] are read-only */
> >  	if (intid < VGIC_NR_PRIVATE_IRQS)
> > @@ -140,13 +142,13 @@ static void vgic_mmio_write_target(struct kvm_vcpu *vcpu,
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid + i);
> >  		int target;
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->targets = (val >> (i * 8)) & cpu_mask;
> >  		target = irq->targets ? __ffs(irq->targets) : 0;
> >  		irq->target_vcpu = kvm_get_vcpu(vcpu->kvm, target);
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -174,17 +176,18 @@ static void vgic_mmio_write_sgipendc(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = addr & 0x0f;
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len; i++) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->source &= ~((val >> (i * 8)) & 0xff);
> >  		if (!irq->source)
> >  			irq->pending_latch = false;
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -195,19 +198,20 @@ static void vgic_mmio_write_sgipends(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = addr & 0x0f;
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len; i++) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->source |= (val >> (i * 8)) & 0xff;
> >  
> >  		if (irq->source) {
> >  			irq->pending_latch = true;
> > -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		} else {
> > -			spin_unlock(&irq->irq_lock);
> > +			spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		}
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > diff --git a/virt/kvm/arm/vgic/vgic-mmio-v3.c b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> > index 408ef06..8378610 100644
> > --- a/virt/kvm/arm/vgic/vgic-mmio-v3.c
> > +++ b/virt/kvm/arm/vgic/vgic-mmio-v3.c
> > @@ -129,6 +129,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> >  {
> >  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
> >  	struct vgic_irq *irq;
> > +	unsigned long flags;
> >  
> >  	/* The upper word is WI for us since we don't implement Aff3. */
> >  	if (addr & 4)
> > @@ -139,13 +140,13 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> >  	if (!irq)
> >  		return;
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	/* We only care about and preserve Aff0, Aff1 and Aff2. */
> >  	irq->mpidr = val & GENMASK(23, 0);
> >  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
> >  
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	vgic_put_irq(vcpu->kvm, irq);
> >  }
> >  
> > @@ -241,11 +242,12 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len * 8; i++) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		if (test_bit(i, &val)) {
> >  			/*
> >  			 * pending_latch is set irrespective of irq type
> > @@ -253,10 +255,10 @@ static void vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
> >  			 * restore irq config before pending info.
> >  			 */
> >  			irq->pending_latch = true;
> > -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		} else {
> >  			irq->pending_latch = false;
> > -			spin_unlock(&irq->irq_lock);
> > +			spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		}
> >  
> >  		vgic_put_irq(vcpu->kvm, irq);
> > @@ -799,6 +801,7 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
> >  	int sgi, c;
> >  	int vcpu_id = vcpu->vcpu_id;
> >  	bool broadcast;
> > +	unsigned long flags;
> >  
> >  	sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT;
> >  	broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT);
> > @@ -837,10 +840,10 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg)
> >  
> >  		irq = vgic_get_irq(vcpu->kvm, c_vcpu, sgi);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->pending_latch = true;
> >  
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> > index c1e4bdd..deb51ee 100644
> > --- a/virt/kvm/arm/vgic/vgic-mmio.c
> > +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> > @@ -69,13 +69,14 @@ void vgic_mmio_write_senable(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->enabled = true;
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > @@ -87,15 +88,16 @@ void vgic_mmio_write_cenable(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->enabled = false;
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -126,14 +128,15 @@ void vgic_mmio_write_spending(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->pending_latch = true;
> >  
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -144,15 +147,16 @@ void vgic_mmio_write_cpending(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		irq->pending_latch = false;
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -181,7 +185,8 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> >  				    bool new_active_state)
> >  {
> >  	struct kvm_vcpu *requester_vcpu;
> > -	spin_lock(&irq->irq_lock);
> > +	unsigned long flags;
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	/*
> >  	 * The vcpu parameter here can mean multiple things depending on how
> > @@ -216,9 +221,9 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> >  
> >  	irq->active = new_active_state;
> >  	if (new_active_state)
> > -		vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +		vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  	else
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  }
> >  
> >  /*
> > @@ -352,14 +357,15 @@ void vgic_mmio_write_priority(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 8);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len; i++) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		/* Narrow the priority range to what we actually support */
> >  		irq->priority = (val >> (i * 8)) & GENMASK(7, 8 - VGIC_PRI_BITS);
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > @@ -390,6 +396,7 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
> >  {
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 2);
> >  	int i;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < len * 4; i++) {
> >  		struct vgic_irq *irq;
> > @@ -404,14 +411,14 @@ void vgic_mmio_write_config(struct kvm_vcpu *vcpu,
> >  			continue;
> >  
> >  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		if (test_bit(i * 2 + 1, &val))
> >  			irq->config = VGIC_CONFIG_EDGE;
> >  		else
> >  			irq->config = VGIC_CONFIG_LEVEL;
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  }
> > @@ -443,6 +450,7 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
> >  {
> >  	int i;
> >  	int nr_irqs = vcpu->kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
> > +	unsigned long flags;
> >  
> >  	for (i = 0; i < 32; i++) {
> >  		struct vgic_irq *irq;
> > @@ -459,12 +467,12 @@ void vgic_write_irq_line_level_info(struct kvm_vcpu *vcpu, u32 intid,
> >  		 * restore irq config before line level.
> >  		 */
> >  		new_level = !!(val & (1U << i));
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		irq->line_level = new_level;
> >  		if (new_level)
> > -			vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +			vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  		else
> > -			spin_unlock(&irq->irq_lock);
> > +			spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> > diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
> > index e4187e5..8089710 100644
> > --- a/virt/kvm/arm/vgic/vgic-v2.c
> > +++ b/virt/kvm/arm/vgic/vgic-v2.c
> > @@ -62,6 +62,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >  	struct vgic_v2_cpu_if *cpuif = &vgic_cpu->vgic_v2;
> >  	int lr;
> > +	unsigned long flags;
> >  
> >  	cpuif->vgic_hcr &= ~GICH_HCR_UIE;
> >  
> > @@ -77,7 +78,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  
> >  		irq = vgic_get_irq(vcpu->kvm, vcpu, intid);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		/* Always preserve the active bit */
> >  		irq->active = !!(val & GICH_LR_ACTIVE_BIT);
> > @@ -104,7 +105,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
> > index 96ea597..863351c 100644
> > --- a/virt/kvm/arm/vgic/vgic-v3.c
> > +++ b/virt/kvm/arm/vgic/vgic-v3.c
> > @@ -44,6 +44,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >  	struct vgic_v3_cpu_if *cpuif = &vgic_cpu->vgic_v3;
> >  	u32 model = vcpu->kvm->arch.vgic.vgic_model;
> >  	int lr;
> > +	unsigned long flags;
> >  
> >  	cpuif->vgic_hcr &= ~ICH_HCR_UIE;
> >  
> > @@ -66,7 +67,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >  		if (!irq)	/* An LPI could have been unmapped. */
> >  			continue;
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  		/* Always preserve the active bit */
> >  		irq->active = !!(val & ICH_LR_ACTIVE_BIT);
> > @@ -94,7 +95,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu)
> >  				irq->pending_latch = false;
> >  		}
> >  
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  
> > @@ -278,6 +279,7 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
> >  	bool status;
> >  	u8 val;
> >  	int ret;
> > +	unsigned long flags;
> >  
> >  retry:
> >  	vcpu = irq->target_vcpu;
> > @@ -296,13 +298,13 @@ int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq)
> >  
> >  	status = val & (1 << bit_nr);
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  	if (irq->target_vcpu != vcpu) {
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		goto retry;
> >  	}
> >  	irq->pending_latch = status;
> > -	vgic_queue_irq_unlock(vcpu->kvm, irq);
> > +	vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> >  
> >  	if (status) {
> >  		/* clear consumed data */
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index e1f7dbc..b1bd238 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -53,6 +53,10 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
> >   *   vcpuX->vcpu_id < vcpuY->vcpu_id:
> >   *     spin_lock(vcpuX->arch.vgic_cpu.ap_list_lock);
> >   *     spin_lock(vcpuY->arch.vgic_cpu.ap_list_lock);
> > + *
> > + * Since the VGIC must support injecting virtual interrupts from ISRs, we have
> > + * to use the spin_lock_irqsave/spin_unlock_irqrestore versions of outer
> > + * spinlocks for any lock that may be taken while injecting an interrupt.
> >   */
> >  
> >  /*
> > @@ -261,7 +265,8 @@ static bool vgic_validate_injection(struct vgic_irq *irq, bool level, void *owne
> >   * Needs to be entered with the IRQ lock already held, but will return
> >   * with all locks dropped.
> >   */
> > -bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> > +bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
> > +			   unsigned long flags)
> >  {
> >  	struct kvm_vcpu *vcpu;
> >  
> > @@ -279,7 +284,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  		 * not need to be inserted into an ap_list and there is also
> >  		 * no more work for us to do.
> >  		 */
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  
> >  		/*
> >  		 * We have to kick the VCPU here, because we could be
> > @@ -301,11 +306,11 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  	 * We must unlock the irq lock to take the ap_list_lock where
> >  	 * we are going to insert this new pending interrupt.
> >  	 */
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  
> >  	/* someone can do stuff here, which we re-check below */
> >  
> > -	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > +	spin_lock_irqsave(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> >  	spin_lock(&irq->irq_lock);
> >  
> >  	/*
> > @@ -322,9 +327,9 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  
> >  	if (unlikely(irq->vcpu || vcpu != vgic_target_oracle(irq))) {
> >  		spin_unlock(&irq->irq_lock);
> > -		spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > +		spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> >  
> > -		spin_lock(&irq->irq_lock);
> > +		spin_lock_irqsave(&irq->irq_lock, flags);
> >  		goto retry;
> >  	}
> >  
> > @@ -337,7 +342,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  	irq->vcpu = vcpu;
> >  
> >  	spin_unlock(&irq->irq_lock);
> > -	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > +	spin_unlock_irqrestore(&vcpu->arch.vgic_cpu.ap_list_lock, flags);
> >  
> >  	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> > @@ -367,6 +372,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> >  {
> >  	struct kvm_vcpu *vcpu;
> >  	struct vgic_irq *irq;
> > +	unsigned long flags;
> >  	int ret;
> >  
> >  	trace_vgic_update_irq_pending(cpuid, intid, level);
> > @@ -383,11 +389,11 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> >  	if (!irq)
> >  		return -EINVAL;
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	if (!vgic_validate_injection(irq, level, owner)) {
> >  		/* Nothing to see here, move along... */
> > -		spin_unlock(&irq->irq_lock);
> > +		spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  		vgic_put_irq(kvm, irq);
> >  		return 0;
> >  	}
> > @@ -397,7 +403,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> >  	else
> >  		irq->pending_latch = true;
> >  
> > -	vgic_queue_irq_unlock(kvm, irq);
> > +	vgic_queue_irq_unlock(kvm, irq, flags);
> >  	vgic_put_irq(kvm, irq);
> >  
> >  	return 0;
> > @@ -406,15 +412,16 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> >  int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
> >  {
> >  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> > +	unsigned long flags;
> >  
> >  	BUG_ON(!irq);
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	irq->hw = true;
> >  	irq->hwintid = phys_irq;
> >  
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	vgic_put_irq(vcpu->kvm, irq);
> >  
> >  	return 0;
> > @@ -423,6 +430,7 @@ int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
> >  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
> >  {
> >  	struct vgic_irq *irq;
> > +	unsigned long flags;
> >  
> >  	if (!vgic_initialized(vcpu->kvm))
> >  		return -EAGAIN;
> > @@ -430,12 +438,12 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
> >  	irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> >  	BUG_ON(!irq);
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> >  	irq->hw = false;
> >  	irq->hwintid = 0;
> >  
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	vgic_put_irq(vcpu->kvm, irq);
> >  
> >  	return 0;
> > @@ -486,9 +494,10 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
> >  {
> >  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >  	struct vgic_irq *irq, *tmp;
> > +	unsigned long flags;
> >  
> >  retry:
> > -	spin_lock(&vgic_cpu->ap_list_lock);
> > +	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
> >  
> >  	list_for_each_entry_safe(irq, tmp, &vgic_cpu->ap_list_head, ap_list) {
> >  		struct kvm_vcpu *target_vcpu, *vcpuA, *vcpuB;
> > @@ -528,7 +537,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
> >  		/* This interrupt looks like it has to be migrated. */
> >  
> >  		spin_unlock(&irq->irq_lock);
> > -		spin_unlock(&vgic_cpu->ap_list_lock);
> > +		spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
> >  
> >  		/*
> >  		 * Ensure locking order by always locking the smallest
> > @@ -542,7 +551,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
> >  			vcpuB = vcpu;
> >  		}
> >  
> > -		spin_lock(&vcpuA->arch.vgic_cpu.ap_list_lock);
> > +		spin_lock_irqsave(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
> >  		spin_lock_nested(&vcpuB->arch.vgic_cpu.ap_list_lock,
> >  				 SINGLE_DEPTH_NESTING);
> >  		spin_lock(&irq->irq_lock);
> > @@ -566,11 +575,11 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
> >  
> >  		spin_unlock(&irq->irq_lock);
> >  		spin_unlock(&vcpuB->arch.vgic_cpu.ap_list_lock);
> > -		spin_unlock(&vcpuA->arch.vgic_cpu.ap_list_lock);
> > +		spin_unlock_irqrestore(&vcpuA->arch.vgic_cpu.ap_list_lock, flags);
> >  		goto retry;
> >  	}
> >  
> > -	spin_unlock(&vgic_cpu->ap_list_lock);
> > +	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
> >  }
> >  
> >  static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
> > @@ -703,6 +712,8 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
> >  		return;
> >  
> > +	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
> > +
> >  	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >  	vgic_flush_lr_state(vcpu);
> >  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > @@ -735,11 +746,12 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> >  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >  	struct vgic_irq *irq;
> >  	bool pending = false;
> > +	unsigned long flags;
> >  
> >  	if (!vcpu->kvm->arch.vgic.enabled)
> >  		return false;
> >  
> > -	spin_lock(&vgic_cpu->ap_list_lock);
> > +	spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
> >  
> >  	list_for_each_entry(irq, &vgic_cpu->ap_list_head, ap_list) {
> >  		spin_lock(&irq->irq_lock);
> > @@ -750,7 +762,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> >  			break;
> >  	}
> >  
> > -	spin_unlock(&vgic_cpu->ap_list_lock);
> > +	spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
> >  
> >  	return pending;
> >  }
> > @@ -776,13 +788,15 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
> >  {
> >  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> >  	bool map_is_active;
> > +	unsigned long flags;
> >  
> >  	if (!vgic_initialized(vcpu->kvm))
> >  		return false;
> > +	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
> >  
> > -	spin_lock(&irq->irq_lock);
> > +	spin_lock_irqsave(&irq->irq_lock, flags);
> 
> I'm a bit puzzled by this sequence: Either interrupts are disabled and
> we don't need the irqsave version, or they aren't and the BUG_ON will
> fire. kvm_vgic_map_is_active is called (indirectly) from
> kvm_timer_flush_hwstate. And at this stage of the patches, we definitely
> call this function with interrupts enabled.
> 
> Is it just a patch splitting snafu? Or something more serious? Same goes
> for the DEBUG_SPINLOCK_BUG_ON in kvm_vgic_flush_hwstate.

It's a leftover thing from before I realized that this also needs to be
called from kvm_timer_vcpu_load_vgic, which has interrupts enabled, and
so I changed the simple spin_lock/spin_unlock to the irqsave/irqrestore
versions, but apparently forgot to take out the assert.  (And apparently
didn't run this with spinlock debugging enabled).

Thanks for spotting it.
-Christoffer

> >  	map_is_active = irq->hw && irq->active;
> > -	spin_unlock(&irq->irq_lock);
> > +	spin_unlock_irqrestore(&irq->irq_lock, flags);
> >  	vgic_put_irq(vcpu->kvm, irq);
> >  
> >  	return map_is_active;
> > diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> > index bf9ceab..4f8aecb 100644
> > --- a/virt/kvm/arm/vgic/vgic.h
> > +++ b/virt/kvm/arm/vgic/vgic.h
> > @@ -140,7 +140,8 @@ vgic_get_mmio_region(struct kvm_vcpu *vcpu, struct vgic_io_device *iodev,
> >  struct vgic_irq *vgic_get_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
> >  			      u32 intid);
> >  void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq);
> > -bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq);
> > +bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
> > +			   unsigned long flags);
> >  void vgic_kick_vcpus(struct kvm *kvm);
> >  
> >  int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr,
> > 
> 
> Otherwise looks good to me.
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
  2017-10-09 16:47     ` Marc Zyngier
@ 2017-10-18 13:41       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 13:41 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel, kvm

On Mon, Oct 09, 2017 at 05:47:18PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > Some systems without proper firmware and/or hardware description data
> > don't support the split EOI and deactivate operation.
> > 
> > On such systems, we cannot leave the physical interrupt active after the
> > timer handler on the host has run, so we cannot support KVM with an
> > in-kernel GIC with the timer changes we are about to introduce.
> > 
> > This patch makes sure that trying to initialize the KVM GIC code will
> > fail on such systems.
> > 
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  drivers/irqchip/irq-gic.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> > index f641e8e..ab12bf4 100644
> > --- a/drivers/irqchip/irq-gic.c
> > +++ b/drivers/irqchip/irq-gic.c
> > @@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
> >  	if (ret)
> >  		return;
> >  
> > -	gic_set_kvm_info(&gic_v2_kvm_info);
> > +	if (static_key_true(&supports_deactivate))
> > +		gic_set_kvm_info(&gic_v2_kvm_info);
> >  }
> >  
> >  int __init
> > 
> 
> Should we add the same level of checking on the ACPI path, just for the
> sake symmetry?

Yes, we should, if anyone is crazy enough to use ACPI :)

> 
> Also, do we need to add the same thing for GICv3?
> 

Why would split EOI/deactivate not be available on GICv3, actually?  It
looks like this is not supported unless you have EL2, but I can't seem
to find anything in the spec for this, and KVM should support
EOI/deactivate for GICv3 guests I think.  Am I missing something?

Assuming I'm wrong about GICv3, which I probably am, how does this look
(on top of the posted patch):

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 519149e..aed524c 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1228,7 +1228,9 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
 		goto out_unmap_rdist;
 
 	gic_populate_ppi_partitions(node);
-	gic_of_setup_kvm_info(node);
+
+	if (static_key_true(&supports_deactivate))
+		gic_of_setup_kvm_info(node);
 	return 0;
 
 out_unmap_rdist:
@@ -1517,7 +1519,9 @@ gic_acpi_init(struct acpi_subtable_header *header, const unsigned long end)
 		goto out_fwhandle_free;
 
 	acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
-	gic_acpi_setup_kvm_info();
+
+	if (static_key_true(&supports_deactivate))
+		gic_acpi_setup_kvm_info();
 
 	return 0;
 
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ab12bf4..121af5c 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1653,7 +1653,8 @@ static int __init gic_v2_acpi_init(struct acpi_subtable_header *header,
 	if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
 		gicv2m_init(NULL, gic_data[0].domain);
 
-	gic_acpi_setup_kvm_info();
+	if (static_key_true(&supports_deactivate))
+		gic_acpi_setup_kvm_info();
 
 	return 0;
 }

 Thanks,
 -Christoffer

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
@ 2017-10-18 13:41       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 05:47:18PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > Some systems without proper firmware and/or hardware description data
> > don't support the split EOI and deactivate operation.
> > 
> > On such systems, we cannot leave the physical interrupt active after the
> > timer handler on the host has run, so we cannot support KVM with an
> > in-kernel GIC with the timer changes we are about to introduce.
> > 
> > This patch makes sure that trying to initialize the KVM GIC code will
> > fail on such systems.
> > 
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  drivers/irqchip/irq-gic.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> > index f641e8e..ab12bf4 100644
> > --- a/drivers/irqchip/irq-gic.c
> > +++ b/drivers/irqchip/irq-gic.c
> > @@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
> >  	if (ret)
> >  		return;
> >  
> > -	gic_set_kvm_info(&gic_v2_kvm_info);
> > +	if (static_key_true(&supports_deactivate))
> > +		gic_set_kvm_info(&gic_v2_kvm_info);
> >  }
> >  
> >  int __init
> > 
> 
> Should we add the same level of checking on the ACPI path, just for the
> sake symmetry?

Yes, we should, if anyone is crazy enough to use ACPI :)

> 
> Also, do we need to add the same thing for GICv3?
> 

Why would split EOI/deactivate not be available on GICv3, actually?  It
looks like this is not supported unless you have EL2, but I can't seem
to find anything in the spec for this, and KVM should support
EOI/deactivate for GICv3 guests I think.  Am I missing something?

Assuming I'm wrong about GICv3, which I probably am, how does this look
(on top of the posted patch):

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 519149e..aed524c 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1228,7 +1228,9 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
 		goto out_unmap_rdist;
 
 	gic_populate_ppi_partitions(node);
-	gic_of_setup_kvm_info(node);
+
+	if (static_key_true(&supports_deactivate))
+		gic_of_setup_kvm_info(node);
 	return 0;
 
 out_unmap_rdist:
@@ -1517,7 +1519,9 @@ gic_acpi_init(struct acpi_subtable_header *header, const unsigned long end)
 		goto out_fwhandle_free;
 
 	acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
-	gic_acpi_setup_kvm_info();
+
+	if (static_key_true(&supports_deactivate))
+		gic_acpi_setup_kvm_info();
 
 	return 0;
 
diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ab12bf4..121af5c 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1653,7 +1653,8 @@ static int __init gic_v2_acpi_init(struct acpi_subtable_header *header,
 	if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
 		gicv2m_init(NULL, gic_data[0].domain);
 
-	gic_acpi_setup_kvm_info();
+	if (static_key_true(&supports_deactivate))
+		gic_acpi_setup_kvm_info();
 
 	return 0;
 }

 Thanks,
 -Christoffer

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles
  2017-10-18 11:34       ` Christoffer Dall
@ 2017-10-18 15:52         ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-18 15:52 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas,
	Mark Rutland

On Wed, Oct 18 2017 at  1:34:05 pm BST, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Oct 09, 2017 at 05:21:24PM +0100, Marc Zyngier wrote:
>> On 23/09/17 01:41, Christoffer Dall wrote:
>> > Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
>> > arm64, but as we move to using the physical timer for the in-kernel
>> > time-keeping, we need to make that more flexible.
>> > 
>> > First, we need to make sure the physical counter can be read on equal
>> > terms to the virtual counter, which includes adding physical counter
>> > read functions for timers that require errata.
>> > 
>> > Second, we need to make a choice between reading the physical vs virtual
>> > counter, depending on which timer is used for time keeping in the kernel
>> > otherwise.  We can do this using a static key to avoid a performance
>> > penalty during runtime when reading the counter.
>> > 
>> > Cc: Catalin Marinas <catalin.marinas@arm.com>
>> > Cc: Will Deacon <will.deacon@arm.com>
>> > Cc: Mark Rutland <mark.rutland@arm.com>
>> > Cc: Marc Zyngier <marc.zyngier@arm.com>
>> > Signed-off-by: Christoffer Dall <cdall@linaro.org>

[...]

>> In my reply to patch #2, I had the following hunk:
>> 
>> @@ -310,7 +329,7 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
>>  						struct clock_event_device *clk)
>>  {
>>  	unsigned long ctrl;
>> -	u64 cval = evt + arch_counter_get_cntvct();
>> +	u64 cval = evt + arch_timer_read_counter();
>>  
>>  	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
>>  	ctrl |= ARCH_TIMER_CTRL_ENABLE;
>> 
>> Once we start using a different timer, this could well have an effect...
>> 
>
> Right, but wouldn't the following be a more correct way to go about it then:
>
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 9a7b359..07f19db 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -329,16 +329,19 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
>  						struct clock_event_device *clk)
>  {
>  	unsigned long ctrl;
> -	u64 cval = evt + arch_timer_read_counter();
> +	u64 cval;
>  
>  	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
>  	ctrl |= ARCH_TIMER_CTRL_ENABLE;
>  	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
>  
> -	if (access == ARCH_TIMER_PHYS_ACCESS)
> +	if (access == ARCH_TIMER_PHYS_ACCESS) {
> +		cval = evt + arch_counter_get_cntpct();
>  		write_sysreg(cval, cntp_cval_el0);
> -	else
> +	} else {
> +		cval = evt + arch_counter_get_cntvct();
>  		write_sysreg(cval, cntv_cval_el0);
> +	}
>  
>  	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
>  }

Yup, that's much better.

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles
@ 2017-10-18 15:52         ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-18 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 18 2017 at  1:34:05 pm BST, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Oct 09, 2017 at 05:21:24PM +0100, Marc Zyngier wrote:
>> On 23/09/17 01:41, Christoffer Dall wrote:
>> > Currently get_cycles() is hardwired to arch_counter_get_cntvct() on
>> > arm64, but as we move to using the physical timer for the in-kernel
>> > time-keeping, we need to make that more flexible.
>> > 
>> > First, we need to make sure the physical counter can be read on equal
>> > terms to the virtual counter, which includes adding physical counter
>> > read functions for timers that require errata.
>> > 
>> > Second, we need to make a choice between reading the physical vs virtual
>> > counter, depending on which timer is used for time keeping in the kernel
>> > otherwise.  We can do this using a static key to avoid a performance
>> > penalty during runtime when reading the counter.
>> > 
>> > Cc: Catalin Marinas <catalin.marinas@arm.com>
>> > Cc: Will Deacon <will.deacon@arm.com>
>> > Cc: Mark Rutland <mark.rutland@arm.com>
>> > Cc: Marc Zyngier <marc.zyngier@arm.com>
>> > Signed-off-by: Christoffer Dall <cdall@linaro.org>

[...]

>> In my reply to patch #2, I had the following hunk:
>> 
>> @@ -310,7 +329,7 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
>>  						struct clock_event_device *clk)
>>  {
>>  	unsigned long ctrl;
>> -	u64 cval = evt + arch_counter_get_cntvct();
>> +	u64 cval = evt + arch_timer_read_counter();
>>  
>>  	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
>>  	ctrl |= ARCH_TIMER_CTRL_ENABLE;
>> 
>> Once we start using a different timer, this could well have an effect...
>> 
>
> Right, but wouldn't the following be a more correct way to go about it then:
>
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 9a7b359..07f19db 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -329,16 +329,19 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
>  						struct clock_event_device *clk)
>  {
>  	unsigned long ctrl;
> -	u64 cval = evt + arch_timer_read_counter();
> +	u64 cval;
>  
>  	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
>  	ctrl |= ARCH_TIMER_CTRL_ENABLE;
>  	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
>  
> -	if (access == ARCH_TIMER_PHYS_ACCESS)
> +	if (access == ARCH_TIMER_PHYS_ACCESS) {
> +		cval = evt + arch_counter_get_cntpct();
>  		write_sysreg(cval, cntp_cval_el0);
> -	else
> +	} else {
> +		cval = evt + arch_counter_get_cntvct();
>  		write_sysreg(cval, cntv_cval_el0);
> +	}
>  
>  	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
>  }

Yup, that's much better.

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
  2017-10-18 13:41       ` Christoffer Dall
@ 2017-10-18 16:03         ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-18 16:03 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Wed, Oct 18 2017 at  3:41:45 pm BST, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Oct 09, 2017 at 05:47:18PM +0100, Marc Zyngier wrote:
>> On 23/09/17 01:41, Christoffer Dall wrote:
>> > Some systems without proper firmware and/or hardware description data
>> > don't support the split EOI and deactivate operation.
>> > 
>> > On such systems, we cannot leave the physical interrupt active after the
>> > timer handler on the host has run, so we cannot support KVM with an
>> > in-kernel GIC with the timer changes we are about to introduce.
>> > 
>> > This patch makes sure that trying to initialize the KVM GIC code will
>> > fail on such systems.
>> > 
>> > Cc: Marc Zyngier <marc.zyngier@arm.com>
>> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
>> > ---
>> >  drivers/irqchip/irq-gic.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
>> > index f641e8e..ab12bf4 100644
>> > --- a/drivers/irqchip/irq-gic.c
>> > +++ b/drivers/irqchip/irq-gic.c
>> > @@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
>> >  	if (ret)
>> >  		return;
>> >  
>> > -	gic_set_kvm_info(&gic_v2_kvm_info);
>> > +	if (static_key_true(&supports_deactivate))
>> > +		gic_set_kvm_info(&gic_v2_kvm_info);
>> >  }
>> >  
>> >  int __init
>> > 
>> 
>> Should we add the same level of checking on the ACPI path, just for the
>> sake symmetry?
>
> Yes, we should, if anyone is crazy enough to use ACPI :)

Sadly, the madness is becoming commonplace.

>> 
>> Also, do we need to add the same thing for GICv3?
>> 
>
> Why would split EOI/deactivate not be available on GICv3, actually?  It
> looks like this is not supported unless you have EL2, but I can't seem
> to find anything in the spec for this, and KVM should support
> EOI/deactivate for GICv3 guests I think.  Am I missing something?

No, you're not. This is just a Linux choice (or rather mine) not to use
EOImode=1 in guests (or anything booted at EL1), as we don't really need
the two-stage deactivate in that situation (it is pure overhead).

I'm just worried of potentially broken HW, and would like to make sure
that when we force EOImode=0 on these systems, we truly tell KVM about
it.

> Assuming I'm wrong about GICv3, which I probably am, how does this look
> (on top of the posted patch):
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 519149e..aed524c 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -1228,7 +1228,9 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
>  		goto out_unmap_rdist;
>  
>  	gic_populate_ppi_partitions(node);
> -	gic_of_setup_kvm_info(node);
> +
> +	if (static_key_true(&supports_deactivate))
> +		gic_of_setup_kvm_info(node);
>  	return 0;
>  
>  out_unmap_rdist:
> @@ -1517,7 +1519,9 @@ gic_acpi_init(struct acpi_subtable_header *header, const unsigned long end)
>  		goto out_fwhandle_free;
>  
>  	acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
> -	gic_acpi_setup_kvm_info();
> +
> +	if (static_key_true(&supports_deactivate))
> +		gic_acpi_setup_kvm_info();
>  
>  	return 0;
>  
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index ab12bf4..121af5c 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -1653,7 +1653,8 @@ static int __init gic_v2_acpi_init(struct acpi_subtable_header *header,
>  	if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
>  		gicv2m_init(NULL, gic_data[0].domain);
>  
> -	gic_acpi_setup_kvm_info();
> +	if (static_key_true(&supports_deactivate))
> +		gic_acpi_setup_kvm_info();
>  
>  	return 0;
>  }

Yup, looks good to me!

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
@ 2017-10-18 16:03         ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-18 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 18 2017 at  3:41:45 pm BST, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Oct 09, 2017 at 05:47:18PM +0100, Marc Zyngier wrote:
>> On 23/09/17 01:41, Christoffer Dall wrote:
>> > Some systems without proper firmware and/or hardware description data
>> > don't support the split EOI and deactivate operation.
>> > 
>> > On such systems, we cannot leave the physical interrupt active after the
>> > timer handler on the host has run, so we cannot support KVM with an
>> > in-kernel GIC with the timer changes we are about to introduce.
>> > 
>> > This patch makes sure that trying to initialize the KVM GIC code will
>> > fail on such systems.
>> > 
>> > Cc: Marc Zyngier <marc.zyngier@arm.com>
>> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
>> > ---
>> >  drivers/irqchip/irq-gic.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
>> > index f641e8e..ab12bf4 100644
>> > --- a/drivers/irqchip/irq-gic.c
>> > +++ b/drivers/irqchip/irq-gic.c
>> > @@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
>> >  	if (ret)
>> >  		return;
>> >  
>> > -	gic_set_kvm_info(&gic_v2_kvm_info);
>> > +	if (static_key_true(&supports_deactivate))
>> > +		gic_set_kvm_info(&gic_v2_kvm_info);
>> >  }
>> >  
>> >  int __init
>> > 
>> 
>> Should we add the same level of checking on the ACPI path, just for the
>> sake symmetry?
>
> Yes, we should, if anyone is crazy enough to use ACPI :)

Sadly, the madness is becoming commonplace.

>> 
>> Also, do we need to add the same thing for GICv3?
>> 
>
> Why would split EOI/deactivate not be available on GICv3, actually?  It
> looks like this is not supported unless you have EL2, but I can't seem
> to find anything in the spec for this, and KVM should support
> EOI/deactivate for GICv3 guests I think.  Am I missing something?

No, you're not. This is just a Linux choice (or rather mine) not to use
EOImode=1 in guests (or anything booted at EL1), as we don't really need
the two-stage deactivate in that situation (it is pure overhead).

I'm just worried of potentially broken HW, and would like to make sure
that when we force EOImode=0 on these systems, we truly tell KVM about
it.

> Assuming I'm wrong about GICv3, which I probably am, how does this look
> (on top of the posted patch):
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 519149e..aed524c 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -1228,7 +1228,9 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
>  		goto out_unmap_rdist;
>  
>  	gic_populate_ppi_partitions(node);
> -	gic_of_setup_kvm_info(node);
> +
> +	if (static_key_true(&supports_deactivate))
> +		gic_of_setup_kvm_info(node);
>  	return 0;
>  
>  out_unmap_rdist:
> @@ -1517,7 +1519,9 @@ gic_acpi_init(struct acpi_subtable_header *header, const unsigned long end)
>  		goto out_fwhandle_free;
>  
>  	acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
> -	gic_acpi_setup_kvm_info();
> +
> +	if (static_key_true(&supports_deactivate))
> +		gic_acpi_setup_kvm_info();
>  
>  	return 0;
>  
> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index ab12bf4..121af5c 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -1653,7 +1653,8 @@ static int __init gic_v2_acpi_init(struct acpi_subtable_header *header,
>  	if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
>  		gicv2m_init(NULL, gic_data[0].domain);
>  
> -	gic_acpi_setup_kvm_info();
> +	if (static_key_true(&supports_deactivate))
> +		gic_acpi_setup_kvm_info();
>  
>  	return 0;
>  }

Yup, looks good to me!

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
  2017-10-09 17:05     ` Marc Zyngier
@ 2017-10-18 16:47       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 16:47 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Mon, Oct 09, 2017 at 06:05:04PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > We are about to add an additional soft timer to the arch timer state for
> > a VCPU and would like to be able to reuse the functions to program and
> > cancel a timer, so we make them slightly more generic and rename to make
> > it more clear that these functions work on soft timers and not the
> > hardware resource that this code is managing.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/arch_timer.c | 33 ++++++++++++++++-----------------
> >  1 file changed, 16 insertions(+), 17 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 8e89d63..871d8ae 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -56,26 +56,22 @@ u64 kvm_phys_timer_read(void)
> >  	return timecounter->cc->read(timecounter->cc);
> >  }
> >  
> > -static bool timer_is_armed(struct arch_timer_cpu *timer)
> > +static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
> >  {
> >  	return timer->armed;
> >  }
> >  
> > -/* timer_arm: as in "arm the timer", not as in ARM the company */
> > -static void timer_arm(struct arch_timer_cpu *timer, u64 ns)
> > +static void soft_timer_start(struct hrtimer *hrt, u64 ns)
> >  {
> > -	timer->armed = true;
> > -	hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns),
> > +	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
> >  		      HRTIMER_MODE_ABS);
> >  }
> >  
> > -static void timer_disarm(struct arch_timer_cpu *timer)
> > +static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
> >  {
> > -	if (timer_is_armed(timer)) {
> > -		hrtimer_cancel(&timer->timer);
> > -		cancel_work_sync(&timer->expired);
> > -		timer->armed = false;
> > -	}
> > +	hrtimer_cancel(hrt);
> > +	if (work)
> 
> When can this happen? Something in a following patch?
> 

Yeah, sorry about that.  I will point this out in the commit message.

> > +		cancel_work_sync(work);
> >  }
> >  
> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > @@ -271,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> >  		return;
> >  
> >  	/*  The timer has not yet expired, schedule a background timer */
> > -	timer_arm(timer, kvm_timer_compute_delta(timer_ctx));
> > +	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> >  /*
> > @@ -285,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> > -	BUG_ON(timer_is_armed(timer));
> > +	BUG_ON(soft_timer_is_armed(timer));
> >  
> >  	/*
> >  	 * No need to schedule a background timer if any guest timer has
> > @@ -306,13 +302,16 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  	 * The guest timers have not yet expired, schedule a background timer.
> >  	 * Set the earliest expiration time among the guest timers.
> >  	 */
> > -	timer_arm(timer, kvm_timer_earliest_exp(vcpu));
> > +	timer->armed = true;
> > +	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
> >  }
> >  
> >  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	timer_disarm(timer);
> > +
> > +	soft_timer_cancel(&timer->timer, &timer->expired);
> > +	timer->armed = false;
> >  }
> >  
> >  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> > @@ -448,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  	 * This is to cancel the background timer for the physical timer
> >  	 * emulation if it is set.
> >  	 */
> > -	timer_disarm(timer);
> > +	soft_timer_cancel(&timer->timer, &timer->expired);
> 
> timer_disarm() used to set timer->armed to false, but that's not the
> case any more. Don't we risk hitting the BUG_ON() in kvm_timer_schedule
> if we hit WFI?
> 

We do, and I just didn't hit that because this goes away at the end of
the series, and I didn't vigurously test every single patch in the
series (just a compile test).

We actually only use the armed flag for the BUG_ON(), and I don't think
we need that check really.  So I suggest simply merging this logic into
this patch:

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index f0053f884b4a..d0beae98f755 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -48,9 +48,6 @@ struct arch_timer_cpu {
 	/* Work queued with the above timer expires */
 	struct work_struct		expired;
 
-	/* Background timer active */
-	bool				armed;
-
 	/* Is the timer enabled */
 	bool			enabled;
 };
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 871d8ae52f9b..98643bc696a9 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -56,11 +56,6 @@ u64 kvm_phys_timer_read(void)
 	return timecounter->cc->read(timecounter->cc);
 }
 
-static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
-{
-	return timer->armed;
-}
-
 static void soft_timer_start(struct hrtimer *hrt, u64 ns)
 {
 	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
@@ -281,8 +276,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	BUG_ON(soft_timer_is_armed(timer));
-
 	/*
 	 * No need to schedule a background timer if any guest timer has
 	 * already expired, because kvm_vcpu_block will return before putting
@@ -302,7 +295,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	 * The guest timers have not yet expired, schedule a background timer.
 	 * Set the earliest expiration time among the guest timers.
 	 */
-	timer->armed = true;
 	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
 }
 
@@ -311,7 +303,6 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	soft_timer_cancel(&timer->timer, &timer->expired);
-	timer->armed = false;
 }
 
 static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)


Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
@ 2017-10-18 16:47       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 16:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 06:05:04PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > We are about to add an additional soft timer to the arch timer state for
> > a VCPU and would like to be able to reuse the functions to program and
> > cancel a timer, so we make them slightly more generic and rename to make
> > it more clear that these functions work on soft timers and not the
> > hardware resource that this code is managing.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/arch_timer.c | 33 ++++++++++++++++-----------------
> >  1 file changed, 16 insertions(+), 17 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 8e89d63..871d8ae 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -56,26 +56,22 @@ u64 kvm_phys_timer_read(void)
> >  	return timecounter->cc->read(timecounter->cc);
> >  }
> >  
> > -static bool timer_is_armed(struct arch_timer_cpu *timer)
> > +static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
> >  {
> >  	return timer->armed;
> >  }
> >  
> > -/* timer_arm: as in "arm the timer", not as in ARM the company */
> > -static void timer_arm(struct arch_timer_cpu *timer, u64 ns)
> > +static void soft_timer_start(struct hrtimer *hrt, u64 ns)
> >  {
> > -	timer->armed = true;
> > -	hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns),
> > +	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
> >  		      HRTIMER_MODE_ABS);
> >  }
> >  
> > -static void timer_disarm(struct arch_timer_cpu *timer)
> > +static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
> >  {
> > -	if (timer_is_armed(timer)) {
> > -		hrtimer_cancel(&timer->timer);
> > -		cancel_work_sync(&timer->expired);
> > -		timer->armed = false;
> > -	}
> > +	hrtimer_cancel(hrt);
> > +	if (work)
> 
> When can this happen? Something in a following patch?
> 

Yeah, sorry about that.  I will point this out in the commit message.

> > +		cancel_work_sync(work);
> >  }
> >  
> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > @@ -271,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> >  		return;
> >  
> >  	/*  The timer has not yet expired, schedule a background timer */
> > -	timer_arm(timer, kvm_timer_compute_delta(timer_ctx));
> > +	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> >  /*
> > @@ -285,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> > -	BUG_ON(timer_is_armed(timer));
> > +	BUG_ON(soft_timer_is_armed(timer));
> >  
> >  	/*
> >  	 * No need to schedule a background timer if any guest timer has
> > @@ -306,13 +302,16 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  	 * The guest timers have not yet expired, schedule a background timer.
> >  	 * Set the earliest expiration time among the guest timers.
> >  	 */
> > -	timer_arm(timer, kvm_timer_earliest_exp(vcpu));
> > +	timer->armed = true;
> > +	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
> >  }
> >  
> >  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	timer_disarm(timer);
> > +
> > +	soft_timer_cancel(&timer->timer, &timer->expired);
> > +	timer->armed = false;
> >  }
> >  
> >  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> > @@ -448,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  	 * This is to cancel the background timer for the physical timer
> >  	 * emulation if it is set.
> >  	 */
> > -	timer_disarm(timer);
> > +	soft_timer_cancel(&timer->timer, &timer->expired);
> 
> timer_disarm() used to set timer->armed to false, but that's not the
> case any more. Don't we risk hitting the BUG_ON() in kvm_timer_schedule
> if we hit WFI?
> 

We do, and I just didn't hit that because this goes away at the end of
the series, and I didn't vigurously test every single patch in the
series (just a compile test).

We actually only use the armed flag for the BUG_ON(), and I don't think
we need that check really.  So I suggest simply merging this logic into
this patch:

diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index f0053f884b4a..d0beae98f755 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -48,9 +48,6 @@ struct arch_timer_cpu {
 	/* Work queued with the above timer expires */
 	struct work_struct		expired;
 
-	/* Background timer active */
-	bool				armed;
-
 	/* Is the timer enabled */
 	bool			enabled;
 };
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 871d8ae52f9b..98643bc696a9 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -56,11 +56,6 @@ u64 kvm_phys_timer_read(void)
 	return timecounter->cc->read(timecounter->cc);
 }
 
-static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
-{
-	return timer->armed;
-}
-
 static void soft_timer_start(struct hrtimer *hrt, u64 ns)
 {
 	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
@@ -281,8 +276,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	BUG_ON(soft_timer_is_armed(timer));
-
 	/*
 	 * No need to schedule a background timer if any guest timer has
 	 * already expired, because kvm_vcpu_block will return before putting
@@ -302,7 +295,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 	 * The guest timers have not yet expired, schedule a background timer.
 	 * Set the earliest expiration time among the guest timers.
 	 */
-	timer->armed = true;
 	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
 }
 
@@ -311,7 +303,6 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	soft_timer_cancel(&timer->timer, &timer->expired);
-	timer->armed = false;
 }
 
 static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)


Thanks,
-Christoffer

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
  2017-10-18 16:47       ` Christoffer Dall
@ 2017-10-18 16:53         ` Marc Zyngier
  -1 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-18 16:53 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Wed, Oct 18 2017 at  6:47:50 pm BST, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Oct 09, 2017 at 06:05:04PM +0100, Marc Zyngier wrote:
>> On 23/09/17 01:41, Christoffer Dall wrote:
>> > We are about to add an additional soft timer to the arch timer state for
>> > a VCPU and would like to be able to reuse the functions to program and
>> > cancel a timer, so we make them slightly more generic and rename to make
>> > it more clear that these functions work on soft timers and not the
>> > hardware resource that this code is managing.
>> > 
>> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
>> > ---
>> >  virt/kvm/arm/arch_timer.c | 33 ++++++++++++++++-----------------
>> >  1 file changed, 16 insertions(+), 17 deletions(-)
>> > 
>> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> > index 8e89d63..871d8ae 100644
>> > --- a/virt/kvm/arm/arch_timer.c
>> > +++ b/virt/kvm/arm/arch_timer.c
>> > @@ -56,26 +56,22 @@ u64 kvm_phys_timer_read(void)
>> >  	return timecounter->cc->read(timecounter->cc);
>> >  }
>> >  
>> > -static bool timer_is_armed(struct arch_timer_cpu *timer)
>> > +static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
>> >  {
>> >  	return timer->armed;
>> >  }
>> >  
>> > -/* timer_arm: as in "arm the timer", not as in ARM the company */
>> > -static void timer_arm(struct arch_timer_cpu *timer, u64 ns)
>> > +static void soft_timer_start(struct hrtimer *hrt, u64 ns)
>> >  {
>> > -	timer->armed = true;
>> > -	hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns),
>> > +	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
>> >  		      HRTIMER_MODE_ABS);
>> >  }
>> >  
>> > -static void timer_disarm(struct arch_timer_cpu *timer)
>> > +static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
>> >  {
>> > -	if (timer_is_armed(timer)) {
>> > -		hrtimer_cancel(&timer->timer);
>> > -		cancel_work_sync(&timer->expired);
>> > -		timer->armed = false;
>> > -	}
>> > +	hrtimer_cancel(hrt);
>> > +	if (work)
>> 
>> When can this happen? Something in a following patch?
>> 
>
> Yeah, sorry about that.  I will point this out in the commit message.
>
>> > +		cancel_work_sync(work);
>> >  }
>> >  
>> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>> > @@ -271,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
>> >  		return;
>> >  
>> >  	/*  The timer has not yet expired, schedule a background timer */
>> > -	timer_arm(timer, kvm_timer_compute_delta(timer_ctx));
>> > +	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
>> >  }
>> >  
>> >  /*
>> > @@ -285,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>> >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>> >  
>> > -	BUG_ON(timer_is_armed(timer));
>> > +	BUG_ON(soft_timer_is_armed(timer));
>> >  
>> >  	/*
>> >  	 * No need to schedule a background timer if any guest timer has
>> > @@ -306,13 +302,16 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>> >  	 * The guest timers have not yet expired, schedule a background timer.
>> >  	 * Set the earliest expiration time among the guest timers.
>> >  	 */
>> > -	timer_arm(timer, kvm_timer_earliest_exp(vcpu));
>> > +	timer->armed = true;
>> > +	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
>> >  }
>> >  
>> >  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>> >  {
>> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>> > -	timer_disarm(timer);
>> > +
>> > +	soft_timer_cancel(&timer->timer, &timer->expired);
>> > +	timer->armed = false;
>> >  }
>> >  
>> >  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
>> > @@ -448,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>> >  	 * This is to cancel the background timer for the physical timer
>> >  	 * emulation if it is set.
>> >  	 */
>> > -	timer_disarm(timer);
>> > +	soft_timer_cancel(&timer->timer, &timer->expired);
>> 
>> timer_disarm() used to set timer->armed to false, but that's not the
>> case any more. Don't we risk hitting the BUG_ON() in kvm_timer_schedule
>> if we hit WFI?
>> 
>
> We do, and I just didn't hit that because this goes away at the end of
> the series, and I didn't vigurously test every single patch in the
> series (just a compile test).
>
> We actually only use the armed flag for the BUG_ON(), and I don't think
> we need that check really.  So I suggest simply merging this logic into
> this patch:
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index f0053f884b4a..d0beae98f755 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -48,9 +48,6 @@ struct arch_timer_cpu {
>  	/* Work queued with the above timer expires */
>  	struct work_struct		expired;
>  
> -	/* Background timer active */
> -	bool				armed;
> -
>  	/* Is the timer enabled */
>  	bool			enabled;
>  };
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 871d8ae52f9b..98643bc696a9 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -56,11 +56,6 @@ u64 kvm_phys_timer_read(void)
>  	return timecounter->cc->read(timecounter->cc);
>  }
>  
> -static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
> -{
> -	return timer->armed;
> -}
> -
>  static void soft_timer_start(struct hrtimer *hrt, u64 ns)
>  {
>  	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
> @@ -281,8 +276,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> -	BUG_ON(soft_timer_is_armed(timer));
> -
>  	/*
>  	 * No need to schedule a background timer if any guest timer has
>  	 * already expired, because kvm_vcpu_block will return before putting
> @@ -302,7 +295,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	 * The guest timers have not yet expired, schedule a background timer.
>  	 * Set the earliest expiration time among the guest timers.
>  	 */
> -	timer->armed = true;
>  	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
>  }
>  
> @@ -311,7 +303,6 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
>  	soft_timer_cancel(&timer->timer, &timer->expired);
> -	timer->armed = false;
>  }
>  
>  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)

Yes, this seems like a sensible thing to do.

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic
@ 2017-10-18 16:53         ` Marc Zyngier
  0 siblings, 0 replies; 110+ messages in thread
From: Marc Zyngier @ 2017-10-18 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 18 2017 at  6:47:50 pm BST, Christoffer Dall <cdall@linaro.org> wrote:
> On Mon, Oct 09, 2017 at 06:05:04PM +0100, Marc Zyngier wrote:
>> On 23/09/17 01:41, Christoffer Dall wrote:
>> > We are about to add an additional soft timer to the arch timer state for
>> > a VCPU and would like to be able to reuse the functions to program and
>> > cancel a timer, so we make them slightly more generic and rename to make
>> > it more clear that these functions work on soft timers and not the
>> > hardware resource that this code is managing.
>> > 
>> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
>> > ---
>> >  virt/kvm/arm/arch_timer.c | 33 ++++++++++++++++-----------------
>> >  1 file changed, 16 insertions(+), 17 deletions(-)
>> > 
>> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> > index 8e89d63..871d8ae 100644
>> > --- a/virt/kvm/arm/arch_timer.c
>> > +++ b/virt/kvm/arm/arch_timer.c
>> > @@ -56,26 +56,22 @@ u64 kvm_phys_timer_read(void)
>> >  	return timecounter->cc->read(timecounter->cc);
>> >  }
>> >  
>> > -static bool timer_is_armed(struct arch_timer_cpu *timer)
>> > +static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
>> >  {
>> >  	return timer->armed;
>> >  }
>> >  
>> > -/* timer_arm: as in "arm the timer", not as in ARM the company */
>> > -static void timer_arm(struct arch_timer_cpu *timer, u64 ns)
>> > +static void soft_timer_start(struct hrtimer *hrt, u64 ns)
>> >  {
>> > -	timer->armed = true;
>> > -	hrtimer_start(&timer->timer, ktime_add_ns(ktime_get(), ns),
>> > +	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
>> >  		      HRTIMER_MODE_ABS);
>> >  }
>> >  
>> > -static void timer_disarm(struct arch_timer_cpu *timer)
>> > +static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
>> >  {
>> > -	if (timer_is_armed(timer)) {
>> > -		hrtimer_cancel(&timer->timer);
>> > -		cancel_work_sync(&timer->expired);
>> > -		timer->armed = false;
>> > -	}
>> > +	hrtimer_cancel(hrt);
>> > +	if (work)
>> 
>> When can this happen? Something in a following patch?
>> 
>
> Yeah, sorry about that.  I will point this out in the commit message.
>
>> > +		cancel_work_sync(work);
>> >  }
>> >  
>> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>> > @@ -271,7 +267,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
>> >  		return;
>> >  
>> >  	/*  The timer has not yet expired, schedule a background timer */
>> > -	timer_arm(timer, kvm_timer_compute_delta(timer_ctx));
>> > +	soft_timer_start(&timer->timer, kvm_timer_compute_delta(timer_ctx));
>> >  }
>> >  
>> >  /*
>> > @@ -285,7 +281,7 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>> >  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>> >  
>> > -	BUG_ON(timer_is_armed(timer));
>> > +	BUG_ON(soft_timer_is_armed(timer));
>> >  
>> >  	/*
>> >  	 * No need to schedule a background timer if any guest timer has
>> > @@ -306,13 +302,16 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>> >  	 * The guest timers have not yet expired, schedule a background timer.
>> >  	 * Set the earliest expiration time among the guest timers.
>> >  	 */
>> > -	timer_arm(timer, kvm_timer_earliest_exp(vcpu));
>> > +	timer->armed = true;
>> > +	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
>> >  }
>> >  
>> >  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>> >  {
>> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>> > -	timer_disarm(timer);
>> > +
>> > +	soft_timer_cancel(&timer->timer, &timer->expired);
>> > +	timer->armed = false;
>> >  }
>> >  
>> >  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
>> > @@ -448,7 +447,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>> >  	 * This is to cancel the background timer for the physical timer
>> >  	 * emulation if it is set.
>> >  	 */
>> > -	timer_disarm(timer);
>> > +	soft_timer_cancel(&timer->timer, &timer->expired);
>> 
>> timer_disarm() used to set timer->armed to false, but that's not the
>> case any more. Don't we risk hitting the BUG_ON() in kvm_timer_schedule
>> if we hit WFI?
>> 
>
> We do, and I just didn't hit that because this goes away at the end of
> the series, and I didn't vigurously test every single patch in the
> series (just a compile test).
>
> We actually only use the armed flag for the BUG_ON(), and I don't think
> we need that check really.  So I suggest simply merging this logic into
> this patch:
>
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index f0053f884b4a..d0beae98f755 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -48,9 +48,6 @@ struct arch_timer_cpu {
>  	/* Work queued with the above timer expires */
>  	struct work_struct		expired;
>  
> -	/* Background timer active */
> -	bool				armed;
> -
>  	/* Is the timer enabled */
>  	bool			enabled;
>  };
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 871d8ae52f9b..98643bc696a9 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -56,11 +56,6 @@ u64 kvm_phys_timer_read(void)
>  	return timecounter->cc->read(timecounter->cc);
>  }
>  
> -static bool soft_timer_is_armed(struct arch_timer_cpu *timer)
> -{
> -	return timer->armed;
> -}
> -
>  static void soft_timer_start(struct hrtimer *hrt, u64 ns)
>  {
>  	hrtimer_start(hrt, ktime_add_ns(ktime_get(), ns),
> @@ -281,8 +276,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>  	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> -	BUG_ON(soft_timer_is_armed(timer));
> -
>  	/*
>  	 * No need to schedule a background timer if any guest timer has
>  	 * already expired, because kvm_vcpu_block will return before putting
> @@ -302,7 +295,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  	 * The guest timers have not yet expired, schedule a background timer.
>  	 * Set the earliest expiration time among the guest timers.
>  	 */
> -	timer->armed = true;
>  	soft_timer_start(&timer->timer, kvm_timer_earliest_exp(vcpu));
>  }
>  
> @@ -311,7 +303,6 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
>  	soft_timer_cancel(&timer->timer, &timer->expired);
> -	timer->armed = false;
>  }
>  
>  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)

Yes, this seems like a sensible thing to do.

Thanks,

	M.
-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
  2017-10-18 16:03         ` Marc Zyngier
@ 2017-10-18 19:16           ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 19:16 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel, kvm

On Wed, Oct 18, 2017 at 05:03:40PM +0100, Marc Zyngier wrote:
> On Wed, Oct 18 2017 at  3:41:45 pm BST, Christoffer Dall <cdall@linaro.org> wrote:
> > On Mon, Oct 09, 2017 at 05:47:18PM +0100, Marc Zyngier wrote:
> >> On 23/09/17 01:41, Christoffer Dall wrote:
> >> > Some systems without proper firmware and/or hardware description data
> >> > don't support the split EOI and deactivate operation.
> >> > 
> >> > On such systems, we cannot leave the physical interrupt active after the
> >> > timer handler on the host has run, so we cannot support KVM with an
> >> > in-kernel GIC with the timer changes we are about to introduce.
> >> > 
> >> > This patch makes sure that trying to initialize the KVM GIC code will
> >> > fail on such systems.
> >> > 
> >> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> >> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> >> > ---
> >> >  drivers/irqchip/irq-gic.c | 3 ++-
> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> >> > index f641e8e..ab12bf4 100644
> >> > --- a/drivers/irqchip/irq-gic.c
> >> > +++ b/drivers/irqchip/irq-gic.c
> >> > @@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
> >> >  	if (ret)
> >> >  		return;
> >> >  
> >> > -	gic_set_kvm_info(&gic_v2_kvm_info);
> >> > +	if (static_key_true(&supports_deactivate))
> >> > +		gic_set_kvm_info(&gic_v2_kvm_info);
> >> >  }
> >> >  
> >> >  int __init
> >> > 
> >> 
> >> Should we add the same level of checking on the ACPI path, just for the
> >> sake symmetry?
> >
> > Yes, we should, if anyone is crazy enough to use ACPI :)
> 
> Sadly, the madness is becoming commonplace.
> 
> >> 
> >> Also, do we need to add the same thing for GICv3?
> >> 
> >
> > Why would split EOI/deactivate not be available on GICv3, actually?  It
> > looks like this is not supported unless you have EL2, but I can't seem
> > to find anything in the spec for this, and KVM should support
> > EOI/deactivate for GICv3 guests I think.  Am I missing something?
> 
> No, you're not. This is just a Linux choice (or rather mine) not to use
> EOImode=1 in guests (or anything booted at EL1), as we don't really need
> the two-stage deactivate in that situation (it is pure overhead).
> 
> I'm just worried of potentially broken HW, and would like to make sure
> that when we force EOImode=0 on these systems, we truly tell KVM about
> it.
> 

Yes, makes sense, it's also more cosistent that way.

> > Assuming I'm wrong about GICv3, which I probably am, how does this look
> > (on top of the posted patch):
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 519149e..aed524c 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -1228,7 +1228,9 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
> >  		goto out_unmap_rdist;
> >  
> >  	gic_populate_ppi_partitions(node);
> > -	gic_of_setup_kvm_info(node);
> > +
> > +	if (static_key_true(&supports_deactivate))
> > +		gic_of_setup_kvm_info(node);
> >  	return 0;
> >  
> >  out_unmap_rdist:
> > @@ -1517,7 +1519,9 @@ gic_acpi_init(struct acpi_subtable_header *header, const unsigned long end)
> >  		goto out_fwhandle_free;
> >  
> >  	acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
> > -	gic_acpi_setup_kvm_info();
> > +
> > +	if (static_key_true(&supports_deactivate))
> > +		gic_acpi_setup_kvm_info();
> >  
> >  	return 0;
> >  
> > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> > index ab12bf4..121af5c 100644
> > --- a/drivers/irqchip/irq-gic.c
> > +++ b/drivers/irqchip/irq-gic.c
> > @@ -1653,7 +1653,8 @@ static int __init gic_v2_acpi_init(struct acpi_subtable_header *header,
> >  	if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
> >  		gicv2m_init(NULL, gic_data[0].domain);
> >  
> > -	gic_acpi_setup_kvm_info();
> > +	if (static_key_true(&supports_deactivate))
> > +		gic_acpi_setup_kvm_info();
> >  
> >  	return 0;
> >  }
> 
> Yup, looks good to me!
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate
@ 2017-10-18 19:16           ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-18 19:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 18, 2017 at 05:03:40PM +0100, Marc Zyngier wrote:
> On Wed, Oct 18 2017 at  3:41:45 pm BST, Christoffer Dall <cdall@linaro.org> wrote:
> > On Mon, Oct 09, 2017 at 05:47:18PM +0100, Marc Zyngier wrote:
> >> On 23/09/17 01:41, Christoffer Dall wrote:
> >> > Some systems without proper firmware and/or hardware description data
> >> > don't support the split EOI and deactivate operation.
> >> > 
> >> > On such systems, we cannot leave the physical interrupt active after the
> >> > timer handler on the host has run, so we cannot support KVM with an
> >> > in-kernel GIC with the timer changes we are about to introduce.
> >> > 
> >> > This patch makes sure that trying to initialize the KVM GIC code will
> >> > fail on such systems.
> >> > 
> >> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> >> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> >> > ---
> >> >  drivers/irqchip/irq-gic.c | 3 ++-
> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> >> > index f641e8e..ab12bf4 100644
> >> > --- a/drivers/irqchip/irq-gic.c
> >> > +++ b/drivers/irqchip/irq-gic.c
> >> > @@ -1420,7 +1420,8 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
> >> >  	if (ret)
> >> >  		return;
> >> >  
> >> > -	gic_set_kvm_info(&gic_v2_kvm_info);
> >> > +	if (static_key_true(&supports_deactivate))
> >> > +		gic_set_kvm_info(&gic_v2_kvm_info);
> >> >  }
> >> >  
> >> >  int __init
> >> > 
> >> 
> >> Should we add the same level of checking on the ACPI path, just for the
> >> sake symmetry?
> >
> > Yes, we should, if anyone is crazy enough to use ACPI :)
> 
> Sadly, the madness is becoming commonplace.
> 
> >> 
> >> Also, do we need to add the same thing for GICv3?
> >> 
> >
> > Why would split EOI/deactivate not be available on GICv3, actually?  It
> > looks like this is not supported unless you have EL2, but I can't seem
> > to find anything in the spec for this, and KVM should support
> > EOI/deactivate for GICv3 guests I think.  Am I missing something?
> 
> No, you're not. This is just a Linux choice (or rather mine) not to use
> EOImode=1 in guests (or anything booted at EL1), as we don't really need
> the two-stage deactivate in that situation (it is pure overhead).
> 
> I'm just worried of potentially broken HW, and would like to make sure
> that when we force EOImode=0 on these systems, we truly tell KVM about
> it.
> 

Yes, makes sense, it's also more cosistent that way.

> > Assuming I'm wrong about GICv3, which I probably am, how does this look
> > (on top of the posted patch):
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 519149e..aed524c 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -1228,7 +1228,9 @@ static int __init gic_of_init(struct device_node *node, struct device_node *pare
> >  		goto out_unmap_rdist;
> >  
> >  	gic_populate_ppi_partitions(node);
> > -	gic_of_setup_kvm_info(node);
> > +
> > +	if (static_key_true(&supports_deactivate))
> > +		gic_of_setup_kvm_info(node);
> >  	return 0;
> >  
> >  out_unmap_rdist:
> > @@ -1517,7 +1519,9 @@ gic_acpi_init(struct acpi_subtable_header *header, const unsigned long end)
> >  		goto out_fwhandle_free;
> >  
> >  	acpi_set_irq_model(ACPI_IRQ_MODEL_GIC, domain_handle);
> > -	gic_acpi_setup_kvm_info();
> > +
> > +	if (static_key_true(&supports_deactivate))
> > +		gic_acpi_setup_kvm_info();
> >  
> >  	return 0;
> >  
> > diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> > index ab12bf4..121af5c 100644
> > --- a/drivers/irqchip/irq-gic.c
> > +++ b/drivers/irqchip/irq-gic.c
> > @@ -1653,7 +1653,8 @@ static int __init gic_v2_acpi_init(struct acpi_subtable_header *header,
> >  	if (IS_ENABLED(CONFIG_ARM_GIC_V2M))
> >  		gicv2m_init(NULL, gic_data[0].domain);
> >  
> > -	gic_acpi_setup_kvm_info();
> > +	if (static_key_true(&supports_deactivate))
> > +		gic_acpi_setup_kvm_info();
> >  
> >  	return 0;
> >  }
> 
> Yup, looks good to me!
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 09/20] KVM: arm/arm64: Use separate timer for phys timer emulation
  2017-10-09 17:23     ` Marc Zyngier
@ 2017-10-19  7:38       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  7:38 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Mon, Oct 09, 2017 at 06:23:45PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > We were using the same hrtimer for emulating the physical timer and for
> > making sure a blocking VCPU thread would be eventually woken up.  That
> > worked fine in the previous arch timer design, but as we are about to
> > actually use the soft timer expire function for the physical timer
> > emulation, change the logic to use a dedicated hrtimer.
> > 
> > This has the added benefit of not having to cancel any work in the sync
> > path, which in turn allows us to run the flush and sync with IRQs
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  include/kvm/arm_arch_timer.h |  3 +++
> >  virt/kvm/arm/arch_timer.c    | 18 ++++++++++++++----
> >  2 files changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index dcbb2e1..16887c0 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -47,6 +47,9 @@ struct arch_timer_cpu {
> >  	/* Work queued with the above timer expires */
> >  	struct work_struct		expired;
> >  
> > +	/* Physical timer emulation */
> > +	struct hrtimer			phys_timer;
> > +
> >  	/* Background timer active */
> >  	bool				armed;
> >  
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index c2e8326..7f87099 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -178,6 +178,12 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
> > +{
> > +	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> > +	return HRTIMER_NORESTART;
> > +}
> > +
> 
> So what prevents this handler from actually firing? Is it that we cancel
> the hrtimer while interrupts are still disabled, hence the timer never
> fires? If that's the intention, then this patch is slightly out of
> place, as we haven't moved the timer sync within the irq_disable() section.
> 
> Or am I missing something obvious?
> 

No you're not missing anything, indeed, that is broken.  I think I had
in the back of my mind that we disable stuff in the world-switch still,
but that obviously doesn't apply to the soft timers.

I'll just move this patch following the next one where interrupts are
disabled.

Nice catch!

> >  bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
> >  {
> >  	u64 cval, now;
> > @@ -255,7 +261,7 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >  }
> >  
> >  /* Schedule the background timer for the emulated timer. */
> > -static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> > +static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> >  			      struct arch_timer_context *timer_ctx)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > @@ -267,7 +273,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> >  		return;
> >  
> >  	/*  The timer has not yet expired, schedule a background timer */
> > -	soft_timer_start(&timer->bg_timer, kvm_timer_compute_delta(timer_ctx));
> > +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> >  /*
> > @@ -424,7 +430,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	kvm_timer_update_state(vcpu);
> >  
> >  	/* Set the background timer for the physical timer emulation. */
> > -	kvm_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> > +	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> >  
> >  	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> >  		kvm_timer_flush_hwstate_user(vcpu);
> > @@ -447,7 +453,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  	 * This is to cancel the background timer for the physical timer
> >  	 * emulation if it is set.
> >  	 */
> > -	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> > +	soft_timer_cancel(&timer->phys_timer, NULL);
> 
> Right, that now explains the "work" test in one of the previous patches.
> 

Yes, I've moved the addition of the test to this patch which actually
uses is.

> >  
> >  	/*
> >  	 * The guest could have modified the timer registers or the timer
> > @@ -507,6 +513,9 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> >  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> >  	timer->bg_timer.function = kvm_bg_timer_expire;
> >  
> > +	hrtimer_init(&timer->phys_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> > +	timer->phys_timer.function = kvm_phys_timer_expire;
> > +
> >  	vtimer->irq.irq = default_vtimer_irq.irq;
> >  	ptimer->irq.irq = default_ptimer_irq.irq;
> >  }
> > @@ -615,6 +624,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> > +	soft_timer_cancel(&timer->phys_timer, NULL);
> >  	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
> >  }
> >  
> > 
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 09/20] KVM: arm/arm64: Use separate timer for phys timer emulation
@ 2017-10-19  7:38       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  7:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 06:23:45PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > We were using the same hrtimer for emulating the physical timer and for
> > making sure a blocking VCPU thread would be eventually woken up.  That
> > worked fine in the previous arch timer design, but as we are about to
> > actually use the soft timer expire function for the physical timer
> > emulation, change the logic to use a dedicated hrtimer.
> > 
> > This has the added benefit of not having to cancel any work in the sync
> > path, which in turn allows us to run the flush and sync with IRQs
> > disabled.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  include/kvm/arm_arch_timer.h |  3 +++
> >  virt/kvm/arm/arch_timer.c    | 18 ++++++++++++++----
> >  2 files changed, 17 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index dcbb2e1..16887c0 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -47,6 +47,9 @@ struct arch_timer_cpu {
> >  	/* Work queued with the above timer expires */
> >  	struct work_struct		expired;
> >  
> > +	/* Physical timer emulation */
> > +	struct hrtimer			phys_timer;
> > +
> >  	/* Background timer active */
> >  	bool				armed;
> >  
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index c2e8326..7f87099 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -178,6 +178,12 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
> > +{
> > +	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> > +	return HRTIMER_NORESTART;
> > +}
> > +
> 
> So what prevents this handler from actually firing? Is it that we cancel
> the hrtimer while interrupts are still disabled, hence the timer never
> fires? If that's the intention, then this patch is slightly out of
> place, as we haven't moved the timer sync within the irq_disable() section.
> 
> Or am I missing something obvious?
> 

No you're not missing anything, indeed, that is broken.  I think I had
in the back of my mind that we disable stuff in the world-switch still,
but that obviously doesn't apply to the soft timers.

I'll just move this patch following the next one where interrupts are
disabled.

Nice catch!

> >  bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
> >  {
> >  	u64 cval, now;
> > @@ -255,7 +261,7 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >  }
> >  
> >  /* Schedule the background timer for the emulated timer. */
> > -static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> > +static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> >  			      struct arch_timer_context *timer_ctx)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > @@ -267,7 +273,7 @@ static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> >  		return;
> >  
> >  	/*  The timer has not yet expired, schedule a background timer */
> > -	soft_timer_start(&timer->bg_timer, kvm_timer_compute_delta(timer_ctx));
> > +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> >  /*
> > @@ -424,7 +430,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	kvm_timer_update_state(vcpu);
> >  
> >  	/* Set the background timer for the physical timer emulation. */
> > -	kvm_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> > +	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> >  
> >  	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> >  		kvm_timer_flush_hwstate_user(vcpu);
> > @@ -447,7 +453,7 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  	 * This is to cancel the background timer for the physical timer
> >  	 * emulation if it is set.
> >  	 */
> > -	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> > +	soft_timer_cancel(&timer->phys_timer, NULL);
> 
> Right, that now explains the "work" test in one of the previous patches.
> 

Yes, I've moved the addition of the test to this patch which actually
uses is.

> >  
> >  	/*
> >  	 * The guest could have modified the timer registers or the timer
> > @@ -507,6 +513,9 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> >  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> >  	timer->bg_timer.function = kvm_bg_timer_expire;
> >  
> > +	hrtimer_init(&timer->phys_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> > +	timer->phys_timer.function = kvm_phys_timer_expire;
> > +
> >  	vtimer->irq.irq = default_vtimer_irq.irq;
> >  	ptimer->irq.irq = default_ptimer_irq.irq;
> >  }
> > @@ -615,6 +624,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> > +	soft_timer_cancel(&timer->phys_timer, NULL);
> >  	kvm_vgic_unmap_phys_irq(vcpu, vtimer->irq.irq);
> >  }
> >  
> > 
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code
  2017-10-09 17:47     ` Marc Zyngier
@ 2017-10-19  7:46       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  7:46 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Mon, Oct 09, 2017 at 06:47:42PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > As we are about to be lazy with saving and restoring the timer
> > registers, we prepare by moving all possible timer configuration logic
> > out of the hyp code.  All virtual timer registers can be programmed from
> > EL1 and since the arch timer is always a level triggered interrupt we
> > can safely do this with interrupts disabled in the host kernel on the
> > way to the guest without taking vtimer interrupts in the host kernel
> > (yet).
> > 
> > The downside is that the cntvoff register can only be programmed from
> > hyp mode, so we jump into hyp mode and back to program it.  This is also
> > safe, because the host kernel doesn't use the virtual timer in the KVM
> > code.  It may add a little performance performance penalty, but only
> > until following commits where we move this operation to vcpu load/put.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  arch/arm/include/asm/kvm_asm.h   |  2 ++
> >  arch/arm/include/asm/kvm_hyp.h   |  4 +--
> >  arch/arm/kvm/hyp/switch.c        |  7 ++--
> >  arch/arm64/include/asm/kvm_asm.h |  2 ++
> >  arch/arm64/include/asm/kvm_hyp.h |  4 +--
> >  arch/arm64/kvm/hyp/switch.c      |  6 ++--
> >  virt/kvm/arm/arch_timer.c        | 40 ++++++++++++++++++++++
> >  virt/kvm/arm/hyp/timer-sr.c      | 74 +++++++++++++++++-----------------------
> >  8 files changed, 87 insertions(+), 52 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> > index 14d68a4..36dd296 100644
> > --- a/arch/arm/include/asm/kvm_asm.h
> > +++ b/arch/arm/include/asm/kvm_asm.h
> > @@ -68,6 +68,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> > +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> > +
> >  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >  
> >  extern void __init_stage2_translation(void);
> > diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> > index 14b5903..ab20ffa 100644
> > --- a/arch/arm/include/asm/kvm_hyp.h
> > +++ b/arch/arm/include/asm/kvm_hyp.h
> > @@ -98,8 +98,8 @@
> >  #define cntvoff_el2			CNTVOFF
> >  #define cnthctl_el2			CNTHCTL
> >  
> > -void __timer_save_state(struct kvm_vcpu *vcpu);
> > -void __timer_restore_state(struct kvm_vcpu *vcpu);
> > +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> > +void __timer_disable_traps(struct kvm_vcpu *vcpu);
> >  
> >  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> >  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> > diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> > index ebd2dd4..330c9ce 100644
> > --- a/arch/arm/kvm/hyp/switch.c
> > +++ b/arch/arm/kvm/hyp/switch.c
> > @@ -174,7 +174,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  	__activate_vm(vcpu);
> >  
> >  	__vgic_restore_state(vcpu);
> > -	__timer_restore_state(vcpu);
> > +	__timer_enable_traps(vcpu);
> >  
> >  	__sysreg_restore_state(guest_ctxt);
> >  	__banked_restore_state(guest_ctxt);
> > @@ -191,7 +191,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  
> >  	__banked_save_state(guest_ctxt);
> >  	__sysreg_save_state(guest_ctxt);
> > -	__timer_save_state(vcpu);
> > +	__timer_disable_traps(vcpu);
> > +
> >  	__vgic_save_state(vcpu);
> >  
> >  	__deactivate_traps(vcpu);
> > @@ -237,7 +238,7 @@ void __hyp_text __noreturn __hyp_panic(int cause)
> >  
> >  		vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
> >  		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -		__timer_save_state(vcpu);
> > +		__timer_disable_traps(vcpu);
> >  		__deactivate_traps(vcpu);
> >  		__deactivate_vm(vcpu);
> >  		__banked_restore_state(host_ctxt);
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 26a64d0..ab4d0a9 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -55,6 +55,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> > +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> > +
> >  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >  
> >  extern u64 __vgic_v3_get_ich_vtr_el2(void);
> > diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> > index 4572a9b..08d3bb6 100644
> > --- a/arch/arm64/include/asm/kvm_hyp.h
> > +++ b/arch/arm64/include/asm/kvm_hyp.h
> > @@ -129,8 +129,8 @@ void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> >  void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> >  int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
> >  
> > -void __timer_save_state(struct kvm_vcpu *vcpu);
> > -void __timer_restore_state(struct kvm_vcpu *vcpu);
> > +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> > +void __timer_disable_traps(struct kvm_vcpu *vcpu);
> >  
> >  void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
> >  void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 945e79c..4994f4b 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -298,7 +298,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  	__activate_vm(vcpu);
> >  
> >  	__vgic_restore_state(vcpu);
> > -	__timer_restore_state(vcpu);
> > +	__timer_enable_traps(vcpu);
> >  
> >  	/*
> >  	 * We must restore the 32-bit state before the sysregs, thanks
> > @@ -368,7 +368,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  
> >  	__sysreg_save_guest_state(guest_ctxt);
> >  	__sysreg32_save_state(vcpu);
> > -	__timer_save_state(vcpu);
> > +	__timer_disable_traps(vcpu);
> >  	__vgic_save_state(vcpu);
> >  
> >  	__deactivate_traps(vcpu);
> > @@ -436,7 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void)
> >  
> >  		vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
> >  		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -		__timer_save_state(vcpu);
> > +		__timer_disable_traps(vcpu);
> >  		__deactivate_traps(vcpu);
> >  		__deactivate_vm(vcpu);
> >  		__sysreg_restore_host_state(host_ctxt);
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 7f87099..4254f88 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -276,6 +276,20 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> >  	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> > +static void timer_save_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (timer->enabled) {
> > +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > +		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> > +	}
> > +
> > +	/* Disable the virtual timer */
> > +	write_sysreg_el0(0, cntv_ctl);
> > +}
> > +
> >  /*
> >   * Schedule the background timer before calling kvm_vcpu_block, so that this
> >   * thread is removed from its waitqueue and made runnable when there's a timer
> > @@ -312,6 +326,18 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
> >  }
> >  
> > +static void timer_restore_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (timer->enabled) {
> > +		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> > +		isb();
> > +		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> > +	}
> > +}
> > +
> >  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > @@ -320,6 +346,13 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> >  	timer->armed = false;
> >  }
> >  
> > +static void set_cntvoff(u64 cntvoff)
> > +{
> > +	u32 low = cntvoff & GENMASK(31, 0);
> > +	u32 high = (cntvoff >> 32) & GENMASK(31, 0);
> 
> upper_32_bits/lower_32_bits?
> 
> > +	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
> 
> Maybe a comment as to why we need to split the 64bit value in two 32bit
> words (32bit ARM PCS is getting in the way).
> 
> > +}
> > +
> >  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > @@ -423,6 +456,7 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> >  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	if (unlikely(!timer->enabled))
> >  		return;
> > @@ -436,6 +470,9 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  		kvm_timer_flush_hwstate_user(vcpu);
> >  	else
> >  		kvm_timer_flush_hwstate_vgic(vcpu);
> > +
> > +	set_cntvoff(vtimer->cntvoff);
> > +	timer_restore_state(vcpu);
> >  }
> >  
> >  /**
> > @@ -455,6 +492,9 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  	 */
> >  	soft_timer_cancel(&timer->phys_timer, NULL);
> >  
> > +	timer_save_state(vcpu);
> > +	set_cntvoff(0);
> > +
> >  	/*
> >  	 * The guest could have modified the timer registers or the timer
> >  	 * could have expired, update the timer state.
> > diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> > index 4734915..a6c3b10 100644
> > --- a/virt/kvm/arm/hyp/timer-sr.c
> > +++ b/virt/kvm/arm/hyp/timer-sr.c
> > @@ -21,58 +21,48 @@
> >  
> >  #include <asm/kvm_hyp.h>
> >  
> > -/* vcpu is already in the HYP VA space */
> > -void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
> > +void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> > +{
> > +	u64 cntvoff = (u64)cntvoff_high << 32 | cntvoff_low;
> > +	write_sysreg(cntvoff, cntvoff_el2);
> > +}
> > +
> > +void __hyp_text enable_phys_timer(void)
> >  {
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  	u64 val;
> >  
> > -	if (timer->enabled) {
> > -		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > -		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> > -	}
> > +	/* Allow physical timer/counter access for the host */
> > +	val = read_sysreg(cnthctl_el2);
> > +	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> > +	write_sysreg(val, cnthctl_el2);
> > +}
> >  
> > -	/* Disable the virtual timer */
> > -	write_sysreg_el0(0, cntv_ctl);
> > +void __hyp_text disable_phys_timer(void)
> > +{
> > +	u64 val;
> >  
> >  	/*
> > +	 * Disallow physical timer access for the guest
> > +	 * Physical counter access is allowed
> > +	 */
> > +	val = read_sysreg(cnthctl_el2);
> > +	val &= ~CNTHCTL_EL1PCEN;
> > +	val |= CNTHCTL_EL1PCTEN;
> > +	write_sysreg(val, cnthctl_el2);
> > +}
> > +
> > +void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> > +{
> > +	/*
> >  	 * We don't need to do this for VHE since the host kernel runs in EL2
> >  	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
> >  	 */
> > -	if (!has_vhe()) {
> > -		/* Allow physical timer/counter access for the host */
> > -		val = read_sysreg(cnthctl_el2);
> > -		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> > -		write_sysreg(val, cnthctl_el2);
> > -	}
> > -
> > -	/* Clear cntvoff for the host */
> > -	write_sysreg(0, cntvoff_el2);
> > +	if (!has_vhe())
> > +		enable_phys_timer();
> >  }
> >  
> > -void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
> > +void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
> >  {
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > -	u64 val;
> > -
> > -	/* Those bits are already configured at boot on VHE-system */
> > -	if (!has_vhe()) {
> > -		/*
> > -		 * Disallow physical timer access for the guest
> > -		 * Physical counter access is allowed
> > -		 */
> > -		val = read_sysreg(cnthctl_el2);
> > -		val &= ~CNTHCTL_EL1PCEN;
> > -		val |= CNTHCTL_EL1PCTEN;
> > -		write_sysreg(val, cnthctl_el2);
> > -	}
> > -
> > -	if (timer->enabled) {
> > -		write_sysreg(vtimer->cntvoff, cntvoff_el2);
> > -		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> > -		isb();
> > -		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> > -	}
> > +	if (!has_vhe())
> > +		disable_phys_timer();
> >  }
> > 
> 
> Otherwise:
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> 

Thanks, I changed the patch in the following way:


diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 6c8baf84b4f0..93c8973a71f4 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -339,8 +339,16 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 
 static void set_cntvoff(u64 cntvoff)
 {
-	u32 low = cntvoff & GENMASK(31, 0);
-	u32 high = (cntvoff >> 32) & GENMASK(31, 0);
+	u32 low = lower_32_bits(cntvoff);
+	u32 high = upper_32_bits(cntvoff);
+
+	/*
+	 * Since kvm_call_hyp doesn't fully support the ARM PCS especially on
+	 * 32-bit systems, but rather passes register by register shifted one
+	 * place (we put the function address in r0/x0), we cannot simply pass
+	 * a 64-bit value as an argument, but have to split the value in two
+	 * 32-bit halves.
+	 */
 	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
 }
 

 -Christoffer

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v3 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code
@ 2017-10-19  7:46       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  7:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 09, 2017 at 06:47:42PM +0100, Marc Zyngier wrote:
> On 23/09/17 01:41, Christoffer Dall wrote:
> > As we are about to be lazy with saving and restoring the timer
> > registers, we prepare by moving all possible timer configuration logic
> > out of the hyp code.  All virtual timer registers can be programmed from
> > EL1 and since the arch timer is always a level triggered interrupt we
> > can safely do this with interrupts disabled in the host kernel on the
> > way to the guest without taking vtimer interrupts in the host kernel
> > (yet).
> > 
> > The downside is that the cntvoff register can only be programmed from
> > hyp mode, so we jump into hyp mode and back to program it.  This is also
> > safe, because the host kernel doesn't use the virtual timer in the KVM
> > code.  It may add a little performance performance penalty, but only
> > until following commits where we move this operation to vcpu load/put.
> > 
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  arch/arm/include/asm/kvm_asm.h   |  2 ++
> >  arch/arm/include/asm/kvm_hyp.h   |  4 +--
> >  arch/arm/kvm/hyp/switch.c        |  7 ++--
> >  arch/arm64/include/asm/kvm_asm.h |  2 ++
> >  arch/arm64/include/asm/kvm_hyp.h |  4 +--
> >  arch/arm64/kvm/hyp/switch.c      |  6 ++--
> >  virt/kvm/arm/arch_timer.c        | 40 ++++++++++++++++++++++
> >  virt/kvm/arm/hyp/timer-sr.c      | 74 +++++++++++++++++-----------------------
> >  8 files changed, 87 insertions(+), 52 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> > index 14d68a4..36dd296 100644
> > --- a/arch/arm/include/asm/kvm_asm.h
> > +++ b/arch/arm/include/asm/kvm_asm.h
> > @@ -68,6 +68,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> > +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> > +
> >  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >  
> >  extern void __init_stage2_translation(void);
> > diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
> > index 14b5903..ab20ffa 100644
> > --- a/arch/arm/include/asm/kvm_hyp.h
> > +++ b/arch/arm/include/asm/kvm_hyp.h
> > @@ -98,8 +98,8 @@
> >  #define cntvoff_el2			CNTVOFF
> >  #define cnthctl_el2			CNTHCTL
> >  
> > -void __timer_save_state(struct kvm_vcpu *vcpu);
> > -void __timer_restore_state(struct kvm_vcpu *vcpu);
> > +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> > +void __timer_disable_traps(struct kvm_vcpu *vcpu);
> >  
> >  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> >  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> > diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
> > index ebd2dd4..330c9ce 100644
> > --- a/arch/arm/kvm/hyp/switch.c
> > +++ b/arch/arm/kvm/hyp/switch.c
> > @@ -174,7 +174,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  	__activate_vm(vcpu);
> >  
> >  	__vgic_restore_state(vcpu);
> > -	__timer_restore_state(vcpu);
> > +	__timer_enable_traps(vcpu);
> >  
> >  	__sysreg_restore_state(guest_ctxt);
> >  	__banked_restore_state(guest_ctxt);
> > @@ -191,7 +191,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  
> >  	__banked_save_state(guest_ctxt);
> >  	__sysreg_save_state(guest_ctxt);
> > -	__timer_save_state(vcpu);
> > +	__timer_disable_traps(vcpu);
> > +
> >  	__vgic_save_state(vcpu);
> >  
> >  	__deactivate_traps(vcpu);
> > @@ -237,7 +238,7 @@ void __hyp_text __noreturn __hyp_panic(int cause)
> >  
> >  		vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR);
> >  		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -		__timer_save_state(vcpu);
> > +		__timer_disable_traps(vcpu);
> >  		__deactivate_traps(vcpu);
> >  		__deactivate_vm(vcpu);
> >  		__banked_restore_state(host_ctxt);
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index 26a64d0..ab4d0a9 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -55,6 +55,8 @@ extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
> >  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> >  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> >  
> > +extern void __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high);
> > +
> >  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
> >  
> >  extern u64 __vgic_v3_get_ich_vtr_el2(void);
> > diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> > index 4572a9b..08d3bb6 100644
> > --- a/arch/arm64/include/asm/kvm_hyp.h
> > +++ b/arch/arm64/include/asm/kvm_hyp.h
> > @@ -129,8 +129,8 @@ void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> >  void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> >  int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
> >  
> > -void __timer_save_state(struct kvm_vcpu *vcpu);
> > -void __timer_restore_state(struct kvm_vcpu *vcpu);
> > +void __timer_enable_traps(struct kvm_vcpu *vcpu);
> > +void __timer_disable_traps(struct kvm_vcpu *vcpu);
> >  
> >  void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
> >  void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 945e79c..4994f4b 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c
> > @@ -298,7 +298,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  	__activate_vm(vcpu);
> >  
> >  	__vgic_restore_state(vcpu);
> > -	__timer_restore_state(vcpu);
> > +	__timer_enable_traps(vcpu);
> >  
> >  	/*
> >  	 * We must restore the 32-bit state before the sysregs, thanks
> > @@ -368,7 +368,7 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
> >  
> >  	__sysreg_save_guest_state(guest_ctxt);
> >  	__sysreg32_save_state(vcpu);
> > -	__timer_save_state(vcpu);
> > +	__timer_disable_traps(vcpu);
> >  	__vgic_save_state(vcpu);
> >  
> >  	__deactivate_traps(vcpu);
> > @@ -436,7 +436,7 @@ void __hyp_text __noreturn __hyp_panic(void)
> >  
> >  		vcpu = (struct kvm_vcpu *)read_sysreg(tpidr_el2);
> >  		host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> > -		__timer_save_state(vcpu);
> > +		__timer_disable_traps(vcpu);
> >  		__deactivate_traps(vcpu);
> >  		__deactivate_vm(vcpu);
> >  		__sysreg_restore_host_state(host_ctxt);
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 7f87099..4254f88 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -276,6 +276,20 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> >  	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> > +static void timer_save_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (timer->enabled) {
> > +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > +		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> > +	}
> > +
> > +	/* Disable the virtual timer */
> > +	write_sysreg_el0(0, cntv_ctl);
> > +}
> > +
> >  /*
> >   * Schedule the background timer before calling kvm_vcpu_block, so that this
> >   * thread is removed from its waitqueue and made runnable when there's a timer
> > @@ -312,6 +326,18 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
> >  }
> >  
> > +static void timer_restore_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (timer->enabled) {
> > +		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> > +		isb();
> > +		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> > +	}
> > +}
> > +
> >  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > @@ -320,6 +346,13 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> >  	timer->armed = false;
> >  }
> >  
> > +static void set_cntvoff(u64 cntvoff)
> > +{
> > +	u32 low = cntvoff & GENMASK(31, 0);
> > +	u32 high = (cntvoff >> 32) & GENMASK(31, 0);
> 
> upper_32_bits/lower_32_bits?
> 
> > +	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
> 
> Maybe a comment as to why we need to split the 64bit value in two 32bit
> words (32bit ARM PCS is getting in the way).
> 
> > +}
> > +
> >  static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > @@ -423,6 +456,7 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> >  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	if (unlikely(!timer->enabled))
> >  		return;
> > @@ -436,6 +470,9 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  		kvm_timer_flush_hwstate_user(vcpu);
> >  	else
> >  		kvm_timer_flush_hwstate_vgic(vcpu);
> > +
> > +	set_cntvoff(vtimer->cntvoff);
> > +	timer_restore_state(vcpu);
> >  }
> >  
> >  /**
> > @@ -455,6 +492,9 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  	 */
> >  	soft_timer_cancel(&timer->phys_timer, NULL);
> >  
> > +	timer_save_state(vcpu);
> > +	set_cntvoff(0);
> > +
> >  	/*
> >  	 * The guest could have modified the timer registers or the timer
> >  	 * could have expired, update the timer state.
> > diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> > index 4734915..a6c3b10 100644
> > --- a/virt/kvm/arm/hyp/timer-sr.c
> > +++ b/virt/kvm/arm/hyp/timer-sr.c
> > @@ -21,58 +21,48 @@
> >  
> >  #include <asm/kvm_hyp.h>
> >  
> > -/* vcpu is already in the HYP VA space */
> > -void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
> > +void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> > +{
> > +	u64 cntvoff = (u64)cntvoff_high << 32 | cntvoff_low;
> > +	write_sysreg(cntvoff, cntvoff_el2);
> > +}
> > +
> > +void __hyp_text enable_phys_timer(void)
> >  {
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  	u64 val;
> >  
> > -	if (timer->enabled) {
> > -		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > -		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> > -	}
> > +	/* Allow physical timer/counter access for the host */
> > +	val = read_sysreg(cnthctl_el2);
> > +	val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> > +	write_sysreg(val, cnthctl_el2);
> > +}
> >  
> > -	/* Disable the virtual timer */
> > -	write_sysreg_el0(0, cntv_ctl);
> > +void __hyp_text disable_phys_timer(void)
> > +{
> > +	u64 val;
> >  
> >  	/*
> > +	 * Disallow physical timer access for the guest
> > +	 * Physical counter access is allowed
> > +	 */
> > +	val = read_sysreg(cnthctl_el2);
> > +	val &= ~CNTHCTL_EL1PCEN;
> > +	val |= CNTHCTL_EL1PCTEN;
> > +	write_sysreg(val, cnthctl_el2);
> > +}
> > +
> > +void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> > +{
> > +	/*
> >  	 * We don't need to do this for VHE since the host kernel runs in EL2
> >  	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
> >  	 */
> > -	if (!has_vhe()) {
> > -		/* Allow physical timer/counter access for the host */
> > -		val = read_sysreg(cnthctl_el2);
> > -		val |= CNTHCTL_EL1PCTEN | CNTHCTL_EL1PCEN;
> > -		write_sysreg(val, cnthctl_el2);
> > -	}
> > -
> > -	/* Clear cntvoff for the host */
> > -	write_sysreg(0, cntvoff_el2);
> > +	if (!has_vhe())
> > +		enable_phys_timer();
> >  }
> >  
> > -void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
> > +void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
> >  {
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > -	u64 val;
> > -
> > -	/* Those bits are already configured at boot on VHE-system */
> > -	if (!has_vhe()) {
> > -		/*
> > -		 * Disallow physical timer access for the guest
> > -		 * Physical counter access is allowed
> > -		 */
> > -		val = read_sysreg(cnthctl_el2);
> > -		val &= ~CNTHCTL_EL1PCEN;
> > -		val |= CNTHCTL_EL1PCTEN;
> > -		write_sysreg(val, cnthctl_el2);
> > -	}
> > -
> > -	if (timer->enabled) {
> > -		write_sysreg(vtimer->cntvoff, cntvoff_el2);
> > -		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> > -		isb();
> > -		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> > -	}
> > +	if (!has_vhe())
> > +		disable_phys_timer();
> >  }
> > 
> 
> Otherwise:
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> 

Thanks, I changed the patch in the following way:


diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 6c8baf84b4f0..93c8973a71f4 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -339,8 +339,16 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
 
 static void set_cntvoff(u64 cntvoff)
 {
-	u32 low = cntvoff & GENMASK(31, 0);
-	u32 high = (cntvoff >> 32) & GENMASK(31, 0);
+	u32 low = lower_32_bits(cntvoff);
+	u32 high = upper_32_bits(cntvoff);
+
+	/*
+	 * Since kvm_call_hyp doesn't fully support the ARM PCS especially on
+	 * 32-bit systems, but rather passes register by register shifted one
+	 * place (we put the function address in r0/x0), we cannot simply pass
+	 * a 64-bit value as an argument, but have to split the value in two
+	 * 32-bit halves.
+	 */
 	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
 }
 

 -Christoffer

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
  2017-10-10  8:47     ` Marc Zyngier
@ 2017-10-19  8:15       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  8:15 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel, kvm

On Tue, Oct 10, 2017 at 09:47:33AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:01 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> > We don't need to save and restore the hardware timer state and examine
> > if it generates interrupts on on every entry/exit to the guest.  The
> > timer hardware is perfectly capable of telling us when it has expired
> > by signaling interrupts.
> >
> > When taking a vtimer interrupt in the host, we don't want to mess with
> > the timer configuration, we just want to forward the physical interrupt
> > to the guest as a virtual interrupt.  We can use the split priority drop
> > and deactivate feature of the GIC to do this, which leaves an EOI'ed
> > interrupt active on the physical distributor, making sure we don't keep
> > taking timer interrupts which would prevent the guest from running.  We
> > can then forward the physical interrupt to the VM using the HW bit in
> > the LR of the GIC VE, like we do already, which lets the guest directly
> 
> VE?
> 

Virtualization Extensions.  I can use GIC hardware virtualization
support or VGIC instead.

> > deactivate both the physical and virtual timer simultaneously, allowing
> > the timer hardware to exit the VM and generate a new physical interrupt
> > when the timer output is again asserted later on.
> >
> > We do need to capture this state when migrating VCPUs between physical
> > CPUs, however, which we use the vcpu put/load functions for, which are
> > called through preempt notifiers whenever the thread is scheduled away
> > from the CPU or called directly if we return from the ioctl to
> > userspace.
> >
> > One caveat is that we cannot restore the timer state during
> > kvm_timer_vcpu_load, because the flow of sleeping a VCPU is:
> >
> >   1. kvm_vcpu_block
> >   2. kvm_timer_schedule
> >   3. schedule
> >   4. kvm_timer_vcpu_put (preempt notifier)
> >   5. schedule (vcpu thread gets scheduled back)
> >   6. kvm_timer_vcpu_load
> >         <---- We restore the hardware state here, but the bg_timer
> > 	      hrtimer may have scheduled a work function that also
> > 	      changes the timer state here.
> >   7. kvm_timer_unschedule
> >         <---- We can restore the state here instead
> >
> > So, while we do need to restore the timer state in step (6) in all other
> > cases than when we called kvm_vcpu_block(), we have to defer the restore
> > to step (7) when coming back after kvm_vcpu_block().  Note that we
> > cannot simply call cancel_work_sync() in step (6), because vcpu_load can
> > be called from a preempt notifier.
> >
> > An added benefit beyond not having to read and write the timer sysregs
> > on every entry and exit is that we no longer have to actively write the
> > active state to the physical distributor, because we set the affinity of
> 
> I don't understand this thing about the affinity of the timer. It is a
> PPI, so it cannot go anywhere else.
> 

Ah, silly wording perhaps.  I mean that we call irq_set_vcpu_affinity()
so that the interrupt doesn't get deactivated by the GIC driver.  I can
try to reword.

How about:

  An added benefit beyond not having to read and write the timer sysregs
  on every entry and exit is that we no longer have to actively write the
  active state to the physical distributor, because we configured the
  irq for the vtimer to only get a priority drop when handling the
  interrupt in the GIC driver (we called irq_set_vcpu_affinity()), and
  the interrupt stays active after firing on the host.


> > the vtimer interrupt when loading the timer state, so that the interrupt
> > automatically stays active after firing.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  include/kvm/arm_arch_timer.h |   9 +-
> >  virt/kvm/arm/arch_timer.c    | 238 +++++++++++++++++++++++++++----------------
> >  virt/kvm/arm/arm.c           |  19 +++-
> >  virt/kvm/arm/hyp/timer-sr.c  |   8 +-
> >  4 files changed, 174 insertions(+), 100 deletions(-)
> >
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index 16887c0..8e5ed54 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -31,8 +31,8 @@ struct arch_timer_context {
> >  	/* Timer IRQ */
> >  	struct kvm_irq_level		irq;
> >  
> > -	/* Active IRQ state caching */
> > -	bool				active_cleared_last;
> > +	/* Is the timer state loaded on the hardware timer */
> > +	bool			loaded;
> 
> I think this little guy is pretty crucial to understand the flow, as
> there is now two points where we save/restore the timer:
> vcpu_load/vcpu_put and timer_schedule/timer_unschedule. Both can be
> executed on the blocking path, and this is the predicate to find out if
> there is actually something to do.
> 
> Would you mind adding a small comment to that effect?
> 

I don't mind at all, will add a comment.  How about:

	/*
	 * We have multiple paths which can save/restore the timer state
	 * onto the hardware, so we need some way of keeping track of
	 * where the latest state is.
	 *
	 * loaded == true:  State is loaded on the hardware registers.
	 * loaded == false: State is stored in memory.
	 */

> >  
> >  	/* Virtual offset */
> >  	u64			cntvoff;
> > @@ -80,10 +80,15 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >  
> >  u64 kvm_phys_timer_read(void);
> >  
> > +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
> >  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
> >  
> >  void kvm_timer_init_vhe(void);
> >  
> >  #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.vtimer)
> >  #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.ptimer)
> > +
> > +void enable_el1_phys_timer_access(void);
> > +void disable_el1_phys_timer_access(void);
> > +
> >  #endif
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 4275f8f..70110ea 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -46,10 +46,9 @@ static const struct kvm_irq_level default_vtimer_irq = {
> >  	.level	= 1,
> >  };
> >  
> > -void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> > -{
> > -	vcpu_vtimer(vcpu)->active_cleared_last = false;
> > -}
> > +static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
> > +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> > +				 struct arch_timer_context *timer_ctx);
> >  
> >  u64 kvm_phys_timer_read(void)
> >  {
> > @@ -74,17 +73,37 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
> >  		cancel_work_sync(work);
> >  }
> >  
> > -static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > +static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)
> >  {
> > -	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	/*
> > -	 * We disable the timer in the world switch and let it be
> > -	 * handled by kvm_timer_sync_hwstate(). Getting a timer
> > -	 * interrupt at this point is a sure sign of some major
> > -	 * breakage.
> > +	 * To prevent continuously exiting from the guest, we mask the
> > +	 * physical interrupt when the virtual level is high, such that the
> > +	 * guest can make forward progress.  Once we detect the output level
> > +	 * being deasserted, we unmask the interrupt again so that we exit
> > +	 * from the guest when the timer fires.
> 
> Maybe an additional comment indicating that this only makes sense when
> we don't have an in-kernel GIC? I know this wasn't in the original code,
> but I started asking myself all kind of questions until I realised what
> this was for...
> 

Yes, I'll clarify.  How about:

	/*
	 * When using a userspace irqchip with the architected timers,
	 * we disable...
	 [...]
	 * ...we mask the physical interrupt by disabling it on the host
	 * interrupt controller when the...

> >  	 */
> > -	pr_warn("Unexpected interrupt %d on vcpu %p\n", irq, vcpu);
> > +	if (vtimer->irq.level)
> > +		disable_percpu_irq(host_vtimer_irq);
> > +	else
> > +		enable_percpu_irq(host_vtimer_irq, 0);
> > +}
> > +
> > +static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > +{
> > +	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (!vtimer->irq.level) {
> > +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > +		if (kvm_timer_irq_can_fire(vtimer))
> > +			kvm_timer_update_irq(vcpu, true, vtimer);
> > +	}
> > +
> > +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > +		kvm_vtimer_update_mask_user(vcpu);
> > +
> >  	return IRQ_HANDLED;
> >  }
> >  
> > @@ -220,7 +239,6 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> >  {
> >  	int ret;
> >  
> > -	timer_ctx->active_cleared_last = false;
> >  	timer_ctx->irq.level = new_level;
> >  	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
> >  				   timer_ctx->irq.level);
> > @@ -276,10 +294,16 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> >  	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> > -static void timer_save_state(struct kvm_vcpu *vcpu)
> > +static void vtimer_save_state(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	unsigned long flags;
> > +
> > +	local_irq_save(flags);
> 
> Is that to avoid racing against the timer when doing a
> vcpu_put/timer/schedule?
> 

Depends on where it's called from.  When called from kvm_timer_schedule,
this is because we need to know the state of the timer, so we know when
to schedule the timer in the future, which is only done in
kvm_timer_schedule (not kvm_timer_vcpu_put).  When called from
kvm_timer_vcpu_put, it is to save the state so that we can preserve it.

> > +
> > +	if (!vtimer->loaded)
> > +		goto out;
> >  
> >  	if (timer->enabled) {
> >  		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > @@ -288,6 +312,10 @@ static void timer_save_state(struct kvm_vcpu *vcpu)
> >  
> >  	/* Disable the virtual timer */
> >  	write_sysreg_el0(0, cntv_ctl);
> > +
> > +	vtimer->loaded = false;
> > +out:
> > +	local_irq_restore(flags);
> >  }
> >  
> >  /*
> > @@ -303,6 +331,8 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  
> >  	BUG_ON(bg_timer_is_armed(timer));
> >  
> > +	vtimer_save_state(vcpu);
> > +
> >  	/*
> >  	 * No need to schedule a background timer if any guest timer has
> >  	 * already expired, because kvm_vcpu_block will return before putting
> > @@ -326,16 +356,26 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
> >  }
> >  
> > -static void timer_restore_state(struct kvm_vcpu *vcpu)
> > +static void vtimer_restore_state(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	unsigned long flags;
> > +
> > +	local_irq_save(flags);
> > +
> > +	if (vtimer->loaded)
> > +		goto out;
> >  
> >  	if (timer->enabled) {
> >  		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> >  		isb();
> >  		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> >  	}
> > +
> > +	vtimer->loaded = true;
> > +out:
> > +	local_irq_restore(flags);
> >  }
> >  
> >  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> > @@ -344,6 +384,8 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> >  
> >  	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> >  	timer->armed = false;
> > +
> > +	vtimer_restore_state(vcpu);
> >  }
> >  
> >  static void set_cntvoff(u64 cntvoff)
> > @@ -353,61 +395,56 @@ static void set_cntvoff(u64 cntvoff)
> >  	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
> >  }
> >  
> > -static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> > +static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  	bool phys_active;
> >  	int ret;
> >  
> > -	/*
> > -	* If we enter the guest with the virtual input level to the VGIC
> > -	* asserted, then we have already told the VGIC what we need to, and
> > -	* we don't need to exit from the guest until the guest deactivates
> > -	* the already injected interrupt, so therefore we should set the
> > -	* hardware active state to prevent unnecessary exits from the guest.
> > -	*
> > -	* Also, if we enter the guest with the virtual timer interrupt active,
> > -	* then it must be active on the physical distributor, because we set
> > -	* the HW bit and the guest must be able to deactivate the virtual and
> > -	* physical interrupt at the same time.
> > -	*
> > -	* Conversely, if the virtual input level is deasserted and the virtual
> > -	* interrupt is not active, then always clear the hardware active state
> > -	* to ensure that hardware interrupts from the timer triggers a guest
> > -	* exit.
> > -	*/
> > -	phys_active = vtimer->irq.level ||
> > -			kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
> > -
> > -	/*
> > -	 * We want to avoid hitting the (re)distributor as much as
> > -	 * possible, as this is a potentially expensive MMIO access
> > -	 * (not to mention locks in the irq layer), and a solution for
> > -	 * this is to cache the "active" state in memory.
> > -	 *
> > -	 * Things to consider: we cannot cache an "active set" state,
> > -	 * because the HW can change this behind our back (it becomes
> > -	 * "clear" in the HW). We must then restrict the caching to
> > -	 * the "clear" state.
> > -	 *
> > -	 * The cache is invalidated on:
> > -	 * - vcpu put, indicating that the HW cannot be trusted to be
> > -	 *   in a sane state on the next vcpu load,
> > -	 * - any change in the interrupt state
> > -	 *
> > -	 * Usage conditions:
> > -	 * - cached value is "active clear"
> > -	 * - value to be programmed is "active clear"
> > -	 */
> > -	if (vtimer->active_cleared_last && !phys_active)
> > -		return;
> > -
> > +	if (vtimer->irq.level || kvm_vgic_map_is_active(vcpu, vtimer->irq.irq))
> > +		phys_active = true;
> > +	else
> > +		phys_active = false;
> 
> nit: this can be written as:
> 
>      phys_active = (vtimer->irq.level ||
>      		    kvm_vgic_map_is_active(vcpu, vtimer->irq.irq));
> 
> Not that it matters in the slightest...
> 

I don't mind changing it.

> >  	ret = irq_set_irqchip_state(host_vtimer_irq,
> >  				    IRQCHIP_STATE_ACTIVE,
> >  				    phys_active);
> >  	WARN_ON(ret);
> > +}
> > +
> > +static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_vtimer_update_mask_user(vcpu);
> > +}
> > +
> > +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (unlikely(!timer->enabled))
> > +		return;
> > +
> > +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > +		kvm_timer_vcpu_load_user(vcpu);
> > +	else
> > +		kvm_timer_vcpu_load_vgic(vcpu);
> >  
> > -	vtimer->active_cleared_last = !phys_active;
> > +	set_cntvoff(vtimer->cntvoff);
> > +
> > +	/*
> > +	 * If we armed a soft timer and potentially queued work, we have to
> > +	 * cancel this, but cannot do it here, because canceling work can
> > +	 * sleep and we can be in the middle of a preempt notifier call.
> > +	 * Instead, when the timer has been armed, we know the return path
> > +	 * from kvm_vcpu_block will call kvm_timer_unschedule, so we can defer
> > +	 * restoring the state and canceling any soft timers and work items
> > +	 * until then.
> > +	 */
> > +	if (!bg_timer_is_armed(timer))
> > +		vtimer_restore_state(vcpu);
> > +
> > +	if (has_vhe())
> > +		disable_el1_phys_timer_access();
> >  }
> >  
> >  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> > @@ -427,23 +464,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> >  	       ptimer->irq.level != plevel;
> >  }
> >  
> > -static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> > -{
> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > -
> > -	/*
> > -	 * To prevent continuously exiting from the guest, we mask the
> > -	 * physical interrupt such that the guest can make forward progress.
> > -	 * Once we detect the output level being deasserted, we unmask the
> > -	 * interrupt again so that we exit from the guest when the timer
> > -	 * fires.
> > -	*/
> > -	if (vtimer->irq.level)
> > -		disable_percpu_irq(host_vtimer_irq);
> > -	else
> > -		enable_percpu_irq(host_vtimer_irq, 0);
> > -}
> > -
> >  /**
> >   * kvm_timer_flush_hwstate - prepare timers before running the vcpu
> >   * @vcpu: The vcpu pointer
> > @@ -456,23 +476,55 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> >  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> >  	if (unlikely(!timer->enabled))
> >  		return;
> >  
> > -	kvm_timer_update_state(vcpu);
> > +	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> > +		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> >  
> >  	/* Set the background timer for the physical timer emulation. */
> >  	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> > +}
> >  
> > -	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > -		kvm_timer_flush_hwstate_user(vcpu);
> > -	else
> > -		kvm_timer_flush_hwstate_vgic(vcpu);
> > +void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  
> > -	set_cntvoff(vtimer->cntvoff);
> > -	timer_restore_state(vcpu);
> > +	if (unlikely(!timer->enabled))
> > +		return;
> > +
> > +	if (has_vhe())
> > +		enable_el1_phys_timer_access();
> > +
> > +	vtimer_save_state(vcpu);
> > +
> > +	set_cntvoff(0);
> 
> Can this be moved into vtimer_save_state()?

It can, I just kept it out of there, because it's technically not saving
any state, but managing some other piece of host hardware, which only
needs to get reset when doing kvm_timer_vcpu_put, ...

> And thinking of it, why
> don't we reset cntvoff in kvm_timer_schedule() as well? 
> 

... because kvm_timer_vcpu_put will get called any time we're going to
run userspace or some other kernel thread, even after we've gotten
kvm_timer_schedule, and that's what made the most semantic sense to me;
here we need to make sure that userspace which accesses the virtual
counter sees a zero offset to the physical counter.

I can put a comment in kvm_timer_vcpu_put, or move it into
vtimer_save_state, with a slight preference to the first option.  What
do you think?

> > +}
> > +
> > +static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
> > +		kvm_vtimer_update_mask_user(vcpu);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * If the guest disabled the timer without acking the interrupt, then
> > +	 * we must make sure the physical and virtual active states are in
> > +	 * sync by deactivating the physical interrupt, because otherwise we
> > +	 * wouldn't see the next timer interrupt in the host.
> > +	 */
> > +	if (!kvm_vgic_map_is_active(vcpu, vtimer->irq.irq)) {
> > +		int ret;
> > +		ret = irq_set_irqchip_state(host_vtimer_irq,
> > +					    IRQCHIP_STATE_ACTIVE,
> > +					    false);
> > +		WARN_ON(ret);
> > +	}
> >  }
> >  
> >  /**
> > @@ -485,6 +537,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	/*
> >  	 * This is to cancel the background timer for the physical timer
> > @@ -492,14 +545,19 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  	 */
> >  	soft_timer_cancel(&timer->phys_timer, NULL);
> >  
> > -	timer_save_state(vcpu);
> > -	set_cntvoff(0);
> > -
> >  	/*
> > -	 * The guest could have modified the timer registers or the timer
> > -	 * could have expired, update the timer state.
> > +	 * If we entered the guest with the vtimer output asserted we have to
> > +	 * check if the guest has modified the timer so that we should lower
> > +	 * the line at this point.
> >  	 */
> > -	kvm_timer_update_state(vcpu);
> > +	if (vtimer->irq.level) {
> > +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > +		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> > +		if (!kvm_timer_should_fire(vtimer)) {
> > +			kvm_timer_update_irq(vcpu, false, vtimer);
> > +			unmask_vtimer_irq(vcpu);
> > +		}
> > +	}
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 27db222..132d39a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -354,18 +354,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
> >  
> >  	kvm_arm_set_running_vcpu(vcpu);
> > -
> >  	kvm_vgic_load(vcpu);
> > +	kvm_timer_vcpu_load(vcpu);
> >  }
> >  
> >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  {
> > +	kvm_timer_vcpu_put(vcpu);
> >  	kvm_vgic_put(vcpu);
> >  
> >  	vcpu->cpu = -1;
> >  
> >  	kvm_arm_set_running_vcpu(NULL);
> > -	kvm_timer_vcpu_put(vcpu);
> >  }
> >  
> >  static void vcpu_power_off(struct kvm_vcpu *vcpu)
> > @@ -710,16 +710,27 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_arm_clear_debug(vcpu);
> >  
> >  		/*
> > -		 * We must sync the PMU and timer state before the vgic state so
> > +		 * We must sync the PMU state before the vgic state so
> >  		 * that the vgic can properly sample the updated state of the
> >  		 * interrupt line.
> >  		 */
> >  		kvm_pmu_sync_hwstate(vcpu);
> > -		kvm_timer_sync_hwstate(vcpu);
> >  
> > +		/*
> > +		 * Sync the vgic state before syncing the timer state because
> > +		 * the timer code needs to know if the virtual timer
> > +		 * interrupts are active.
> > +		 */
> >  		kvm_vgic_sync_hwstate(vcpu);
> >  
> >  		/*
> > +		 * Sync the timer hardware state before enabling interrupts as
> > +		 * we don't want vtimer interrupts to race with syncing the
> > +		 * timer virtual interrupt state.
> > +		 */
> > +		kvm_timer_sync_hwstate(vcpu);
> > +
> > +		/*
> >  		 * We may have taken a host interrupt in HYP mode (ie
> >  		 * while executing the guest). This interrupt is still
> >  		 * pending, as we haven't serviced it yet!
> > diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> > index a6c3b10..f398616 100644
> > --- a/virt/kvm/arm/hyp/timer-sr.c
> > +++ b/virt/kvm/arm/hyp/timer-sr.c
> > @@ -27,7 +27,7 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> >  	write_sysreg(cntvoff, cntvoff_el2);
> >  }
> >  
> > -void __hyp_text enable_phys_timer(void)
> > +void __hyp_text enable_el1_phys_timer_access(void)
> >  {
> >  	u64 val;
> >  
> > @@ -37,7 +37,7 @@ void __hyp_text enable_phys_timer(void)
> >  	write_sysreg(val, cnthctl_el2);
> >  }
> >  
> > -void __hyp_text disable_phys_timer(void)
> > +void __hyp_text disable_el1_phys_timer_access(void)
> >  {
> >  	u64 val;
> >  
> > @@ -58,11 +58,11 @@ void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> >  	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
> >  	 */
> >  	if (!has_vhe())
> > -		enable_phys_timer();
> > +		enable_el1_phys_timer_access();
> >  }
> >  
> >  void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
> >  {
> >  	if (!has_vhe())
> > -		disable_phys_timer();
> > +		disable_el1_phys_timer_access();
> >  }
> 
> It'd be nice to move this renaming to the patch that introduce these two
> functions.

Ah, yes, absolutely.  Patch splitting madness.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit
@ 2017-10-19  8:15       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  8:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 10, 2017 at 09:47:33AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:01 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> > We don't need to save and restore the hardware timer state and examine
> > if it generates interrupts on on every entry/exit to the guest.  The
> > timer hardware is perfectly capable of telling us when it has expired
> > by signaling interrupts.
> >
> > When taking a vtimer interrupt in the host, we don't want to mess with
> > the timer configuration, we just want to forward the physical interrupt
> > to the guest as a virtual interrupt.  We can use the split priority drop
> > and deactivate feature of the GIC to do this, which leaves an EOI'ed
> > interrupt active on the physical distributor, making sure we don't keep
> > taking timer interrupts which would prevent the guest from running.  We
> > can then forward the physical interrupt to the VM using the HW bit in
> > the LR of the GIC VE, like we do already, which lets the guest directly
> 
> VE?
> 

Virtualization Extensions.  I can use GIC hardware virtualization
support or VGIC instead.

> > deactivate both the physical and virtual timer simultaneously, allowing
> > the timer hardware to exit the VM and generate a new physical interrupt
> > when the timer output is again asserted later on.
> >
> > We do need to capture this state when migrating VCPUs between physical
> > CPUs, however, which we use the vcpu put/load functions for, which are
> > called through preempt notifiers whenever the thread is scheduled away
> > from the CPU or called directly if we return from the ioctl to
> > userspace.
> >
> > One caveat is that we cannot restore the timer state during
> > kvm_timer_vcpu_load, because the flow of sleeping a VCPU is:
> >
> >   1. kvm_vcpu_block
> >   2. kvm_timer_schedule
> >   3. schedule
> >   4. kvm_timer_vcpu_put (preempt notifier)
> >   5. schedule (vcpu thread gets scheduled back)
> >   6. kvm_timer_vcpu_load
> >         <---- We restore the hardware state here, but the bg_timer
> > 	      hrtimer may have scheduled a work function that also
> > 	      changes the timer state here.
> >   7. kvm_timer_unschedule
> >         <---- We can restore the state here instead
> >
> > So, while we do need to restore the timer state in step (6) in all other
> > cases than when we called kvm_vcpu_block(), we have to defer the restore
> > to step (7) when coming back after kvm_vcpu_block().  Note that we
> > cannot simply call cancel_work_sync() in step (6), because vcpu_load can
> > be called from a preempt notifier.
> >
> > An added benefit beyond not having to read and write the timer sysregs
> > on every entry and exit is that we no longer have to actively write the
> > active state to the physical distributor, because we set the affinity of
> 
> I don't understand this thing about the affinity of the timer. It is a
> PPI, so it cannot go anywhere else.
> 

Ah, silly wording perhaps.  I mean that we call irq_set_vcpu_affinity()
so that the interrupt doesn't get deactivated by the GIC driver.  I can
try to reword.

How about:

  An added benefit beyond not having to read and write the timer sysregs
  on every entry and exit is that we no longer have to actively write the
  active state to the physical distributor, because we configured the
  irq for the vtimer to only get a priority drop when handling the
  interrupt in the GIC driver (we called irq_set_vcpu_affinity()), and
  the interrupt stays active after firing on the host.


> > the vtimer interrupt when loading the timer state, so that the interrupt
> > automatically stays active after firing.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  include/kvm/arm_arch_timer.h |   9 +-
> >  virt/kvm/arm/arch_timer.c    | 238 +++++++++++++++++++++++++++----------------
> >  virt/kvm/arm/arm.c           |  19 +++-
> >  virt/kvm/arm/hyp/timer-sr.c  |   8 +-
> >  4 files changed, 174 insertions(+), 100 deletions(-)
> >
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index 16887c0..8e5ed54 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -31,8 +31,8 @@ struct arch_timer_context {
> >  	/* Timer IRQ */
> >  	struct kvm_irq_level		irq;
> >  
> > -	/* Active IRQ state caching */
> > -	bool				active_cleared_last;
> > +	/* Is the timer state loaded on the hardware timer */
> > +	bool			loaded;
> 
> I think this little guy is pretty crucial to understand the flow, as
> there is now two points where we save/restore the timer:
> vcpu_load/vcpu_put and timer_schedule/timer_unschedule. Both can be
> executed on the blocking path, and this is the predicate to find out if
> there is actually something to do.
> 
> Would you mind adding a small comment to that effect?
> 

I don't mind at all, will add a comment.  How about:

	/*
	 * We have multiple paths which can save/restore the timer state
	 * onto the hardware, so we need some way of keeping track of
	 * where the latest state is.
	 *
	 * loaded == true:  State is loaded on the hardware registers.
	 * loaded == false: State is stored in memory.
	 */

> >  
> >  	/* Virtual offset */
> >  	u64			cntvoff;
> > @@ -80,10 +80,15 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >  
> >  u64 kvm_phys_timer_read(void);
> >  
> > +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu);
> >  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
> >  
> >  void kvm_timer_init_vhe(void);
> >  
> >  #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.vtimer)
> >  #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.ptimer)
> > +
> > +void enable_el1_phys_timer_access(void);
> > +void disable_el1_phys_timer_access(void);
> > +
> >  #endif
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 4275f8f..70110ea 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -46,10 +46,9 @@ static const struct kvm_irq_level default_vtimer_irq = {
> >  	.level	= 1,
> >  };
> >  
> > -void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> > -{
> > -	vcpu_vtimer(vcpu)->active_cleared_last = false;
> > -}
> > +static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
> > +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> > +				 struct arch_timer_context *timer_ctx);
> >  
> >  u64 kvm_phys_timer_read(void)
> >  {
> > @@ -74,17 +73,37 @@ static void soft_timer_cancel(struct hrtimer *hrt, struct work_struct *work)
> >  		cancel_work_sync(work);
> >  }
> >  
> > -static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > +static void kvm_vtimer_update_mask_user(struct kvm_vcpu *vcpu)
> >  {
> > -	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	/*
> > -	 * We disable the timer in the world switch and let it be
> > -	 * handled by kvm_timer_sync_hwstate(). Getting a timer
> > -	 * interrupt at this point is a sure sign of some major
> > -	 * breakage.
> > +	 * To prevent continuously exiting from the guest, we mask the
> > +	 * physical interrupt when the virtual level is high, such that the
> > +	 * guest can make forward progress.  Once we detect the output level
> > +	 * being deasserted, we unmask the interrupt again so that we exit
> > +	 * from the guest when the timer fires.
> 
> Maybe an additional comment indicating that this only makes sense when
> we don't have an in-kernel GIC? I know this wasn't in the original code,
> but I started asking myself all kind of questions until I realised what
> this was for...
> 

Yes, I'll clarify.  How about:

	/*
	 * When using a userspace irqchip with the architected timers,
	 * we disable...
	 [...]
	 * ...we mask the physical interrupt by disabling it on the host
	 * interrupt controller when the...

> >  	 */
> > -	pr_warn("Unexpected interrupt %d on vcpu %p\n", irq, vcpu);
> > +	if (vtimer->irq.level)
> > +		disable_percpu_irq(host_vtimer_irq);
> > +	else
> > +		enable_percpu_irq(host_vtimer_irq, 0);
> > +}
> > +
> > +static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > +{
> > +	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (!vtimer->irq.level) {
> > +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > +		if (kvm_timer_irq_can_fire(vtimer))
> > +			kvm_timer_update_irq(vcpu, true, vtimer);
> > +	}
> > +
> > +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > +		kvm_vtimer_update_mask_user(vcpu);
> > +
> >  	return IRQ_HANDLED;
> >  }
> >  
> > @@ -220,7 +239,6 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> >  {
> >  	int ret;
> >  
> > -	timer_ctx->active_cleared_last = false;
> >  	timer_ctx->irq.level = new_level;
> >  	trace_kvm_timer_update_irq(vcpu->vcpu_id, timer_ctx->irq.irq,
> >  				   timer_ctx->irq.level);
> > @@ -276,10 +294,16 @@ static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> >  	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> >  }
> >  
> > -static void timer_save_state(struct kvm_vcpu *vcpu)
> > +static void vtimer_save_state(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	unsigned long flags;
> > +
> > +	local_irq_save(flags);
> 
> Is that to avoid racing against the timer when doing a
> vcpu_put/timer/schedule?
> 

Depends on where it's called from.  When called from kvm_timer_schedule,
this is because we need to know the state of the timer, so we know when
to schedule the timer in the future, which is only done in
kvm_timer_schedule (not kvm_timer_vcpu_put).  When called from
kvm_timer_vcpu_put, it is to save the state so that we can preserve it.

> > +
> > +	if (!vtimer->loaded)
> > +		goto out;
> >  
> >  	if (timer->enabled) {
> >  		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > @@ -288,6 +312,10 @@ static void timer_save_state(struct kvm_vcpu *vcpu)
> >  
> >  	/* Disable the virtual timer */
> >  	write_sysreg_el0(0, cntv_ctl);
> > +
> > +	vtimer->loaded = false;
> > +out:
> > +	local_irq_restore(flags);
> >  }
> >  
> >  /*
> > @@ -303,6 +331,8 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  
> >  	BUG_ON(bg_timer_is_armed(timer));
> >  
> > +	vtimer_save_state(vcpu);
> > +
> >  	/*
> >  	 * No need to schedule a background timer if any guest timer has
> >  	 * already expired, because kvm_vcpu_block will return before putting
> > @@ -326,16 +356,26 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  	soft_timer_start(&timer->bg_timer, kvm_timer_earliest_exp(vcpu));
> >  }
> >  
> > -static void timer_restore_state(struct kvm_vcpu *vcpu)
> > +static void vtimer_restore_state(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	unsigned long flags;
> > +
> > +	local_irq_save(flags);
> > +
> > +	if (vtimer->loaded)
> > +		goto out;
> >  
> >  	if (timer->enabled) {
> >  		write_sysreg_el0(vtimer->cnt_cval, cntv_cval);
> >  		isb();
> >  		write_sysreg_el0(vtimer->cnt_ctl, cntv_ctl);
> >  	}
> > +
> > +	vtimer->loaded = true;
> > +out:
> > +	local_irq_restore(flags);
> >  }
> >  
> >  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> > @@ -344,6 +384,8 @@ void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> >  
> >  	soft_timer_cancel(&timer->bg_timer, &timer->expired);
> >  	timer->armed = false;
> > +
> > +	vtimer_restore_state(vcpu);
> >  }
> >  
> >  static void set_cntvoff(u64 cntvoff)
> > @@ -353,61 +395,56 @@ static void set_cntvoff(u64 cntvoff)
> >  	kvm_call_hyp(__kvm_timer_set_cntvoff, low, high);
> >  }
> >  
> > -static void kvm_timer_flush_hwstate_vgic(struct kvm_vcpu *vcpu)
> > +static void kvm_timer_vcpu_load_vgic(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  	bool phys_active;
> >  	int ret;
> >  
> > -	/*
> > -	* If we enter the guest with the virtual input level to the VGIC
> > -	* asserted, then we have already told the VGIC what we need to, and
> > -	* we don't need to exit from the guest until the guest deactivates
> > -	* the already injected interrupt, so therefore we should set the
> > -	* hardware active state to prevent unnecessary exits from the guest.
> > -	*
> > -	* Also, if we enter the guest with the virtual timer interrupt active,
> > -	* then it must be active on the physical distributor, because we set
> > -	* the HW bit and the guest must be able to deactivate the virtual and
> > -	* physical interrupt at the same time.
> > -	*
> > -	* Conversely, if the virtual input level is deasserted and the virtual
> > -	* interrupt is not active, then always clear the hardware active state
> > -	* to ensure that hardware interrupts from the timer triggers a guest
> > -	* exit.
> > -	*/
> > -	phys_active = vtimer->irq.level ||
> > -			kvm_vgic_map_is_active(vcpu, vtimer->irq.irq);
> > -
> > -	/*
> > -	 * We want to avoid hitting the (re)distributor as much as
> > -	 * possible, as this is a potentially expensive MMIO access
> > -	 * (not to mention locks in the irq layer), and a solution for
> > -	 * this is to cache the "active" state in memory.
> > -	 *
> > -	 * Things to consider: we cannot cache an "active set" state,
> > -	 * because the HW can change this behind our back (it becomes
> > -	 * "clear" in the HW). We must then restrict the caching to
> > -	 * the "clear" state.
> > -	 *
> > -	 * The cache is invalidated on:
> > -	 * - vcpu put, indicating that the HW cannot be trusted to be
> > -	 *   in a sane state on the next vcpu load,
> > -	 * - any change in the interrupt state
> > -	 *
> > -	 * Usage conditions:
> > -	 * - cached value is "active clear"
> > -	 * - value to be programmed is "active clear"
> > -	 */
> > -	if (vtimer->active_cleared_last && !phys_active)
> > -		return;
> > -
> > +	if (vtimer->irq.level || kvm_vgic_map_is_active(vcpu, vtimer->irq.irq))
> > +		phys_active = true;
> > +	else
> > +		phys_active = false;
> 
> nit: this can be written as:
> 
>      phys_active = (vtimer->irq.level ||
>      		    kvm_vgic_map_is_active(vcpu, vtimer->irq.irq));
> 
> Not that it matters in the slightest...
> 

I don't mind changing it.

> >  	ret = irq_set_irqchip_state(host_vtimer_irq,
> >  				    IRQCHIP_STATE_ACTIVE,
> >  				    phys_active);
> >  	WARN_ON(ret);
> > +}
> > +
> > +static void kvm_timer_vcpu_load_user(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_vtimer_update_mask_user(vcpu);
> > +}
> > +
> > +void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (unlikely(!timer->enabled))
> > +		return;
> > +
> > +	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > +		kvm_timer_vcpu_load_user(vcpu);
> > +	else
> > +		kvm_timer_vcpu_load_vgic(vcpu);
> >  
> > -	vtimer->active_cleared_last = !phys_active;
> > +	set_cntvoff(vtimer->cntvoff);
> > +
> > +	/*
> > +	 * If we armed a soft timer and potentially queued work, we have to
> > +	 * cancel this, but cannot do it here, because canceling work can
> > +	 * sleep and we can be in the middle of a preempt notifier call.
> > +	 * Instead, when the timer has been armed, we know the return path
> > +	 * from kvm_vcpu_block will call kvm_timer_unschedule, so we can defer
> > +	 * restoring the state and canceling any soft timers and work items
> > +	 * until then.
> > +	 */
> > +	if (!bg_timer_is_armed(timer))
> > +		vtimer_restore_state(vcpu);
> > +
> > +	if (has_vhe())
> > +		disable_el1_phys_timer_access();
> >  }
> >  
> >  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> > @@ -427,23 +464,6 @@ bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> >  	       ptimer->irq.level != plevel;
> >  }
> >  
> > -static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> > -{
> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > -
> > -	/*
> > -	 * To prevent continuously exiting from the guest, we mask the
> > -	 * physical interrupt such that the guest can make forward progress.
> > -	 * Once we detect the output level being deasserted, we unmask the
> > -	 * interrupt again so that we exit from the guest when the timer
> > -	 * fires.
> > -	*/
> > -	if (vtimer->irq.level)
> > -		disable_percpu_irq(host_vtimer_irq);
> > -	else
> > -		enable_percpu_irq(host_vtimer_irq, 0);
> > -}
> > -
> >  /**
> >   * kvm_timer_flush_hwstate - prepare timers before running the vcpu
> >   * @vcpu: The vcpu pointer
> > @@ -456,23 +476,55 @@ static void kvm_timer_flush_hwstate_user(struct kvm_vcpu *vcpu)
> >  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> >  	if (unlikely(!timer->enabled))
> >  		return;
> >  
> > -	kvm_timer_update_state(vcpu);
> > +	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> > +		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> >  
> >  	/* Set the background timer for the physical timer emulation. */
> >  	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> > +}
> >  
> > -	if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
> > -		kvm_timer_flush_hwstate_user(vcpu);
> > -	else
> > -		kvm_timer_flush_hwstate_vgic(vcpu);
> > +void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  
> > -	set_cntvoff(vtimer->cntvoff);
> > -	timer_restore_state(vcpu);
> > +	if (unlikely(!timer->enabled))
> > +		return;
> > +
> > +	if (has_vhe())
> > +		enable_el1_phys_timer_access();
> > +
> > +	vtimer_save_state(vcpu);
> > +
> > +	set_cntvoff(0);
> 
> Can this be moved into vtimer_save_state()?

It can, I just kept it out of there, because it's technically not saving
any state, but managing some other piece of host hardware, which only
needs to get reset when doing kvm_timer_vcpu_put, ...

> And thinking of it, why
> don't we reset cntvoff in kvm_timer_schedule() as well? 
> 

... because kvm_timer_vcpu_put will get called any time we're going to
run userspace or some other kernel thread, even after we've gotten
kvm_timer_schedule, and that's what made the most semantic sense to me;
here we need to make sure that userspace which accesses the virtual
counter sees a zero offset to the physical counter.

I can put a comment in kvm_timer_vcpu_put, or move it into
vtimer_save_state, with a slight preference to the first option.  What
do you think?

> > +}
> > +
> > +static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +
> > +	if (unlikely(!irqchip_in_kernel(vcpu->kvm))) {
> > +		kvm_vtimer_update_mask_user(vcpu);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * If the guest disabled the timer without acking the interrupt, then
> > +	 * we must make sure the physical and virtual active states are in
> > +	 * sync by deactivating the physical interrupt, because otherwise we
> > +	 * wouldn't see the next timer interrupt in the host.
> > +	 */
> > +	if (!kvm_vgic_map_is_active(vcpu, vtimer->irq.irq)) {
> > +		int ret;
> > +		ret = irq_set_irqchip_state(host_vtimer_irq,
> > +					    IRQCHIP_STATE_ACTIVE,
> > +					    false);
> > +		WARN_ON(ret);
> > +	}
> >  }
> >  
> >  /**
> > @@ -485,6 +537,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	/*
> >  	 * This is to cancel the background timer for the physical timer
> > @@ -492,14 +545,19 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  	 */
> >  	soft_timer_cancel(&timer->phys_timer, NULL);
> >  
> > -	timer_save_state(vcpu);
> > -	set_cntvoff(0);
> > -
> >  	/*
> > -	 * The guest could have modified the timer registers or the timer
> > -	 * could have expired, update the timer state.
> > +	 * If we entered the guest with the vtimer output asserted we have to
> > +	 * check if the guest has modified the timer so that we should lower
> > +	 * the line at this point.
> >  	 */
> > -	kvm_timer_update_state(vcpu);
> > +	if (vtimer->irq.level) {
> > +		vtimer->cnt_ctl = read_sysreg_el0(cntv_ctl);
> > +		vtimer->cnt_cval = read_sysreg_el0(cntv_cval);
> > +		if (!kvm_timer_should_fire(vtimer)) {
> > +			kvm_timer_update_irq(vcpu, false, vtimer);
> > +			unmask_vtimer_irq(vcpu);
> > +		}
> > +	}
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 27db222..132d39a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -354,18 +354,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >  	vcpu->arch.host_cpu_context = this_cpu_ptr(kvm_host_cpu_state);
> >  
> >  	kvm_arm_set_running_vcpu(vcpu);
> > -
> >  	kvm_vgic_load(vcpu);
> > +	kvm_timer_vcpu_load(vcpu);
> >  }
> >  
> >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  {
> > +	kvm_timer_vcpu_put(vcpu);
> >  	kvm_vgic_put(vcpu);
> >  
> >  	vcpu->cpu = -1;
> >  
> >  	kvm_arm_set_running_vcpu(NULL);
> > -	kvm_timer_vcpu_put(vcpu);
> >  }
> >  
> >  static void vcpu_power_off(struct kvm_vcpu *vcpu)
> > @@ -710,16 +710,27 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_arm_clear_debug(vcpu);
> >  
> >  		/*
> > -		 * We must sync the PMU and timer state before the vgic state so
> > +		 * We must sync the PMU state before the vgic state so
> >  		 * that the vgic can properly sample the updated state of the
> >  		 * interrupt line.
> >  		 */
> >  		kvm_pmu_sync_hwstate(vcpu);
> > -		kvm_timer_sync_hwstate(vcpu);
> >  
> > +		/*
> > +		 * Sync the vgic state before syncing the timer state because
> > +		 * the timer code needs to know if the virtual timer
> > +		 * interrupts are active.
> > +		 */
> >  		kvm_vgic_sync_hwstate(vcpu);
> >  
> >  		/*
> > +		 * Sync the timer hardware state before enabling interrupts as
> > +		 * we don't want vtimer interrupts to race with syncing the
> > +		 * timer virtual interrupt state.
> > +		 */
> > +		kvm_timer_sync_hwstate(vcpu);
> > +
> > +		/*
> >  		 * We may have taken a host interrupt in HYP mode (ie
> >  		 * while executing the guest). This interrupt is still
> >  		 * pending, as we haven't serviced it yet!
> > diff --git a/virt/kvm/arm/hyp/timer-sr.c b/virt/kvm/arm/hyp/timer-sr.c
> > index a6c3b10..f398616 100644
> > --- a/virt/kvm/arm/hyp/timer-sr.c
> > +++ b/virt/kvm/arm/hyp/timer-sr.c
> > @@ -27,7 +27,7 @@ void __hyp_text __kvm_timer_set_cntvoff(u32 cntvoff_low, u32 cntvoff_high)
> >  	write_sysreg(cntvoff, cntvoff_el2);
> >  }
> >  
> > -void __hyp_text enable_phys_timer(void)
> > +void __hyp_text enable_el1_phys_timer_access(void)
> >  {
> >  	u64 val;
> >  
> > @@ -37,7 +37,7 @@ void __hyp_text enable_phys_timer(void)
> >  	write_sysreg(val, cnthctl_el2);
> >  }
> >  
> > -void __hyp_text disable_phys_timer(void)
> > +void __hyp_text disable_el1_phys_timer_access(void)
> >  {
> >  	u64 val;
> >  
> > @@ -58,11 +58,11 @@ void __hyp_text __timer_disable_traps(struct kvm_vcpu *vcpu)
> >  	 * with HCR_EL2.TGE ==1, which makes those bits have no impact.
> >  	 */
> >  	if (!has_vhe())
> > -		enable_phys_timer();
> > +		enable_el1_phys_timer_access();
> >  }
> >  
> >  void __hyp_text __timer_enable_traps(struct kvm_vcpu *vcpu)
> >  {
> >  	if (!has_vhe())
> > -		disable_phys_timer();
> > +		disable_el1_phys_timer_access();
> >  }
> 
> It'd be nice to move this renaming to the patch that introduce these two
> functions.

Ah, yes, absolutely.  Patch splitting madness.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg
  2017-10-10  9:10     ` Marc Zyngier
@ 2017-10-19  8:32       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  8:32 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Catalin Marinas, Will Deacon, kvmarm, linux-arm-kernel, kvm

On Tue, Oct 10, 2017 at 10:10:27AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:02 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> > Add suport for the physical timer registers in kvm_arm_timer_set_reg and
> > kvm_arm_timer_get_reg so that these functions can be reused to interact
> > with the rest of the system.
> >
> > Note that this paves part of the way for the physical timer state
> > save/restore, but we still need to add those registers to
> > KVM_GET_REG_LIST before we support migrating the physical timer state.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  arch/arm/include/uapi/asm/kvm.h   |  6 ++++++
> >  arch/arm64/include/uapi/asm/kvm.h |  6 ++++++
> >  virt/kvm/arm/arch_timer.c         | 33 +++++++++++++++++++++++++++++++--
> >  3 files changed, 43 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> > index 5db2d4c..665c454 100644
> > --- a/arch/arm/include/uapi/asm/kvm.h
> > +++ b/arch/arm/include/uapi/asm/kvm.h
> > @@ -151,6 +151,12 @@ struct kvm_arch_memory_slot {
> >  	(__ARM_CP15_REG(op1, 0, crm, 0) | KVM_REG_SIZE_U64)
> >  #define ARM_CP15_REG64(...) __ARM_CP15_REG64(__VA_ARGS__)
> >  
> > +/* PL1 Physical Timer Registers */
> > +#define KVM_REG_ARM_PTIMER_CTL		ARM_CP15_REG32(0, 14, 2, 1)
> > +#define KVM_REG_ARM_PTIMER_CNT		ARM_CP15_REG64(0, 14)
> > +#define KVM_REG_ARM_PTIMER_CVAL		ARM_CP15_REG64(2, 14)
> > +
> > +/* Virtual Timer Registers */
> >  #define KVM_REG_ARM_TIMER_CTL		ARM_CP15_REG32(0, 14, 3, 1)
> >  #define KVM_REG_ARM_TIMER_CNT		ARM_CP15_REG64(1, 14)
> >  #define KVM_REG_ARM_TIMER_CVAL		ARM_CP15_REG64(3, 14)
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index 9f3ca24..07be6e2 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -195,6 +195,12 @@ struct kvm_arch_memory_slot {
> >  
> >  #define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_SIZE_U64)
> >  
> > +/* EL1 Physical Timer Registers */
> 
> These are EL0 registers, even if we tend to restrict them to EL1. Even
> the 32bit version is not strictly a PL1 register, since PL1 can delegate
> it to userspace (but the ARMv7 ARM still carries this PL1 thing...).
> 

The latest publicly available ARM ARM also refers to the timer as the
"EL1 Physical Timer", for example, the EL0 register CNTP_CTL_EL0 is
described as "Control register for the EL1 physical timer", so the
associativity in my comment was "(EL1 Physical Timer) Registers", and
not "Physical Timer (EL1 Registers)" :)

How about "EL0 Registers for the EL1 Physical Timer" or "Physical Timer
EL0 Registers" or "EL1 Physical Timer EL0 Registers".  Take your pick...

> > +#define KVM_REG_ARM_PTIMER_CTL		ARM64_SYS_REG(3, 3, 14, 2, 1)
> > +#define KVM_REG_ARM_PTIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 2, 2)
> > +#define KVM_REG_ARM_PTIMER_CNT		ARM64_SYS_REG(3, 3, 14, 0, 1)
> > +
> > +/* EL0 Virtual Timer Registers */
> >  #define KVM_REG_ARM_TIMER_CTL		ARM64_SYS_REG(3, 3, 14, 3, 1)
> >  #define KVM_REG_ARM_TIMER_CNT		ARM64_SYS_REG(3, 3, 14, 3, 2)
> >  #define KVM_REG_ARM_TIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 0, 2)
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 70110ea..d5b632d 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -626,10 +626,11 @@ static void kvm_timer_init_interrupt(void *info)
> >  int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  {
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> >  	switch (regid) {
> >  	case KVM_REG_ARM_TIMER_CTL:
> > -		vtimer->cnt_ctl = value;
> > +		vtimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
> 
> Ah, interesting. Does this change anything to userspace behaviour?
> 

The only effect is that you don't get read-as-written from userspace
behavior, but we don't guarantee that anywhere in the API and the
current QEMU code doesn't rely on it.

It can't have any meaningful effect, because ISTATUS is purely a
function of the remaining state of the timer.

> >  		break;
> >  	case KVM_REG_ARM_TIMER_CNT:
> >  		update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
> > @@ -637,6 +638,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  	case KVM_REG_ARM_TIMER_CVAL:
> >  		vtimer->cnt_cval = value;
> >  		break;
> > +	case KVM_REG_ARM_PTIMER_CTL:
> > +		ptimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
> > +		break;
> > +	case KVM_REG_ARM_PTIMER_CVAL:
> > +		ptimer->cnt_cval = value;
> > +		break;
> > +
> >  	default:
> >  		return -1;
> >  	}
> > @@ -645,17 +653,38 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  	return 0;
> >  }
> >  
> > +static u64 read_timer_ctl(struct arch_timer_context *timer)
> > +{
> > +	/*
> > +	 * Set ISTATUS bit if it's expired.
> > +	 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
> > +	 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
> > +	 * regardless of ENABLE bit for our implementation convenience.
> > +	 */
> > +	if (!kvm_timer_compute_delta(timer))
> > +		return timer->cnt_ctl | ARCH_TIMER_CTRL_IT_STAT;
> > +	else
> > +		return timer->cnt_ctl;
> 
> Can't we end-up with a stale IT_STAT bit here if the timer has been
> snapshoted with an interrupt pending, and then CVAL updated to expire
> later?
> 

Yes, but that's just the nature of doing business with the timer, and no
different from the behavior we had before, where you could have run the
guest, read the cnt_ctl as it was saved in the world-switch, run the
VCPU again which changes cval, and then the bit would be stale.

Do you see us changing that behavior in some worse way here?

> > +}
> > +
> >  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 regid)
> >  {
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	switch (regid) {
> >  	case KVM_REG_ARM_TIMER_CTL:
> > -		return vtimer->cnt_ctl;
> > +		return read_timer_ctl(vtimer);
> >  	case KVM_REG_ARM_TIMER_CNT:
> >  		return kvm_phys_timer_read() - vtimer->cntvoff;
> >  	case KVM_REG_ARM_TIMER_CVAL:
> >  		return vtimer->cnt_cval;
> > +	case KVM_REG_ARM_PTIMER_CTL:
> > +		return read_timer_ctl(ptimer);
> > +	case KVM_REG_ARM_PTIMER_CVAL:
> > +		return ptimer->cnt_cval;
> > +	case KVM_REG_ARM_PTIMER_CNT:
> > +		return kvm_phys_timer_read();
> >  	}
> >  	return (u64)-1;
> >  }
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg
@ 2017-10-19  8:32       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  8:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 10, 2017 at 10:10:27AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:02 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> > Add suport for the physical timer registers in kvm_arm_timer_set_reg and
> > kvm_arm_timer_get_reg so that these functions can be reused to interact
> > with the rest of the system.
> >
> > Note that this paves part of the way for the physical timer state
> > save/restore, but we still need to add those registers to
> > KVM_GET_REG_LIST before we support migrating the physical timer state.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  arch/arm/include/uapi/asm/kvm.h   |  6 ++++++
> >  arch/arm64/include/uapi/asm/kvm.h |  6 ++++++
> >  virt/kvm/arm/arch_timer.c         | 33 +++++++++++++++++++++++++++++++--
> >  3 files changed, 43 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
> > index 5db2d4c..665c454 100644
> > --- a/arch/arm/include/uapi/asm/kvm.h
> > +++ b/arch/arm/include/uapi/asm/kvm.h
> > @@ -151,6 +151,12 @@ struct kvm_arch_memory_slot {
> >  	(__ARM_CP15_REG(op1, 0, crm, 0) | KVM_REG_SIZE_U64)
> >  #define ARM_CP15_REG64(...) __ARM_CP15_REG64(__VA_ARGS__)
> >  
> > +/* PL1 Physical Timer Registers */
> > +#define KVM_REG_ARM_PTIMER_CTL		ARM_CP15_REG32(0, 14, 2, 1)
> > +#define KVM_REG_ARM_PTIMER_CNT		ARM_CP15_REG64(0, 14)
> > +#define KVM_REG_ARM_PTIMER_CVAL		ARM_CP15_REG64(2, 14)
> > +
> > +/* Virtual Timer Registers */
> >  #define KVM_REG_ARM_TIMER_CTL		ARM_CP15_REG32(0, 14, 3, 1)
> >  #define KVM_REG_ARM_TIMER_CNT		ARM_CP15_REG64(1, 14)
> >  #define KVM_REG_ARM_TIMER_CVAL		ARM_CP15_REG64(3, 14)
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index 9f3ca24..07be6e2 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -195,6 +195,12 @@ struct kvm_arch_memory_slot {
> >  
> >  #define ARM64_SYS_REG(...) (__ARM64_SYS_REG(__VA_ARGS__) | KVM_REG_SIZE_U64)
> >  
> > +/* EL1 Physical Timer Registers */
> 
> These are EL0 registers, even if we tend to restrict them to EL1. Even
> the 32bit version is not strictly a PL1 register, since PL1 can delegate
> it to userspace (but the ARMv7 ARM still carries this PL1 thing...).
> 

The latest publicly available ARM ARM also refers to the timer as the
"EL1 Physical Timer", for example, the EL0 register CNTP_CTL_EL0 is
described as "Control register for the EL1 physical timer", so the
associativity in my comment was "(EL1 Physical Timer) Registers", and
not "Physical Timer (EL1 Registers)" :)

How about "EL0 Registers for the EL1 Physical Timer" or "Physical Timer
EL0 Registers" or "EL1 Physical Timer EL0 Registers".  Take your pick...

> > +#define KVM_REG_ARM_PTIMER_CTL		ARM64_SYS_REG(3, 3, 14, 2, 1)
> > +#define KVM_REG_ARM_PTIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 2, 2)
> > +#define KVM_REG_ARM_PTIMER_CNT		ARM64_SYS_REG(3, 3, 14, 0, 1)
> > +
> > +/* EL0 Virtual Timer Registers */
> >  #define KVM_REG_ARM_TIMER_CTL		ARM64_SYS_REG(3, 3, 14, 3, 1)
> >  #define KVM_REG_ARM_TIMER_CNT		ARM64_SYS_REG(3, 3, 14, 3, 2)
> >  #define KVM_REG_ARM_TIMER_CVAL		ARM64_SYS_REG(3, 3, 14, 0, 2)
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 70110ea..d5b632d 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -626,10 +626,11 @@ static void kvm_timer_init_interrupt(void *info)
> >  int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  {
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> >  	switch (regid) {
> >  	case KVM_REG_ARM_TIMER_CTL:
> > -		vtimer->cnt_ctl = value;
> > +		vtimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
> 
> Ah, interesting. Does this change anything to userspace behaviour?
> 

The only effect is that you don't get read-as-written from userspace
behavior, but we don't guarantee that anywhere in the API and the
current QEMU code doesn't rely on it.

It can't have any meaningful effect, because ISTATUS is purely a
function of the remaining state of the timer.

> >  		break;
> >  	case KVM_REG_ARM_TIMER_CNT:
> >  		update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
> > @@ -637,6 +638,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  	case KVM_REG_ARM_TIMER_CVAL:
> >  		vtimer->cnt_cval = value;
> >  		break;
> > +	case KVM_REG_ARM_PTIMER_CTL:
> > +		ptimer->cnt_ctl = value & ~ARCH_TIMER_CTRL_IT_STAT;
> > +		break;
> > +	case KVM_REG_ARM_PTIMER_CVAL:
> > +		ptimer->cnt_cval = value;
> > +		break;
> > +
> >  	default:
> >  		return -1;
> >  	}
> > @@ -645,17 +653,38 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  	return 0;
> >  }
> >  
> > +static u64 read_timer_ctl(struct arch_timer_context *timer)
> > +{
> > +	/*
> > +	 * Set ISTATUS bit if it's expired.
> > +	 * Note that according to ARMv8 ARM Issue A.k, ISTATUS bit is
> > +	 * UNKNOWN when ENABLE bit is 0, so we chose to set ISTATUS bit
> > +	 * regardless of ENABLE bit for our implementation convenience.
> > +	 */
> > +	if (!kvm_timer_compute_delta(timer))
> > +		return timer->cnt_ctl | ARCH_TIMER_CTRL_IT_STAT;
> > +	else
> > +		return timer->cnt_ctl;
> 
> Can't we end-up with a stale IT_STAT bit here if the timer has been
> snapshoted with an interrupt pending, and then CVAL updated to expire
> later?
> 

Yes, but that's just the nature of doing business with the timer, and no
different from the behavior we had before, where you could have run the
guest, read the cnt_ctl as it was saved in the world-switch, run the
VCPU again which changes cval, and then the bit would be stale.

Do you see us changing that behavior in some worse way here?

> > +}
> > +
> >  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *vcpu, u64 regid)
> >  {
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	switch (regid) {
> >  	case KVM_REG_ARM_TIMER_CTL:
> > -		return vtimer->cnt_ctl;
> > +		return read_timer_ctl(vtimer);
> >  	case KVM_REG_ARM_TIMER_CNT:
> >  		return kvm_phys_timer_read() - vtimer->cntvoff;
> >  	case KVM_REG_ARM_TIMER_CVAL:
> >  		return vtimer->cnt_cval;
> > +	case KVM_REG_ARM_PTIMER_CTL:
> > +		return read_timer_ctl(ptimer);
> > +	case KVM_REG_ARM_PTIMER_CVAL:
> > +		return ptimer->cnt_cval;
> > +	case KVM_REG_ARM_PTIMER_CNT:
> > +		return kvm_phys_timer_read();
> >  	}
> >  	return (u64)-1;
> >  }
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v3 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
  2017-10-10  9:45     ` Marc Zyngier
@ 2017-10-19  8:44       ` Christoffer Dall
  -1 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  8:44 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm, Will Deacon, Catalin Marinas

On Tue, Oct 10, 2017 at 10:45:15AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:05 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> > There is no need to schedule and cancel a hrtimer when entering and
> > exiting the guest, because we know when the physical timer is going to
> > fire when the guest programs it, and we can simply program the hrtimer
> > at that point.
> >
> > Now when the register modifications from the guest go through the
> > kvm_arm_timer_set/get_reg functions, which always call
> > kvm_timer_update_state(), we can simply consider the timer state in this
> > function and schedule and cancel the timers as needed.
> >
> > This avoids looking at the physical timer emulation state when entering
> > and exiting the VCPU, allowing for faster servicing of the VM when
> > needed.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/arch_timer.c | 75 ++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 51 insertions(+), 24 deletions(-)
> >
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 1f82c21..aa18a5d 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -199,7 +199,27 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
> >  
> >  static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
> >  {
> > -	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> > +	struct arch_timer_context *ptimer;
> > +	struct arch_timer_cpu *timer;
> > +	struct kvm_vcpu *vcpu;
> > +	u64 ns;
> > +
> > +	timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
> > +	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
> > +	ptimer = vcpu_ptimer(vcpu);
> > +
> > +	/*
> > +	 * Check that the timer has really expired from the guest's
> > +	 * PoV (NTP on the host may have forced it to expire
> > +	 * early). If not ready, schedule for a later time.
> > +	 */
> > +	ns = kvm_timer_compute_delta(ptimer);
> > +	if (unlikely(ns)) {
> > +		hrtimer_forward_now(hrt, ns_to_ktime(ns));
> > +		return HRTIMER_RESTART;
> > +	}
> 
> Don't we already have a similar logic for the background timer (I must
> admit I've lost track of how we changed things in this series)? If so,
> can we make this common code?
> 

I looked at it, but the functions are slightly different.  In the
phys_timer, we just have to calculate the delta and figure out if we
need to restart it.  In the bg_timer, we have to compute the delta of
both the vtimer and the ptimer, figure out the earliest one, and figure
out if we have to restart it.  Therefore, kvm_timer_earliest_exp()
alrady calls kvm_timer_compute_delate() to share code, and the only code
we can really share is:

	if (unlikely(ns)) {
		hrtimer_forward_now(hrt, ns_to_ktime(ns));
		return HRTIMER_RESTART;
	}

The following code also cannot really be shared because in one case we
have to schedule work (the bg timer) and in the other case we have to
inject an irq (the phys timer).  Ther alternative would be:

static enum hrtimer_restart kvm_soft_timer_expire(struct hrtimer *hrt, bool is_bg_timer)
{
	struct arch_timer_context *ptimer;
	struct arch_timer_cpu *timer;
	struct kvm_vcpu *vcpu;
	u64 ns;

	if (is_bg_timer)
		timer = container_of(hrt, struct arch_timer_cpu, bg_timer);
	else
		timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
	ptimer = vcpu_ptimer(vcpu);

	/*
	 * Check that the timer has really expired from the guest's
	 * PoV (NTP on the host may have forced it to expire
	 * early). If we should have slept longer, restart it.
	 */
	if (is_bg_timer)
		ns = kvm_timer_earliest_exp(vcpu);
	else
		ns = kvm_timer_compute_delta(ptimer);
	if (unlikely(ns)) {
		hrtimer_forward_now(hrt, ns_to_ktime(ns));
		return HRTIMER_RESTART;
	}

	if (is_bg_timer)
		schedule_work(&timer->expired);
	else
		kvm_timer_update_irq(vcpu, true, ptimer);
	return HRTIMER_NORESTART;
}

But I prefer just having them separate.


> > +
> > +	kvm_timer_update_irq(vcpu, true, ptimer);
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > @@ -253,24 +273,28 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> >  }
> >  
> >  /* Schedule the background timer for the emulated timer. */
> > -static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> > -			      struct arch_timer_context *timer_ctx)
> > +static void phys_timer_emulate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> > -	if (kvm_timer_should_fire(timer_ctx))
> > -		return;
> > -
> > -	if (!kvm_timer_irq_can_fire(timer_ctx))
> > +	/*
> > +	 * If the timer can fire now we have just raised the IRQ line and we
> > +	 * don't need to have a soft timer scheduled for the future.  If the
> > +	 * timer cannot fire at all, then we also don't need a soft timer.
> > +	 */
> > +	if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
> > +		soft_timer_cancel(&timer->phys_timer, NULL);
> >  		return;
> > +	}
> >  
> > -	/*  The timer has not yet expired, schedule a background timer */
> > -	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> > +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(ptimer));
> >  }
> >  
> >  /*
> > - * Check if there was a change in the timer state (should we raise or lower
> > - * the line level to the GIC).
> > + * Check if there was a change in the timer state, so that we should either
> > + * raise or lower the line level to the GIC or schedule a background timer to
> > + * emulate the physical timer.
> >   */
> >  static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >  {
> > @@ -292,6 +316,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >  
> >  	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> >  		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> > +
> > +	phys_timer_emulate(vcpu);
> >  }
> >  
> >  static void vtimer_save_state(struct kvm_vcpu *vcpu)
> > @@ -445,6 +471,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
> >  
> >  	if (has_vhe())
> >  		disable_el1_phys_timer_access();
> > +
> > +	/* Set the background timer for the physical timer emulation. */
> > +	phys_timer_emulate(vcpu);
> >  }
> >  
> >  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> > @@ -480,12 +509,6 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  
> >  	if (unlikely(!timer->enabled))
> >  		return;
> > -
> > -	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> > -		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> > -
> > -	/* Set the background timer for the physical timer emulation. */
> > -	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> >  }
> >  
> >  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> > @@ -500,6 +523,17 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> >  
> >  	vtimer_save_state(vcpu);
> >  
> > +	/*
> > +	 * Cancel the physical timer emulation, because the only case where we
> > +	 * need it after a vcpu_put is in the context of a sleeping VCPU, and
> > +	 * in that case we already factor in the deadline for the physical
> > +	 * timer when scheduling the bg_timer.
> > +	 *
> > +	 * In any case, we re-schedule the hrtimer for the physical timer when
> > +	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
> > +	 */
> > +	soft_timer_cancel(&timer->phys_timer, NULL);
> > +
> >  	set_cntvoff(0);
> >  }
> >  
> > @@ -536,16 +570,9 @@ static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
> >   */
> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  {
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	/*
> > -	 * This is to cancel the background timer for the physical timer
> > -	 * emulation if it is set.
> > -	 */
> > -	soft_timer_cancel(&timer->phys_timer, NULL);
> > -
> > -	/*
> >  	 * If we entered the guest with the vtimer output asserted we have to
> >  	 * check if the guest has modified the timer so that we should lower
> >  	 * the line at this point.
> 
> Otherwise:
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> 
Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v3 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
@ 2017-10-19  8:44       ` Christoffer Dall
  0 siblings, 0 replies; 110+ messages in thread
From: Christoffer Dall @ 2017-10-19  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 10, 2017 at 10:45:15AM +0100, Marc Zyngier wrote:
> On Sat, Sep 23 2017 at  2:42:05 am BST, Christoffer Dall <cdall@linaro.org> wrote:
> > There is no need to schedule and cancel a hrtimer when entering and
> > exiting the guest, because we know when the physical timer is going to
> > fire when the guest programs it, and we can simply program the hrtimer
> > at that point.
> >
> > Now when the register modifications from the guest go through the
> > kvm_arm_timer_set/get_reg functions, which always call
> > kvm_timer_update_state(), we can simply consider the timer state in this
> > function and schedule and cancel the timers as needed.
> >
> > This avoids looking at the physical timer emulation state when entering
> > and exiting the VCPU, allowing for faster servicing of the VM when
> > needed.
> >
> > Signed-off-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  virt/kvm/arm/arch_timer.c | 75 ++++++++++++++++++++++++++++++++---------------
> >  1 file changed, 51 insertions(+), 24 deletions(-)
> >
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 1f82c21..aa18a5d 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -199,7 +199,27 @@ static enum hrtimer_restart kvm_bg_timer_expire(struct hrtimer *hrt)
> >  
> >  static enum hrtimer_restart kvm_phys_timer_expire(struct hrtimer *hrt)
> >  {
> > -	WARN(1, "Timer only used to ensure guest exit - unexpected event.");
> > +	struct arch_timer_context *ptimer;
> > +	struct arch_timer_cpu *timer;
> > +	struct kvm_vcpu *vcpu;
> > +	u64 ns;
> > +
> > +	timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
> > +	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
> > +	ptimer = vcpu_ptimer(vcpu);
> > +
> > +	/*
> > +	 * Check that the timer has really expired from the guest's
> > +	 * PoV (NTP on the host may have forced it to expire
> > +	 * early). If not ready, schedule for a later time.
> > +	 */
> > +	ns = kvm_timer_compute_delta(ptimer);
> > +	if (unlikely(ns)) {
> > +		hrtimer_forward_now(hrt, ns_to_ktime(ns));
> > +		return HRTIMER_RESTART;
> > +	}
> 
> Don't we already have a similar logic for the background timer (I must
> admit I've lost track of how we changed things in this series)? If so,
> can we make this common code?
> 

I looked at it, but the functions are slightly different.  In the
phys_timer, we just have to calculate the delta and figure out if we
need to restart it.  In the bg_timer, we have to compute the delta of
both the vtimer and the ptimer, figure out the earliest one, and figure
out if we have to restart it.  Therefore, kvm_timer_earliest_exp()
alrady calls kvm_timer_compute_delate() to share code, and the only code
we can really share is:

	if (unlikely(ns)) {
		hrtimer_forward_now(hrt, ns_to_ktime(ns));
		return HRTIMER_RESTART;
	}

The following code also cannot really be shared because in one case we
have to schedule work (the bg timer) and in the other case we have to
inject an irq (the phys timer).  Ther alternative would be:

static enum hrtimer_restart kvm_soft_timer_expire(struct hrtimer *hrt, bool is_bg_timer)
{
	struct arch_timer_context *ptimer;
	struct arch_timer_cpu *timer;
	struct kvm_vcpu *vcpu;
	u64 ns;

	if (is_bg_timer)
		timer = container_of(hrt, struct arch_timer_cpu, bg_timer);
	else
		timer = container_of(hrt, struct arch_timer_cpu, phys_timer);
	vcpu = container_of(timer, struct kvm_vcpu, arch.timer_cpu);
	ptimer = vcpu_ptimer(vcpu);

	/*
	 * Check that the timer has really expired from the guest's
	 * PoV (NTP on the host may have forced it to expire
	 * early). If we should have slept longer, restart it.
	 */
	if (is_bg_timer)
		ns = kvm_timer_earliest_exp(vcpu);
	else
		ns = kvm_timer_compute_delta(ptimer);
	if (unlikely(ns)) {
		hrtimer_forward_now(hrt, ns_to_ktime(ns));
		return HRTIMER_RESTART;
	}

	if (is_bg_timer)
		schedule_work(&timer->expired);
	else
		kvm_timer_update_irq(vcpu, true, ptimer);
	return HRTIMER_NORESTART;
}

But I prefer just having them separate.


> > +
> > +	kvm_timer_update_irq(vcpu, true, ptimer);
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > @@ -253,24 +273,28 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> >  }
> >  
> >  /* Schedule the background timer for the emulated timer. */
> > -static void phys_timer_emulate(struct kvm_vcpu *vcpu,
> > -			      struct arch_timer_context *timer_ctx)
> > +static void phys_timer_emulate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >  
> > -	if (kvm_timer_should_fire(timer_ctx))
> > -		return;
> > -
> > -	if (!kvm_timer_irq_can_fire(timer_ctx))
> > +	/*
> > +	 * If the timer can fire now we have just raised the IRQ line and we
> > +	 * don't need to have a soft timer scheduled for the future.  If the
> > +	 * timer cannot fire at all, then we also don't need a soft timer.
> > +	 */
> > +	if (kvm_timer_should_fire(ptimer) || !kvm_timer_irq_can_fire(ptimer)) {
> > +		soft_timer_cancel(&timer->phys_timer, NULL);
> >  		return;
> > +	}
> >  
> > -	/*  The timer has not yet expired, schedule a background timer */
> > -	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(timer_ctx));
> > +	soft_timer_start(&timer->phys_timer, kvm_timer_compute_delta(ptimer));
> >  }
> >  
> >  /*
> > - * Check if there was a change in the timer state (should we raise or lower
> > - * the line level to the GIC).
> > + * Check if there was a change in the timer state, so that we should either
> > + * raise or lower the line level to the GIC or schedule a background timer to
> > + * emulate the physical timer.
> >   */
> >  static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >  {
> > @@ -292,6 +316,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >  
> >  	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> >  		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> > +
> > +	phys_timer_emulate(vcpu);
> >  }
> >  
> >  static void vtimer_save_state(struct kvm_vcpu *vcpu)
> > @@ -445,6 +471,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
> >  
> >  	if (has_vhe())
> >  		disable_el1_phys_timer_access();
> > +
> > +	/* Set the background timer for the physical timer emulation. */
> > +	phys_timer_emulate(vcpu);
> >  }
> >  
> >  bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> > @@ -480,12 +509,6 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  
> >  	if (unlikely(!timer->enabled))
> >  		return;
> > -
> > -	if (kvm_timer_should_fire(ptimer) != ptimer->irq.level)
> > -		kvm_timer_update_irq(vcpu, !ptimer->irq.level, ptimer);
> > -
> > -	/* Set the background timer for the physical timer emulation. */
> > -	phys_timer_emulate(vcpu, vcpu_ptimer(vcpu));
> >  }
> >  
> >  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> > @@ -500,6 +523,17 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
> >  
> >  	vtimer_save_state(vcpu);
> >  
> > +	/*
> > +	 * Cancel the physical timer emulation, because the only case where we
> > +	 * need it after a vcpu_put is in the context of a sleeping VCPU, and
> > +	 * in that case we already factor in the deadline for the physical
> > +	 * timer when scheduling the bg_timer.
> > +	 *
> > +	 * In any case, we re-schedule the hrtimer for the physical timer when
> > +	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
> > +	 */
> > +	soft_timer_cancel(&timer->phys_timer, NULL);
> > +
> >  	set_cntvoff(0);
> >  }
> >  
> > @@ -536,16 +570,9 @@ static void unmask_vtimer_irq(struct kvm_vcpu *vcpu)
> >   */
> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  {
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> >  
> >  	/*
> > -	 * This is to cancel the background timer for the physical timer
> > -	 * emulation if it is set.
> > -	 */
> > -	soft_timer_cancel(&timer->phys_timer, NULL);
> > -
> > -	/*
> >  	 * If we entered the guest with the vtimer output asserted we have to
> >  	 * check if the guest has modified the timer so that we should lower
> >  	 * the line at this point.
> 
> Otherwise:
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> 
Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2017-10-19  8:44 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-23  0:41 [PATCH v3 00/20] KVM: arm/arm64: Optimize arch timer register handling Christoffer Dall
2017-09-23  0:41 ` Christoffer Dall
2017-09-23  0:41 ` [PATCH v3 01/20] irqchip/gic: Deal with broken firmware exposing only 4kB of GICv2 CPU interface Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-09-23  0:41 ` [PATCH v3 02/20] arm64: Use physical counter for in-kernel reads Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 16:10   ` Marc Zyngier
2017-10-09 16:10     ` Marc Zyngier
2017-10-17 15:33   ` Will Deacon
2017-10-17 15:33     ` Will Deacon
2017-10-18 10:00     ` Christoffer Dall
2017-10-18 10:00       ` Christoffer Dall
2017-09-23  0:41 ` [PATCH v3 03/20] arm64: Use the physical counter when available for read_cycles Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 16:21   ` Marc Zyngier
2017-10-09 16:21     ` Marc Zyngier
2017-10-18 11:34     ` Christoffer Dall
2017-10-18 11:34       ` Christoffer Dall
2017-10-18 15:52       ` Marc Zyngier
2017-10-18 15:52         ` Marc Zyngier
2017-09-23  0:41 ` [PATCH v3 04/20] KVM: arm/arm64: Guard kvm_vgic_map_is_active against !vgic_initialized Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 16:22   ` Marc Zyngier
2017-10-09 16:22     ` Marc Zyngier
2017-09-23  0:41 ` [PATCH v3 05/20] KVM: arm/arm64: Support calling vgic_update_irq_pending from irq context Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 16:37   ` Marc Zyngier
2017-10-09 16:37     ` Marc Zyngier
2017-10-18 11:54     ` Christoffer Dall
2017-10-18 11:54       ` Christoffer Dall
2017-09-23  0:41 ` [PATCH v3 06/20] KVM: arm/arm64: Check that system supports split eoi/deactivate Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 16:47   ` Marc Zyngier
2017-10-09 16:47     ` Marc Zyngier
2017-10-18 13:41     ` Christoffer Dall
2017-10-18 13:41       ` Christoffer Dall
2017-10-18 16:03       ` Marc Zyngier
2017-10-18 16:03         ` Marc Zyngier
2017-10-18 19:16         ` Christoffer Dall
2017-10-18 19:16           ` Christoffer Dall
2017-09-23  0:41 ` [PATCH v3 07/20] KVM: arm/arm64: Make timer_arm and timer_disarm helpers more generic Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 17:05   ` Marc Zyngier
2017-10-09 17:05     ` Marc Zyngier
2017-10-18 16:47     ` Christoffer Dall
2017-10-18 16:47       ` Christoffer Dall
2017-10-18 16:53       ` Marc Zyngier
2017-10-18 16:53         ` Marc Zyngier
2017-09-23  0:41 ` [PATCH v3 08/20] KVM: arm/arm64: Rename soft timer to bg_timer Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 17:06   ` Marc Zyngier
2017-10-09 17:06     ` Marc Zyngier
2017-09-23  0:41 ` [PATCH v3 09/20] KVM: arm/arm64: Use separate timer for phys timer emulation Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 17:23   ` Marc Zyngier
2017-10-09 17:23     ` Marc Zyngier
2017-10-19  7:38     ` Christoffer Dall
2017-10-19  7:38       ` Christoffer Dall
2017-09-23  0:41 ` [PATCH v3 10/20] KVM: arm/arm64: Move timer/vgic flush/sync under disabled irq Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 17:34   ` Marc Zyngier
2017-10-09 17:34     ` Marc Zyngier
2017-09-23  0:41 ` [PATCH v3 11/20] KVM: arm/arm64: Move timer save/restore out of the hyp code Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 17:47   ` Marc Zyngier
2017-10-09 17:47     ` Marc Zyngier
2017-10-19  7:46     ` Christoffer Dall
2017-10-19  7:46       ` Christoffer Dall
2017-09-23  0:41 ` [PATCH v3 12/20] genirq: Document vcpu_info usage for percpu_devid interrupts Christoffer Dall
2017-09-23  0:41   ` Christoffer Dall
2017-10-09 17:48   ` Marc Zyngier
2017-10-09 17:48     ` Marc Zyngier
2017-09-23  0:42 ` [PATCH v3 13/20] KVM: arm/arm64: Set VCPU affinity for virt timer irq Christoffer Dall
2017-09-23  0:42   ` Christoffer Dall
2017-10-09 17:52   ` Marc Zyngier
2017-10-09 17:52     ` Marc Zyngier
2017-09-23  0:42 ` [PATCH v3 14/20] KVM: arm/arm64: Avoid timer save/restore in vcpu entry/exit Christoffer Dall
2017-09-23  0:42   ` Christoffer Dall
2017-10-10  8:47   ` Marc Zyngier
2017-10-10  8:47     ` Marc Zyngier
2017-10-19  8:15     ` Christoffer Dall
2017-10-19  8:15       ` Christoffer Dall
2017-09-23  0:42 ` [PATCH v3 15/20] KVM: arm/arm64: Support EL1 phys timer register access in set/get reg Christoffer Dall
2017-09-23  0:42   ` Christoffer Dall
2017-10-10  9:10   ` Marc Zyngier
2017-10-10  9:10     ` Marc Zyngier
2017-10-19  8:32     ` Christoffer Dall
2017-10-19  8:32       ` Christoffer Dall
2017-09-23  0:42 ` [PATCH v3 16/20] KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps Christoffer Dall
2017-09-23  0:42   ` Christoffer Dall
2017-10-10  9:12   ` Marc Zyngier
2017-10-10  9:12     ` Marc Zyngier
2017-09-23  0:42 ` [PATCH v3 17/20] KVM: arm/arm64: Move phys_timer_emulate function Christoffer Dall
2017-09-23  0:42   ` Christoffer Dall
2017-10-10  9:21   ` Marc Zyngier
2017-10-10  9:21     ` Marc Zyngier
2017-09-23  0:42 ` [PATCH v3 18/20] KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit Christoffer Dall
2017-09-23  0:42   ` Christoffer Dall
2017-10-10  9:45   ` Marc Zyngier
2017-10-10  9:45     ` Marc Zyngier
2017-10-19  8:44     ` Christoffer Dall
2017-10-19  8:44       ` Christoffer Dall
2017-09-23  0:42 ` [PATCH v3 19/20] KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate Christoffer Dall
2017-09-23  0:42   ` Christoffer Dall
2017-10-10  9:46   ` Marc Zyngier
2017-10-10  9:46     ` Marc Zyngier
2017-09-23  0:42 ` [PATCH v3 20/20] KVM: arm/arm64: Rework kvm_timer_should_fire Christoffer Dall
2017-09-23  0:42   ` Christoffer Dall
2017-10-10  9:59   ` Marc Zyngier
2017-10-10  9:59     ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.