[PATCH v2 0/8] Rework architected timer and forwarded IRQs handling

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/8] Rework architected timer and forwarded IRQs handling
@ 2015-09-04 19:40 ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: kvm, Marc Zyngier, Christoffer Dall

The architected timer integration with the VGIC had some shortcomings in
that certain guest operations weren't fully supported.

This series tries to address these problems in providing level-triggered
semantics for the arch timer and VGIC integration and seeks to clarify
the behavior when setting/clearing the active state on the physical
distributor.

It also fixes a few other bugs in the VGIC code and finally adds support
for edge-triggered forwarded interrupts.

The edge-triggered forwarded interrupts code is untested but probably
better to clearly do something wrong and raise a warning.

Changes since v1:
 - Sent out bug fixes for active state and UEFI reset as separate
   patches.
 - Fixed various spelling nits
 - Rewrote proposed documentation file trying to address Eric's and
   Marc's comments
 - Rewrote kvm_timer_update_irq and kvm_timer_update_state according to
   Marc's suggestion (thanks!)
 - Added additional patch to support edge-triggered forwarded
   interrupts.

Christoffer Dall (8):
  KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs
  arm/arm64: KVM: Use appropriate define in VGIC reset code
  arm/arm64: KVM: Add forwarded physical interrupts documentation
  arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  arm/arm64: KVM: Support edge-triggered forwarded interrupts

 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
 arch/arm/kvm/arm.c                                 |  21 ++-
 arch/mips/include/asm/kvm_host.h                   |   2 +
 arch/powerpc/include/asm/kvm_host.h                |   2 +
 arch/s390/include/asm/kvm_host.h                   |   2 +
 arch/x86/include/asm/kvm_host.h                    |   3 +
 include/kvm/arm_arch_timer.h                       |   4 +-
 include/kvm/arm_vgic.h                             |   3 -
 include/linux/kvm_host.h                           |   2 +
 virt/kvm/arm/arch_timer.c                          | 150 +++++++++++------
 virt/kvm/arm/vgic.c                                | 163 +++++++++----------
 virt/kvm/kvm_main.c                                |   3 +
 12 files changed, 398 insertions(+), 138 deletions(-)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 0/8] Rework architected timer and forwarded IRQs handling
@ 2015-09-04 19:40 ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

The architected timer integration with the VGIC had some shortcomings in
that certain guest operations weren't fully supported.

This series tries to address these problems in providing level-triggered
semantics for the arch timer and VGIC integration and seeks to clarify
the behavior when setting/clearing the active state on the physical
distributor.

It also fixes a few other bugs in the VGIC code and finally adds support
for edge-triggered forwarded interrupts.

The edge-triggered forwarded interrupts code is untested but probably
better to clearly do something wrong and raise a warning.

Changes since v1:
 - Sent out bug fixes for active state and UEFI reset as separate
   patches.
 - Fixed various spelling nits
 - Rewrote proposed documentation file trying to address Eric's and
   Marc's comments
 - Rewrote kvm_timer_update_irq and kvm_timer_update_state according to
   Marc's suggestion (thanks!)
 - Added additional patch to support edge-triggered forwarded
   interrupts.

Christoffer Dall (8):
  KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs
  arm/arm64: KVM: Use appropriate define in VGIC reset code
  arm/arm64: KVM: Add forwarded physical interrupts documentation
  arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  arm/arm64: KVM: Support edge-triggered forwarded interrupts

 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
 arch/arm/kvm/arm.c                                 |  21 ++-
 arch/mips/include/asm/kvm_host.h                   |   2 +
 arch/powerpc/include/asm/kvm_host.h                |   2 +
 arch/s390/include/asm/kvm_host.h                   |   2 +
 arch/x86/include/asm/kvm_host.h                    |   3 +
 include/kvm/arm_arch_timer.h                       |   4 +-
 include/kvm/arm_vgic.h                             |   3 -
 include/linux/kvm_host.h                           |   2 +
 virt/kvm/arm/arch_timer.c                          | 150 +++++++++++------
 virt/kvm/arm/vgic.c                                | 163 +++++++++----------
 virt/kvm/kvm_main.c                                |   3 +
 12 files changed, 398 insertions(+), 138 deletions(-)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 1/8] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  2015-09-04 19:40 ` Christoffer Dall
@ 2015-09-04 19:40   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: kvm, Marc Zyngier, Christoffer Dall

Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h     | 3 +++
 arch/arm64/include/asm/kvm_host.h   | 3 +++
 arch/mips/include/asm/kvm_host.h    | 2 ++
 arch/powerpc/include/asm/kvm_host.h | 2 ++
 arch/s390/include/asm/kvm_host.h    | 2 ++
 arch/x86/include/asm/kvm_host.h     | 3 +++
 include/linux/kvm_host.h            | 2 ++
 virt/kvm/kvm_main.c                 | 3 +++
 8 files changed, 20 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..86fcf6e 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,4 +236,7 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 415938d..dd143f5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -257,4 +257,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index e8c8d9d..58f0f4d 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -845,5 +845,7 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..179f9a7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -702,5 +702,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_exit(void) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3024acb..04a97df 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -640,5 +640,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2a7f5d7..26c4086 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1202,4 +1202,7 @@ int __x86_set_memory_region(struct kvm *kvm,
 int x86_set_memory_region(struct kvm *kvm,
 			  const struct kvm_userspace_memory_region *mem);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9564fd7..87d7be6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -619,6 +619,8 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
 void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..04b59dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1946,6 +1946,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		} while (single_task_running() && ktime_before(cur, stop));
 	}
 
+	kvm_arch_vcpu_blocking(vcpu);
+
 	for (;;) {
 		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
@@ -1959,6 +1961,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	finish_wait(&vcpu->wq, &wait);
 	cur = ktime_get();
 
+	kvm_arch_vcpu_unblocking(vcpu);
 out:
 	trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
 }
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 1/8] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
@ 2015-09-04 19:40   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h     | 3 +++
 arch/arm64/include/asm/kvm_host.h   | 3 +++
 arch/mips/include/asm/kvm_host.h    | 2 ++
 arch/powerpc/include/asm/kvm_host.h | 2 ++
 arch/s390/include/asm/kvm_host.h    | 2 ++
 arch/x86/include/asm/kvm_host.h     | 3 +++
 include/linux/kvm_host.h            | 2 ++
 virt/kvm/kvm_main.c                 | 3 +++
 8 files changed, 20 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..86fcf6e 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,4 +236,7 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 415938d..dd143f5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -257,4 +257,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index e8c8d9d..58f0f4d 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -845,5 +845,7 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..179f9a7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -702,5 +702,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_exit(void) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3024acb..04a97df 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -640,5 +640,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2a7f5d7..26c4086 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1202,4 +1202,7 @@ int __x86_set_memory_region(struct kvm *kvm,
 int x86_set_memory_region(struct kvm *kvm,
 			  const struct kvm_userspace_memory_region *mem);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9564fd7..87d7be6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -619,6 +619,8 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
 void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..04b59dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1946,6 +1946,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		} while (single_task_running() && ktime_before(cur, stop));
 	}
 
+	kvm_arch_vcpu_blocking(vcpu);
+
 	for (;;) {
 		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
@@ -1959,6 +1961,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	finish_wait(&vcpu->wq, &wait);
 	cur = ktime_get();
 
+	kvm_arch_vcpu_unblocking(vcpu);
 out:
 	trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
 }
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 2/8] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  2015-09-04 19:40 ` Christoffer Dall
@ 2015-09-04 19:40   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

We currently schedule a soft timer every time we exit the guest if the
timer did not expire while running the guest.  This is really not
necessary, because the only work we do in the timer work function is to
kick the vcpu.

Kicking the vcpu does two things:
(1) If the vpcu thread is on a waitqueue, make it runnable and remove it
from the waitqueue.
(2) If the vcpu is running on a different physical CPU from the one
doing the kick, it sends a reschedule IPI.

The second case cannot happen, because the soft timer is only ever
scheduled when the vcpu is not running.  The first case is only relevant
when the vcpu thread is on a waitqueue, which is only the case when the
vcpu thread has called kvm_vcpu_block().

Therefore, we only need to make sure a timer is scheduled for
kvm_vcpu_block(), which we do by encapsulating all calls to
kvm_vcpu_block() with kvm_timer_{un}schedule calls.

Additionally, we only schedule a soft timer if the timer is enabled and
unmasked, since it is useless otherwise.

Note that theoretically userspace can use the SET_ONE_REG interface to
change registers that should cause the timer to fire, even if the vcpu
is blocked without a scheduled timer, but this case was not supported
before this patch and we leave it for future work for now.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   |  3 --
 arch/arm/kvm/arm.c                | 10 +++++
 arch/arm64/include/asm/kvm_host.h |  3 --
 include/kvm/arm_arch_timer.h      |  2 +
 virt/kvm/arm/arch_timer.c         | 91 ++++++++++++++++++++++++++-------------
 5 files changed, 72 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 86fcf6e..dcba0fa 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce404a5..bdf8871 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 	return kvm_timer_should_fire(vcpu);
 }
 
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	kvm_timer_schedule(vcpu);
+}
+
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	kvm_timer_unschedule(vcpu);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	/* Force users to call KVM_ARM_VCPU_INIT */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dd143f5..415938d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index e1e4d7c..ef14cc1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
+void kvm_timer_schedule(struct kvm_vcpu *vcpu);
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 48c6e1a..7991537 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
 	return HRTIMER_NORESTART;
 }
 
+static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
+		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
+		!kvm_vgic_get_phys_irq_active(timer->map);
+}
+
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	cycle_t cval, now;
 
-	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
-	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
-	    kvm_vgic_get_phys_irq_active(timer->map))
+	if (!kvm_timer_irq_can_fire(vcpu))
 		return false;
 
 	cval = timer->cntv_cval;
@@ -127,24 +134,61 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
-/**
- * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
- * @vcpu: The vcpu pointer
- *
- * Disarm any pending soft timers, since the world-switch code will write the
- * virtual timer state back to the physical CPU.
+/*
+ * Schedule the background timer before calling kvm_vcpu_block, so that this
+ * thread is removed from its waitqueue and made runnable when there's a timer
+ * interrupt to handle.
  */
-void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
+void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	u64 ns;
+	cycle_t cval, now;
+
+	BUG_ON(timer_is_armed(timer));
+
+	/*
+	 * No need to schedule a background timer if the guest timer has
+	 * already expired, because kvm_vcpu_block will return before putting
+	 * the thread to sleep.
+	 */
+	if (kvm_timer_should_fire(vcpu))
+		return;
 
 	/*
-	 * We're about to run this vcpu again, so there is no need to
-	 * keep the background timer running, as we're about to
-	 * populate the CPU timer again.
+	 * If the timer is either not capable of raising interrupts (disabled
+	 * or masked) or if we already have a background timer, then there's
+	 * no more work for us to do.
 	 */
+	if (!kvm_timer_irq_can_fire(vcpu))
+		return;
+
+	/*  The timer has not yet expired, schedule a background timer */
+	cval = timer->cntv_cval;
+	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
+
+	ns = cyclecounter_cyc2ns(timecounter->cc,
+				 cval - now,
+				 timecounter->mask,
+				 &timecounter->frac);
+	timer_arm(timer, ns);
+}
+
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	timer_disarm(timer);
+}
 
+/**
+ * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Check if the virtual timer has expired while we were running in the host,
+ * and inject an interrupt if that was the case.
+ */
+void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
+{
 	/*
 	 * If the timer expired while we were not scheduled, now is the time
 	 * to inject it.
@@ -157,32 +201,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
  * kvm_timer_sync_hwstate - sync timer state from cpu
  * @vcpu: The vcpu pointer
  *
- * Check if the virtual timer was armed and either schedule a corresponding
- * soft timer or inject directly if already expired.
+ * Check if the virtual timer has expired while we were running in the guest,
+ * and inject an interrupt if that was the case.
  */
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	cycle_t cval, now;
-	u64 ns;
 
 	BUG_ON(timer_is_armed(timer));
 
-	if (kvm_timer_should_fire(vcpu)) {
-		/*
-		 * Timer has already expired while we were not
-		 * looking. Inject the interrupt and carry on.
-		 */
+	if (kvm_timer_should_fire(vcpu))
 		kvm_timer_inject_irq(vcpu);
-		return;
-	}
-
-	cval = timer->cntv_cval;
-	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
-
-	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
-				 &timecounter->frac);
-	timer_arm(timer, ns);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 2/8] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
@ 2015-09-04 19:40   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

We currently schedule a soft timer every time we exit the guest if the
timer did not expire while running the guest.  This is really not
necessary, because the only work we do in the timer work function is to
kick the vcpu.

Kicking the vcpu does two things:
(1) If the vpcu thread is on a waitqueue, make it runnable and remove it
from the waitqueue.
(2) If the vcpu is running on a different physical CPU from the one
doing the kick, it sends a reschedule IPI.

The second case cannot happen, because the soft timer is only ever
scheduled when the vcpu is not running.  The first case is only relevant
when the vcpu thread is on a waitqueue, which is only the case when the
vcpu thread has called kvm_vcpu_block().

Therefore, we only need to make sure a timer is scheduled for
kvm_vcpu_block(), which we do by encapsulating all calls to
kvm_vcpu_block() with kvm_timer_{un}schedule calls.

Additionally, we only schedule a soft timer if the timer is enabled and
unmasked, since it is useless otherwise.

Note that theoretically userspace can use the SET_ONE_REG interface to
change registers that should cause the timer to fire, even if the vcpu
is blocked without a scheduled timer, but this case was not supported
before this patch and we leave it for future work for now.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   |  3 --
 arch/arm/kvm/arm.c                | 10 +++++
 arch/arm64/include/asm/kvm_host.h |  3 --
 include/kvm/arm_arch_timer.h      |  2 +
 virt/kvm/arm/arch_timer.c         | 91 ++++++++++++++++++++++++++-------------
 5 files changed, 72 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 86fcf6e..dcba0fa 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce404a5..bdf8871 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 	return kvm_timer_should_fire(vcpu);
 }
 
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	kvm_timer_schedule(vcpu);
+}
+
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	kvm_timer_unschedule(vcpu);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	/* Force users to call KVM_ARM_VCPU_INIT */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dd143f5..415938d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index e1e4d7c..ef14cc1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
+void kvm_timer_schedule(struct kvm_vcpu *vcpu);
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 48c6e1a..7991537 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
 	return HRTIMER_NORESTART;
 }
 
+static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
+		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
+		!kvm_vgic_get_phys_irq_active(timer->map);
+}
+
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	cycle_t cval, now;
 
-	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
-	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
-	    kvm_vgic_get_phys_irq_active(timer->map))
+	if (!kvm_timer_irq_can_fire(vcpu))
 		return false;
 
 	cval = timer->cntv_cval;
@@ -127,24 +134,61 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
-/**
- * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
- * @vcpu: The vcpu pointer
- *
- * Disarm any pending soft timers, since the world-switch code will write the
- * virtual timer state back to the physical CPU.
+/*
+ * Schedule the background timer before calling kvm_vcpu_block, so that this
+ * thread is removed from its waitqueue and made runnable when there's a timer
+ * interrupt to handle.
  */
-void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
+void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	u64 ns;
+	cycle_t cval, now;
+
+	BUG_ON(timer_is_armed(timer));
+
+	/*
+	 * No need to schedule a background timer if the guest timer has
+	 * already expired, because kvm_vcpu_block will return before putting
+	 * the thread to sleep.
+	 */
+	if (kvm_timer_should_fire(vcpu))
+		return;
 
 	/*
-	 * We're about to run this vcpu again, so there is no need to
-	 * keep the background timer running, as we're about to
-	 * populate the CPU timer again.
+	 * If the timer is either not capable of raising interrupts (disabled
+	 * or masked) or if we already have a background timer, then there's
+	 * no more work for us to do.
 	 */
+	if (!kvm_timer_irq_can_fire(vcpu))
+		return;
+
+	/*  The timer has not yet expired, schedule a background timer */
+	cval = timer->cntv_cval;
+	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
+
+	ns = cyclecounter_cyc2ns(timecounter->cc,
+				 cval - now,
+				 timecounter->mask,
+				 &timecounter->frac);
+	timer_arm(timer, ns);
+}
+
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	timer_disarm(timer);
+}
 
+/**
+ * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Check if the virtual timer has expired while we were running in the host,
+ * and inject an interrupt if that was the case.
+ */
+void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
+{
 	/*
 	 * If the timer expired while we were not scheduled, now is the time
 	 * to inject it.
@@ -157,32 +201,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
  * kvm_timer_sync_hwstate - sync timer state from cpu
  * @vcpu: The vcpu pointer
  *
- * Check if the virtual timer was armed and either schedule a corresponding
- * soft timer or inject directly if already expired.
+ * Check if the virtual timer has expired while we were running in the guest,
+ * and inject an interrupt if that was the case.
  */
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	cycle_t cval, now;
-	u64 ns;
 
 	BUG_ON(timer_is_armed(timer));
 
-	if (kvm_timer_should_fire(vcpu)) {
-		/*
-		 * Timer has already expired while we were not
-		 * looking. Inject the interrupt and carry on.
-		 */
+	if (kvm_timer_should_fire(vcpu))
 		kvm_timer_inject_irq(vcpu);
-		return;
-	}
-
-	cval = timer->cntv_cval;
-	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
-
-	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
-				 &timecounter->frac);
-	timer_arm(timer, ns);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 3/8] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  2015-09-04 19:40 ` Christoffer Dall
@ 2015-09-04 19:40   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

Currently vgic_process_maintenance() processes dealing with a completed
level-triggered interrupt directly, but we are soon going to reuse this
logic for level-triggered mapped interrupts with the HW bit set, so
move this logic into a separate static function.

Probably the most scary part of this commit is convincing yourself that
the current flow is safe compared to the old one.  In the following I
try to list the changes and why they are harmless:

  Move vgic_irq_clear_queued after kvm_notify_acked_irq:
    Harmless because the effect of clearing the queued flag wrt.
    kvm_set_irq is only that vgic_update_irq_pending does not set the
    pending bit on the emulated CPU interface or in the pending_on_cpu
    bitmask, but we set this in __kvm_vgic_sync_hwstate later on if the
    level is stil high.

  Move vgic_set_lr before kvm_notify_acked_irq:
    Also, harmless because the LR are cpu-local operations and
    kvm_notify_acked only affects the dist

  Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
    Also harmless because it's just a bit which is cleared and altering
    the line state does not affect this bit.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 88 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 50 insertions(+), 38 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 6bd1c9b..fe0e5db 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1322,12 +1322,56 @@ epilog:
 	}
 }
 
+static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
+{
+	int level_pending = 0;
+
+	vlr.state = 0;
+	vlr.hwirq = 0;
+	vgic_set_lr(vcpu, lr, vlr);
+
+	/*
+	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
+	 * went from active to non-active (called from vgic_sync_hwirq) it was
+	 * also ACKed and we we therefore assume we can clear the soft pending
+	 * state (should it had been set) for this interrupt.
+	 *
+	 * Note: if the IRQ soft pending state was set after the IRQ was
+	 * acked, it actually shouldn't be cleared, but we have no way of
+	 * knowing that unless we start trapping ACKs when the soft-pending
+	 * state is set.
+	 */
+	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
+
+	/*
+	 * Tell the gic to start sampling the line of this interrupt again.
+	 */
+	vgic_irq_clear_queued(vcpu, vlr.irq);
+
+	/* Any additional pending interrupt? */
+	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
+		vgic_cpu_irq_set(vcpu, vlr.irq);
+		level_pending = 1;
+	} else {
+		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
+		vgic_cpu_irq_clear(vcpu, vlr.irq);
+	}
+
+	/*
+	 * Despite being EOIed, the LR may not have
+	 * been marked as empty.
+	 */
+	vgic_sync_lr_elrsr(vcpu, lr, vlr);
+
+	return level_pending;
+}
+
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 {
 	u32 status = vgic_get_interrupt_status(vcpu);
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-	bool level_pending = false;
 	struct kvm *kvm = vcpu->kvm;
+	int level_pending = 0;
 
 	kvm_debug("STATUS = %08x\n", status);
 
@@ -1342,54 +1386,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 
 		for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
 			struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
-			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 
-			spin_lock(&dist->lock);
-			vgic_irq_clear_queued(vcpu, vlr.irq);
+			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 			WARN_ON(vlr.state & LR_STATE_MASK);
-			vlr.state = 0;
-			vgic_set_lr(vcpu, lr, vlr);
 
-			/*
-			 * If the IRQ was EOIed it was also ACKed and we we
-			 * therefore assume we can clear the soft pending
-			 * state (should it had been set) for this interrupt.
-			 *
-			 * Note: if the IRQ soft pending state was set after
-			 * the IRQ was acked, it actually shouldn't be
-			 * cleared, but we have no way of knowing that unless
-			 * we start trapping ACKs when the soft-pending state
-			 * is set.
-			 */
-			vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
 
 			/*
 			 * kvm_notify_acked_irq calls kvm_set_irq()
-			 * to reset the IRQ level. Need to release the
-			 * lock for kvm_set_irq to grab it.
+			 * to reset the IRQ level, which grabs the dist->lock
+			 * so we call this before taking the dist->lock.
 			 */
-			spin_unlock(&dist->lock);
-
 			kvm_notify_acked_irq(kvm, 0,
 					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
-			spin_lock(&dist->lock);
-
-			/* Any additional pending interrupt? */
-			if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
-				vgic_cpu_irq_set(vcpu, vlr.irq);
-				level_pending = true;
-			} else {
-				vgic_dist_irq_clear_pending(vcpu, vlr.irq);
-				vgic_cpu_irq_clear(vcpu, vlr.irq);
-			}
 
+			spin_lock(&dist->lock);
+			level_pending |= process_level_irq(vcpu, lr, vlr);
 			spin_unlock(&dist->lock);
-
-			/*
-			 * Despite being EOIed, the LR may not have
-			 * been marked as empty.
-			 */
-			vgic_sync_lr_elrsr(vcpu, lr, vlr);
 		}
 	}
 
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 3/8] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
@ 2015-09-04 19:40   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

Currently vgic_process_maintenance() processes dealing with a completed
level-triggered interrupt directly, but we are soon going to reuse this
logic for level-triggered mapped interrupts with the HW bit set, so
move this logic into a separate static function.

Probably the most scary part of this commit is convincing yourself that
the current flow is safe compared to the old one.  In the following I
try to list the changes and why they are harmless:

  Move vgic_irq_clear_queued after kvm_notify_acked_irq:
    Harmless because the effect of clearing the queued flag wrt.
    kvm_set_irq is only that vgic_update_irq_pending does not set the
    pending bit on the emulated CPU interface or in the pending_on_cpu
    bitmask, but we set this in __kvm_vgic_sync_hwstate later on if the
    level is stil high.

  Move vgic_set_lr before kvm_notify_acked_irq:
    Also, harmless because the LR are cpu-local operations and
    kvm_notify_acked only affects the dist

  Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
    Also harmless because it's just a bit which is cleared and altering
    the line state does not affect this bit.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 88 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 50 insertions(+), 38 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 6bd1c9b..fe0e5db 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1322,12 +1322,56 @@ epilog:
 	}
 }
 
+static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
+{
+	int level_pending = 0;
+
+	vlr.state = 0;
+	vlr.hwirq = 0;
+	vgic_set_lr(vcpu, lr, vlr);
+
+	/*
+	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
+	 * went from active to non-active (called from vgic_sync_hwirq) it was
+	 * also ACKed and we we therefore assume we can clear the soft pending
+	 * state (should it had been set) for this interrupt.
+	 *
+	 * Note: if the IRQ soft pending state was set after the IRQ was
+	 * acked, it actually shouldn't be cleared, but we have no way of
+	 * knowing that unless we start trapping ACKs when the soft-pending
+	 * state is set.
+	 */
+	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
+
+	/*
+	 * Tell the gic to start sampling the line of this interrupt again.
+	 */
+	vgic_irq_clear_queued(vcpu, vlr.irq);
+
+	/* Any additional pending interrupt? */
+	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
+		vgic_cpu_irq_set(vcpu, vlr.irq);
+		level_pending = 1;
+	} else {
+		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
+		vgic_cpu_irq_clear(vcpu, vlr.irq);
+	}
+
+	/*
+	 * Despite being EOIed, the LR may not have
+	 * been marked as empty.
+	 */
+	vgic_sync_lr_elrsr(vcpu, lr, vlr);
+
+	return level_pending;
+}
+
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 {
 	u32 status = vgic_get_interrupt_status(vcpu);
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-	bool level_pending = false;
 	struct kvm *kvm = vcpu->kvm;
+	int level_pending = 0;
 
 	kvm_debug("STATUS = %08x\n", status);
 
@@ -1342,54 +1386,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 
 		for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
 			struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
-			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 
-			spin_lock(&dist->lock);
-			vgic_irq_clear_queued(vcpu, vlr.irq);
+			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 			WARN_ON(vlr.state & LR_STATE_MASK);
-			vlr.state = 0;
-			vgic_set_lr(vcpu, lr, vlr);
 
-			/*
-			 * If the IRQ was EOIed it was also ACKed and we we
-			 * therefore assume we can clear the soft pending
-			 * state (should it had been set) for this interrupt.
-			 *
-			 * Note: if the IRQ soft pending state was set after
-			 * the IRQ was acked, it actually shouldn't be
-			 * cleared, but we have no way of knowing that unless
-			 * we start trapping ACKs when the soft-pending state
-			 * is set.
-			 */
-			vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
 
 			/*
 			 * kvm_notify_acked_irq calls kvm_set_irq()
-			 * to reset the IRQ level. Need to release the
-			 * lock for kvm_set_irq to grab it.
+			 * to reset the IRQ level, which grabs the dist->lock
+			 * so we call this before taking the dist->lock.
 			 */
-			spin_unlock(&dist->lock);
-
 			kvm_notify_acked_irq(kvm, 0,
 					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
-			spin_lock(&dist->lock);
-
-			/* Any additional pending interrupt? */
-			if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
-				vgic_cpu_irq_set(vcpu, vlr.irq);
-				level_pending = true;
-			} else {
-				vgic_dist_irq_clear_pending(vcpu, vlr.irq);
-				vgic_cpu_irq_clear(vcpu, vlr.irq);
-			}
 
+			spin_lock(&dist->lock);
+			level_pending |= process_level_irq(vcpu, lr, vlr);
 			spin_unlock(&dist->lock);
-
-			/*
-			 * Despite being EOIed, the LR may not have
-			 * been marked as empty.
-			 */
-			vgic_sync_lr_elrsr(vcpu, lr, vlr);
 		}
 	}
 
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 4/8] arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs
  2015-09-04 19:40 ` Christoffer Dall
@ 2015-09-04 19:40   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: kvm, Marc Zyngier, Christoffer Dall

The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
We currently simulate this behavior by writing a hardcoded value to the
register for the SGIs and PPIs on every write of these bits to the
register (ignoring what the guest actually wrote), and by writing the
same value as the reset value to the register.

This is a bit counter-intuitive, as the register is RO for these bits,
and we can just implement it that way, allowing us to control the value
of the bits purely in the reset code.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index fe0e5db..e606f78 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -655,7 +655,7 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
 			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
 	if (mmio->is_write) {
 		if (offset < 8) {
-			*reg = ~0U; /* Force PPIs/SGIs to 1 */
+			/* Ignore writes to read-only SGI and PPI bits */
 			return false;
 		}

-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 4/8] arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs
@ 2015-09-04 19:40   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
We currently simulate this behavior by writing a hardcoded value to the
register for the SGIs and PPIs on every write of these bits to the
register (ignoring what the guest actually wrote), and by writing the
same value as the reset value to the register.

This is a bit counter-intuitive, as the register is RO for these bits,
and we can just implement it that way, allowing us to control the value
of the bits purely in the reset code.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index fe0e5db..e606f78 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -655,7 +655,7 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
 			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
 	if (mmio->is_write) {
 		if (offset < 8) {
-			*reg = ~0U; /* Force PPIs/SGIs to 1 */
+			/* Ignore writes to read-only SGI and PPI bits */
 			return false;
 		}

-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 5/8] arm/arm64: KVM: Use appropriate define in VGIC reset code
  2015-09-04 19:40 ` Christoffer Dall
@ 2015-09-04 19:40   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

We currently initialize the SGIs to be enabled in the VGIC code, but we
use the VGIC_NR_PPIS define for this purpose, instead of the the more
natural VGIC_NR_SGIS.  Change this slightly confusing use of the
defines.

Note: This should have no functional change, as both names are defined
to the number 16.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index e606f78..9ed8d53 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2109,7 +2109,7 @@ int vgic_init(struct kvm *kvm)
 		}
 
 		for (i = 0; i < dist->nr_irqs; i++) {
-			if (i < VGIC_NR_PPIS)
+			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_enabled,
 							vcpu->vcpu_id, i, 1);
 			if (i < VGIC_NR_PRIVATE_IRQS)
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 5/8] arm/arm64: KVM: Use appropriate define in VGIC reset code
@ 2015-09-04 19:40   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

We currently initialize the SGIs to be enabled in the VGIC code, but we
use the VGIC_NR_PPIS define for this purpose, instead of the the more
natural VGIC_NR_SGIS.  Change this slightly confusing use of the
defines.

Note: This should have no functional change, as both names are defined
to the number 16.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index e606f78..9ed8d53 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2109,7 +2109,7 @@ int vgic_init(struct kvm *kvm)
 		}
 
 		for (i = 0; i < dist->nr_irqs; i++) {
-			if (i < VGIC_NR_PPIS)
+			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_enabled,
 							vcpu->vcpu_id, i, 1);
 			if (i < VGIC_NR_PRIVATE_IRQS)
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-04 19:40 ` Christoffer Dall
@ 2015-09-04 19:40   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: kvm, Marc Zyngier, Christoffer Dall

Forwarded physical interrupts on arm/arm64 is a tricky concept and the
way we deal with them is not apparently easy to understand by reading
various specs.

Therefore, add a proper documentation file explaining the flow and
rationale of the behavior of the vgic.

Some of this text was contributed by Marc Zyngier and edited by me.
Omissions and errors are all mine.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
 1 file changed, 181 insertions(+)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
new file mode 100644
index 0000000..24b6f28
--- /dev/null
+++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
@@ -0,0 +1,181 @@
+KVM/ARM VGIC Forwarded Physical Interrupts
+==========================================
+
+The KVM/ARM code implements software support for the ARM Generic
+Interrupt Controller's (GIC's) hardware support for virtualization by
+allowing software to inject virtual interrupts to a VM, which the guest
+OS sees as regular interrupts.  The code is famously known as the VGIC.
+
+Some of these virtual interrupts, however, correspond to physical
+interrupts from real physical devices.  One example could be the
+architected timer, which itself supports virtualization, and therefore
+lets a guest OS program the hardware device directly to raise an
+interrupt at some point in time.  When such an interrupt is raised, the
+host OS initially handles the interrupt and must somehow signal this
+event as a virtual interrupt to the guest.  Another example could be a
+passthrough device, where the physical interrupts are initially handled
+by the host, but the device driver for the device lives in the guest OS
+and KVM must therefore somehow inject a virtual interrupt on behalf of
+the physical one to the guest OS.
+
+These virtual interrupts corresponding to a physical interrupt on the
+host are called forwarded physical interrupts, but are also sometimes
+referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
+
+Forwarded physical interrupts are handled slightly differently compared
+to virtual interrupts generated purely by a software emulated device.
+
+
+The HW bit
+----------
+Virtual interrupts are signalled to the guest by programming the List
+Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
+with the virtual IRQ number and the state of the interrupt (Pending,
+Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
+interrupt, the LR state moves from Pending to Active, and finally to
+inactive.
+
+The LRs include an extra bit, called the HW bit.  When this bit is set,
+KVM must also program an additional field in the LR, the physical IRQ
+number, to link the virtual with the physical IRQ.
+
+When the HW bit is set, KVM must EITHER set the Pending OR the Active
+bit, never both at the same time.
+
+Setting the HW bit causes the hardware to deactivate the physical
+interrupt on the physical distributor when the guest deactivates the
+corresponding virtual interrupt.
+
+
+Forwarded Physical Interrupts Life Cycle
+----------------------------------------
+
+The state of forwarded physical interrupts is managed in the following way:
+
+  - The physical interrupt is acked by the host, and becomes active on
+    the physical distributor (*).
+  - KVM sets the LR.Pending bit, because this is the only way the GICV
+    interface is going to present it to the guest.
+  - LR.Pending will stay set as long as the guest has not acked the interrupt.
+  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
+    expected.
+  - On guest EOI, the *physical distributor* active bit gets cleared,
+    but the LR.Active is left untouched (set).
+  - KVM clears the LR when on VM exits when the physical distributor
+    active state has been cleared.
+
+(*): The host handling is slightly more complicated.  For some devices
+(shared), KVM directly sets the active state on the physical distributor
+before entering the guest, and for some devices (non-shared) the host
+configures the GIC such that it does not deactivate the interrupt on
+host EOIs, but only performs a priority drop allowing the GIC to receive
+other interrupts and leaves the interrupt in the active state on the
+physical distributor.
+
+
+Forwarded Edge and Level Triggered PPIs and SPIs
+------------------------------------------------
+Forwarded physical interrupts injected should always be active on the
+physical distributor when injected to a guest.
+
+Level-triggered interrupts will keep the interrupt line to the GIC
+asserted, typically until the guest programs the device to deassert the
+line.  This means that the interrupt will remain pending on the physical
+distributor until the guest has reprogrammed the device.  Since we
+always run the VM with interrupts enabled on the CPU, a pending
+interrupt will exit the guest as soon as we switch into the guest,
+preventing the guest from ever making progress as the process repeats
+over and over.  Therefore, the active state on the physical distributor
+must be set when entering the guest, preventing the GIC from forwarding
+the pending interrupt to the CPU.  As soon as the guest deactivates
+(EOIs) the interrupt, the physical line is sampled by the hardware again
+and the host takes a new interrupt if and only if the physical line is
+still asserted.
+
+Edge-triggered interrupts do not exhibit the same problem with
+preventing guest execution that level-triggered interrupts do.  One
+option is to not use HW bit at all, and inject edge-triggered interrupts
+from a physical device as pure virtual interrupts.  But that would
+potentially slow down handling of the interrupt in the guest, because a
+physical interrupt occurring in the middle of the guest ISR would
+preempt the guest for the host to handle the interrupt.  Additionally,
+if you configure the system to handle interrupts on a separate physical
+core from that running your VCPU, you still have to interrupt the VCPU
+to queue the pending state onto the LR, even though the guest won't use
+this information until the guest ISR completes.  Therefore, the HW
+bit should always be set for forwarded edge-triggered interrupts.  With
+the HW bit set, the virtual interrupt is injected and additional
+physical interrupts occurring before the guest deactivates the interrupt
+simply mark the state on the physical distributor as Pending+Active.  As
+soon as the guest deactivates the interrupt, the host takes another
+interrupt if and only if there was a physical interrupt between
+injecting the forwarded interrupt to the guest the guest deactivating
+the interrupt.
+
+Consequently, whenever we schedule a VCPU with one or more LRs with the
+HW bit set, the interrupt must also be active on the physical
+distributor.
+
+
+Forwarded LPIs
+--------------
+LPIs, introduced in GICv3, are always edge-triggered and do not have an
+active state.  They become pending when a device signal them, and as
+soon as they are acked by the CPU, they are inactive again.
+
+It therefore doesn't make sense, and is not supported, to set the HW bit
+for physical LPIs that are forwarded to a VM as virtual interrupts,
+typically virtual SPIs.
+
+For LPIs, there is no other choice than to preempt the VCPU thread if
+necessary, and queue the pending state onto the LR.
+
+
+Putting It Together: The Architected Timer
+------------------------------------------
+The architected timer is a device that signals interrupts with level
+triggered semantics.  The timer hardware is directly accessed by VCPUs
+which program the timer to fire at some point in time.  Each VCPU on a
+system programs the timer to fire at different times, and therefore the
+hardware is multiplexed between multiple VCPUs.  This is implemented by
+context-switching the timer state along with each VCPU thread.
+
+However, this means that a scenario like the following is entirely
+possible, and in fact, typical:
+
+1.  KVM runs the VCPU
+2.  The guest programs the time to fire in T+100
+3.  The guest is idle and calls WFI (wait-for-interrupts)
+4.  The hardware traps to the host
+5.  KVM stores the timer state to memory and disables the hardware timer
+6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
+7.  KVM puts the VCPU thread to sleep (on a waitqueue)
+8.  The soft timer fires, waking up the VCPU thread
+9.  KVM reprograms the timer hardware with the VCPU's values
+10. KVM marks the timer interrupt as active on the physical distributor
+11. KVM injects a forwarded physical interrupt to the guest
+12. KVM runs the VCPU
+
+Notice that KVM injects a forwarded physical interrupt in step 11 without
+the corresponding interrupt having actually fired on the host.  That is
+exactly why we mark the timer interrupt as active in step 10, because
+the active state on the physical distributor is part of the state
+belonging to the timer hardware, which is context-switched along with
+the VCPU thread.
+
+If the guest does not idle because it is busy, flow looks like this
+instead:
+
+1.  KVM runs the VCPU
+2.  The guest programs the time to fire in T+100
+4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
+5.  With interrupts disabled on the CPU, KVM looks at the timer state
+    and injects a forwarded physical interrupt because it concludes the
+    timer has expired.
+6.  KVM marks the timer interrupt as active on the physical distributor
+7.  KVM runs the VCPU
+
+Notice that again the forwarded physical interrupt is injected to the
+guest without having actually been handled on the host.  In this case it
+is because the physical interrupt is forwarded to the guest before KVM
+enables physical interrupts on the CPU after exiting the guest.
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-04 19:40   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

Forwarded physical interrupts on arm/arm64 is a tricky concept and the
way we deal with them is not apparently easy to understand by reading
various specs.

Therefore, add a proper documentation file explaining the flow and
rationale of the behavior of the vgic.

Some of this text was contributed by Marc Zyngier and edited by me.
Omissions and errors are all mine.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
 1 file changed, 181 insertions(+)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
new file mode 100644
index 0000000..24b6f28
--- /dev/null
+++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
@@ -0,0 +1,181 @@
+KVM/ARM VGIC Forwarded Physical Interrupts
+==========================================
+
+The KVM/ARM code implements software support for the ARM Generic
+Interrupt Controller's (GIC's) hardware support for virtualization by
+allowing software to inject virtual interrupts to a VM, which the guest
+OS sees as regular interrupts.  The code is famously known as the VGIC.
+
+Some of these virtual interrupts, however, correspond to physical
+interrupts from real physical devices.  One example could be the
+architected timer, which itself supports virtualization, and therefore
+lets a guest OS program the hardware device directly to raise an
+interrupt at some point in time.  When such an interrupt is raised, the
+host OS initially handles the interrupt and must somehow signal this
+event as a virtual interrupt to the guest.  Another example could be a
+passthrough device, where the physical interrupts are initially handled
+by the host, but the device driver for the device lives in the guest OS
+and KVM must therefore somehow inject a virtual interrupt on behalf of
+the physical one to the guest OS.
+
+These virtual interrupts corresponding to a physical interrupt on the
+host are called forwarded physical interrupts, but are also sometimes
+referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
+
+Forwarded physical interrupts are handled slightly differently compared
+to virtual interrupts generated purely by a software emulated device.
+
+
+The HW bit
+----------
+Virtual interrupts are signalled to the guest by programming the List
+Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
+with the virtual IRQ number and the state of the interrupt (Pending,
+Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
+interrupt, the LR state moves from Pending to Active, and finally to
+inactive.
+
+The LRs include an extra bit, called the HW bit.  When this bit is set,
+KVM must also program an additional field in the LR, the physical IRQ
+number, to link the virtual with the physical IRQ.
+
+When the HW bit is set, KVM must EITHER set the Pending OR the Active
+bit, never both at the same time.
+
+Setting the HW bit causes the hardware to deactivate the physical
+interrupt on the physical distributor when the guest deactivates the
+corresponding virtual interrupt.
+
+
+Forwarded Physical Interrupts Life Cycle
+----------------------------------------
+
+The state of forwarded physical interrupts is managed in the following way:
+
+  - The physical interrupt is acked by the host, and becomes active on
+    the physical distributor (*).
+  - KVM sets the LR.Pending bit, because this is the only way the GICV
+    interface is going to present it to the guest.
+  - LR.Pending will stay set as long as the guest has not acked the interrupt.
+  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
+    expected.
+  - On guest EOI, the *physical distributor* active bit gets cleared,
+    but the LR.Active is left untouched (set).
+  - KVM clears the LR when on VM exits when the physical distributor
+    active state has been cleared.
+
+(*): The host handling is slightly more complicated.  For some devices
+(shared), KVM directly sets the active state on the physical distributor
+before entering the guest, and for some devices (non-shared) the host
+configures the GIC such that it does not deactivate the interrupt on
+host EOIs, but only performs a priority drop allowing the GIC to receive
+other interrupts and leaves the interrupt in the active state on the
+physical distributor.
+
+
+Forwarded Edge and Level Triggered PPIs and SPIs
+------------------------------------------------
+Forwarded physical interrupts injected should always be active on the
+physical distributor when injected to a guest.
+
+Level-triggered interrupts will keep the interrupt line to the GIC
+asserted, typically until the guest programs the device to deassert the
+line.  This means that the interrupt will remain pending on the physical
+distributor until the guest has reprogrammed the device.  Since we
+always run the VM with interrupts enabled on the CPU, a pending
+interrupt will exit the guest as soon as we switch into the guest,
+preventing the guest from ever making progress as the process repeats
+over and over.  Therefore, the active state on the physical distributor
+must be set when entering the guest, preventing the GIC from forwarding
+the pending interrupt to the CPU.  As soon as the guest deactivates
+(EOIs) the interrupt, the physical line is sampled by the hardware again
+and the host takes a new interrupt if and only if the physical line is
+still asserted.
+
+Edge-triggered interrupts do not exhibit the same problem with
+preventing guest execution that level-triggered interrupts do.  One
+option is to not use HW bit at all, and inject edge-triggered interrupts
+from a physical device as pure virtual interrupts.  But that would
+potentially slow down handling of the interrupt in the guest, because a
+physical interrupt occurring in the middle of the guest ISR would
+preempt the guest for the host to handle the interrupt.  Additionally,
+if you configure the system to handle interrupts on a separate physical
+core from that running your VCPU, you still have to interrupt the VCPU
+to queue the pending state onto the LR, even though the guest won't use
+this information until the guest ISR completes.  Therefore, the HW
+bit should always be set for forwarded edge-triggered interrupts.  With
+the HW bit set, the virtual interrupt is injected and additional
+physical interrupts occurring before the guest deactivates the interrupt
+simply mark the state on the physical distributor as Pending+Active.  As
+soon as the guest deactivates the interrupt, the host takes another
+interrupt if and only if there was a physical interrupt between
+injecting the forwarded interrupt to the guest the guest deactivating
+the interrupt.
+
+Consequently, whenever we schedule a VCPU with one or more LRs with the
+HW bit set, the interrupt must also be active on the physical
+distributor.
+
+
+Forwarded LPIs
+--------------
+LPIs, introduced in GICv3, are always edge-triggered and do not have an
+active state.  They become pending when a device signal them, and as
+soon as they are acked by the CPU, they are inactive again.
+
+It therefore doesn't make sense, and is not supported, to set the HW bit
+for physical LPIs that are forwarded to a VM as virtual interrupts,
+typically virtual SPIs.
+
+For LPIs, there is no other choice than to preempt the VCPU thread if
+necessary, and queue the pending state onto the LR.
+
+
+Putting It Together: The Architected Timer
+------------------------------------------
+The architected timer is a device that signals interrupts with level
+triggered semantics.  The timer hardware is directly accessed by VCPUs
+which program the timer to fire at some point in time.  Each VCPU on a
+system programs the timer to fire at different times, and therefore the
+hardware is multiplexed between multiple VCPUs.  This is implemented by
+context-switching the timer state along with each VCPU thread.
+
+However, this means that a scenario like the following is entirely
+possible, and in fact, typical:
+
+1.  KVM runs the VCPU
+2.  The guest programs the time to fire in T+100
+3.  The guest is idle and calls WFI (wait-for-interrupts)
+4.  The hardware traps to the host
+5.  KVM stores the timer state to memory and disables the hardware timer
+6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
+7.  KVM puts the VCPU thread to sleep (on a waitqueue)
+8.  The soft timer fires, waking up the VCPU thread
+9.  KVM reprograms the timer hardware with the VCPU's values
+10. KVM marks the timer interrupt as active on the physical distributor
+11. KVM injects a forwarded physical interrupt to the guest
+12. KVM runs the VCPU
+
+Notice that KVM injects a forwarded physical interrupt in step 11 without
+the corresponding interrupt having actually fired on the host.  That is
+exactly why we mark the timer interrupt as active in step 10, because
+the active state on the physical distributor is part of the state
+belonging to the timer hardware, which is context-switched along with
+the VCPU thread.
+
+If the guest does not idle because it is busy, flow looks like this
+instead:
+
+1.  KVM runs the VCPU
+2.  The guest programs the time to fire in T+100
+4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
+5.  With interrupts disabled on the CPU, KVM looks at the timer state
+    and injects a forwarded physical interrupt because it concludes the
+    timer has expired.
+6.  KVM marks the timer interrupt as active on the physical distributor
+7.  KVM runs the VCPU
+
+Notice that again the forwarded physical interrupt is injected to the
+guest without having actually been handled on the host.  In this case it
+is because the physical interrupt is forwarded to the guest before KVM
+enables physical interrupts on the CPU after exiting the guest.
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-04 19:40 ` Christoffer Dall
@ 2015-09-04 19:40   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any affect on the pending state of
virtual interrupts in the vgic.  This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts.  Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.

This patches fixes this shortcoming through the following series of
changes.

First, we change the flow of the timer/vgic sync/flush operations.  Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output.  This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.

Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes.  Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.

Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever we signal a mapped interrupt to the guest, and we reset the
active state when we sync back the HW state from the vgic.

Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/kvm/arm.c           | 11 ++++++--
 include/kvm/arm_arch_timer.h |  2 +-
 include/kvm/arm_vgic.h       |  3 --
 virt/kvm/arm/arch_timer.c    | 65 +++++++++++++++++++++++++++++-------------
 virt/kvm/arm/vgic.c          | 67 +++++++++++++++-----------------------------
 5 files changed, 78 insertions(+), 70 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index bdf8871..102a4aa 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
+			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			preempt_enable();
-			kvm_timer_sync_hwstate(vcpu);
 			continue;
 		}
 
@@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_exit();
 		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
+		/*
+		 * We must sync the timer state before the vgic state so that
+		 * the vgic can properly sample the updated state of the
+		 * interrupt line.
+		 */
+		kvm_timer_sync_hwstate(vcpu);
+
 		kvm_vgic_sync_hwstate(vcpu);
 
 		preempt_enable();
 
-		kvm_timer_sync_hwstate(vcpu);
-
 		ret = handle_exit(vcpu, run, ret);
 	}
 
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ef14cc1..1800227 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -51,7 +51,7 @@ struct arch_timer_cpu {
 	bool				armed;
 
 	/* Timer IRQ */
-	const struct kvm_irq_level	*irq;
+	struct kvm_irq_level		irq;
 
 	/* VGIC mapping */
 	struct irq_phys_map		*map;
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d901f1a..99011a0 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -163,7 +163,6 @@ struct irq_phys_map {
 	u32			virt_irq;
 	u32			phys_irq;
 	u32			irq;
-	bool			active;
 };
 
 struct irq_phys_map_entry {
@@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 					   int virt_irq, int irq);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 7991537..0cdd092 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
 	}
 }
 
-static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
-{
-	int ret;
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-
-	kvm_vgic_set_phys_irq_active(timer->map, true);
-	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
-					 timer->map,
-					 timer->irq->level);
-	WARN_ON(ret);
-}
-
 static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
 {
 	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
@@ -116,8 +104,7 @@ static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
-		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
-		!kvm_vgic_get_phys_irq_active(timer->map);
+		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
 }
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
@@ -134,6 +121,41 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
+static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
+{
+	int ret;
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	BUG_ON(!vgic_initialized(vcpu->kvm));
+
+	timer->irq.level = new_level;
+	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
+					 timer->map,
+					 timer->irq.level);
+	WARN_ON(ret);
+}
+
+/*
+ * Check if there was a change in the timer state (should we raise or lower
+ * the line level to the GIC).
+ */
+static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	/*
+	 * If userspace modified the timer registers via SET_ONE_REG before
+	 * the vgic was initialized, we mustn't set the timer->irq.level value
+	 * because the guest would never see the interrupt.  Instead wait
+	 * until we call this funciton from kvm_timer_flush_hwstate.
+	 */
+	if (!vgic_initialized(vcpu->kvm))
+	    return;
+
+	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
+		kvm_timer_update_irq(vcpu, !timer->irq.level);
+}
+
 /*
  * Schedule the background timer before calling kvm_vcpu_block, so that this
  * thread is removed from its waitqueue and made runnable when there's a timer
@@ -193,8 +215,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	 * If the timer expired while we were not scheduled, now is the time
 	 * to inject it.
 	 */
-	if (kvm_timer_should_fire(vcpu))
-		kvm_timer_inject_irq(vcpu);
+	kvm_timer_update_state(vcpu);
 }
 
 /**
@@ -210,8 +231,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 
 	BUG_ON(timer_is_armed(timer));
 
-	if (kvm_timer_should_fire(vcpu))
-		kvm_timer_inject_irq(vcpu);
+	/*
+	 * The guest could have modified the timer registers or the timer
+	 * could have expired, update the timer state.
+	 */
+	kvm_timer_update_state(vcpu);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
@@ -226,7 +250,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * kvm_vcpu_set_target(). To handle this, we determine
 	 * vcpu timer irq number when the vcpu is reset.
 	 */
-	timer->irq = irq;
+	timer->irq.irq = irq->irq;
 
 	/*
 	 * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
@@ -235,6 +259,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * the ARMv7 architecture.
 	 */
 	timer->cntv_ctl = 0;
+	kvm_timer_update_state(vcpu);
 
 	/*
 	 * Tell the VGIC that the virtual interrupt is tied to a
@@ -279,6 +304,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 	default:
 		return -1;
 	}
+
+	kvm_timer_update_state(vcpu);
 	return 0;
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 9ed8d53..f4ea950 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 /*
  * Save the physical active state, and reset it to inactive.
  *
- * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
+ * Return true if there's a pending level triggered interrupt line to queue.
  */
-static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
+static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 {
 	struct irq_phys_map *map;
+	bool phys_active;
 	int ret;
 
 	if (!(vlr.state & LR_HW))
 		return 0;
 
 	map = vgic_irq_map_search(vcpu, vlr.irq);
-	BUG_ON(!map || !map->active);
+	BUG_ON(!map);
 
 	ret = irq_get_irqchip_state(map->irq,
 				    IRQCHIP_STATE_ACTIVE,
-				    &map->active);
+				    &phys_active);
 
 	WARN_ON(ret);
 
-	if (map->active) {
+	if (phys_active) {
+		/*
+		 * Interrupt still marked as active on the physical
+		 * distributor, so guest did not EOI it yet.  Reset to
+		 * non-active so that other VMs can see interrupts from this
+		 * device.
+		 */
 		ret = irq_set_irqchip_state(map->irq,
 					    IRQCHIP_STATE_ACTIVE,
 					    false);
 		WARN_ON(ret);
-		return 0;
+		return false;
 	}
 
-	return 1;
+	/* Mapped edge-triggered interrupts not yet supported. */
+	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
+	return process_level_irq(vcpu, lr, vlr);
 }
 
 /* Sync back the VGIC state after a guest run */
@@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 			continue;
 
 		vlr = vgic_get_lr(vcpu, lr);
-		if (vgic_sync_hwirq(vcpu, vlr)) {
-			/*
-			 * So this is a HW interrupt that the guest
-			 * EOI-ed. Clean the LR state and allow the
-			 * interrupt to be sampled again.
-			 */
-			vlr.state = 0;
-			vlr.hwirq = 0;
-			vgic_set_lr(vcpu, lr, vlr);
-			vgic_irq_clear_queued(vcpu, vlr.irq);
-			set_bit(lr, elrsr_ptr);
-		}
+		if (vgic_sync_hwirq(vcpu, lr, vlr))
+			level_pending = true;
 
 		if (!test_bit(lr, elrsr_ptr))
 			continue;
@@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
 }
 
 /**
- * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
- *
- * Return the logical active state of a mapped interrupt. This doesn't
- * necessarily reflects the current HW state.
- */
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
-{
-	BUG_ON(!map);
-	return map->active;
-}
-
-/**
- * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
- *
- * Set the logical active state of a mapped interrupt. This doesn't
- * immediately affects the HW state.
- */
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
-{
-	BUG_ON(!map);
-	map->active = active;
-}
-
-/**
  * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
  * @vcpu: The VCPU pointer
  * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
@@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
 			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_enabled,
 							vcpu->vcpu_id, i, 1);
-			if (i < VGIC_NR_PRIVATE_IRQS)
+			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_cfg,
 							vcpu->vcpu_id, i,
 							VGIC_CFG_EDGE);
+			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
+				vgic_bitmap_set_irq_val(&dist->irq_cfg,
+							vcpu->vcpu_id, i,
+							VGIC_CFG_LEVEL);
 		}
 
 		vgic_enable(vcpu);
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-04 19:40   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any affect on the pending state of
virtual interrupts in the vgic.  This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts.  Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.

This patches fixes this shortcoming through the following series of
changes.

First, we change the flow of the timer/vgic sync/flush operations.  Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output.  This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.

Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes.  Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.

Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever we signal a mapped interrupt to the guest, and we reset the
active state when we sync back the HW state from the vgic.

Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/kvm/arm.c           | 11 ++++++--
 include/kvm/arm_arch_timer.h |  2 +-
 include/kvm/arm_vgic.h       |  3 --
 virt/kvm/arm/arch_timer.c    | 65 +++++++++++++++++++++++++++++-------------
 virt/kvm/arm/vgic.c          | 67 +++++++++++++++-----------------------------
 5 files changed, 78 insertions(+), 70 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index bdf8871..102a4aa 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
+			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			preempt_enable();
-			kvm_timer_sync_hwstate(vcpu);
 			continue;
 		}
 
@@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_exit();
 		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
+		/*
+		 * We must sync the timer state before the vgic state so that
+		 * the vgic can properly sample the updated state of the
+		 * interrupt line.
+		 */
+		kvm_timer_sync_hwstate(vcpu);
+
 		kvm_vgic_sync_hwstate(vcpu);
 
 		preempt_enable();
 
-		kvm_timer_sync_hwstate(vcpu);
-
 		ret = handle_exit(vcpu, run, ret);
 	}
 
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ef14cc1..1800227 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -51,7 +51,7 @@ struct arch_timer_cpu {
 	bool				armed;
 
 	/* Timer IRQ */
-	const struct kvm_irq_level	*irq;
+	struct kvm_irq_level		irq;
 
 	/* VGIC mapping */
 	struct irq_phys_map		*map;
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d901f1a..99011a0 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -163,7 +163,6 @@ struct irq_phys_map {
 	u32			virt_irq;
 	u32			phys_irq;
 	u32			irq;
-	bool			active;
 };
 
 struct irq_phys_map_entry {
@@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 					   int virt_irq, int irq);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 7991537..0cdd092 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
 	}
 }
 
-static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
-{
-	int ret;
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-
-	kvm_vgic_set_phys_irq_active(timer->map, true);
-	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
-					 timer->map,
-					 timer->irq->level);
-	WARN_ON(ret);
-}
-
 static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
 {
 	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
@@ -116,8 +104,7 @@ static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
-		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
-		!kvm_vgic_get_phys_irq_active(timer->map);
+		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
 }
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
@@ -134,6 +121,41 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
+static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
+{
+	int ret;
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	BUG_ON(!vgic_initialized(vcpu->kvm));
+
+	timer->irq.level = new_level;
+	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
+					 timer->map,
+					 timer->irq.level);
+	WARN_ON(ret);
+}
+
+/*
+ * Check if there was a change in the timer state (should we raise or lower
+ * the line level to the GIC).
+ */
+static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	/*
+	 * If userspace modified the timer registers via SET_ONE_REG before
+	 * the vgic was initialized, we mustn't set the timer->irq.level value
+	 * because the guest would never see the interrupt.  Instead wait
+	 * until we call this funciton from kvm_timer_flush_hwstate.
+	 */
+	if (!vgic_initialized(vcpu->kvm))
+	    return;
+
+	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
+		kvm_timer_update_irq(vcpu, !timer->irq.level);
+}
+
 /*
  * Schedule the background timer before calling kvm_vcpu_block, so that this
  * thread is removed from its waitqueue and made runnable when there's a timer
@@ -193,8 +215,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	 * If the timer expired while we were not scheduled, now is the time
 	 * to inject it.
 	 */
-	if (kvm_timer_should_fire(vcpu))
-		kvm_timer_inject_irq(vcpu);
+	kvm_timer_update_state(vcpu);
 }
 
 /**
@@ -210,8 +231,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 
 	BUG_ON(timer_is_armed(timer));
 
-	if (kvm_timer_should_fire(vcpu))
-		kvm_timer_inject_irq(vcpu);
+	/*
+	 * The guest could have modified the timer registers or the timer
+	 * could have expired, update the timer state.
+	 */
+	kvm_timer_update_state(vcpu);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
@@ -226,7 +250,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * kvm_vcpu_set_target(). To handle this, we determine
 	 * vcpu timer irq number when the vcpu is reset.
 	 */
-	timer->irq = irq;
+	timer->irq.irq = irq->irq;
 
 	/*
 	 * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
@@ -235,6 +259,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * the ARMv7 architecture.
 	 */
 	timer->cntv_ctl = 0;
+	kvm_timer_update_state(vcpu);
 
 	/*
 	 * Tell the VGIC that the virtual interrupt is tied to a
@@ -279,6 +304,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 	default:
 		return -1;
 	}
+
+	kvm_timer_update_state(vcpu);
 	return 0;
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 9ed8d53..f4ea950 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 /*
  * Save the physical active state, and reset it to inactive.
  *
- * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
+ * Return true if there's a pending level triggered interrupt line to queue.
  */
-static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
+static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 {
 	struct irq_phys_map *map;
+	bool phys_active;
 	int ret;
 
 	if (!(vlr.state & LR_HW))
 		return 0;
 
 	map = vgic_irq_map_search(vcpu, vlr.irq);
-	BUG_ON(!map || !map->active);
+	BUG_ON(!map);
 
 	ret = irq_get_irqchip_state(map->irq,
 				    IRQCHIP_STATE_ACTIVE,
-				    &map->active);
+				    &phys_active);
 
 	WARN_ON(ret);
 
-	if (map->active) {
+	if (phys_active) {
+		/*
+		 * Interrupt still marked as active on the physical
+		 * distributor, so guest did not EOI it yet.  Reset to
+		 * non-active so that other VMs can see interrupts from this
+		 * device.
+		 */
 		ret = irq_set_irqchip_state(map->irq,
 					    IRQCHIP_STATE_ACTIVE,
 					    false);
 		WARN_ON(ret);
-		return 0;
+		return false;
 	}
 
-	return 1;
+	/* Mapped edge-triggered interrupts not yet supported. */
+	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
+	return process_level_irq(vcpu, lr, vlr);
 }
 
 /* Sync back the VGIC state after a guest run */
@@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 			continue;
 
 		vlr = vgic_get_lr(vcpu, lr);
-		if (vgic_sync_hwirq(vcpu, vlr)) {
-			/*
-			 * So this is a HW interrupt that the guest
-			 * EOI-ed. Clean the LR state and allow the
-			 * interrupt to be sampled again.
-			 */
-			vlr.state = 0;
-			vlr.hwirq = 0;
-			vgic_set_lr(vcpu, lr, vlr);
-			vgic_irq_clear_queued(vcpu, vlr.irq);
-			set_bit(lr, elrsr_ptr);
-		}
+		if (vgic_sync_hwirq(vcpu, lr, vlr))
+			level_pending = true;
 
 		if (!test_bit(lr, elrsr_ptr))
 			continue;
@@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
 }
 
 /**
- * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
- *
- * Return the logical active state of a mapped interrupt. This doesn't
- * necessarily reflects the current HW state.
- */
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
-{
-	BUG_ON(!map);
-	return map->active;
-}
-
-/**
- * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
- *
- * Set the logical active state of a mapped interrupt. This doesn't
- * immediately affects the HW state.
- */
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
-{
-	BUG_ON(!map);
-	map->active = active;
-}
-
-/**
  * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
  * @vcpu: The VCPU pointer
  * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
@@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
 			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_enabled,
 							vcpu->vcpu_id, i, 1);
-			if (i < VGIC_NR_PRIVATE_IRQS)
+			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_cfg,
 							vcpu->vcpu_id, i,
 							VGIC_CFG_EDGE);
+			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
+				vgic_bitmap_set_irq_val(&dist->irq_cfg,
+							vcpu->vcpu_id, i,
+							VGIC_CFG_LEVEL);
 		}
 
 		vgic_enable(vcpu);
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 8/8] arm/arm64: KVM: Support edge-triggered forwarded interrupts
  2015-09-04 19:40 ` Christoffer Dall
@ 2015-09-04 19:40   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

We mark edge-triggered interrupts with the HW bit set as queued to
prevent the VGIC code from injecting LRs with both the Active and
Pending bits set at the same time while also setting the HW bit,
because the hardware does not support this.

However, this means that we must also clear the queued flag when we sync
back a LR where the state on the physical distributor went from active
to inactive because the guest deactivated the interrupt.  At this point
we must also check if the interrupt is pending on the distributor, and
tell the VGIC to queue it again if it is.

Since these actions on the sync path are extremely close to those for
level-triggered interrupts, rename process_level_irq to
process_queued_irq, allowing it to cater for both cases.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index f4ea950..5942ce9 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1322,13 +1322,10 @@ epilog:
 	}
 }
 
-static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
+static int process_queued_irq(struct kvm_vcpu *vcpu,
+				   int lr, struct vgic_lr vlr)
 {
-	int level_pending = 0;
-
-	vlr.state = 0;
-	vlr.hwirq = 0;
-	vgic_set_lr(vcpu, lr, vlr);
+	int pending = 0;
 
 	/*
 	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
@@ -1344,26 +1341,35 @@ static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
 
 	/*
-	 * Tell the gic to start sampling the line of this interrupt again.
+	 * Tell the gic to start sampling this interrupt again.
 	 */
 	vgic_irq_clear_queued(vcpu, vlr.irq);
 
 	/* Any additional pending interrupt? */
-	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
-		vgic_cpu_irq_set(vcpu, vlr.irq);
-		level_pending = 1;
+	if (vgic_irq_is_edge(vcpu, vlr.irq)) {
+		BUG_ON(!(vlr.state & LR_HW));
+		pending = vgic_dist_irq_is_pending(vcpu, vlr.irq);
 	} else {
-		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
-		vgic_cpu_irq_clear(vcpu, vlr.irq);
+		if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
+			vgic_cpu_irq_set(vcpu, vlr.irq);
+			pending = 1;
+		} else {
+			vgic_dist_irq_clear_pending(vcpu, vlr.irq);
+			vgic_cpu_irq_clear(vcpu, vlr.irq);
+		}
 	}
 
 	/*
 	 * Despite being EOIed, the LR may not have
 	 * been marked as empty.
 	 */
+	vlr.state = 0;
+	vlr.hwirq = 0;
+	vgic_set_lr(vcpu, lr, vlr);
+
 	vgic_sync_lr_elrsr(vcpu, lr, vlr);
 
-	return level_pending;
+	return pending;
 }
 
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
@@ -1400,7 +1406,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
 
 			spin_lock(&dist->lock);
-			level_pending |= process_level_irq(vcpu, lr, vlr);
+			level_pending |= process_queued_irq(vcpu, lr, vlr);
 			spin_unlock(&dist->lock);
 		}
 	}
@@ -1422,7 +1428,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 /*
  * Save the physical active state, and reset it to inactive.
  *
- * Return true if there's a pending level triggered interrupt line to queue.
+ * Return true if there's a pending forwarded interrupt to queue.
  */
 static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 {
@@ -1456,9 +1462,7 @@ static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 		return false;
 	}
 
-	/* Mapped edge-triggered interrupts not yet supported. */
-	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
-	return process_level_irq(vcpu, lr, vlr);
+	return process_queued_irq(vcpu, lr, vlr);
 }
 
 /* Sync back the VGIC state after a guest run */
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 8/8] arm/arm64: KVM: Support edge-triggered forwarded interrupts
@ 2015-09-04 19:40   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-04 19:40 UTC (permalink / raw)
  To: linux-arm-kernel

We mark edge-triggered interrupts with the HW bit set as queued to
prevent the VGIC code from injecting LRs with both the Active and
Pending bits set at the same time while also setting the HW bit,
because the hardware does not support this.

However, this means that we must also clear the queued flag when we sync
back a LR where the state on the physical distributor went from active
to inactive because the guest deactivated the interrupt.  At this point
we must also check if the interrupt is pending on the distributor, and
tell the VGIC to queue it again if it is.

Since these actions on the sync path are extremely close to those for
level-triggered interrupts, rename process_level_irq to
process_queued_irq, allowing it to cater for both cases.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index f4ea950..5942ce9 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1322,13 +1322,10 @@ epilog:
 	}
 }
 
-static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
+static int process_queued_irq(struct kvm_vcpu *vcpu,
+				   int lr, struct vgic_lr vlr)
 {
-	int level_pending = 0;
-
-	vlr.state = 0;
-	vlr.hwirq = 0;
-	vgic_set_lr(vcpu, lr, vlr);
+	int pending = 0;
 
 	/*
 	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
@@ -1344,26 +1341,35 @@ static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
 
 	/*
-	 * Tell the gic to start sampling the line of this interrupt again.
+	 * Tell the gic to start sampling this interrupt again.
 	 */
 	vgic_irq_clear_queued(vcpu, vlr.irq);
 
 	/* Any additional pending interrupt? */
-	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
-		vgic_cpu_irq_set(vcpu, vlr.irq);
-		level_pending = 1;
+	if (vgic_irq_is_edge(vcpu, vlr.irq)) {
+		BUG_ON(!(vlr.state & LR_HW));
+		pending = vgic_dist_irq_is_pending(vcpu, vlr.irq);
 	} else {
-		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
-		vgic_cpu_irq_clear(vcpu, vlr.irq);
+		if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
+			vgic_cpu_irq_set(vcpu, vlr.irq);
+			pending = 1;
+		} else {
+			vgic_dist_irq_clear_pending(vcpu, vlr.irq);
+			vgic_cpu_irq_clear(vcpu, vlr.irq);
+		}
 	}
 
 	/*
 	 * Despite being EOIed, the LR may not have
 	 * been marked as empty.
 	 */
+	vlr.state = 0;
+	vlr.hwirq = 0;
+	vgic_set_lr(vcpu, lr, vlr);
+
 	vgic_sync_lr_elrsr(vcpu, lr, vlr);
 
-	return level_pending;
+	return pending;
 }
 
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
@@ -1400,7 +1406,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
 
 			spin_lock(&dist->lock);
-			level_pending |= process_level_irq(vcpu, lr, vlr);
+			level_pending |= process_queued_irq(vcpu, lr, vlr);
 			spin_unlock(&dist->lock);
 		}
 	}
@@ -1422,7 +1428,7 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 /*
  * Save the physical active state, and reset it to inactive.
  *
- * Return true if there's a pending level triggered interrupt line to queue.
+ * Return true if there's a pending forwarded interrupt to queue.
  */
 static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 {
@@ -1456,9 +1462,7 @@ static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 		return false;
 	}
 
-	/* Mapped edge-triggered interrupts not yet supported. */
-	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
-	return process_level_irq(vcpu, lr, vlr);
+	return process_queued_irq(vcpu, lr, vlr);
 }
 
 /* Sync back the VGIC state after a guest run */
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-04 19:40   ` Christoffer Dall
@ 2015-09-07 11:25     ` Andre Przywara
  -1 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-07 11:25 UTC (permalink / raw)
  To: Christoffer Dall, Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm

Hi,

firstly: this text is really great, thanks for coming up with that.
See below for some information I got from tracing the host which I
cannot make sense of....


On 04/09/15 20:40, Christoffer Dall wrote:
> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> way we deal with them is not apparently easy to understand by reading
> various specs.
> 
> Therefore, add a proper documentation file explaining the flow and
> rationale of the behavior of the vgic.
> 
> Some of this text was contributed by Marc Zyngier and edited by me.
> Omissions and errors are all mine.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>  1 file changed, 181 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> 
> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> new file mode 100644
> index 0000000..24b6f28
> --- /dev/null
> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> @@ -0,0 +1,181 @@
> +KVM/ARM VGIC Forwarded Physical Interrupts
> +==========================================
> +
> +The KVM/ARM code implements software support for the ARM Generic
> +Interrupt Controller's (GIC's) hardware support for virtualization by
> +allowing software to inject virtual interrupts to a VM, which the guest
> +OS sees as regular interrupts.  The code is famously known as the VGIC.
> +
> +Some of these virtual interrupts, however, correspond to physical
> +interrupts from real physical devices.  One example could be the
> +architected timer, which itself supports virtualization, and therefore
> +lets a guest OS program the hardware device directly to raise an
> +interrupt at some point in time.  When such an interrupt is raised, the
> +host OS initially handles the interrupt and must somehow signal this
> +event as a virtual interrupt to the guest.  Another example could be a
> +passthrough device, where the physical interrupts are initially handled
> +by the host, but the device driver for the device lives in the guest OS
> +and KVM must therefore somehow inject a virtual interrupt on behalf of
> +the physical one to the guest OS.
> +
> +These virtual interrupts corresponding to a physical interrupt on the
> +host are called forwarded physical interrupts, but are also sometimes
> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> +
> +Forwarded physical interrupts are handled slightly differently compared
> +to virtual interrupts generated purely by a software emulated device.
> +
> +
> +The HW bit
> +----------
> +Virtual interrupts are signalled to the guest by programming the List
> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> +with the virtual IRQ number and the state of the interrupt (Pending,
> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> +interrupt, the LR state moves from Pending to Active, and finally to
> +inactive.
> +
> +The LRs include an extra bit, called the HW bit.  When this bit is set,
> +KVM must also program an additional field in the LR, the physical IRQ
> +number, to link the virtual with the physical IRQ.
> +
> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> +bit, never both at the same time.
> +
> +Setting the HW bit causes the hardware to deactivate the physical
> +interrupt on the physical distributor when the guest deactivates the
> +corresponding virtual interrupt.
> +
> +
> +Forwarded Physical Interrupts Life Cycle
> +----------------------------------------
> +
> +The state of forwarded physical interrupts is managed in the following way:
> +
> +  - The physical interrupt is acked by the host, and becomes active on
> +    the physical distributor (*).
> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> +    interface is going to present it to the guest.
> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> +    expected.
> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> +    but the LR.Active is left untouched (set).

I tried hard in the last week, but couldn't confirm this. Tracing shows
the following pattern over and over (case 1):
(This is the kvm/kvm.git:queue branch from last week, so including the
mapped timer IRQ code. Tests were done on Juno and Midway)

...
229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
0xffffffc0004089d8
229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
ELRSR: 1, dist active: 0, log. active: 1
....

My hunch is that the following happens (please correct me if needed!):
First there is an unrelated trap (line 1), then later the guest exits
due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
The host injects the timer IRQ (not shown here) and returns to the
guest. On the next trap (line 3, due to a stage 2 page fault),
vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
GIC actually did deactivate both the LR (state=8, which is inactive,
just the HW bit is still set) _and_ the state on the physical
distributor (dist active=0). This trace_printk is just after entering
the function, so before the code there performs these steps redundantly.
Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
point of view this virtual IRQ cycle is finished.

The other sequence I see is this one (case 2):

....
231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
0xffffffc0004089d8
231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
ELRSR: 0, dist active: 1, log. active: 1
231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
ELRSR: 0, dist active: 0, log. active: 1
...

In line 1 the timer fires, the host injects the timer IRQ into the
guest, which exits again in line 2 due to a page fault (may have IRQs
disabled?). The LR dump in line 3 shows that the timer IRQ is still
pending in the LR (state=9) and active on the physical distributor. Now
the code in vgic_sync_hwirq() clears the active state in the physical
distributor (by calling irq_set_irqchip_state()), but leaves the LR
alone (by returning 0 to the caller).
On the next exit (line 4, due to some HW IRQ?) the LR is still the same
(line 5), only that the physical dist state in now inactive (due to us
clearing that explicitly during the last exit). Now vgic_sync_hwirq()
returns 1, leading to the LR being cleaned up in the caller.
So to me it looks like we kill that IRQ before the guest had the chance
to handle it (presumably because it has IRQs off).

The distribution of those patterns in my particular snapshot are (all
with timer IRQ 27):
 7107  LR.state:  8, ELRSR: 1, dist active: 0, log. active: 1
 1629  LR.state:  9, ELRSR: 0, dist active: 0, log. active: 1
 1629  LR.state:  9, ELRSR: 0, dist active: 1, log. active: 1
  331  LR.state: 10, ELRSR: 0, dist active: 1, log. active: 1
   68  LR.state: 10, ELRSR: 0, dist active: 0, log. active: 1

So for the majority of exits with the timer having been injected before
we redundantly clean the LR (case 1 above). Also there is quite a number
of cases where we "kill" the IRQ (case 2 above). The active state case
(state: 10 in the last two lines) seems to be a variation of case 2,
just with the guest exiting from within the IRQ handler (after
activation, before EOI).

I'd appreciate if someone could shed some light on this and show me
where I am wrong here or what is going on instead.

Cheers,
Andre.

> +  - KVM clears the LR when on VM exits when the physical distributor
> +    active state has been cleared.
> +
> +(*): The host handling is slightly more complicated.  For some devices
> +(shared), KVM directly sets the active state on the physical distributor
> +before entering the guest, and for some devices (non-shared) the host
> +configures the GIC such that it does not deactivate the interrupt on
> +host EOIs, but only performs a priority drop allowing the GIC to receive
> +other interrupts and leaves the interrupt in the active state on the
> +physical distributor.
> +
> +
> +Forwarded Edge and Level Triggered PPIs and SPIs
> +------------------------------------------------
> +Forwarded physical interrupts injected should always be active on the
> +physical distributor when injected to a guest.
> +
> +Level-triggered interrupts will keep the interrupt line to the GIC
> +asserted, typically until the guest programs the device to deassert the
> +line.  This means that the interrupt will remain pending on the physical
> +distributor until the guest has reprogrammed the device.  Since we
> +always run the VM with interrupts enabled on the CPU, a pending
> +interrupt will exit the guest as soon as we switch into the guest,
> +preventing the guest from ever making progress as the process repeats
> +over and over.  Therefore, the active state on the physical distributor
> +must be set when entering the guest, preventing the GIC from forwarding
> +the pending interrupt to the CPU.  As soon as the guest deactivates
> +(EOIs) the interrupt, the physical line is sampled by the hardware again
> +and the host takes a new interrupt if and only if the physical line is
> +still asserted.
> +
> +Edge-triggered interrupts do not exhibit the same problem with
> +preventing guest execution that level-triggered interrupts do.  One
> +option is to not use HW bit at all, and inject edge-triggered interrupts
> +from a physical device as pure virtual interrupts.  But that would
> +potentially slow down handling of the interrupt in the guest, because a
> +physical interrupt occurring in the middle of the guest ISR would
> +preempt the guest for the host to handle the interrupt.  Additionally,
> +if you configure the system to handle interrupts on a separate physical
> +core from that running your VCPU, you still have to interrupt the VCPU
> +to queue the pending state onto the LR, even though the guest won't use
> +this information until the guest ISR completes.  Therefore, the HW
> +bit should always be set for forwarded edge-triggered interrupts.  With
> +the HW bit set, the virtual interrupt is injected and additional
> +physical interrupts occurring before the guest deactivates the interrupt
> +simply mark the state on the physical distributor as Pending+Active.  As
> +soon as the guest deactivates the interrupt, the host takes another
> +interrupt if and only if there was a physical interrupt between
> +injecting the forwarded interrupt to the guest the guest deactivating
> +the interrupt.
> +
> +Consequently, whenever we schedule a VCPU with one or more LRs with the
> +HW bit set, the interrupt must also be active on the physical
> +distributor.
> +
> +
> +Forwarded LPIs
> +--------------
> +LPIs, introduced in GICv3, are always edge-triggered and do not have an
> +active state.  They become pending when a device signal them, and as
> +soon as they are acked by the CPU, they are inactive again.
> +
> +It therefore doesn't make sense, and is not supported, to set the HW bit
> +for physical LPIs that are forwarded to a VM as virtual interrupts,
> +typically virtual SPIs.
> +
> +For LPIs, there is no other choice than to preempt the VCPU thread if
> +necessary, and queue the pending state onto the LR.
> +
> +
> +Putting It Together: The Architected Timer
> +------------------------------------------
> +The architected timer is a device that signals interrupts with level
> +triggered semantics.  The timer hardware is directly accessed by VCPUs
> +which program the timer to fire at some point in time.  Each VCPU on a
> +system programs the timer to fire at different times, and therefore the
> +hardware is multiplexed between multiple VCPUs.  This is implemented by
> +context-switching the timer state along with each VCPU thread.
> +
> +However, this means that a scenario like the following is entirely
> +possible, and in fact, typical:
> +
> +1.  KVM runs the VCPU
> +2.  The guest programs the time to fire in T+100
> +3.  The guest is idle and calls WFI (wait-for-interrupts)
> +4.  The hardware traps to the host
> +5.  KVM stores the timer state to memory and disables the hardware timer
> +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
> +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
> +8.  The soft timer fires, waking up the VCPU thread
> +9.  KVM reprograms the timer hardware with the VCPU's values
> +10. KVM marks the timer interrupt as active on the physical distributor
> +11. KVM injects a forwarded physical interrupt to the guest
> +12. KVM runs the VCPU
> +
> +Notice that KVM injects a forwarded physical interrupt in step 11 without
> +the corresponding interrupt having actually fired on the host.  That is
> +exactly why we mark the timer interrupt as active in step 10, because
> +the active state on the physical distributor is part of the state
> +belonging to the timer hardware, which is context-switched along with
> +the VCPU thread.
> +
> +If the guest does not idle because it is busy, flow looks like this
> +instead:
> +
> +1.  KVM runs the VCPU
> +2.  The guest programs the time to fire in T+100
> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
> +    and injects a forwarded physical interrupt because it concludes the
> +    timer has expired.
> +6.  KVM marks the timer interrupt as active on the physical distributor
> +7.  KVM runs the VCPU
> +
> +Notice that again the forwarded physical interrupt is injected to the
> +guest without having actually been handled on the host.  In this case it
> +is because the physical interrupt is forwarded to the guest before KVM
> +enables physical interrupts on the CPU after exiting the guest.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-07 11:25     ` Andre Przywara
  0 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-07 11:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

firstly: this text is really great, thanks for coming up with that.
See below for some information I got from tracing the host which I
cannot make sense of....


On 04/09/15 20:40, Christoffer Dall wrote:
> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> way we deal with them is not apparently easy to understand by reading
> various specs.
> 
> Therefore, add a proper documentation file explaining the flow and
> rationale of the behavior of the vgic.
> 
> Some of this text was contributed by Marc Zyngier and edited by me.
> Omissions and errors are all mine.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>  1 file changed, 181 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> 
> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> new file mode 100644
> index 0000000..24b6f28
> --- /dev/null
> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> @@ -0,0 +1,181 @@
> +KVM/ARM VGIC Forwarded Physical Interrupts
> +==========================================
> +
> +The KVM/ARM code implements software support for the ARM Generic
> +Interrupt Controller's (GIC's) hardware support for virtualization by
> +allowing software to inject virtual interrupts to a VM, which the guest
> +OS sees as regular interrupts.  The code is famously known as the VGIC.
> +
> +Some of these virtual interrupts, however, correspond to physical
> +interrupts from real physical devices.  One example could be the
> +architected timer, which itself supports virtualization, and therefore
> +lets a guest OS program the hardware device directly to raise an
> +interrupt at some point in time.  When such an interrupt is raised, the
> +host OS initially handles the interrupt and must somehow signal this
> +event as a virtual interrupt to the guest.  Another example could be a
> +passthrough device, where the physical interrupts are initially handled
> +by the host, but the device driver for the device lives in the guest OS
> +and KVM must therefore somehow inject a virtual interrupt on behalf of
> +the physical one to the guest OS.
> +
> +These virtual interrupts corresponding to a physical interrupt on the
> +host are called forwarded physical interrupts, but are also sometimes
> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> +
> +Forwarded physical interrupts are handled slightly differently compared
> +to virtual interrupts generated purely by a software emulated device.
> +
> +
> +The HW bit
> +----------
> +Virtual interrupts are signalled to the guest by programming the List
> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> +with the virtual IRQ number and the state of the interrupt (Pending,
> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> +interrupt, the LR state moves from Pending to Active, and finally to
> +inactive.
> +
> +The LRs include an extra bit, called the HW bit.  When this bit is set,
> +KVM must also program an additional field in the LR, the physical IRQ
> +number, to link the virtual with the physical IRQ.
> +
> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> +bit, never both at the same time.
> +
> +Setting the HW bit causes the hardware to deactivate the physical
> +interrupt on the physical distributor when the guest deactivates the
> +corresponding virtual interrupt.
> +
> +
> +Forwarded Physical Interrupts Life Cycle
> +----------------------------------------
> +
> +The state of forwarded physical interrupts is managed in the following way:
> +
> +  - The physical interrupt is acked by the host, and becomes active on
> +    the physical distributor (*).
> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> +    interface is going to present it to the guest.
> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> +    expected.
> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> +    but the LR.Active is left untouched (set).

I tried hard in the last week, but couldn't confirm this. Tracing shows
the following pattern over and over (case 1):
(This is the kvm/kvm.git:queue branch from last week, so including the
mapped timer IRQ code. Tests were done on Juno and Midway)

...
229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
0xffffffc0004089d8
229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
ELRSR: 1, dist active: 0, log. active: 1
....

My hunch is that the following happens (please correct me if needed!):
First there is an unrelated trap (line 1), then later the guest exits
due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
The host injects the timer IRQ (not shown here) and returns to the
guest. On the next trap (line 3, due to a stage 2 page fault),
vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
GIC actually did deactivate both the LR (state=8, which is inactive,
just the HW bit is still set) _and_ the state on the physical
distributor (dist active=0). This trace_printk is just after entering
the function, so before the code there performs these steps redundantly.
Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
point of view this virtual IRQ cycle is finished.

The other sequence I see is this one (case 2):

....
231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
0xffffffc0004089d8
231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
ELRSR: 0, dist active: 1, log. active: 1
231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
ELRSR: 0, dist active: 0, log. active: 1
...

In line 1 the timer fires, the host injects the timer IRQ into the
guest, which exits again in line 2 due to a page fault (may have IRQs
disabled?). The LR dump in line 3 shows that the timer IRQ is still
pending in the LR (state=9) and active on the physical distributor. Now
the code in vgic_sync_hwirq() clears the active state in the physical
distributor (by calling irq_set_irqchip_state()), but leaves the LR
alone (by returning 0 to the caller).
On the next exit (line 4, due to some HW IRQ?) the LR is still the same
(line 5), only that the physical dist state in now inactive (due to us
clearing that explicitly during the last exit). Now vgic_sync_hwirq()
returns 1, leading to the LR being cleaned up in the caller.
So to me it looks like we kill that IRQ before the guest had the chance
to handle it (presumably because it has IRQs off).

The distribution of those patterns in my particular snapshot are (all
with timer IRQ 27):
 7107  LR.state:  8, ELRSR: 1, dist active: 0, log. active: 1
 1629  LR.state:  9, ELRSR: 0, dist active: 0, log. active: 1
 1629  LR.state:  9, ELRSR: 0, dist active: 1, log. active: 1
  331  LR.state: 10, ELRSR: 0, dist active: 1, log. active: 1
   68  LR.state: 10, ELRSR: 0, dist active: 0, log. active: 1

So for the majority of exits with the timer having been injected before
we redundantly clean the LR (case 1 above). Also there is quite a number
of cases where we "kill" the IRQ (case 2 above). The active state case
(state: 10 in the last two lines) seems to be a variation of case 2,
just with the guest exiting from within the IRQ handler (after
activation, before EOI).

I'd appreciate if someone could shed some light on this and show me
where I am wrong here or what is going on instead.

Cheers,
Andre.

> +  - KVM clears the LR when on VM exits when the physical distributor
> +    active state has been cleared.
> +
> +(*): The host handling is slightly more complicated.  For some devices
> +(shared), KVM directly sets the active state on the physical distributor
> +before entering the guest, and for some devices (non-shared) the host
> +configures the GIC such that it does not deactivate the interrupt on
> +host EOIs, but only performs a priority drop allowing the GIC to receive
> +other interrupts and leaves the interrupt in the active state on the
> +physical distributor.
> +
> +
> +Forwarded Edge and Level Triggered PPIs and SPIs
> +------------------------------------------------
> +Forwarded physical interrupts injected should always be active on the
> +physical distributor when injected to a guest.
> +
> +Level-triggered interrupts will keep the interrupt line to the GIC
> +asserted, typically until the guest programs the device to deassert the
> +line.  This means that the interrupt will remain pending on the physical
> +distributor until the guest has reprogrammed the device.  Since we
> +always run the VM with interrupts enabled on the CPU, a pending
> +interrupt will exit the guest as soon as we switch into the guest,
> +preventing the guest from ever making progress as the process repeats
> +over and over.  Therefore, the active state on the physical distributor
> +must be set when entering the guest, preventing the GIC from forwarding
> +the pending interrupt to the CPU.  As soon as the guest deactivates
> +(EOIs) the interrupt, the physical line is sampled by the hardware again
> +and the host takes a new interrupt if and only if the physical line is
> +still asserted.
> +
> +Edge-triggered interrupts do not exhibit the same problem with
> +preventing guest execution that level-triggered interrupts do.  One
> +option is to not use HW bit at all, and inject edge-triggered interrupts
> +from a physical device as pure virtual interrupts.  But that would
> +potentially slow down handling of the interrupt in the guest, because a
> +physical interrupt occurring in the middle of the guest ISR would
> +preempt the guest for the host to handle the interrupt.  Additionally,
> +if you configure the system to handle interrupts on a separate physical
> +core from that running your VCPU, you still have to interrupt the VCPU
> +to queue the pending state onto the LR, even though the guest won't use
> +this information until the guest ISR completes.  Therefore, the HW
> +bit should always be set for forwarded edge-triggered interrupts.  With
> +the HW bit set, the virtual interrupt is injected and additional
> +physical interrupts occurring before the guest deactivates the interrupt
> +simply mark the state on the physical distributor as Pending+Active.  As
> +soon as the guest deactivates the interrupt, the host takes another
> +interrupt if and only if there was a physical interrupt between
> +injecting the forwarded interrupt to the guest the guest deactivating
> +the interrupt.
> +
> +Consequently, whenever we schedule a VCPU with one or more LRs with the
> +HW bit set, the interrupt must also be active on the physical
> +distributor.
> +
> +
> +Forwarded LPIs
> +--------------
> +LPIs, introduced in GICv3, are always edge-triggered and do not have an
> +active state.  They become pending when a device signal them, and as
> +soon as they are acked by the CPU, they are inactive again.
> +
> +It therefore doesn't make sense, and is not supported, to set the HW bit
> +for physical LPIs that are forwarded to a VM as virtual interrupts,
> +typically virtual SPIs.
> +
> +For LPIs, there is no other choice than to preempt the VCPU thread if
> +necessary, and queue the pending state onto the LR.
> +
> +
> +Putting It Together: The Architected Timer
> +------------------------------------------
> +The architected timer is a device that signals interrupts with level
> +triggered semantics.  The timer hardware is directly accessed by VCPUs
> +which program the timer to fire at some point in time.  Each VCPU on a
> +system programs the timer to fire at different times, and therefore the
> +hardware is multiplexed between multiple VCPUs.  This is implemented by
> +context-switching the timer state along with each VCPU thread.
> +
> +However, this means that a scenario like the following is entirely
> +possible, and in fact, typical:
> +
> +1.  KVM runs the VCPU
> +2.  The guest programs the time to fire in T+100
> +3.  The guest is idle and calls WFI (wait-for-interrupts)
> +4.  The hardware traps to the host
> +5.  KVM stores the timer state to memory and disables the hardware timer
> +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
> +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
> +8.  The soft timer fires, waking up the VCPU thread
> +9.  KVM reprograms the timer hardware with the VCPU's values
> +10. KVM marks the timer interrupt as active on the physical distributor
> +11. KVM injects a forwarded physical interrupt to the guest
> +12. KVM runs the VCPU
> +
> +Notice that KVM injects a forwarded physical interrupt in step 11 without
> +the corresponding interrupt having actually fired on the host.  That is
> +exactly why we mark the timer interrupt as active in step 10, because
> +the active state on the physical distributor is part of the state
> +belonging to the timer hardware, which is context-switched along with
> +the VCPU thread.
> +
> +If the guest does not idle because it is busy, flow looks like this
> +instead:
> +
> +1.  KVM runs the VCPU
> +2.  The guest programs the time to fire in T+100
> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
> +    and injects a forwarded physical interrupt because it concludes the
> +    timer has expired.
> +6.  KVM marks the timer interrupt as active on the physical distributor
> +7.  KVM runs the VCPU
> +
> +Notice that again the forwarded physical interrupt is injected to the
> +guest without having actually been handled on the host.  In this case it
> +is because the physical interrupt is forwarded to the guest before KVM
> +enables physical interrupts on the CPU after exiting the guest.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 2/8] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  2015-09-04 19:40   ` Christoffer Dall
@ 2015-09-07 15:01     ` Eric Auger
  -1 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-07 15:01 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

Hi Christoffer,
On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> We currently schedule a soft timer every time we exit the guest if the
> timer did not expire while running the guest.  This is really not
> necessary, because the only work we do in the timer work function is to
> kick the vcpu.
> 
> Kicking the vcpu does two things:
> (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> from the waitqueue.
> (2) If the vcpu is running on a different physical CPU from the one
> doing the kick, it sends a reschedule IPI.
> 
> The second case cannot happen, because the soft timer is only ever
> scheduled when the vcpu is not running.  The first case is only relevant
> when the vcpu thread is on a waitqueue, which is only the case when the
> vcpu thread has called kvm_vcpu_block().
> 
> Therefore, we only need to make sure a timer is scheduled for
> kvm_vcpu_block(), which we do by encapsulating all calls to
> kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> 
> Additionally, we only schedule a soft timer if the timer is enabled and
> unmasked, since it is useless otherwise.
> 
> Note that theoretically userspace can use the SET_ONE_REG interface to
> change registers that should cause the timer to fire, even if the vcpu
> is blocked without a scheduled timer, but this case was not supported
> before this patch and we leave it for future work for now.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_host.h   |  3 --
>  arch/arm/kvm/arm.c                | 10 +++++
>  arch/arm64/include/asm/kvm_host.h |  3 --
>  include/kvm/arm_arch_timer.h      |  2 +
>  virt/kvm/arm/arch_timer.c         | 91 ++++++++++++++++++++++++++-------------
>  5 files changed, 72 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 86fcf6e..dcba0fa 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index ce404a5..bdf8871 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  	return kvm_timer_should_fire(vcpu);
>  }
>  
> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> +{
> +	kvm_timer_schedule(vcpu);
> +}
> +
> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> +{
> +	kvm_timer_unschedule(vcpu);
> +}
> +
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  {
>  	/* Force users to call KVM_ARM_VCPU_INIT */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index dd143f5..415938d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index e1e4d7c..ef14cc1 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 48c6e1a..7991537 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
>  	return HRTIMER_NORESTART;
>  }
>  
> +static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> +		!kvm_vgic_get_phys_irq_active(timer->map);
kvm_vgic_get_phys_irq_active(timer->map) checks a logical state and not
the actual HW state. What is the exact aim of that check? in case the
PPI already is active, ie. timer hit, no use to schedule anything?

> +}
> +
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	cycle_t cval, now;
>  
> -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> -	    kvm_vgic_get_phys_irq_active(timer->map))
> +	if (!kvm_timer_irq_can_fire(vcpu))
>  		return false;
>  
>  	cval = timer->cntv_cval;
> @@ -127,24 +134,61 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> -/**
> - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> - * @vcpu: The vcpu pointer
> - *
> - * Disarm any pending soft timers, since the world-switch code will write the
> - * virtual timer state back to the physical CPU.
> +/*
> + * Schedule the background timer before calling kvm_vcpu_block, so that this
> + * thread is removed from its waitqueue and made runnable when there's a timer
> + * interrupt to handle.
>   */
> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	u64 ns;
> +	cycle_t cval, now;
> +
> +	BUG_ON(timer_is_armed(timer));
> +
> +	/*
> +	 * No need to schedule a background timer if the guest timer has
> +	 * already expired, because kvm_vcpu_block will return before putting
> +	 * the thread to sleep.
> +	 */
> +	if (kvm_timer_should_fire(vcpu))
> +		return;
>  
>  	/*
> -	 * We're about to run this vcpu again, so there is no need to
> -	 * keep the background timer running, as we're about to
> -	 * populate the CPU timer again.
> +	 * If the timer is either not capable of raising interrupts (disabled
> +	 * or masked) or if we already have a background timer, then there's
> +	 * no more work for us to do.
I don't understand the comment about "if we already have a background
timer", related to the above comment...
>  	 */
> +	if (!kvm_timer_irq_can_fire(vcpu))
> +		return;
> +
> +	/*  The timer has not yet expired, schedule a background timer */
> +	cval = timer->cntv_cval;
> +	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> +
> +	ns = cyclecounter_cyc2ns(timecounter->cc,
> +				 cval - now,
> +				 timecounter->mask,
> +				 &timecounter->frac);
> +	timer_arm(timer, ns);
> +}
> +
> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	timer_disarm(timer);
> +}
>  
> +/**
> + * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> + * @vcpu: The vcpu pointer
> + *
> + * Check if the virtual timer has expired while we were running in the host,
> + * and inject an interrupt if that was the case.
> + */
> +void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> +{
>  	/*
>  	 * If the timer expired while we were not scheduled, now is the time
>  	 * to inject it.
above comment seems duplicated now?
> @@ -157,32 +201,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>   * kvm_timer_sync_hwstate - sync timer state from cpu
>   * @vcpu: The vcpu pointer
>   *
> - * Check if the virtual timer was armed and either schedule a corresponding
> - * soft timer or inject directly if already expired.
> + * Check if the virtual timer has expired while we were running in the guest,
> + * and inject an interrupt if that was the case.
>   */
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	cycle_t cval, now;
> -	u64 ns;
>  
>  	BUG_ON(timer_is_armed(timer));
>  
> -	if (kvm_timer_should_fire(vcpu)) {
> -		/*
> -		 * Timer has already expired while we were not
> -		 * looking. Inject the interrupt and carry on.
> -		 */
> +	if (kvm_timer_should_fire(vcpu))
>  		kvm_timer_inject_irq(vcpu);
> -		return;
> -	}
> -
> -	cval = timer->cntv_cval;
> -	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> -
> -	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
> -				 &timecounter->frac);
> -	timer_arm(timer, ns);
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 2/8] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
@ 2015-09-07 15:01     ` Eric Auger
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-07 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,
On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> We currently schedule a soft timer every time we exit the guest if the
> timer did not expire while running the guest.  This is really not
> necessary, because the only work we do in the timer work function is to
> kick the vcpu.
> 
> Kicking the vcpu does two things:
> (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> from the waitqueue.
> (2) If the vcpu is running on a different physical CPU from the one
> doing the kick, it sends a reschedule IPI.
> 
> The second case cannot happen, because the soft timer is only ever
> scheduled when the vcpu is not running.  The first case is only relevant
> when the vcpu thread is on a waitqueue, which is only the case when the
> vcpu thread has called kvm_vcpu_block().
> 
> Therefore, we only need to make sure a timer is scheduled for
> kvm_vcpu_block(), which we do by encapsulating all calls to
> kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> 
> Additionally, we only schedule a soft timer if the timer is enabled and
> unmasked, since it is useless otherwise.
> 
> Note that theoretically userspace can use the SET_ONE_REG interface to
> change registers that should cause the timer to fire, even if the vcpu
> is blocked without a scheduled timer, but this case was not supported
> before this patch and we leave it for future work for now.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_host.h   |  3 --
>  arch/arm/kvm/arm.c                | 10 +++++
>  arch/arm64/include/asm/kvm_host.h |  3 --
>  include/kvm/arm_arch_timer.h      |  2 +
>  virt/kvm/arm/arch_timer.c         | 91 ++++++++++++++++++++++++++-------------
>  5 files changed, 72 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 86fcf6e..dcba0fa 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index ce404a5..bdf8871 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  	return kvm_timer_should_fire(vcpu);
>  }
>  
> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> +{
> +	kvm_timer_schedule(vcpu);
> +}
> +
> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> +{
> +	kvm_timer_unschedule(vcpu);
> +}
> +
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  {
>  	/* Force users to call KVM_ARM_VCPU_INIT */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index dd143f5..415938d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index e1e4d7c..ef14cc1 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 48c6e1a..7991537 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
>  	return HRTIMER_NORESTART;
>  }
>  
> +static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> +		!kvm_vgic_get_phys_irq_active(timer->map);
kvm_vgic_get_phys_irq_active(timer->map) checks a logical state and not
the actual HW state. What is the exact aim of that check? in case the
PPI already is active, ie. timer hit, no use to schedule anything?

> +}
> +
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	cycle_t cval, now;
>  
> -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> -	    kvm_vgic_get_phys_irq_active(timer->map))
> +	if (!kvm_timer_irq_can_fire(vcpu))
>  		return false;
>  
>  	cval = timer->cntv_cval;
> @@ -127,24 +134,61 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> -/**
> - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> - * @vcpu: The vcpu pointer
> - *
> - * Disarm any pending soft timers, since the world-switch code will write the
> - * virtual timer state back to the physical CPU.
> +/*
> + * Schedule the background timer before calling kvm_vcpu_block, so that this
> + * thread is removed from its waitqueue and made runnable when there's a timer
> + * interrupt to handle.
>   */
> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	u64 ns;
> +	cycle_t cval, now;
> +
> +	BUG_ON(timer_is_armed(timer));
> +
> +	/*
> +	 * No need to schedule a background timer if the guest timer has
> +	 * already expired, because kvm_vcpu_block will return before putting
> +	 * the thread to sleep.
> +	 */
> +	if (kvm_timer_should_fire(vcpu))
> +		return;
>  
>  	/*
> -	 * We're about to run this vcpu again, so there is no need to
> -	 * keep the background timer running, as we're about to
> -	 * populate the CPU timer again.
> +	 * If the timer is either not capable of raising interrupts (disabled
> +	 * or masked) or if we already have a background timer, then there's
> +	 * no more work for us to do.
I don't understand the comment about "if we already have a background
timer", related to the above comment...
>  	 */
> +	if (!kvm_timer_irq_can_fire(vcpu))
> +		return;
> +
> +	/*  The timer has not yet expired, schedule a background timer */
> +	cval = timer->cntv_cval;
> +	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> +
> +	ns = cyclecounter_cyc2ns(timecounter->cc,
> +				 cval - now,
> +				 timecounter->mask,
> +				 &timecounter->frac);
> +	timer_arm(timer, ns);
> +}
> +
> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	timer_disarm(timer);
> +}
>  
> +/**
> + * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> + * @vcpu: The vcpu pointer
> + *
> + * Check if the virtual timer has expired while we were running in the host,
> + * and inject an interrupt if that was the case.
> + */
> +void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> +{
>  	/*
>  	 * If the timer expired while we were not scheduled, now is the time
>  	 * to inject it.
above comment seems duplicated now?
> @@ -157,32 +201,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>   * kvm_timer_sync_hwstate - sync timer state from cpu
>   * @vcpu: The vcpu pointer
>   *
> - * Check if the virtual timer was armed and either schedule a corresponding
> - * soft timer or inject directly if already expired.
> + * Check if the virtual timer has expired while we were running in the guest,
> + * and inject an interrupt if that was the case.
>   */
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	cycle_t cval, now;
> -	u64 ns;
>  
>  	BUG_ON(timer_is_armed(timer));
>  
> -	if (kvm_timer_should_fire(vcpu)) {
> -		/*
> -		 * Timer has already expired while we were not
> -		 * looking. Inject the interrupt and carry on.
> -		 */
> +	if (kvm_timer_should_fire(vcpu))
>  		kvm_timer_inject_irq(vcpu);
> -		return;
> -	}
> -
> -	cval = timer->cntv_cval;
> -	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> -
> -	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
> -				 &timecounter->frac);
> -	timer_arm(timer, ns);
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 3/8] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  2015-09-04 19:40   ` Christoffer Dall
@ 2015-09-07 15:32     ` Eric Auger
  -1 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-07 15:32 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm



On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> Currently vgic_process_maintenance() processes dealing with a completed
> level-triggered interrupt directly, but we are soon going to reuse this
> logic for level-triggered mapped interrupts with the HW bit set, so
> move this logic into a separate static function.
> 
> Probably the most scary part of this commit is convincing yourself that
> the current flow is safe compared to the old one.  In the following I
> try to list the changes and why they are harmless:
> 
>   Move vgic_irq_clear_queued after kvm_notify_acked_irq:
>     Harmless because the effect of clearing the queued flag wrt.
>     kvm_set_irq is only that vgic_update_irq_pending does not set the
>     pending bit on the emulated CPU interface or in the pending_on_cpu
>     bitmask,
well actually the notifier calls vgic_update_irq_pending with level ==0
so it does not reach the can_sample.
 but we set this in __kvm_vgic_sync_hwstate later on if the
>     level is stil high.
still

Reviewed-by: Eric Auger <eric.auger@linaro.org>

Eric
> 
>   Move vgic_set_lr before kvm_notify_acked_irq:
>     Also, harmless because the LR are cpu-local operations and
>     kvm_notify_acked only affects the dist
> 
>   Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
>     Also harmless because it's just a bit which is cleared and altering
>     the line state does not affect this bit.
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 88 ++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 50 insertions(+), 38 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 6bd1c9b..fe0e5db 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1322,12 +1322,56 @@ epilog:
>  	}
>  }
>  
> +static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> +{
> +	int level_pending = 0;
> +
> +	vlr.state = 0;
> +	vlr.hwirq = 0;
> +	vgic_set_lr(vcpu, lr, vlr);
> +
> +	/*
> +	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
> +	 * went from active to non-active (called from vgic_sync_hwirq) it was
> +	 * also ACKed and we we therefore assume we can clear the soft pending
> +	 * state (should it had been set) for this interrupt.
> +	 *
> +	 * Note: if the IRQ soft pending state was set after the IRQ was
> +	 * acked, it actually shouldn't be cleared, but we have no way of
> +	 * knowing that unless we start trapping ACKs when the soft-pending
> +	 * state is set.
> +	 */
> +	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
> +
> +	/*
> +	 * Tell the gic to start sampling the line of this interrupt again.
> +	 */
> +	vgic_irq_clear_queued(vcpu, vlr.irq);
> +
> +	/* Any additional pending interrupt? */
> +	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> +		vgic_cpu_irq_set(vcpu, vlr.irq);
> +		level_pending = 1;
> +	} else {
> +		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> +		vgic_cpu_irq_clear(vcpu, vlr.irq);
> +	}
> +
> +	/*
> +	 * Despite being EOIed, the LR may not have
> +	 * been marked as empty.
> +	 */
> +	vgic_sync_lr_elrsr(vcpu, lr, vlr);
> +
> +	return level_pending;
> +}
> +
>  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  {
>  	u32 status = vgic_get_interrupt_status(vcpu);
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> -	bool level_pending = false;
>  	struct kvm *kvm = vcpu->kvm;
> +	int level_pending = 0;
>  
>  	kvm_debug("STATUS = %08x\n", status);
>  
> @@ -1342,54 +1386,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  
>  		for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
>  			struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
> -			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
>  
> -			spin_lock(&dist->lock);
> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> +			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
>  			WARN_ON(vlr.state & LR_STATE_MASK);
> -			vlr.state = 0;
> -			vgic_set_lr(vcpu, lr, vlr);
>  
> -			/*
> -			 * If the IRQ was EOIed it was also ACKed and we we
> -			 * therefore assume we can clear the soft pending
> -			 * state (should it had been set) for this interrupt.
> -			 *
> -			 * Note: if the IRQ soft pending state was set after
> -			 * the IRQ was acked, it actually shouldn't be
> -			 * cleared, but we have no way of knowing that unless
> -			 * we start trapping ACKs when the soft-pending state
> -			 * is set.
> -			 */
> -			vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
>  
>  			/*
>  			 * kvm_notify_acked_irq calls kvm_set_irq()
> -			 * to reset the IRQ level. Need to release the
> -			 * lock for kvm_set_irq to grab it.
> +			 * to reset the IRQ level, which grabs the dist->lock
> +			 * so we call this before taking the dist->lock.
>  			 */
> -			spin_unlock(&dist->lock);
> -
>  			kvm_notify_acked_irq(kvm, 0,
>  					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
> -			spin_lock(&dist->lock);
> -
> -			/* Any additional pending interrupt? */
> -			if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> -				vgic_cpu_irq_set(vcpu, vlr.irq);
> -				level_pending = true;
> -			} else {
> -				vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> -				vgic_cpu_irq_clear(vcpu, vlr.irq);
> -			}
>  
> +			spin_lock(&dist->lock);
> +			level_pending |= process_level_irq(vcpu, lr, vlr);
>  			spin_unlock(&dist->lock);
> -
> -			/*
> -			 * Despite being EOIed, the LR may not have
> -			 * been marked as empty.
> -			 */
> -			vgic_sync_lr_elrsr(vcpu, lr, vlr);
>  		}
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 3/8] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
@ 2015-09-07 15:32     ` Eric Auger
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-07 15:32 UTC (permalink / raw)
  To: linux-arm-kernel



On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> Currently vgic_process_maintenance() processes dealing with a completed
> level-triggered interrupt directly, but we are soon going to reuse this
> logic for level-triggered mapped interrupts with the HW bit set, so
> move this logic into a separate static function.
> 
> Probably the most scary part of this commit is convincing yourself that
> the current flow is safe compared to the old one.  In the following I
> try to list the changes and why they are harmless:
> 
>   Move vgic_irq_clear_queued after kvm_notify_acked_irq:
>     Harmless because the effect of clearing the queued flag wrt.
>     kvm_set_irq is only that vgic_update_irq_pending does not set the
>     pending bit on the emulated CPU interface or in the pending_on_cpu
>     bitmask,
well actually the notifier calls vgic_update_irq_pending with level ==0
so it does not reach the can_sample.
 but we set this in __kvm_vgic_sync_hwstate later on if the
>     level is stil high.
still

Reviewed-by: Eric Auger <eric.auger@linaro.org>

Eric
> 
>   Move vgic_set_lr before kvm_notify_acked_irq:
>     Also, harmless because the LR are cpu-local operations and
>     kvm_notify_acked only affects the dist
> 
>   Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
>     Also harmless because it's just a bit which is cleared and altering
>     the line state does not affect this bit.
> 
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 88 ++++++++++++++++++++++++++++++-----------------------
>  1 file changed, 50 insertions(+), 38 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 6bd1c9b..fe0e5db 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1322,12 +1322,56 @@ epilog:
>  	}
>  }
>  
> +static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> +{
> +	int level_pending = 0;
> +
> +	vlr.state = 0;
> +	vlr.hwirq = 0;
> +	vgic_set_lr(vcpu, lr, vlr);
> +
> +	/*
> +	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
> +	 * went from active to non-active (called from vgic_sync_hwirq) it was
> +	 * also ACKed and we we therefore assume we can clear the soft pending
> +	 * state (should it had been set) for this interrupt.
> +	 *
> +	 * Note: if the IRQ soft pending state was set after the IRQ was
> +	 * acked, it actually shouldn't be cleared, but we have no way of
> +	 * knowing that unless we start trapping ACKs when the soft-pending
> +	 * state is set.
> +	 */
> +	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
> +
> +	/*
> +	 * Tell the gic to start sampling the line of this interrupt again.
> +	 */
> +	vgic_irq_clear_queued(vcpu, vlr.irq);
> +
> +	/* Any additional pending interrupt? */
> +	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> +		vgic_cpu_irq_set(vcpu, vlr.irq);
> +		level_pending = 1;
> +	} else {
> +		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> +		vgic_cpu_irq_clear(vcpu, vlr.irq);
> +	}
> +
> +	/*
> +	 * Despite being EOIed, the LR may not have
> +	 * been marked as empty.
> +	 */
> +	vgic_sync_lr_elrsr(vcpu, lr, vlr);
> +
> +	return level_pending;
> +}
> +
>  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  {
>  	u32 status = vgic_get_interrupt_status(vcpu);
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> -	bool level_pending = false;
>  	struct kvm *kvm = vcpu->kvm;
> +	int level_pending = 0;
>  
>  	kvm_debug("STATUS = %08x\n", status);
>  
> @@ -1342,54 +1386,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  
>  		for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
>  			struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
> -			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
>  
> -			spin_lock(&dist->lock);
> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> +			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
>  			WARN_ON(vlr.state & LR_STATE_MASK);
> -			vlr.state = 0;
> -			vgic_set_lr(vcpu, lr, vlr);
>  
> -			/*
> -			 * If the IRQ was EOIed it was also ACKed and we we
> -			 * therefore assume we can clear the soft pending
> -			 * state (should it had been set) for this interrupt.
> -			 *
> -			 * Note: if the IRQ soft pending state was set after
> -			 * the IRQ was acked, it actually shouldn't be
> -			 * cleared, but we have no way of knowing that unless
> -			 * we start trapping ACKs when the soft-pending state
> -			 * is set.
> -			 */
> -			vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
>  
>  			/*
>  			 * kvm_notify_acked_irq calls kvm_set_irq()
> -			 * to reset the IRQ level. Need to release the
> -			 * lock for kvm_set_irq to grab it.
> +			 * to reset the IRQ level, which grabs the dist->lock
> +			 * so we call this before taking the dist->lock.
>  			 */
> -			spin_unlock(&dist->lock);
> -
>  			kvm_notify_acked_irq(kvm, 0,
>  					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
> -			spin_lock(&dist->lock);
> -
> -			/* Any additional pending interrupt? */
> -			if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> -				vgic_cpu_irq_set(vcpu, vlr.irq);
> -				level_pending = true;
> -			} else {
> -				vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> -				vgic_cpu_irq_clear(vcpu, vlr.irq);
> -			}
>  
> +			spin_lock(&dist->lock);
> +			level_pending |= process_level_irq(vcpu, lr, vlr);
>  			spin_unlock(&dist->lock);
> -
> -			/*
> -			 * Despite being EOIed, the LR may not have
> -			 * been marked as empty.
> -			 */
> -			vgic_sync_lr_elrsr(vcpu, lr, vlr);
>  		}
>  	}
>  
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-04 19:40   ` Christoffer Dall
@ 2015-09-07 16:45     ` Eric Auger
  -1 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-07 16:45 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

Hi Christoffer,
On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> way we deal with them is not apparently easy to understand by reading
> various specs.
> 
> Therefore, add a proper documentation file explaining the flow and
> rationale of the behavior of the vgic.
> 
> Some of this text was contributed by Marc Zyngier and edited by me.
> Omissions and errors are all mine.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>  1 file changed, 181 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> 
> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> new file mode 100644
> index 0000000..24b6f28
> --- /dev/null
> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> @@ -0,0 +1,181 @@
> +KVM/ARM VGIC Forwarded Physical Interrupts
> +==========================================
> +
> +The KVM/ARM code implements software support for the ARM Generic
> +Interrupt Controller's (GIC's) hardware support for virtualization by
> +allowing software to inject virtual interrupts to a VM, which the guest
> +OS sees as regular interrupts.  The code is famously known as the VGIC.
> +
> +Some of these virtual interrupts, however, correspond to physical
> +interrupts from real physical devices.  One example could be the
> +architected timer, which itself supports virtualization, and therefore
> +lets a guest OS program the hardware device directly to raise an
> +interrupt at some point in time.  When such an interrupt is raised, the
> +host OS initially handles the interrupt and must somehow signal this
> +event as a virtual interrupt to the guest.  Another example could be a
> +passthrough device, where the physical interrupts are initially handled
> +by the host, but the device driver for the device lives in the guest OS
> +and KVM must therefore somehow inject a virtual interrupt on behalf of
> +the physical one to the guest OS.
> +
> +These virtual interrupts corresponding to a physical interrupt on the
> +host are called forwarded physical interrupts, but are also sometimes
> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> +
> +Forwarded physical interrupts are handled slightly differently compared
> +to virtual interrupts generated purely by a software emulated device.
> +
> +
> +The HW bit
> +----------
> +Virtual interrupts are signalled to the guest by programming the List
> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> +with the virtual IRQ number and the state of the interrupt (Pending,
> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> +interrupt, the LR state moves from Pending to Active, and finally to
> +inactive.
> +
> +The LRs include an extra bit, called the HW bit.  When this bit is set,
> +KVM must also program an additional field in the LR, the physical IRQ
> +number, to link the virtual with the physical IRQ.
> +
> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> +bit, never both at the same time.
> +
> +Setting the HW bit causes the hardware to deactivate the physical
> +interrupt on the physical distributor when the guest deactivates the
> +corresponding virtual interrupt.
> +
> +
> +Forwarded Physical Interrupts Life Cycle
> +----------------------------------------
> +
> +The state of forwarded physical interrupts is managed in the following way:
> +
> +  - The physical interrupt is acked by the host, and becomes active on
> +    the physical distributor (*).
> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> +    interface is going to present it to the guest.
> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> +    expected.
> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> +    but the LR.Active is left untouched (set).
> +  - KVM clears the LR when on VM exits when the physical distributor
s/when//?
> +    active state has been cleared.
> +
> +(*): The host handling is slightly more complicated.  For some devices
> +(shared), KVM directly sets the active state on the physical distributor
> +before entering the guest, and for some devices (non-shared) the host
> +configures the GIC such that it does not deactivate the interrupt on
> +host EOIs, but only performs a priority drop allowing the GIC to receive
> +other interrupts and leaves the interrupt in the active state on the
> +physical distributor.
EOIMode == 1 is set globally and impacts all forwarded SPI/PPIs, shared
or not shared I think. reading the above lines I have the impression
this is a per-device programming.

My understanding is for the timer it is needed to manually set the
physical distributor state because 1) the HW (GIC) does not do it and 2)
we need to context switch depending on the vCPU. For non shared device
the GIC sets the physical distributor state and the state is fully
maintained by HW until the guest deactivation.

> +
> +
> +Forwarded Edge and Level Triggered PPIs and SPIs
> +------------------------------------------------
> +Forwarded physical interrupts injected should always be active on the
> +physical distributor when injected to a guest.
> +
> +Level-triggered interrupts will keep the interrupt line to the GIC
> +asserted, typically until the guest programs the device to deassert the
> +line.  This means that the interrupt will remain pending on the physical
> +distributor until the guest has reprogrammed the device.  Since we
> +always run the VM with interrupts enabled on the CPU, a pending
> +interrupt will exit the guest as soon as we switch into the guest,
> +preventing the guest from ever making progress as the process repeats
> +over and over.  Therefore, the active state on the physical distributor
> +must be set when entering the guest, preventing the GIC from forwarding
> +the pending interrupt to the CPU.  As soon as the guest deactivates
> +(EOIs) the interrupt, the physical line is sampled by the hardware again
I think you can remove "(EOI)". This depends on EOI mode setting on
guest side. it can be 2-in-1 EOI or EOI+DIR.
> +and the host takes a new interrupt if and only if the physical line is
> +still asserted.
> +
> +Edge-triggered interrupts do not exhibit the same problem with
> +preventing guest execution that level-triggered interrupts do.  One
> +option is to not use HW bit at all, and inject edge-triggered interrupts
> +from a physical device as pure virtual interrupts.  But that would
> +potentially slow down handling of the interrupt in the guest, because a
> +physical interrupt occurring in the middle of the guest ISR would
> +preempt the guest for the host to handle the interrupt.  Additionally,
> +if you configure the system to handle interrupts on a separate physical
> +core from that running your VCPU, you still have to interrupt the VCPU
> +to queue the pending state onto the LR, even though the guest won't use
> +this information until the guest ISR completes.  Therefore, the HW
> +bit should always be set for forwarded edge-triggered interrupts.  With
> +the HW bit set, the virtual interrupt is injected and additional
> +physical interrupts occurring before the guest deactivates the interrupt
> +simply mark the state on the physical distributor as Pending+Active.  As
> +soon as the guest deactivates the interrupt, the host takes another
> +interrupt if and only if there was a physical interrupt between
> +injecting the forwarded interrupt to the guest
missing and?
 the guest deactivating
> +the interrupt.
> +
> +Consequently, whenever we schedule a VCPU with one or more LRs with the
> +HW bit set, the interrupt must also be active on the physical
> +distributor.
> +
> +
> +Forwarded LPIs
> +--------------
> +LPIs, introduced in GICv3, are always edge-triggered and do not have an
> +active state.  They become pending when a device signal them, and as
> +soon as they are acked by the CPU, they are inactive again.
> +
> +It therefore doesn't make sense, and is not supported, to set the HW bit
> +for physical LPIs that are forwarded to a VM as virtual interrupts,
> +typically virtual SPIs.
> +
> +For LPIs, there is no other choice than to preempt the VCPU thread if
> +necessary, and queue the pending state onto the LR.
> +
> +
> +Putting It Together: The Architected Timer
> +------------------------------------------
> +The architected timer is a device that signals interrupts with level
> +triggered semantics.  The timer hardware is directly accessed by VCPUs
> +which program the timer to fire at some point in time.  Each VCPU on a
> +system programs the timer to fire at different times, and therefore the
> +hardware is multiplexed between multiple VCPUs.  This is implemented by
> +context-switching the timer state along with each VCPU thread.
> +
> +However, this means that a scenario like the following is entirely
> +possible, and in fact, typical:
> +
> +1.  KVM runs the VCPU
> +2.  The guest programs the time to fire in T+100
> +3.  The guest is idle and calls WFI (wait-for-interrupts)
> +4.  The hardware traps to the host
> +5.  KVM stores the timer state to memory and disables the hardware timer
> +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
> +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
> +8.  The soft timer fires, waking up the VCPU thread
> +9.  KVM reprograms the timer hardware with the VCPU's values
> +10. KVM marks the timer interrupt as active on the physical distributor
> +11. KVM injects a forwarded physical interrupt to the guest
> +12. KVM runs the VCPU
> +
> +Notice that KVM injects a forwarded physical interrupt in step 11 without
> +the corresponding interrupt having actually fired on the host.  That is
> +exactly why we mark the timer interrupt as active in step 10, because
> +the active state on the physical distributor is part of the state
> +belonging to the timer hardware, which is context-switched along with
> +the VCPU thread.
> +
> +If the guest does not idle because it is busy, flow looks like this
> +instead:
> +
> +1.  KVM runs the VCPU
> +2.  The guest programs the time to fire in T+100
> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
> +    and injects a forwarded physical interrupt because it concludes the
> +    timer has expired.
I don't get how we can trap without the virtual timer PPI handler being
entered on host side. Please can you elaborate on this?

Eric
> +6.  KVM marks the timer interrupt as active on the physical distributor
> +7.  KVM runs the VCPU
> +
> +Notice that again the forwarded physical interrupt is injected to the
> +guest without having actually been handled on the host.  In this case it
> +is because the physical interrupt is forwarded to the guest before KVM
> +enables physical interrupts on the CPU after exiting the guest.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-07 16:45     ` Eric Auger
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-07 16:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,
On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> way we deal with them is not apparently easy to understand by reading
> various specs.
> 
> Therefore, add a proper documentation file explaining the flow and
> rationale of the behavior of the vgic.
> 
> Some of this text was contributed by Marc Zyngier and edited by me.
> Omissions and errors are all mine.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>  1 file changed, 181 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> 
> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> new file mode 100644
> index 0000000..24b6f28
> --- /dev/null
> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> @@ -0,0 +1,181 @@
> +KVM/ARM VGIC Forwarded Physical Interrupts
> +==========================================
> +
> +The KVM/ARM code implements software support for the ARM Generic
> +Interrupt Controller's (GIC's) hardware support for virtualization by
> +allowing software to inject virtual interrupts to a VM, which the guest
> +OS sees as regular interrupts.  The code is famously known as the VGIC.
> +
> +Some of these virtual interrupts, however, correspond to physical
> +interrupts from real physical devices.  One example could be the
> +architected timer, which itself supports virtualization, and therefore
> +lets a guest OS program the hardware device directly to raise an
> +interrupt at some point in time.  When such an interrupt is raised, the
> +host OS initially handles the interrupt and must somehow signal this
> +event as a virtual interrupt to the guest.  Another example could be a
> +passthrough device, where the physical interrupts are initially handled
> +by the host, but the device driver for the device lives in the guest OS
> +and KVM must therefore somehow inject a virtual interrupt on behalf of
> +the physical one to the guest OS.
> +
> +These virtual interrupts corresponding to a physical interrupt on the
> +host are called forwarded physical interrupts, but are also sometimes
> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> +
> +Forwarded physical interrupts are handled slightly differently compared
> +to virtual interrupts generated purely by a software emulated device.
> +
> +
> +The HW bit
> +----------
> +Virtual interrupts are signalled to the guest by programming the List
> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> +with the virtual IRQ number and the state of the interrupt (Pending,
> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> +interrupt, the LR state moves from Pending to Active, and finally to
> +inactive.
> +
> +The LRs include an extra bit, called the HW bit.  When this bit is set,
> +KVM must also program an additional field in the LR, the physical IRQ
> +number, to link the virtual with the physical IRQ.
> +
> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> +bit, never both at the same time.
> +
> +Setting the HW bit causes the hardware to deactivate the physical
> +interrupt on the physical distributor when the guest deactivates the
> +corresponding virtual interrupt.
> +
> +
> +Forwarded Physical Interrupts Life Cycle
> +----------------------------------------
> +
> +The state of forwarded physical interrupts is managed in the following way:
> +
> +  - The physical interrupt is acked by the host, and becomes active on
> +    the physical distributor (*).
> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> +    interface is going to present it to the guest.
> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> +    expected.
> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> +    but the LR.Active is left untouched (set).
> +  - KVM clears the LR when on VM exits when the physical distributor
s/when//?
> +    active state has been cleared.
> +
> +(*): The host handling is slightly more complicated.  For some devices
> +(shared), KVM directly sets the active state on the physical distributor
> +before entering the guest, and for some devices (non-shared) the host
> +configures the GIC such that it does not deactivate the interrupt on
> +host EOIs, but only performs a priority drop allowing the GIC to receive
> +other interrupts and leaves the interrupt in the active state on the
> +physical distributor.
EOIMode == 1 is set globally and impacts all forwarded SPI/PPIs, shared
or not shared I think. reading the above lines I have the impression
this is a per-device programming.

My understanding is for the timer it is needed to manually set the
physical distributor state because 1) the HW (GIC) does not do it and 2)
we need to context switch depending on the vCPU. For non shared device
the GIC sets the physical distributor state and the state is fully
maintained by HW until the guest deactivation.

> +
> +
> +Forwarded Edge and Level Triggered PPIs and SPIs
> +------------------------------------------------
> +Forwarded physical interrupts injected should always be active on the
> +physical distributor when injected to a guest.
> +
> +Level-triggered interrupts will keep the interrupt line to the GIC
> +asserted, typically until the guest programs the device to deassert the
> +line.  This means that the interrupt will remain pending on the physical
> +distributor until the guest has reprogrammed the device.  Since we
> +always run the VM with interrupts enabled on the CPU, a pending
> +interrupt will exit the guest as soon as we switch into the guest,
> +preventing the guest from ever making progress as the process repeats
> +over and over.  Therefore, the active state on the physical distributor
> +must be set when entering the guest, preventing the GIC from forwarding
> +the pending interrupt to the CPU.  As soon as the guest deactivates
> +(EOIs) the interrupt, the physical line is sampled by the hardware again
I think you can remove "(EOI)". This depends on EOI mode setting on
guest side. it can be 2-in-1 EOI or EOI+DIR.
> +and the host takes a new interrupt if and only if the physical line is
> +still asserted.
> +
> +Edge-triggered interrupts do not exhibit the same problem with
> +preventing guest execution that level-triggered interrupts do.  One
> +option is to not use HW bit at all, and inject edge-triggered interrupts
> +from a physical device as pure virtual interrupts.  But that would
> +potentially slow down handling of the interrupt in the guest, because a
> +physical interrupt occurring in the middle of the guest ISR would
> +preempt the guest for the host to handle the interrupt.  Additionally,
> +if you configure the system to handle interrupts on a separate physical
> +core from that running your VCPU, you still have to interrupt the VCPU
> +to queue the pending state onto the LR, even though the guest won't use
> +this information until the guest ISR completes.  Therefore, the HW
> +bit should always be set for forwarded edge-triggered interrupts.  With
> +the HW bit set, the virtual interrupt is injected and additional
> +physical interrupts occurring before the guest deactivates the interrupt
> +simply mark the state on the physical distributor as Pending+Active.  As
> +soon as the guest deactivates the interrupt, the host takes another
> +interrupt if and only if there was a physical interrupt between
> +injecting the forwarded interrupt to the guest
missing and?
 the guest deactivating
> +the interrupt.
> +
> +Consequently, whenever we schedule a VCPU with one or more LRs with the
> +HW bit set, the interrupt must also be active on the physical
> +distributor.
> +
> +
> +Forwarded LPIs
> +--------------
> +LPIs, introduced in GICv3, are always edge-triggered and do not have an
> +active state.  They become pending when a device signal them, and as
> +soon as they are acked by the CPU, they are inactive again.
> +
> +It therefore doesn't make sense, and is not supported, to set the HW bit
> +for physical LPIs that are forwarded to a VM as virtual interrupts,
> +typically virtual SPIs.
> +
> +For LPIs, there is no other choice than to preempt the VCPU thread if
> +necessary, and queue the pending state onto the LR.
> +
> +
> +Putting It Together: The Architected Timer
> +------------------------------------------
> +The architected timer is a device that signals interrupts with level
> +triggered semantics.  The timer hardware is directly accessed by VCPUs
> +which program the timer to fire at some point in time.  Each VCPU on a
> +system programs the timer to fire at different times, and therefore the
> +hardware is multiplexed between multiple VCPUs.  This is implemented by
> +context-switching the timer state along with each VCPU thread.
> +
> +However, this means that a scenario like the following is entirely
> +possible, and in fact, typical:
> +
> +1.  KVM runs the VCPU
> +2.  The guest programs the time to fire in T+100
> +3.  The guest is idle and calls WFI (wait-for-interrupts)
> +4.  The hardware traps to the host
> +5.  KVM stores the timer state to memory and disables the hardware timer
> +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
> +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
> +8.  The soft timer fires, waking up the VCPU thread
> +9.  KVM reprograms the timer hardware with the VCPU's values
> +10. KVM marks the timer interrupt as active on the physical distributor
> +11. KVM injects a forwarded physical interrupt to the guest
> +12. KVM runs the VCPU
> +
> +Notice that KVM injects a forwarded physical interrupt in step 11 without
> +the corresponding interrupt having actually fired on the host.  That is
> +exactly why we mark the timer interrupt as active in step 10, because
> +the active state on the physical distributor is part of the state
> +belonging to the timer hardware, which is context-switched along with
> +the VCPU thread.
> +
> +If the guest does not idle because it is busy, flow looks like this
> +instead:
> +
> +1.  KVM runs the VCPU
> +2.  The guest programs the time to fire in T+100
> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
> +    and injects a forwarded physical interrupt because it concludes the
> +    timer has expired.
I don't get how we can trap without the virtual timer PPI handler being
entered on host side. Please can you elaborate on this?

Eric
> +6.  KVM marks the timer interrupt as active on the physical distributor
> +7.  KVM runs the VCPU
> +
> +Notice that again the forwarded physical interrupt is injected to the
> +guest without having actually been handled on the host.  In this case it
> +is because the physical interrupt is forwarded to the guest before KVM
> +enables physical interrupts on the CPU after exiting the guest.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-07 16:45     ` Eric Auger
@ 2015-09-07 17:50       ` Marc Zyngier
  -1 siblings, 0 replies; 64+ messages in thread
From: Marc Zyngier @ 2015-09-07 17:50 UTC (permalink / raw)
  To: Eric Auger, Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: kvm

On 07/09/15 17:45, Eric Auger wrote:
> Hi Christoffer,
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>> way we deal with them is not apparently easy to understand by reading
>> various specs.
>>
>> Therefore, add a proper documentation file explaining the flow and
>> rationale of the behavior of the vgic.
>>
>> Some of this text was contributed by Marc Zyngier and edited by me.
>> Omissions and errors are all mine.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>  1 file changed, 181 insertions(+)
>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>
>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> new file mode 100644
>> index 0000000..24b6f28
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> @@ -0,0 +1,181 @@
>> +KVM/ARM VGIC Forwarded Physical Interrupts
>> +==========================================

[...]

>> +1.  KVM runs the VCPU
>> +2.  The guest programs the time to fire in T+100
>> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
>> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
>> +    and injects a forwarded physical interrupt because it concludes the
>> +    timer has expired.
> I don't get how we can trap without the virtual timer PPI handler being
> entered on host side. Please can you elaborate on this?

On VM exit, we disable the virtual timer (see the code in
hyp.S::save_timer_state where we clear the enable bit). We still perform
the exit, but the cause for exit is now gone, and the handler will never
fire.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-07 17:50       ` Marc Zyngier
  0 siblings, 0 replies; 64+ messages in thread
From: Marc Zyngier @ 2015-09-07 17:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/09/15 17:45, Eric Auger wrote:
> Hi Christoffer,
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>> way we deal with them is not apparently easy to understand by reading
>> various specs.
>>
>> Therefore, add a proper documentation file explaining the flow and
>> rationale of the behavior of the vgic.
>>
>> Some of this text was contributed by Marc Zyngier and edited by me.
>> Omissions and errors are all mine.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>  1 file changed, 181 insertions(+)
>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>
>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> new file mode 100644
>> index 0000000..24b6f28
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> @@ -0,0 +1,181 @@
>> +KVM/ARM VGIC Forwarded Physical Interrupts
>> +==========================================

[...]

>> +1.  KVM runs the VCPU
>> +2.  The guest programs the time to fire in T+100
>> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
>> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
>> +    and injects a forwarded physical interrupt because it concludes the
>> +    timer has expired.
> I don't get how we can trap without the virtual timer PPI handler being
> entered on host side. Please can you elaborate on this?

On VM exit, we disable the virtual timer (see the code in
hyp.S::save_timer_state where we clear the enable bit). We still perform
the exit, but the cause for exit is now gone, and the handler will never
fire.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-07 17:50       ` Marc Zyngier
@ 2015-09-08  7:44         ` Eric Auger
  -1 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-08  7:44 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: kvm

Hi Marc,
On 09/07/2015 07:50 PM, Marc Zyngier wrote:
> On 07/09/15 17:45, Eric Auger wrote:
>> Hi Christoffer,
>> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>> way we deal with them is not apparently easy to understand by reading
>>> various specs.
>>>
>>> Therefore, add a proper documentation file explaining the flow and
>>> rationale of the behavior of the vgic.
>>>
>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>> Omissions and errors are all mine.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>  1 file changed, 181 insertions(+)
>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>
>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> new file mode 100644
>>> index 0000000..24b6f28
>>> --- /dev/null
>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> @@ -0,0 +1,181 @@
>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>> +==========================================
> 
> [...]
> 
>>> +1.  KVM runs the VCPU
>>> +2.  The guest programs the time to fire in T+100
>>> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
>>> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
>>> +    and injects a forwarded physical interrupt because it concludes the
>>> +    timer has expired.
>> I don't get how we can trap without the virtual timer PPI handler being
>> entered on host side. Please can you elaborate on this?
> 
> On VM exit, we disable the virtual timer (see the code in
> hyp.S::save_timer_state where we clear the enable bit). We still perform
> the exit, but the cause for exit is now gone, and the handler will never
> fire.
OK thanks for the clarification

Eric
> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-08  7:44         ` Eric Auger
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-08  7:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,
On 09/07/2015 07:50 PM, Marc Zyngier wrote:
> On 07/09/15 17:45, Eric Auger wrote:
>> Hi Christoffer,
>> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>> way we deal with them is not apparently easy to understand by reading
>>> various specs.
>>>
>>> Therefore, add a proper documentation file explaining the flow and
>>> rationale of the behavior of the vgic.
>>>
>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>> Omissions and errors are all mine.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>  1 file changed, 181 insertions(+)
>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>
>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> new file mode 100644
>>> index 0000000..24b6f28
>>> --- /dev/null
>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> @@ -0,0 +1,181 @@
>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>> +==========================================
> 
> [...]
> 
>>> +1.  KVM runs the VCPU
>>> +2.  The guest programs the time to fire in T+100
>>> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
>>> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
>>> +    and injects a forwarded physical interrupt because it concludes the
>>> +    timer has expired.
>> I don't get how we can trap without the virtual timer PPI handler being
>> entered on host side. Please can you elaborate on this?
> 
> On VM exit, we disable the virtual timer (see the code in
> hyp.S::save_timer_state where we clear the enable bit). We still perform
> the exit, but the cause for exit is now gone, and the handler will never
> fire.
OK thanks for the clarification

Eric
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-07 11:25     ` Andre Przywara
@ 2015-09-08  8:43       ` Eric Auger
  -1 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-08  8:43 UTC (permalink / raw)
  To: Andre Przywara, Christoffer Dall, Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm

Hi Andre,
On 09/07/2015 01:25 PM, Andre Przywara wrote:
> Hi,
> 
> firstly: this text is really great, thanks for coming up with that.
> See below for some information I got from tracing the host which I
> cannot make sense of....
> 
> 
> On 04/09/15 20:40, Christoffer Dall wrote:
>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>> way we deal with them is not apparently easy to understand by reading
>> various specs.
>>
>> Therefore, add a proper documentation file explaining the flow and
>> rationale of the behavior of the vgic.
>>
>> Some of this text was contributed by Marc Zyngier and edited by me.
>> Omissions and errors are all mine.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>  1 file changed, 181 insertions(+)
>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>
>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> new file mode 100644
>> index 0000000..24b6f28
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> @@ -0,0 +1,181 @@
>> +KVM/ARM VGIC Forwarded Physical Interrupts
>> +==========================================
>> +
>> +The KVM/ARM code implements software support for the ARM Generic
>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>> +allowing software to inject virtual interrupts to a VM, which the guest
>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>> +
>> +Some of these virtual interrupts, however, correspond to physical
>> +interrupts from real physical devices.  One example could be the
>> +architected timer, which itself supports virtualization, and therefore
>> +lets a guest OS program the hardware device directly to raise an
>> +interrupt at some point in time.  When such an interrupt is raised, the
>> +host OS initially handles the interrupt and must somehow signal this
>> +event as a virtual interrupt to the guest.  Another example could be a
>> +passthrough device, where the physical interrupts are initially handled
>> +by the host, but the device driver for the device lives in the guest OS
>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>> +the physical one to the guest OS.
>> +
>> +These virtual interrupts corresponding to a physical interrupt on the
>> +host are called forwarded physical interrupts, but are also sometimes
>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>> +
>> +Forwarded physical interrupts are handled slightly differently compared
>> +to virtual interrupts generated purely by a software emulated device.
>> +
>> +
>> +The HW bit
>> +----------
>> +Virtual interrupts are signalled to the guest by programming the List
>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>> +with the virtual IRQ number and the state of the interrupt (Pending,
>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>> +interrupt, the LR state moves from Pending to Active, and finally to
>> +inactive.
>> +
>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>> +KVM must also program an additional field in the LR, the physical IRQ
>> +number, to link the virtual with the physical IRQ.
>> +
>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>> +bit, never both at the same time.
>> +
>> +Setting the HW bit causes the hardware to deactivate the physical
>> +interrupt on the physical distributor when the guest deactivates the
>> +corresponding virtual interrupt.
>> +
>> +
>> +Forwarded Physical Interrupts Life Cycle
>> +----------------------------------------
>> +
>> +The state of forwarded physical interrupts is managed in the following way:
>> +
>> +  - The physical interrupt is acked by the host, and becomes active on
>> +    the physical distributor (*).
>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>> +    interface is going to present it to the guest.
>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>> +    expected.
>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>> +    but the LR.Active is left untouched (set).
> 
> I tried hard in the last week, but couldn't confirm this. Tracing shows
> the following pattern over and over (case 1):
> (This is the kvm/kvm.git:queue branch from last week, so including the
> mapped timer IRQ code. Tests were done on Juno and Midway)
> 
> ...
> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> 0xffffffc0004089d8
> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
> ELRSR: 1, dist active: 0, log. active: 1
> ....
> 
> My hunch is that the following happens (please correct me if needed!):
> First there is an unrelated trap (line 1), then later the guest exits
> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
> The host injects the timer IRQ (not shown here) and returns to the
> guest. On the next trap (line 3, due to a stage 2 page fault),
> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
> GIC actually did deactivate both the LR (state=8, which is inactive,
> just the HW bit is still set) _and_ the state on the physical
> distributor (dist active=0). This trace_printk is just after entering
> the function, so before the code there performs these steps redundantly.
> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
> point of view this virtual IRQ cycle is finished.
> 
> The other sequence I see is this one (case 2):
> 
> ....
> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> 0xffffffc0004089d8
> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> ELRSR: 0, dist active: 1, log. active: 1
> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> ELRSR: 0, dist active: 0, log. active: 1
> ...
> 
> In line 1 the timer fires, the host injects the timer IRQ into the
> guest, which exits again in line 2 due to a page fault (may have IRQs
> disabled?). The LR dump in line 3 shows that the timer IRQ is still
> pending in the LR (state=9) and active on the physical distributor. Now
> the code in vgic_sync_hwirq() clears the active state in the physical
> distributor (by calling irq_set_irqchip_state()), but leaves the LR
> alone (by returning 0 to the caller).
> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
> (line 5), only that the physical dist state in now inactive (due to us
> clearing that explicitly during the last exit).
Normally the physical dist state was set active on previous flush, right
(done for all mapped IRQs)? So are you sure the IRQ was not actually
completed by the guest? As Christoffer mentions the LR active state can
remain even if the IRQ was completed.

Did I misunderstand the problem you try to shed the light on?

Cheers

Eric

 Now vgic_sync_hwirq()
> returns 1, leading to the LR being cleaned up in the caller.
> So to me it looks like we kill that IRQ before the guest had the chance
> to handle it (presumably because it has IRQs off).

> 
> The distribution of those patterns in my particular snapshot are (all
> with timer IRQ 27):
>  7107  LR.state:  8, ELRSR: 1, dist active: 0, log. active: 1
>  1629  LR.state:  9, ELRSR: 0, dist active: 0, log. active: 1
>  1629  LR.state:  9, ELRSR: 0, dist active: 1, log. active: 1
>   331  LR.state: 10, ELRSR: 0, dist active: 1, log. active: 1
>    68  LR.state: 10, ELRSR: 0, dist active: 0, log. active: 1
> 
> So for the majority of exits with the timer having been injected before
> we redundantly clean the LR (case 1 above). Also there is quite a number
> of cases where we "kill" the IRQ (case 2 above). The active state case
> (state: 10 in the last two lines) seems to be a variation of case 2,
> just with the guest exiting from within the IRQ handler (after
> activation, before EOI).
> 
> I'd appreciate if someone could shed some light on this and show me
> where I am wrong here or what is going on instead.
> 
> Cheers,
> Andre.
> 
>> +  - KVM clears the LR when on VM exits when the physical distributor
>> +    active state has been cleared.
>> +
>> +(*): The host handling is slightly more complicated.  For some devices
>> +(shared), KVM directly sets the active state on the physical distributor
>> +before entering the guest, and for some devices (non-shared) the host
>> +configures the GIC such that it does not deactivate the interrupt on
>> +host EOIs, but only performs a priority drop allowing the GIC to receive
>> +other interrupts and leaves the interrupt in the active state on the
>> +physical distributor.
>> +
>> +
>> +Forwarded Edge and Level Triggered PPIs and SPIs
>> +------------------------------------------------
>> +Forwarded physical interrupts injected should always be active on the
>> +physical distributor when injected to a guest.
>> +
>> +Level-triggered interrupts will keep the interrupt line to the GIC
>> +asserted, typically until the guest programs the device to deassert the
>> +line.  This means that the interrupt will remain pending on the physical
>> +distributor until the guest has reprogrammed the device.  Since we
>> +always run the VM with interrupts enabled on the CPU, a pending
>> +interrupt will exit the guest as soon as we switch into the guest,
>> +preventing the guest from ever making progress as the process repeats
>> +over and over.  Therefore, the active state on the physical distributor
>> +must be set when entering the guest, preventing the GIC from forwarding
>> +the pending interrupt to the CPU.  As soon as the guest deactivates
>> +(EOIs) the interrupt, the physical line is sampled by the hardware again
>> +and the host takes a new interrupt if and only if the physical line is
>> +still asserted.
>> +
>> +Edge-triggered interrupts do not exhibit the same problem with
>> +preventing guest execution that level-triggered interrupts do.  One
>> +option is to not use HW bit at all, and inject edge-triggered interrupts
>> +from a physical device as pure virtual interrupts.  But that would
>> +potentially slow down handling of the interrupt in the guest, because a
>> +physical interrupt occurring in the middle of the guest ISR would
>> +preempt the guest for the host to handle the interrupt.  Additionally,
>> +if you configure the system to handle interrupts on a separate physical
>> +core from that running your VCPU, you still have to interrupt the VCPU
>> +to queue the pending state onto the LR, even though the guest won't use
>> +this information until the guest ISR completes.  Therefore, the HW
>> +bit should always be set for forwarded edge-triggered interrupts.  With
>> +the HW bit set, the virtual interrupt is injected and additional
>> +physical interrupts occurring before the guest deactivates the interrupt
>> +simply mark the state on the physical distributor as Pending+Active.  As
>> +soon as the guest deactivates the interrupt, the host takes another
>> +interrupt if and only if there was a physical interrupt between
>> +injecting the forwarded interrupt to the guest the guest deactivating
>> +the interrupt.
>> +
>> +Consequently, whenever we schedule a VCPU with one or more LRs with the
>> +HW bit set, the interrupt must also be active on the physical
>> +distributor.
>> +
>> +
>> +Forwarded LPIs
>> +--------------
>> +LPIs, introduced in GICv3, are always edge-triggered and do not have an
>> +active state.  They become pending when a device signal them, and as
>> +soon as they are acked by the CPU, they are inactive again.
>> +
>> +It therefore doesn't make sense, and is not supported, to set the HW bit
>> +for physical LPIs that are forwarded to a VM as virtual interrupts,
>> +typically virtual SPIs.
>> +
>> +For LPIs, there is no other choice than to preempt the VCPU thread if
>> +necessary, and queue the pending state onto the LR.
>> +
>> +
>> +Putting It Together: The Architected Timer
>> +------------------------------------------
>> +The architected timer is a device that signals interrupts with level
>> +triggered semantics.  The timer hardware is directly accessed by VCPUs
>> +which program the timer to fire at some point in time.  Each VCPU on a
>> +system programs the timer to fire at different times, and therefore the
>> +hardware is multiplexed between multiple VCPUs.  This is implemented by
>> +context-switching the timer state along with each VCPU thread.
>> +
>> +However, this means that a scenario like the following is entirely
>> +possible, and in fact, typical:
>> +
>> +1.  KVM runs the VCPU
>> +2.  The guest programs the time to fire in T+100
>> +3.  The guest is idle and calls WFI (wait-for-interrupts)
>> +4.  The hardware traps to the host
>> +5.  KVM stores the timer state to memory and disables the hardware timer
>> +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
>> +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
>> +8.  The soft timer fires, waking up the VCPU thread
>> +9.  KVM reprograms the timer hardware with the VCPU's values
>> +10. KVM marks the timer interrupt as active on the physical distributor
>> +11. KVM injects a forwarded physical interrupt to the guest
>> +12. KVM runs the VCPU
>> +
>> +Notice that KVM injects a forwarded physical interrupt in step 11 without
>> +the corresponding interrupt having actually fired on the host.  That is
>> +exactly why we mark the timer interrupt as active in step 10, because
>> +the active state on the physical distributor is part of the state
>> +belonging to the timer hardware, which is context-switched along with
>> +the VCPU thread.
>> +
>> +If the guest does not idle because it is busy, flow looks like this
>> +instead:
>> +
>> +1.  KVM runs the VCPU
>> +2.  The guest programs the time to fire in T+100
>> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
>> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
>> +    and injects a forwarded physical interrupt because it concludes the
>> +    timer has expired.
>> +6.  KVM marks the timer interrupt as active on the physical distributor
>> +7.  KVM runs the VCPU
>> +
>> +Notice that again the forwarded physical interrupt is injected to the
>> +guest without having actually been handled on the host.  In this case it
>> +is because the physical interrupt is forwarded to the guest before KVM
>> +enables physical interrupts on the CPU after exiting the guest.
>>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-08  8:43       ` Eric Auger
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-08  8:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Andre,
On 09/07/2015 01:25 PM, Andre Przywara wrote:
> Hi,
> 
> firstly: this text is really great, thanks for coming up with that.
> See below for some information I got from tracing the host which I
> cannot make sense of....
> 
> 
> On 04/09/15 20:40, Christoffer Dall wrote:
>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>> way we deal with them is not apparently easy to understand by reading
>> various specs.
>>
>> Therefore, add a proper documentation file explaining the flow and
>> rationale of the behavior of the vgic.
>>
>> Some of this text was contributed by Marc Zyngier and edited by me.
>> Omissions and errors are all mine.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>  1 file changed, 181 insertions(+)
>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>
>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> new file mode 100644
>> index 0000000..24b6f28
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> @@ -0,0 +1,181 @@
>> +KVM/ARM VGIC Forwarded Physical Interrupts
>> +==========================================
>> +
>> +The KVM/ARM code implements software support for the ARM Generic
>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>> +allowing software to inject virtual interrupts to a VM, which the guest
>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>> +
>> +Some of these virtual interrupts, however, correspond to physical
>> +interrupts from real physical devices.  One example could be the
>> +architected timer, which itself supports virtualization, and therefore
>> +lets a guest OS program the hardware device directly to raise an
>> +interrupt at some point in time.  When such an interrupt is raised, the
>> +host OS initially handles the interrupt and must somehow signal this
>> +event as a virtual interrupt to the guest.  Another example could be a
>> +passthrough device, where the physical interrupts are initially handled
>> +by the host, but the device driver for the device lives in the guest OS
>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>> +the physical one to the guest OS.
>> +
>> +These virtual interrupts corresponding to a physical interrupt on the
>> +host are called forwarded physical interrupts, but are also sometimes
>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>> +
>> +Forwarded physical interrupts are handled slightly differently compared
>> +to virtual interrupts generated purely by a software emulated device.
>> +
>> +
>> +The HW bit
>> +----------
>> +Virtual interrupts are signalled to the guest by programming the List
>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>> +with the virtual IRQ number and the state of the interrupt (Pending,
>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>> +interrupt, the LR state moves from Pending to Active, and finally to
>> +inactive.
>> +
>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>> +KVM must also program an additional field in the LR, the physical IRQ
>> +number, to link the virtual with the physical IRQ.
>> +
>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>> +bit, never both at the same time.
>> +
>> +Setting the HW bit causes the hardware to deactivate the physical
>> +interrupt on the physical distributor when the guest deactivates the
>> +corresponding virtual interrupt.
>> +
>> +
>> +Forwarded Physical Interrupts Life Cycle
>> +----------------------------------------
>> +
>> +The state of forwarded physical interrupts is managed in the following way:
>> +
>> +  - The physical interrupt is acked by the host, and becomes active on
>> +    the physical distributor (*).
>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>> +    interface is going to present it to the guest.
>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>> +    expected.
>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>> +    but the LR.Active is left untouched (set).
> 
> I tried hard in the last week, but couldn't confirm this. Tracing shows
> the following pattern over and over (case 1):
> (This is the kvm/kvm.git:queue branch from last week, so including the
> mapped timer IRQ code. Tests were done on Juno and Midway)
> 
> ...
> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> 0xffffffc0004089d8
> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
> ELRSR: 1, dist active: 0, log. active: 1
> ....
> 
> My hunch is that the following happens (please correct me if needed!):
> First there is an unrelated trap (line 1), then later the guest exits
> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
> The host injects the timer IRQ (not shown here) and returns to the
> guest. On the next trap (line 3, due to a stage 2 page fault),
> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
> GIC actually did deactivate both the LR (state=8, which is inactive,
> just the HW bit is still set) _and_ the state on the physical
> distributor (dist active=0). This trace_printk is just after entering
> the function, so before the code there performs these steps redundantly.
> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
> point of view this virtual IRQ cycle is finished.
> 
> The other sequence I see is this one (case 2):
> 
> ....
> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> 0xffffffc0004089d8
> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> ELRSR: 0, dist active: 1, log. active: 1
> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> ELRSR: 0, dist active: 0, log. active: 1
> ...
> 
> In line 1 the timer fires, the host injects the timer IRQ into the
> guest, which exits again in line 2 due to a page fault (may have IRQs
> disabled?). The LR dump in line 3 shows that the timer IRQ is still
> pending in the LR (state=9) and active on the physical distributor. Now
> the code in vgic_sync_hwirq() clears the active state in the physical
> distributor (by calling irq_set_irqchip_state()), but leaves the LR
> alone (by returning 0 to the caller).
> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
> (line 5), only that the physical dist state in now inactive (due to us
> clearing that explicitly during the last exit).
Normally the physical dist state was set active on previous flush, right
(done for all mapped IRQs)? So are you sure the IRQ was not actually
completed by the guest? As Christoffer mentions the LR active state can
remain even if the IRQ was completed.

Did I misunderstand the problem you try to shed the light on?

Cheers

Eric

 Now vgic_sync_hwirq()
> returns 1, leading to the LR being cleaned up in the caller.
> So to me it looks like we kill that IRQ before the guest had the chance
> to handle it (presumably because it has IRQs off).

> 
> The distribution of those patterns in my particular snapshot are (all
> with timer IRQ 27):
>  7107  LR.state:  8, ELRSR: 1, dist active: 0, log. active: 1
>  1629  LR.state:  9, ELRSR: 0, dist active: 0, log. active: 1
>  1629  LR.state:  9, ELRSR: 0, dist active: 1, log. active: 1
>   331  LR.state: 10, ELRSR: 0, dist active: 1, log. active: 1
>    68  LR.state: 10, ELRSR: 0, dist active: 0, log. active: 1
> 
> So for the majority of exits with the timer having been injected before
> we redundantly clean the LR (case 1 above). Also there is quite a number
> of cases where we "kill" the IRQ (case 2 above). The active state case
> (state: 10 in the last two lines) seems to be a variation of case 2,
> just with the guest exiting from within the IRQ handler (after
> activation, before EOI).
> 
> I'd appreciate if someone could shed some light on this and show me
> where I am wrong here or what is going on instead.
> 
> Cheers,
> Andre.
> 
>> +  - KVM clears the LR when on VM exits when the physical distributor
>> +    active state has been cleared.
>> +
>> +(*): The host handling is slightly more complicated.  For some devices
>> +(shared), KVM directly sets the active state on the physical distributor
>> +before entering the guest, and for some devices (non-shared) the host
>> +configures the GIC such that it does not deactivate the interrupt on
>> +host EOIs, but only performs a priority drop allowing the GIC to receive
>> +other interrupts and leaves the interrupt in the active state on the
>> +physical distributor.
>> +
>> +
>> +Forwarded Edge and Level Triggered PPIs and SPIs
>> +------------------------------------------------
>> +Forwarded physical interrupts injected should always be active on the
>> +physical distributor when injected to a guest.
>> +
>> +Level-triggered interrupts will keep the interrupt line to the GIC
>> +asserted, typically until the guest programs the device to deassert the
>> +line.  This means that the interrupt will remain pending on the physical
>> +distributor until the guest has reprogrammed the device.  Since we
>> +always run the VM with interrupts enabled on the CPU, a pending
>> +interrupt will exit the guest as soon as we switch into the guest,
>> +preventing the guest from ever making progress as the process repeats
>> +over and over.  Therefore, the active state on the physical distributor
>> +must be set when entering the guest, preventing the GIC from forwarding
>> +the pending interrupt to the CPU.  As soon as the guest deactivates
>> +(EOIs) the interrupt, the physical line is sampled by the hardware again
>> +and the host takes a new interrupt if and only if the physical line is
>> +still asserted.
>> +
>> +Edge-triggered interrupts do not exhibit the same problem with
>> +preventing guest execution that level-triggered interrupts do.  One
>> +option is to not use HW bit at all, and inject edge-triggered interrupts
>> +from a physical device as pure virtual interrupts.  But that would
>> +potentially slow down handling of the interrupt in the guest, because a
>> +physical interrupt occurring in the middle of the guest ISR would
>> +preempt the guest for the host to handle the interrupt.  Additionally,
>> +if you configure the system to handle interrupts on a separate physical
>> +core from that running your VCPU, you still have to interrupt the VCPU
>> +to queue the pending state onto the LR, even though the guest won't use
>> +this information until the guest ISR completes.  Therefore, the HW
>> +bit should always be set for forwarded edge-triggered interrupts.  With
>> +the HW bit set, the virtual interrupt is injected and additional
>> +physical interrupts occurring before the guest deactivates the interrupt
>> +simply mark the state on the physical distributor as Pending+Active.  As
>> +soon as the guest deactivates the interrupt, the host takes another
>> +interrupt if and only if there was a physical interrupt between
>> +injecting the forwarded interrupt to the guest the guest deactivating
>> +the interrupt.
>> +
>> +Consequently, whenever we schedule a VCPU with one or more LRs with the
>> +HW bit set, the interrupt must also be active on the physical
>> +distributor.
>> +
>> +
>> +Forwarded LPIs
>> +--------------
>> +LPIs, introduced in GICv3, are always edge-triggered and do not have an
>> +active state.  They become pending when a device signal them, and as
>> +soon as they are acked by the CPU, they are inactive again.
>> +
>> +It therefore doesn't make sense, and is not supported, to set the HW bit
>> +for physical LPIs that are forwarded to a VM as virtual interrupts,
>> +typically virtual SPIs.
>> +
>> +For LPIs, there is no other choice than to preempt the VCPU thread if
>> +necessary, and queue the pending state onto the LR.
>> +
>> +
>> +Putting It Together: The Architected Timer
>> +------------------------------------------
>> +The architected timer is a device that signals interrupts with level
>> +triggered semantics.  The timer hardware is directly accessed by VCPUs
>> +which program the timer to fire at some point in time.  Each VCPU on a
>> +system programs the timer to fire at different times, and therefore the
>> +hardware is multiplexed between multiple VCPUs.  This is implemented by
>> +context-switching the timer state along with each VCPU thread.
>> +
>> +However, this means that a scenario like the following is entirely
>> +possible, and in fact, typical:
>> +
>> +1.  KVM runs the VCPU
>> +2.  The guest programs the time to fire in T+100
>> +3.  The guest is idle and calls WFI (wait-for-interrupts)
>> +4.  The hardware traps to the host
>> +5.  KVM stores the timer state to memory and disables the hardware timer
>> +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
>> +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
>> +8.  The soft timer fires, waking up the VCPU thread
>> +9.  KVM reprograms the timer hardware with the VCPU's values
>> +10. KVM marks the timer interrupt as active on the physical distributor
>> +11. KVM injects a forwarded physical interrupt to the guest
>> +12. KVM runs the VCPU
>> +
>> +Notice that KVM injects a forwarded physical interrupt in step 11 without
>> +the corresponding interrupt having actually fired on the host.  That is
>> +exactly why we mark the timer interrupt as active in step 10, because
>> +the active state on the physical distributor is part of the state
>> +belonging to the timer hardware, which is context-switched along with
>> +the VCPU thread.
>> +
>> +If the guest does not idle because it is busy, flow looks like this
>> +instead:
>> +
>> +1.  KVM runs the VCPU
>> +2.  The guest programs the time to fire in T+100
>> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
>> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
>> +    and injects a forwarded physical interrupt because it concludes the
>> +    timer has expired.
>> +6.  KVM marks the timer interrupt as active on the physical distributor
>> +7.  KVM runs the VCPU
>> +
>> +Notice that again the forwarded physical interrupt is injected to the
>> +guest without having actually been handled on the host.  In this case it
>> +is because the physical interrupt is forwarded to the guest before KVM
>> +enables physical interrupts on the CPU after exiting the guest.
>>
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-07 11:25     ` Andre Przywara
@ 2015-09-08 14:18       ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-08 14:18 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm

On Mon, Sep 07, 2015 at 12:25:27PM +0100, Andre Przywara wrote:
> Hi,
> 
> firstly: this text is really great, thanks for coming up with that.
> See below for some information I got from tracing the host which I
> cannot make sense of....
> 
> 
> On 04/09/15 20:40, Christoffer Dall wrote:
> > Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> > way we deal with them is not apparently easy to understand by reading
> > various specs.
> > 
> > Therefore, add a proper documentation file explaining the flow and
> > rationale of the behavior of the vgic.
> > 
> > Some of this text was contributed by Marc Zyngier and edited by me.
> > Omissions and errors are all mine.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
> >  1 file changed, 181 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > 
> > diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > new file mode 100644
> > index 0000000..24b6f28
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > @@ -0,0 +1,181 @@
> > +KVM/ARM VGIC Forwarded Physical Interrupts
> > +==========================================
> > +
> > +The KVM/ARM code implements software support for the ARM Generic
> > +Interrupt Controller's (GIC's) hardware support for virtualization by
> > +allowing software to inject virtual interrupts to a VM, which the guest
> > +OS sees as regular interrupts.  The code is famously known as the VGIC.
> > +
> > +Some of these virtual interrupts, however, correspond to physical
> > +interrupts from real physical devices.  One example could be the
> > +architected timer, which itself supports virtualization, and therefore
> > +lets a guest OS program the hardware device directly to raise an
> > +interrupt at some point in time.  When such an interrupt is raised, the
> > +host OS initially handles the interrupt and must somehow signal this
> > +event as a virtual interrupt to the guest.  Another example could be a
> > +passthrough device, where the physical interrupts are initially handled
> > +by the host, but the device driver for the device lives in the guest OS
> > +and KVM must therefore somehow inject a virtual interrupt on behalf of
> > +the physical one to the guest OS.
> > +
> > +These virtual interrupts corresponding to a physical interrupt on the
> > +host are called forwarded physical interrupts, but are also sometimes
> > +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> > +
> > +Forwarded physical interrupts are handled slightly differently compared
> > +to virtual interrupts generated purely by a software emulated device.
> > +
> > +
> > +The HW bit
> > +----------
> > +Virtual interrupts are signalled to the guest by programming the List
> > +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> > +with the virtual IRQ number and the state of the interrupt (Pending,
> > +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> > +interrupt, the LR state moves from Pending to Active, and finally to
> > +inactive.
> > +
> > +The LRs include an extra bit, called the HW bit.  When this bit is set,
> > +KVM must also program an additional field in the LR, the physical IRQ
> > +number, to link the virtual with the physical IRQ.
> > +
> > +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> > +bit, never both at the same time.
> > +
> > +Setting the HW bit causes the hardware to deactivate the physical
> > +interrupt on the physical distributor when the guest deactivates the
> > +corresponding virtual interrupt.
> > +
> > +
> > +Forwarded Physical Interrupts Life Cycle
> > +----------------------------------------
> > +
> > +The state of forwarded physical interrupts is managed in the following way:
> > +
> > +  - The physical interrupt is acked by the host, and becomes active on
> > +    the physical distributor (*).
> > +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> > +    interface is going to present it to the guest.
> > +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> > +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> > +    expected.
> > +  - On guest EOI, the *physical distributor* active bit gets cleared,
> > +    but the LR.Active is left untouched (set).
> 
> I tried hard in the last week, but couldn't confirm this. Tracing shows
> the following pattern over and over (case 1):
> (This is the kvm/kvm.git:queue branch from last week, so including the
> mapped timer IRQ code. Tests were done on Juno and Midway)
> 
> ...
> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> 0xffffffc0004089d8
> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
> ELRSR: 1, dist active: 0, log. active: 1
> ....
> 
> My hunch is that the following happens (please correct me if needed!):
> First there is an unrelated trap (line 1), then later the guest exits
> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
> The host injects the timer IRQ (not shown here) and returns to the
> guest. On the next trap (line 3, due to a stage 2 page fault),
> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
> GIC actually did deactivate both the LR (state=8, which is inactive,
> just the HW bit is still set) _and_ the state on the physical
> distributor (dist active=0). This trace_printk is just after entering
> the function, so before the code there performs these steps redundantly.
> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
> point of view this virtual IRQ cycle is finished.
> 
> The other sequence I see is this one (case 2):
> 
> ....
> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> 0xffffffc0004089d8
> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> ELRSR: 0, dist active: 1, log. active: 1
> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> ELRSR: 0, dist active: 0, log. active: 1
> ...
> 
> In line 1 the timer fires, the host injects the timer IRQ into the
> guest, which exits again in line 2 due to a page fault (may have IRQs
> disabled?). The LR dump in line 3 shows that the timer IRQ is still
> pending in the LR (state=9) and active on the physical distributor. Now
> the code in vgic_sync_hwirq() clears the active state in the physical
> distributor (by calling irq_set_irqchip_state()), but leaves the LR
> alone (by returning 0 to the caller).
> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
> (line 5), only that the physical dist state in now inactive (due to us
> clearing that explicitly during the last exit). Now vgic_sync_hwirq()
> returns 1, leading to the LR being cleaned up in the caller.
> So to me it looks like we kill that IRQ before the guest had the chance
> to handle it (presumably because it has IRQs off).
> 
> The distribution of those patterns in my particular snapshot are (all
> with timer IRQ 27):
>  7107  LR.state:  8, ELRSR: 1, dist active: 0, log. active: 1
>  1629  LR.state:  9, ELRSR: 0, dist active: 0, log. active: 1
>  1629  LR.state:  9, ELRSR: 0, dist active: 1, log. active: 1
>   331  LR.state: 10, ELRSR: 0, dist active: 1, log. active: 1
>    68  LR.state: 10, ELRSR: 0, dist active: 0, log. active: 1
> 
> So for the majority of exits with the timer having been injected before
> we redundantly clean the LR (case 1 above). Also there is quite a number
> of cases where we "kill" the IRQ (case 2 above). The active state case
> (state: 10 in the last two lines) seems to be a variation of case 2,
> just with the guest exiting from within the IRQ handler (after
> activation, before EOI).
> 
> I'd appreciate if someone could shed some light on this and show me
> where I am wrong here or what is going on instead.
> 
Hi Andre,

>From your write-up it's a bit unclear exactly where you feel the flow
breaks down compared to your trace.

However, I think the case where we kill the IRQ is the thing fixed in
the other commit "arm/arm64: KVM: vgic: Move active state handling to
flush_hwstate", which I sent recently.

Can you summarize what exactly your concerns are?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-08 14:18       ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-08 14:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 07, 2015 at 12:25:27PM +0100, Andre Przywara wrote:
> Hi,
> 
> firstly: this text is really great, thanks for coming up with that.
> See below for some information I got from tracing the host which I
> cannot make sense of....
> 
> 
> On 04/09/15 20:40, Christoffer Dall wrote:
> > Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> > way we deal with them is not apparently easy to understand by reading
> > various specs.
> > 
> > Therefore, add a proper documentation file explaining the flow and
> > rationale of the behavior of the vgic.
> > 
> > Some of this text was contributed by Marc Zyngier and edited by me.
> > Omissions and errors are all mine.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
> >  1 file changed, 181 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > 
> > diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > new file mode 100644
> > index 0000000..24b6f28
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > @@ -0,0 +1,181 @@
> > +KVM/ARM VGIC Forwarded Physical Interrupts
> > +==========================================
> > +
> > +The KVM/ARM code implements software support for the ARM Generic
> > +Interrupt Controller's (GIC's) hardware support for virtualization by
> > +allowing software to inject virtual interrupts to a VM, which the guest
> > +OS sees as regular interrupts.  The code is famously known as the VGIC.
> > +
> > +Some of these virtual interrupts, however, correspond to physical
> > +interrupts from real physical devices.  One example could be the
> > +architected timer, which itself supports virtualization, and therefore
> > +lets a guest OS program the hardware device directly to raise an
> > +interrupt at some point in time.  When such an interrupt is raised, the
> > +host OS initially handles the interrupt and must somehow signal this
> > +event as a virtual interrupt to the guest.  Another example could be a
> > +passthrough device, where the physical interrupts are initially handled
> > +by the host, but the device driver for the device lives in the guest OS
> > +and KVM must therefore somehow inject a virtual interrupt on behalf of
> > +the physical one to the guest OS.
> > +
> > +These virtual interrupts corresponding to a physical interrupt on the
> > +host are called forwarded physical interrupts, but are also sometimes
> > +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> > +
> > +Forwarded physical interrupts are handled slightly differently compared
> > +to virtual interrupts generated purely by a software emulated device.
> > +
> > +
> > +The HW bit
> > +----------
> > +Virtual interrupts are signalled to the guest by programming the List
> > +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> > +with the virtual IRQ number and the state of the interrupt (Pending,
> > +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> > +interrupt, the LR state moves from Pending to Active, and finally to
> > +inactive.
> > +
> > +The LRs include an extra bit, called the HW bit.  When this bit is set,
> > +KVM must also program an additional field in the LR, the physical IRQ
> > +number, to link the virtual with the physical IRQ.
> > +
> > +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> > +bit, never both at the same time.
> > +
> > +Setting the HW bit causes the hardware to deactivate the physical
> > +interrupt on the physical distributor when the guest deactivates the
> > +corresponding virtual interrupt.
> > +
> > +
> > +Forwarded Physical Interrupts Life Cycle
> > +----------------------------------------
> > +
> > +The state of forwarded physical interrupts is managed in the following way:
> > +
> > +  - The physical interrupt is acked by the host, and becomes active on
> > +    the physical distributor (*).
> > +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> > +    interface is going to present it to the guest.
> > +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> > +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> > +    expected.
> > +  - On guest EOI, the *physical distributor* active bit gets cleared,
> > +    but the LR.Active is left untouched (set).
> 
> I tried hard in the last week, but couldn't confirm this. Tracing shows
> the following pattern over and over (case 1):
> (This is the kvm/kvm.git:queue branch from last week, so including the
> mapped timer IRQ code. Tests were done on Juno and Midway)
> 
> ...
> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> 0xffffffc0004089d8
> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
> ELRSR: 1, dist active: 0, log. active: 1
> ....
> 
> My hunch is that the following happens (please correct me if needed!):
> First there is an unrelated trap (line 1), then later the guest exits
> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
> The host injects the timer IRQ (not shown here) and returns to the
> guest. On the next trap (line 3, due to a stage 2 page fault),
> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
> GIC actually did deactivate both the LR (state=8, which is inactive,
> just the HW bit is still set) _and_ the state on the physical
> distributor (dist active=0). This trace_printk is just after entering
> the function, so before the code there performs these steps redundantly.
> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
> point of view this virtual IRQ cycle is finished.
> 
> The other sequence I see is this one (case 2):
> 
> ....
> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> 0xffffffc0004089d8
> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> ELRSR: 0, dist active: 1, log. active: 1
> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> ELRSR: 0, dist active: 0, log. active: 1
> ...
> 
> In line 1 the timer fires, the host injects the timer IRQ into the
> guest, which exits again in line 2 due to a page fault (may have IRQs
> disabled?). The LR dump in line 3 shows that the timer IRQ is still
> pending in the LR (state=9) and active on the physical distributor. Now
> the code in vgic_sync_hwirq() clears the active state in the physical
> distributor (by calling irq_set_irqchip_state()), but leaves the LR
> alone (by returning 0 to the caller).
> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
> (line 5), only that the physical dist state in now inactive (due to us
> clearing that explicitly during the last exit). Now vgic_sync_hwirq()
> returns 1, leading to the LR being cleaned up in the caller.
> So to me it looks like we kill that IRQ before the guest had the chance
> to handle it (presumably because it has IRQs off).
> 
> The distribution of those patterns in my particular snapshot are (all
> with timer IRQ 27):
>  7107  LR.state:  8, ELRSR: 1, dist active: 0, log. active: 1
>  1629  LR.state:  9, ELRSR: 0, dist active: 0, log. active: 1
>  1629  LR.state:  9, ELRSR: 0, dist active: 1, log. active: 1
>   331  LR.state: 10, ELRSR: 0, dist active: 1, log. active: 1
>    68  LR.state: 10, ELRSR: 0, dist active: 0, log. active: 1
> 
> So for the majority of exits with the timer having been injected before
> we redundantly clean the LR (case 1 above). Also there is quite a number
> of cases where we "kill" the IRQ (case 2 above). The active state case
> (state: 10 in the last two lines) seems to be a variation of case 2,
> just with the guest exiting from within the IRQ handler (after
> activation, before EOI).
> 
> I'd appreciate if someone could shed some light on this and show me
> where I am wrong here or what is going on instead.
> 
Hi Andre,

>From your write-up it's a bit unclear exactly where you feel the flow
breaks down compared to your trace.

However, I think the case where we kill the IRQ is the thing fixed in
the other commit "arm/arm64: KVM: vgic: Move active state handling to
flush_hwstate", which I sent recently.

Can you summarize what exactly your concerns are?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-08  8:43       ` Eric Auger
@ 2015-09-08 16:57         ` Andre Przywara
  -1 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-08 16:57 UTC (permalink / raw)
  To: Eric Auger, Christoffer Dall, Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm

Hi Eric,

thanks for you answer.

On 08/09/15 09:43, Eric Auger wrote:
> Hi Andre,
> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>> Hi,
>>
>> firstly: this text is really great, thanks for coming up with that.
>> See below for some information I got from tracing the host which I
>> cannot make sense of....
>>
>>
>> On 04/09/15 20:40, Christoffer Dall wrote:
>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>> way we deal with them is not apparently easy to understand by reading
>>> various specs.
>>>
>>> Therefore, add a proper documentation file explaining the flow and
>>> rationale of the behavior of the vgic.
>>>
>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>> Omissions and errors are all mine.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>  1 file changed, 181 insertions(+)
>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>
>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> new file mode 100644
>>> index 0000000..24b6f28
>>> --- /dev/null
>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> @@ -0,0 +1,181 @@
>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>> +==========================================
>>> +
>>> +The KVM/ARM code implements software support for the ARM Generic
>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>> +
>>> +Some of these virtual interrupts, however, correspond to physical
>>> +interrupts from real physical devices.  One example could be the
>>> +architected timer, which itself supports virtualization, and therefore
>>> +lets a guest OS program the hardware device directly to raise an
>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>> +host OS initially handles the interrupt and must somehow signal this
>>> +event as a virtual interrupt to the guest.  Another example could be a
>>> +passthrough device, where the physical interrupts are initially handled
>>> +by the host, but the device driver for the device lives in the guest OS
>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>> +the physical one to the guest OS.
>>> +
>>> +These virtual interrupts corresponding to a physical interrupt on the
>>> +host are called forwarded physical interrupts, but are also sometimes
>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>> +
>>> +Forwarded physical interrupts are handled slightly differently compared
>>> +to virtual interrupts generated purely by a software emulated device.
>>> +
>>> +
>>> +The HW bit
>>> +----------
>>> +Virtual interrupts are signalled to the guest by programming the List
>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>> +inactive.
>>> +
>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>> +KVM must also program an additional field in the LR, the physical IRQ
>>> +number, to link the virtual with the physical IRQ.
>>> +
>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>> +bit, never both at the same time.
>>> +
>>> +Setting the HW bit causes the hardware to deactivate the physical
>>> +interrupt on the physical distributor when the guest deactivates the
>>> +corresponding virtual interrupt.
>>> +
>>> +
>>> +Forwarded Physical Interrupts Life Cycle
>>> +----------------------------------------
>>> +
>>> +The state of forwarded physical interrupts is managed in the following way:
>>> +
>>> +  - The physical interrupt is acked by the host, and becomes active on
>>> +    the physical distributor (*).
>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>>> +    interface is going to present it to the guest.
>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>> +    expected.
>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>>> +    but the LR.Active is left untouched (set).
>>
>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>> the following pattern over and over (case 1):
>> (This is the kvm/kvm.git:queue branch from last week, so including the
>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>
>> ...
>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>> 0xffffffc0004089d8
>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>> ELRSR: 1, dist active: 0, log. active: 1
>> ....
>>
>> My hunch is that the following happens (please correct me if needed!):
>> First there is an unrelated trap (line 1), then later the guest exits
>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>> The host injects the timer IRQ (not shown here) and returns to the
>> guest. On the next trap (line 3, due to a stage 2 page fault),
>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>> GIC actually did deactivate both the LR (state=8, which is inactive,
>> just the HW bit is still set) _and_ the state on the physical
>> distributor (dist active=0). This trace_printk is just after entering
>> the function, so before the code there performs these steps redundantly.
>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>> point of view this virtual IRQ cycle is finished.
>>
>> The other sequence I see is this one (case 2):
>>
>> ....
>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>> 0xffffffc0004089d8
>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>> ELRSR: 0, dist active: 1, log. active: 1
>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>> ELRSR: 0, dist active: 0, log. active: 1
>> ...
>>
>> In line 1 the timer fires, the host injects the timer IRQ into the
>> guest, which exits again in line 2 due to a page fault (may have IRQs
>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>> pending in the LR (state=9) and active on the physical distributor. Now
>> the code in vgic_sync_hwirq() clears the active state in the physical
>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>> alone (by returning 0 to the caller).
>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>> (line 5), only that the physical dist state in now inactive (due to us
>> clearing that explicitly during the last exit).
> Normally the physical dist state was set active on previous flush, right
> (done for all mapped IRQs)?

Where is this done? I see that the physical dist state is altered on the
actual IRQ forwarding, but not on later exits/entries? Do you mean
kvm_vgic_flush_hwstate() with "flush"?

> So are you sure the IRQ was not actually
> completed by the guest? As Christoffer mentions the LR active state can
> remain even if the IRQ was completed.

I was wondering where this behaviour Christoffer mentioned comes from?
Is this an observation, an implementation bug or is this mentioned in
the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
awkward to me.
I will try to add more tracing to see what is actually happening, trying
to trace a timer IRQ life cycle more accurately to see what's going on here.

Cheers,
Andre.

> Did I misunderstand the problem you try to shed the light on?
> 
> Cheers
> 
> Eric
> 
>  Now vgic_sync_hwirq()
>> returns 1, leading to the LR being cleaned up in the caller.
>> So to me it looks like we kill that IRQ before the guest had the chance
>> to handle it (presumably because it has IRQs off).
> 
>>
>> The distribution of those patterns in my particular snapshot are (all
>> with timer IRQ 27):
>>  7107  LR.state:  8, ELRSR: 1, dist active: 0, log. active: 1
>>  1629  LR.state:  9, ELRSR: 0, dist active: 0, log. active: 1
>>  1629  LR.state:  9, ELRSR: 0, dist active: 1, log. active: 1
>>   331  LR.state: 10, ELRSR: 0, dist active: 1, log. active: 1
>>    68  LR.state: 10, ELRSR: 0, dist active: 0, log. active: 1
>>
>> So for the majority of exits with the timer having been injected before
>> we redundantly clean the LR (case 1 above). Also there is quite a number
>> of cases where we "kill" the IRQ (case 2 above). The active state case
>> (state: 10 in the last two lines) seems to be a variation of case 2,
>> just with the guest exiting from within the IRQ handler (after
>> activation, before EOI).
>>
>> I'd appreciate if someone could shed some light on this and show me
>> where I am wrong here or what is going on instead.
>>
>> Cheers,
>> Andre.
>>
>>> +  - KVM clears the LR when on VM exits when the physical distributor
>>> +    active state has been cleared.
>>> +
>>> +(*): The host handling is slightly more complicated.  For some devices
>>> +(shared), KVM directly sets the active state on the physical distributor
>>> +before entering the guest, and for some devices (non-shared) the host
>>> +configures the GIC such that it does not deactivate the interrupt on
>>> +host EOIs, but only performs a priority drop allowing the GIC to receive
>>> +other interrupts and leaves the interrupt in the active state on the
>>> +physical distributor.
>>> +
>>> +
>>> +Forwarded Edge and Level Triggered PPIs and SPIs
>>> +------------------------------------------------
>>> +Forwarded physical interrupts injected should always be active on the
>>> +physical distributor when injected to a guest.
>>> +
>>> +Level-triggered interrupts will keep the interrupt line to the GIC
>>> +asserted, typically until the guest programs the device to deassert the
>>> +line.  This means that the interrupt will remain pending on the physical
>>> +distributor until the guest has reprogrammed the device.  Since we
>>> +always run the VM with interrupts enabled on the CPU, a pending
>>> +interrupt will exit the guest as soon as we switch into the guest,
>>> +preventing the guest from ever making progress as the process repeats
>>> +over and over.  Therefore, the active state on the physical distributor
>>> +must be set when entering the guest, preventing the GIC from forwarding
>>> +the pending interrupt to the CPU.  As soon as the guest deactivates
>>> +(EOIs) the interrupt, the physical line is sampled by the hardware again
>>> +and the host takes a new interrupt if and only if the physical line is
>>> +still asserted.
>>> +
>>> +Edge-triggered interrupts do not exhibit the same problem with
>>> +preventing guest execution that level-triggered interrupts do.  One
>>> +option is to not use HW bit at all, and inject edge-triggered interrupts
>>> +from a physical device as pure virtual interrupts.  But that would
>>> +potentially slow down handling of the interrupt in the guest, because a
>>> +physical interrupt occurring in the middle of the guest ISR would
>>> +preempt the guest for the host to handle the interrupt.  Additionally,
>>> +if you configure the system to handle interrupts on a separate physical
>>> +core from that running your VCPU, you still have to interrupt the VCPU
>>> +to queue the pending state onto the LR, even though the guest won't use
>>> +this information until the guest ISR completes.  Therefore, the HW
>>> +bit should always be set for forwarded edge-triggered interrupts.  With
>>> +the HW bit set, the virtual interrupt is injected and additional
>>> +physical interrupts occurring before the guest deactivates the interrupt
>>> +simply mark the state on the physical distributor as Pending+Active.  As
>>> +soon as the guest deactivates the interrupt, the host takes another
>>> +interrupt if and only if there was a physical interrupt between
>>> +injecting the forwarded interrupt to the guest the guest deactivating
>>> +the interrupt.
>>> +
>>> +Consequently, whenever we schedule a VCPU with one or more LRs with the
>>> +HW bit set, the interrupt must also be active on the physical
>>> +distributor.
>>> +
>>> +
>>> +Forwarded LPIs
>>> +--------------
>>> +LPIs, introduced in GICv3, are always edge-triggered and do not have an
>>> +active state.  They become pending when a device signal them, and as
>>> +soon as they are acked by the CPU, they are inactive again.
>>> +
>>> +It therefore doesn't make sense, and is not supported, to set the HW bit
>>> +for physical LPIs that are forwarded to a VM as virtual interrupts,
>>> +typically virtual SPIs.
>>> +
>>> +For LPIs, there is no other choice than to preempt the VCPU thread if
>>> +necessary, and queue the pending state onto the LR.
>>> +
>>> +
>>> +Putting It Together: The Architected Timer
>>> +------------------------------------------
>>> +The architected timer is a device that signals interrupts with level
>>> +triggered semantics.  The timer hardware is directly accessed by VCPUs
>>> +which program the timer to fire at some point in time.  Each VCPU on a
>>> +system programs the timer to fire at different times, and therefore the
>>> +hardware is multiplexed between multiple VCPUs.  This is implemented by
>>> +context-switching the timer state along with each VCPU thread.
>>> +
>>> +However, this means that a scenario like the following is entirely
>>> +possible, and in fact, typical:
>>> +
>>> +1.  KVM runs the VCPU
>>> +2.  The guest programs the time to fire in T+100
>>> +3.  The guest is idle and calls WFI (wait-for-interrupts)
>>> +4.  The hardware traps to the host
>>> +5.  KVM stores the timer state to memory and disables the hardware timer
>>> +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
>>> +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
>>> +8.  The soft timer fires, waking up the VCPU thread
>>> +9.  KVM reprograms the timer hardware with the VCPU's values
>>> +10. KVM marks the timer interrupt as active on the physical distributor
>>> +11. KVM injects a forwarded physical interrupt to the guest
>>> +12. KVM runs the VCPU
>>> +
>>> +Notice that KVM injects a forwarded physical interrupt in step 11 without
>>> +the corresponding interrupt having actually fired on the host.  That is
>>> +exactly why we mark the timer interrupt as active in step 10, because
>>> +the active state on the physical distributor is part of the state
>>> +belonging to the timer hardware, which is context-switched along with
>>> +the VCPU thread.
>>> +
>>> +If the guest does not idle because it is busy, flow looks like this
>>> +instead:
>>> +
>>> +1.  KVM runs the VCPU
>>> +2.  The guest programs the time to fire in T+100
>>> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
>>> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
>>> +    and injects a forwarded physical interrupt because it concludes the
>>> +    timer has expired.
>>> +6.  KVM marks the timer interrupt as active on the physical distributor
>>> +7.  KVM runs the VCPU
>>> +
>>> +Notice that again the forwarded physical interrupt is injected to the
>>> +guest without having actually been handled on the host.  In this case it
>>> +is because the physical interrupt is forwarded to the guest before KVM
>>> +enables physical interrupts on the CPU after exiting the guest.
>>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-08 16:57         ` Andre Przywara
  0 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-08 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Eric,

thanks for you answer.

On 08/09/15 09:43, Eric Auger wrote:
> Hi Andre,
> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>> Hi,
>>
>> firstly: this text is really great, thanks for coming up with that.
>> See below for some information I got from tracing the host which I
>> cannot make sense of....
>>
>>
>> On 04/09/15 20:40, Christoffer Dall wrote:
>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>> way we deal with them is not apparently easy to understand by reading
>>> various specs.
>>>
>>> Therefore, add a proper documentation file explaining the flow and
>>> rationale of the behavior of the vgic.
>>>
>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>> Omissions and errors are all mine.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>  1 file changed, 181 insertions(+)
>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>
>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> new file mode 100644
>>> index 0000000..24b6f28
>>> --- /dev/null
>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>> @@ -0,0 +1,181 @@
>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>> +==========================================
>>> +
>>> +The KVM/ARM code implements software support for the ARM Generic
>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>> +
>>> +Some of these virtual interrupts, however, correspond to physical
>>> +interrupts from real physical devices.  One example could be the
>>> +architected timer, which itself supports virtualization, and therefore
>>> +lets a guest OS program the hardware device directly to raise an
>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>> +host OS initially handles the interrupt and must somehow signal this
>>> +event as a virtual interrupt to the guest.  Another example could be a
>>> +passthrough device, where the physical interrupts are initially handled
>>> +by the host, but the device driver for the device lives in the guest OS
>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>> +the physical one to the guest OS.
>>> +
>>> +These virtual interrupts corresponding to a physical interrupt on the
>>> +host are called forwarded physical interrupts, but are also sometimes
>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>> +
>>> +Forwarded physical interrupts are handled slightly differently compared
>>> +to virtual interrupts generated purely by a software emulated device.
>>> +
>>> +
>>> +The HW bit
>>> +----------
>>> +Virtual interrupts are signalled to the guest by programming the List
>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>> +inactive.
>>> +
>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>> +KVM must also program an additional field in the LR, the physical IRQ
>>> +number, to link the virtual with the physical IRQ.
>>> +
>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>> +bit, never both at the same time.
>>> +
>>> +Setting the HW bit causes the hardware to deactivate the physical
>>> +interrupt on the physical distributor when the guest deactivates the
>>> +corresponding virtual interrupt.
>>> +
>>> +
>>> +Forwarded Physical Interrupts Life Cycle
>>> +----------------------------------------
>>> +
>>> +The state of forwarded physical interrupts is managed in the following way:
>>> +
>>> +  - The physical interrupt is acked by the host, and becomes active on
>>> +    the physical distributor (*).
>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>>> +    interface is going to present it to the guest.
>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>> +    expected.
>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>>> +    but the LR.Active is left untouched (set).
>>
>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>> the following pattern over and over (case 1):
>> (This is the kvm/kvm.git:queue branch from last week, so including the
>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>
>> ...
>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>> 0xffffffc0004089d8
>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>> ELRSR: 1, dist active: 0, log. active: 1
>> ....
>>
>> My hunch is that the following happens (please correct me if needed!):
>> First there is an unrelated trap (line 1), then later the guest exits
>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>> The host injects the timer IRQ (not shown here) and returns to the
>> guest. On the next trap (line 3, due to a stage 2 page fault),
>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>> GIC actually did deactivate both the LR (state=8, which is inactive,
>> just the HW bit is still set) _and_ the state on the physical
>> distributor (dist active=0). This trace_printk is just after entering
>> the function, so before the code there performs these steps redundantly.
>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>> point of view this virtual IRQ cycle is finished.
>>
>> The other sequence I see is this one (case 2):
>>
>> ....
>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>> 0xffffffc0004089d8
>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>> ELRSR: 0, dist active: 1, log. active: 1
>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>> ELRSR: 0, dist active: 0, log. active: 1
>> ...
>>
>> In line 1 the timer fires, the host injects the timer IRQ into the
>> guest, which exits again in line 2 due to a page fault (may have IRQs
>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>> pending in the LR (state=9) and active on the physical distributor. Now
>> the code in vgic_sync_hwirq() clears the active state in the physical
>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>> alone (by returning 0 to the caller).
>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>> (line 5), only that the physical dist state in now inactive (due to us
>> clearing that explicitly during the last exit).
> Normally the physical dist state was set active on previous flush, right
> (done for all mapped IRQs)?

Where is this done? I see that the physical dist state is altered on the
actual IRQ forwarding, but not on later exits/entries? Do you mean
kvm_vgic_flush_hwstate() with "flush"?

> So are you sure the IRQ was not actually
> completed by the guest? As Christoffer mentions the LR active state can
> remain even if the IRQ was completed.

I was wondering where this behaviour Christoffer mentioned comes from?
Is this an observation, an implementation bug or is this mentioned in
the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
awkward to me.
I will try to add more tracing to see what is actually happening, trying
to trace a timer IRQ life cycle more accurately to see what's going on here.

Cheers,
Andre.

> Did I misunderstand the problem you try to shed the light on?
> 
> Cheers
> 
> Eric
> 
>  Now vgic_sync_hwirq()
>> returns 1, leading to the LR being cleaned up in the caller.
>> So to me it looks like we kill that IRQ before the guest had the chance
>> to handle it (presumably because it has IRQs off).
> 
>>
>> The distribution of those patterns in my particular snapshot are (all
>> with timer IRQ 27):
>>  7107  LR.state:  8, ELRSR: 1, dist active: 0, log. active: 1
>>  1629  LR.state:  9, ELRSR: 0, dist active: 0, log. active: 1
>>  1629  LR.state:  9, ELRSR: 0, dist active: 1, log. active: 1
>>   331  LR.state: 10, ELRSR: 0, dist active: 1, log. active: 1
>>    68  LR.state: 10, ELRSR: 0, dist active: 0, log. active: 1
>>
>> So for the majority of exits with the timer having been injected before
>> we redundantly clean the LR (case 1 above). Also there is quite a number
>> of cases where we "kill" the IRQ (case 2 above). The active state case
>> (state: 10 in the last two lines) seems to be a variation of case 2,
>> just with the guest exiting from within the IRQ handler (after
>> activation, before EOI).
>>
>> I'd appreciate if someone could shed some light on this and show me
>> where I am wrong here or what is going on instead.
>>
>> Cheers,
>> Andre.
>>
>>> +  - KVM clears the LR when on VM exits when the physical distributor
>>> +    active state has been cleared.
>>> +
>>> +(*): The host handling is slightly more complicated.  For some devices
>>> +(shared), KVM directly sets the active state on the physical distributor
>>> +before entering the guest, and for some devices (non-shared) the host
>>> +configures the GIC such that it does not deactivate the interrupt on
>>> +host EOIs, but only performs a priority drop allowing the GIC to receive
>>> +other interrupts and leaves the interrupt in the active state on the
>>> +physical distributor.
>>> +
>>> +
>>> +Forwarded Edge and Level Triggered PPIs and SPIs
>>> +------------------------------------------------
>>> +Forwarded physical interrupts injected should always be active on the
>>> +physical distributor when injected to a guest.
>>> +
>>> +Level-triggered interrupts will keep the interrupt line to the GIC
>>> +asserted, typically until the guest programs the device to deassert the
>>> +line.  This means that the interrupt will remain pending on the physical
>>> +distributor until the guest has reprogrammed the device.  Since we
>>> +always run the VM with interrupts enabled on the CPU, a pending
>>> +interrupt will exit the guest as soon as we switch into the guest,
>>> +preventing the guest from ever making progress as the process repeats
>>> +over and over.  Therefore, the active state on the physical distributor
>>> +must be set when entering the guest, preventing the GIC from forwarding
>>> +the pending interrupt to the CPU.  As soon as the guest deactivates
>>> +(EOIs) the interrupt, the physical line is sampled by the hardware again
>>> +and the host takes a new interrupt if and only if the physical line is
>>> +still asserted.
>>> +
>>> +Edge-triggered interrupts do not exhibit the same problem with
>>> +preventing guest execution that level-triggered interrupts do.  One
>>> +option is to not use HW bit at all, and inject edge-triggered interrupts
>>> +from a physical device as pure virtual interrupts.  But that would
>>> +potentially slow down handling of the interrupt in the guest, because a
>>> +physical interrupt occurring in the middle of the guest ISR would
>>> +preempt the guest for the host to handle the interrupt.  Additionally,
>>> +if you configure the system to handle interrupts on a separate physical
>>> +core from that running your VCPU, you still have to interrupt the VCPU
>>> +to queue the pending state onto the LR, even though the guest won't use
>>> +this information until the guest ISR completes.  Therefore, the HW
>>> +bit should always be set for forwarded edge-triggered interrupts.  With
>>> +the HW bit set, the virtual interrupt is injected and additional
>>> +physical interrupts occurring before the guest deactivates the interrupt
>>> +simply mark the state on the physical distributor as Pending+Active.  As
>>> +soon as the guest deactivates the interrupt, the host takes another
>>> +interrupt if and only if there was a physical interrupt between
>>> +injecting the forwarded interrupt to the guest the guest deactivating
>>> +the interrupt.
>>> +
>>> +Consequently, whenever we schedule a VCPU with one or more LRs with the
>>> +HW bit set, the interrupt must also be active on the physical
>>> +distributor.
>>> +
>>> +
>>> +Forwarded LPIs
>>> +--------------
>>> +LPIs, introduced in GICv3, are always edge-triggered and do not have an
>>> +active state.  They become pending when a device signal them, and as
>>> +soon as they are acked by the CPU, they are inactive again.
>>> +
>>> +It therefore doesn't make sense, and is not supported, to set the HW bit
>>> +for physical LPIs that are forwarded to a VM as virtual interrupts,
>>> +typically virtual SPIs.
>>> +
>>> +For LPIs, there is no other choice than to preempt the VCPU thread if
>>> +necessary, and queue the pending state onto the LR.
>>> +
>>> +
>>> +Putting It Together: The Architected Timer
>>> +------------------------------------------
>>> +The architected timer is a device that signals interrupts with level
>>> +triggered semantics.  The timer hardware is directly accessed by VCPUs
>>> +which program the timer to fire at some point in time.  Each VCPU on a
>>> +system programs the timer to fire at different times, and therefore the
>>> +hardware is multiplexed between multiple VCPUs.  This is implemented by
>>> +context-switching the timer state along with each VCPU thread.
>>> +
>>> +However, this means that a scenario like the following is entirely
>>> +possible, and in fact, typical:
>>> +
>>> +1.  KVM runs the VCPU
>>> +2.  The guest programs the time to fire in T+100
>>> +3.  The guest is idle and calls WFI (wait-for-interrupts)
>>> +4.  The hardware traps to the host
>>> +5.  KVM stores the timer state to memory and disables the hardware timer
>>> +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
>>> +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
>>> +8.  The soft timer fires, waking up the VCPU thread
>>> +9.  KVM reprograms the timer hardware with the VCPU's values
>>> +10. KVM marks the timer interrupt as active on the physical distributor
>>> +11. KVM injects a forwarded physical interrupt to the guest
>>> +12. KVM runs the VCPU
>>> +
>>> +Notice that KVM injects a forwarded physical interrupt in step 11 without
>>> +the corresponding interrupt having actually fired on the host.  That is
>>> +exactly why we mark the timer interrupt as active in step 10, because
>>> +the active state on the physical distributor is part of the state
>>> +belonging to the timer hardware, which is context-switched along with
>>> +the VCPU thread.
>>> +
>>> +If the guest does not idle because it is busy, flow looks like this
>>> +instead:
>>> +
>>> +1.  KVM runs the VCPU
>>> +2.  The guest programs the time to fire in T+100
>>> +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
>>> +5.  With interrupts disabled on the CPU, KVM looks at the timer state
>>> +    and injects a forwarded physical interrupt because it concludes the
>>> +    timer has expired.
>>> +6.  KVM marks the timer interrupt as active on the physical distributor
>>> +7.  KVM runs the VCPU
>>> +
>>> +Notice that again the forwarded physical interrupt is injected to the
>>> +guest without having actually been handled on the host.  In this case it
>>> +is because the physical interrupt is forwarded to the guest before KVM
>>> +enables physical interrupts on the CPU after exiting the guest.
>>>
>> _______________________________________________
>> kvmarm mailing list
>> kvmarm at lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-08 16:57         ` Andre Przywara
@ 2015-09-09  8:49           ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-09  8:49 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Eric Auger, Marc Zyngier, kvmarm, linux-arm-kernel, kvm

On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara@arm.com> wrote:
> Hi Eric,
>
> thanks for you answer.
>
> On 08/09/15 09:43, Eric Auger wrote:
>> Hi Andre,
>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>>> Hi,
>>>
>>> firstly: this text is really great, thanks for coming up with that.
>>> See below for some information I got from tracing the host which I
>>> cannot make sense of....
>>>
>>>
>>> On 04/09/15 20:40, Christoffer Dall wrote:
>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>>> way we deal with them is not apparently easy to understand by reading
>>>> various specs.
>>>>
>>>> Therefore, add a proper documentation file explaining the flow and
>>>> rationale of the behavior of the vgic.
>>>>
>>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>>> Omissions and errors are all mine.
>>>>
>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>> ---
>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>>  1 file changed, 181 insertions(+)
>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>
>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>> new file mode 100644
>>>> index 0000000..24b6f28
>>>> --- /dev/null
>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>> @@ -0,0 +1,181 @@
>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>>> +==========================================
>>>> +
>>>> +The KVM/ARM code implements software support for the ARM Generic
>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>>> +
>>>> +Some of these virtual interrupts, however, correspond to physical
>>>> +interrupts from real physical devices.  One example could be the
>>>> +architected timer, which itself supports virtualization, and therefore
>>>> +lets a guest OS program the hardware device directly to raise an
>>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>>> +host OS initially handles the interrupt and must somehow signal this
>>>> +event as a virtual interrupt to the guest.  Another example could be a
>>>> +passthrough device, where the physical interrupts are initially handled
>>>> +by the host, but the device driver for the device lives in the guest OS
>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>>> +the physical one to the guest OS.
>>>> +
>>>> +These virtual interrupts corresponding to a physical interrupt on the
>>>> +host are called forwarded physical interrupts, but are also sometimes
>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>>> +
>>>> +Forwarded physical interrupts are handled slightly differently compared
>>>> +to virtual interrupts generated purely by a software emulated device.
>>>> +
>>>> +
>>>> +The HW bit
>>>> +----------
>>>> +Virtual interrupts are signalled to the guest by programming the List
>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>>> +inactive.
>>>> +
>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>>> +KVM must also program an additional field in the LR, the physical IRQ
>>>> +number, to link the virtual with the physical IRQ.
>>>> +
>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>>> +bit, never both at the same time.
>>>> +
>>>> +Setting the HW bit causes the hardware to deactivate the physical
>>>> +interrupt on the physical distributor when the guest deactivates the
>>>> +corresponding virtual interrupt.
>>>> +
>>>> +
>>>> +Forwarded Physical Interrupts Life Cycle
>>>> +----------------------------------------
>>>> +
>>>> +The state of forwarded physical interrupts is managed in the following way:
>>>> +
>>>> +  - The physical interrupt is acked by the host, and becomes active on
>>>> +    the physical distributor (*).
>>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>>>> +    interface is going to present it to the guest.
>>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>>> +    expected.
>>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>>>> +    but the LR.Active is left untouched (set).
>>>
>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>>> the following pattern over and over (case 1):
>>> (This is the kvm/kvm.git:queue branch from last week, so including the
>>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>>
>>> ...
>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>> 0xffffffc0004089d8
>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>>> ELRSR: 1, dist active: 0, log. active: 1
>>> ....
>>>
>>> My hunch is that the following happens (please correct me if needed!):
>>> First there is an unrelated trap (line 1), then later the guest exits
>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>>> The host injects the timer IRQ (not shown here) and returns to the
>>> guest. On the next trap (line 3, due to a stage 2 page fault),
>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>>> GIC actually did deactivate both the LR (state=8, which is inactive,
>>> just the HW bit is still set) _and_ the state on the physical
>>> distributor (dist active=0). This trace_printk is just after entering
>>> the function, so before the code there performs these steps redundantly.
>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>>> point of view this virtual IRQ cycle is finished.
>>>
>>> The other sequence I see is this one (case 2):
>>>
>>> ....
>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>> 0xffffffc0004089d8
>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>> ELRSR: 0, dist active: 1, log. active: 1
>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>> ELRSR: 0, dist active: 0, log. active: 1
>>> ...
>>>
>>> In line 1 the timer fires, the host injects the timer IRQ into the
>>> guest, which exits again in line 2 due to a page fault (may have IRQs
>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>>> pending in the LR (state=9) and active on the physical distributor. Now
>>> the code in vgic_sync_hwirq() clears the active state in the physical
>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>>> alone (by returning 0 to the caller).
>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>>> (line 5), only that the physical dist state in now inactive (due to us
>>> clearing that explicitly during the last exit).
>> Normally the physical dist state was set active on previous flush, right
>> (done for all mapped IRQs)?
>
> Where is this done? I see that the physical dist state is altered on the
> actual IRQ forwarding, but not on later exits/entries? Do you mean
> kvm_vgic_flush_hwstate() with "flush"?

this is a bug and should be fixed in the 'fixes' patches I sent last
week.  We should set active state on every entry to the guest for IRQs
with the HW bit set in either pending or active state.

>
>> So are you sure the IRQ was not actually
>> completed by the guest? As Christoffer mentions the LR active state can
>> remain even if the IRQ was completed.
>
> I was wondering where this behaviour Christoffer mentioned comes from?

>From how I understand the architecture and from talking to Marc.

> Is this an observation, an implementation bug or is this mentioned in
> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
> awkward to me.

What do you mean?  How are we spoon-feeding the VGIC?

> I will try to add more tracing to see what is actually happening, trying
> to trace a timer IRQ life cycle more accurately to see what's going on here.
>

By all means, trace through the thing, it would be great to get others
to look at this, but I recommend applying both the fixes I sent and
this v2 timer rework series before doing so, because otherwise things
don't work as I outline in this document.

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-09  8:49           ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-09  8:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara@arm.com> wrote:
> Hi Eric,
>
> thanks for you answer.
>
> On 08/09/15 09:43, Eric Auger wrote:
>> Hi Andre,
>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>>> Hi,
>>>
>>> firstly: this text is really great, thanks for coming up with that.
>>> See below for some information I got from tracing the host which I
>>> cannot make sense of....
>>>
>>>
>>> On 04/09/15 20:40, Christoffer Dall wrote:
>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>>> way we deal with them is not apparently easy to understand by reading
>>>> various specs.
>>>>
>>>> Therefore, add a proper documentation file explaining the flow and
>>>> rationale of the behavior of the vgic.
>>>>
>>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>>> Omissions and errors are all mine.
>>>>
>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>> ---
>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>>  1 file changed, 181 insertions(+)
>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>
>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>> new file mode 100644
>>>> index 0000000..24b6f28
>>>> --- /dev/null
>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>> @@ -0,0 +1,181 @@
>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>>> +==========================================
>>>> +
>>>> +The KVM/ARM code implements software support for the ARM Generic
>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>>> +
>>>> +Some of these virtual interrupts, however, correspond to physical
>>>> +interrupts from real physical devices.  One example could be the
>>>> +architected timer, which itself supports virtualization, and therefore
>>>> +lets a guest OS program the hardware device directly to raise an
>>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>>> +host OS initially handles the interrupt and must somehow signal this
>>>> +event as a virtual interrupt to the guest.  Another example could be a
>>>> +passthrough device, where the physical interrupts are initially handled
>>>> +by the host, but the device driver for the device lives in the guest OS
>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>>> +the physical one to the guest OS.
>>>> +
>>>> +These virtual interrupts corresponding to a physical interrupt on the
>>>> +host are called forwarded physical interrupts, but are also sometimes
>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>>> +
>>>> +Forwarded physical interrupts are handled slightly differently compared
>>>> +to virtual interrupts generated purely by a software emulated device.
>>>> +
>>>> +
>>>> +The HW bit
>>>> +----------
>>>> +Virtual interrupts are signalled to the guest by programming the List
>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>>> +inactive.
>>>> +
>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>>> +KVM must also program an additional field in the LR, the physical IRQ
>>>> +number, to link the virtual with the physical IRQ.
>>>> +
>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>>> +bit, never both at the same time.
>>>> +
>>>> +Setting the HW bit causes the hardware to deactivate the physical
>>>> +interrupt on the physical distributor when the guest deactivates the
>>>> +corresponding virtual interrupt.
>>>> +
>>>> +
>>>> +Forwarded Physical Interrupts Life Cycle
>>>> +----------------------------------------
>>>> +
>>>> +The state of forwarded physical interrupts is managed in the following way:
>>>> +
>>>> +  - The physical interrupt is acked by the host, and becomes active on
>>>> +    the physical distributor (*).
>>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>>>> +    interface is going to present it to the guest.
>>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>>> +    expected.
>>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>>>> +    but the LR.Active is left untouched (set).
>>>
>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>>> the following pattern over and over (case 1):
>>> (This is the kvm/kvm.git:queue branch from last week, so including the
>>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>>
>>> ...
>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>> 0xffffffc0004089d8
>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>>> ELRSR: 1, dist active: 0, log. active: 1
>>> ....
>>>
>>> My hunch is that the following happens (please correct me if needed!):
>>> First there is an unrelated trap (line 1), then later the guest exits
>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>>> The host injects the timer IRQ (not shown here) and returns to the
>>> guest. On the next trap (line 3, due to a stage 2 page fault),
>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>>> GIC actually did deactivate both the LR (state=8, which is inactive,
>>> just the HW bit is still set) _and_ the state on the physical
>>> distributor (dist active=0). This trace_printk is just after entering
>>> the function, so before the code there performs these steps redundantly.
>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>>> point of view this virtual IRQ cycle is finished.
>>>
>>> The other sequence I see is this one (case 2):
>>>
>>> ....
>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>> 0xffffffc0004089d8
>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>> ELRSR: 0, dist active: 1, log. active: 1
>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>> ELRSR: 0, dist active: 0, log. active: 1
>>> ...
>>>
>>> In line 1 the timer fires, the host injects the timer IRQ into the
>>> guest, which exits again in line 2 due to a page fault (may have IRQs
>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>>> pending in the LR (state=9) and active on the physical distributor. Now
>>> the code in vgic_sync_hwirq() clears the active state in the physical
>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>>> alone (by returning 0 to the caller).
>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>>> (line 5), only that the physical dist state in now inactive (due to us
>>> clearing that explicitly during the last exit).
>> Normally the physical dist state was set active on previous flush, right
>> (done for all mapped IRQs)?
>
> Where is this done? I see that the physical dist state is altered on the
> actual IRQ forwarding, but not on later exits/entries? Do you mean
> kvm_vgic_flush_hwstate() with "flush"?

this is a bug and should be fixed in the 'fixes' patches I sent last
week.  We should set active state on every entry to the guest for IRQs
with the HW bit set in either pending or active state.

>
>> So are you sure the IRQ was not actually
>> completed by the guest? As Christoffer mentions the LR active state can
>> remain even if the IRQ was completed.
>
> I was wondering where this behaviour Christoffer mentioned comes from?

>From how I understand the architecture and from talking to Marc.

> Is this an observation, an implementation bug or is this mentioned in
> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
> awkward to me.

What do you mean?  How are we spoon-feeding the VGIC?

> I will try to add more tracing to see what is actually happening, trying
> to trace a timer IRQ life cycle more accurately to see what's going on here.
>

By all means, trace through the thing, it would be great to get others
to look at this, but I recommend applying both the fixes I sent and
this v2 timer rework series before doing so, because otherwise things
don't work as I outline in this document.

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-09  8:49           ` Christoffer Dall
@ 2015-09-09  8:57             ` Eric Auger
  -1 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-09  8:57 UTC (permalink / raw)
  To: Christoffer Dall, Andre Przywara
  Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm

Salut Andre,
On 09/09/2015 10:49 AM, Christoffer Dall wrote:
> On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara@arm.com> wrote:
>> Hi Eric,
>>
>> thanks for you answer.
>>
>> On 08/09/15 09:43, Eric Auger wrote:
>>> Hi Andre,
>>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>>>> Hi,
>>>>
>>>> firstly: this text is really great, thanks for coming up with that.
>>>> See below for some information I got from tracing the host which I
>>>> cannot make sense of....
>>>>
>>>>
>>>> On 04/09/15 20:40, Christoffer Dall wrote:
>>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>>>> way we deal with them is not apparently easy to understand by reading
>>>>> various specs.
>>>>>
>>>>> Therefore, add a proper documentation file explaining the flow and
>>>>> rationale of the behavior of the vgic.
>>>>>
>>>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>>>> Omissions and errors are all mine.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> ---
>>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>>>  1 file changed, 181 insertions(+)
>>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>>
>>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> new file mode 100644
>>>>> index 0000000..24b6f28
>>>>> --- /dev/null
>>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> @@ -0,0 +1,181 @@
>>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>>>> +==========================================
>>>>> +
>>>>> +The KVM/ARM code implements software support for the ARM Generic
>>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>>>> +
>>>>> +Some of these virtual interrupts, however, correspond to physical
>>>>> +interrupts from real physical devices.  One example could be the
>>>>> +architected timer, which itself supports virtualization, and therefore
>>>>> +lets a guest OS program the hardware device directly to raise an
>>>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>>>> +host OS initially handles the interrupt and must somehow signal this
>>>>> +event as a virtual interrupt to the guest.  Another example could be a
>>>>> +passthrough device, where the physical interrupts are initially handled
>>>>> +by the host, but the device driver for the device lives in the guest OS
>>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>>>> +the physical one to the guest OS.
>>>>> +
>>>>> +These virtual interrupts corresponding to a physical interrupt on the
>>>>> +host are called forwarded physical interrupts, but are also sometimes
>>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>>>> +
>>>>> +Forwarded physical interrupts are handled slightly differently compared
>>>>> +to virtual interrupts generated purely by a software emulated device.
>>>>> +
>>>>> +
>>>>> +The HW bit
>>>>> +----------
>>>>> +Virtual interrupts are signalled to the guest by programming the List
>>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>>>> +inactive.
>>>>> +
>>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>>>> +KVM must also program an additional field in the LR, the physical IRQ
>>>>> +number, to link the virtual with the physical IRQ.
>>>>> +
>>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>>>> +bit, never both at the same time.
>>>>> +
>>>>> +Setting the HW bit causes the hardware to deactivate the physical
>>>>> +interrupt on the physical distributor when the guest deactivates the
>>>>> +corresponding virtual interrupt.
>>>>> +
>>>>> +
>>>>> +Forwarded Physical Interrupts Life Cycle
>>>>> +----------------------------------------
>>>>> +
>>>>> +The state of forwarded physical interrupts is managed in the following way:
>>>>> +
>>>>> +  - The physical interrupt is acked by the host, and becomes active on
>>>>> +    the physical distributor (*).
>>>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>>>>> +    interface is going to present it to the guest.
>>>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>>>> +    expected.
>>>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>>>>> +    but the LR.Active is left untouched (set).
>>>>
>>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>>>> the following pattern over and over (case 1):
>>>> (This is the kvm/kvm.git:queue branch from last week, so including the
>>>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>>>
>>>> ...
>>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>>> 0xffffffc0004089d8
>>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>>>> ELRSR: 1, dist active: 0, log. active: 1
>>>> ....
>>>>
>>>> My hunch is that the following happens (please correct me if needed!):
>>>> First there is an unrelated trap (line 1), then later the guest exits
>>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>>>> The host injects the timer IRQ (not shown here) and returns to the
>>>> guest. On the next trap (line 3, due to a stage 2 page fault),
>>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>>>> GIC actually did deactivate both the LR (state=8, which is inactive,
>>>> just the HW bit is still set) _and_ the state on the physical
>>>> distributor (dist active=0). This trace_printk is just after entering
>>>> the function, so before the code there performs these steps redundantly.
>>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>>>> point of view this virtual IRQ cycle is finished.
>>>>
>>>> The other sequence I see is this one (case 2):
>>>>
>>>> ....
>>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>>> 0xffffffc0004089d8
>>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>>> ELRSR: 0, dist active: 1, log. active: 1
>>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>>> ELRSR: 0, dist active: 0, log. active: 1
>>>> ...
>>>>
>>>> In line 1 the timer fires, the host injects the timer IRQ into the
>>>> guest, which exits again in line 2 due to a page fault (may have IRQs
>>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>>>> pending in the LR (state=9) and active on the physical distributor. Now
>>>> the code in vgic_sync_hwirq() clears the active state in the physical
>>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>>>> alone (by returning 0 to the caller).
>>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>>>> (line 5), only that the physical dist state in now inactive (due to us
>>>> clearing that explicitly during the last exit).
>>> Normally the physical dist state was set active on previous flush, right
>>> (done for all mapped IRQs)?
>>
>> Where is this done? I see that the physical dist state is altered on the
>> actual IRQ forwarding, but not on later exits/entries? Do you mean
>> kvm_vgic_flush_hwstate() with "flush"?
Yes flush ~ kvm_vgic_flush_hwstate()
See Christoffer's "arm/arm64: KVM: vgic: Move active state handling to
flush_hwstate"

Cheers

Eric
> 
> this is a bug and should be fixed in the 'fixes' patches I sent last
> week.  We should set active state on every entry to the guest for IRQs
> with the HW bit set in either pending or active state.
> 
>>
>>> So are you sure the IRQ was not actually
>>> completed by the guest? As Christoffer mentions the LR active state can
>>> remain even if the IRQ was completed.
>>
>> I was wondering where this behaviour Christoffer mentioned comes from?
> 
> From how I understand the architecture and from talking to Marc.
> 
>> Is this an observation, an implementation bug or is this mentioned in
>> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
>> awkward to me.
> 
> What do you mean?  How are we spoon-feeding the VGIC?
> 
>> I will try to add more tracing to see what is actually happening, trying
>> to trace a timer IRQ life cycle more accurately to see what's going on here.
>>
> 
> By all means, trace through the thing, it would be great to get others
> to look at this, but I recommend applying both the fixes I sent and
> this v2 timer rework series before doing so, because otherwise things
> don't work as I outline in this document.
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-09  8:57             ` Eric Auger
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-09  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

Salut Andre,
On 09/09/2015 10:49 AM, Christoffer Dall wrote:
> On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara@arm.com> wrote:
>> Hi Eric,
>>
>> thanks for you answer.
>>
>> On 08/09/15 09:43, Eric Auger wrote:
>>> Hi Andre,
>>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>>>> Hi,
>>>>
>>>> firstly: this text is really great, thanks for coming up with that.
>>>> See below for some information I got from tracing the host which I
>>>> cannot make sense of....
>>>>
>>>>
>>>> On 04/09/15 20:40, Christoffer Dall wrote:
>>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>>>> way we deal with them is not apparently easy to understand by reading
>>>>> various specs.
>>>>>
>>>>> Therefore, add a proper documentation file explaining the flow and
>>>>> rationale of the behavior of the vgic.
>>>>>
>>>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>>>> Omissions and errors are all mine.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> ---
>>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>>>  1 file changed, 181 insertions(+)
>>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>>
>>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> new file mode 100644
>>>>> index 0000000..24b6f28
>>>>> --- /dev/null
>>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> @@ -0,0 +1,181 @@
>>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>>>> +==========================================
>>>>> +
>>>>> +The KVM/ARM code implements software support for the ARM Generic
>>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>>>> +
>>>>> +Some of these virtual interrupts, however, correspond to physical
>>>>> +interrupts from real physical devices.  One example could be the
>>>>> +architected timer, which itself supports virtualization, and therefore
>>>>> +lets a guest OS program the hardware device directly to raise an
>>>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>>>> +host OS initially handles the interrupt and must somehow signal this
>>>>> +event as a virtual interrupt to the guest.  Another example could be a
>>>>> +passthrough device, where the physical interrupts are initially handled
>>>>> +by the host, but the device driver for the device lives in the guest OS
>>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>>>> +the physical one to the guest OS.
>>>>> +
>>>>> +These virtual interrupts corresponding to a physical interrupt on the
>>>>> +host are called forwarded physical interrupts, but are also sometimes
>>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>>>> +
>>>>> +Forwarded physical interrupts are handled slightly differently compared
>>>>> +to virtual interrupts generated purely by a software emulated device.
>>>>> +
>>>>> +
>>>>> +The HW bit
>>>>> +----------
>>>>> +Virtual interrupts are signalled to the guest by programming the List
>>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>>>> +inactive.
>>>>> +
>>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>>>> +KVM must also program an additional field in the LR, the physical IRQ
>>>>> +number, to link the virtual with the physical IRQ.
>>>>> +
>>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>>>> +bit, never both at the same time.
>>>>> +
>>>>> +Setting the HW bit causes the hardware to deactivate the physical
>>>>> +interrupt on the physical distributor when the guest deactivates the
>>>>> +corresponding virtual interrupt.
>>>>> +
>>>>> +
>>>>> +Forwarded Physical Interrupts Life Cycle
>>>>> +----------------------------------------
>>>>> +
>>>>> +The state of forwarded physical interrupts is managed in the following way:
>>>>> +
>>>>> +  - The physical interrupt is acked by the host, and becomes active on
>>>>> +    the physical distributor (*).
>>>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>>>>> +    interface is going to present it to the guest.
>>>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>>>> +    expected.
>>>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>>>>> +    but the LR.Active is left untouched (set).
>>>>
>>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>>>> the following pattern over and over (case 1):
>>>> (This is the kvm/kvm.git:queue branch from last week, so including the
>>>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>>>
>>>> ...
>>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>>> 0xffffffc0004089d8
>>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>>>> ELRSR: 1, dist active: 0, log. active: 1
>>>> ....
>>>>
>>>> My hunch is that the following happens (please correct me if needed!):
>>>> First there is an unrelated trap (line 1), then later the guest exits
>>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>>>> The host injects the timer IRQ (not shown here) and returns to the
>>>> guest. On the next trap (line 3, due to a stage 2 page fault),
>>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>>>> GIC actually did deactivate both the LR (state=8, which is inactive,
>>>> just the HW bit is still set) _and_ the state on the physical
>>>> distributor (dist active=0). This trace_printk is just after entering
>>>> the function, so before the code there performs these steps redundantly.
>>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>>>> point of view this virtual IRQ cycle is finished.
>>>>
>>>> The other sequence I see is this one (case 2):
>>>>
>>>> ....
>>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>>> 0xffffffc0004089d8
>>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>>> ELRSR: 0, dist active: 1, log. active: 1
>>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>>> ELRSR: 0, dist active: 0, log. active: 1
>>>> ...
>>>>
>>>> In line 1 the timer fires, the host injects the timer IRQ into the
>>>> guest, which exits again in line 2 due to a page fault (may have IRQs
>>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>>>> pending in the LR (state=9) and active on the physical distributor. Now
>>>> the code in vgic_sync_hwirq() clears the active state in the physical
>>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>>>> alone (by returning 0 to the caller).
>>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>>>> (line 5), only that the physical dist state in now inactive (due to us
>>>> clearing that explicitly during the last exit).
>>> Normally the physical dist state was set active on previous flush, right
>>> (done for all mapped IRQs)?
>>
>> Where is this done? I see that the physical dist state is altered on the
>> actual IRQ forwarding, but not on later exits/entries? Do you mean
>> kvm_vgic_flush_hwstate() with "flush"?
Yes flush ~ kvm_vgic_flush_hwstate()
See Christoffer's "arm/arm64: KVM: vgic: Move active state handling to
flush_hwstate"

Cheers

Eric
> 
> this is a bug and should be fixed in the 'fixes' patches I sent last
> week.  We should set active state on every entry to the guest for IRQs
> with the HW bit set in either pending or active state.
> 
>>
>>> So are you sure the IRQ was not actually
>>> completed by the guest? As Christoffer mentions the LR active state can
>>> remain even if the IRQ was completed.
>>
>> I was wondering where this behaviour Christoffer mentioned comes from?
> 
> From how I understand the architecture and from talking to Marc.
> 
>> Is this an observation, an implementation bug or is this mentioned in
>> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
>> awkward to me.
> 
> What do you mean?  How are we spoon-feeding the VGIC?
> 
>> I will try to add more tracing to see what is actually happening, trying
>> to trace a timer IRQ life cycle more accurately to see what's going on here.
>>
> 
> By all means, trace through the thing, it would be great to get others
> to look at this, but I recommend applying both the fixes I sent and
> this v2 timer rework series before doing so, because otherwise things
> don't work as I outline in this document.
> 
> -Christoffer
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-09  8:49           ` Christoffer Dall
@ 2015-09-11 11:21             ` Andre Przywara
  -1 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-11 11:21 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Marc Zyngier, kvm, kvmarm, linux-arm-kernel

Hi Christoffer,

(actually you are not supposed to reply during your holidays!)

On 09/09/15 09:49, Christoffer Dall wrote:
> On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara@arm.com> wrote:
>> Hi Eric,
>>
>> thanks for you answer.
>>
>> On 08/09/15 09:43, Eric Auger wrote:
>>> Hi Andre,
>>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>>>> Hi,
>>>>
>>>> firstly: this text is really great, thanks for coming up with that.
>>>> See below for some information I got from tracing the host which I
>>>> cannot make sense of....
>>>>
>>>>
>>>> On 04/09/15 20:40, Christoffer Dall wrote:
>>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>>>> way we deal with them is not apparently easy to understand by reading
>>>>> various specs.
>>>>>
>>>>> Therefore, add a proper documentation file explaining the flow and
>>>>> rationale of the behavior of the vgic.
>>>>>
>>>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>>>> Omissions and errors are all mine.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> ---
>>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>>>  1 file changed, 181 insertions(+)
>>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>>
>>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> new file mode 100644
>>>>> index 0000000..24b6f28
>>>>> --- /dev/null
>>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> @@ -0,0 +1,181 @@
>>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>>>> +==========================================
>>>>> +
>>>>> +The KVM/ARM code implements software support for the ARM Generic
>>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>>>> +
>>>>> +Some of these virtual interrupts, however, correspond to physical
>>>>> +interrupts from real physical devices.  One example could be the
>>>>> +architected timer, which itself supports virtualization, and therefore
>>>>> +lets a guest OS program the hardware device directly to raise an
>>>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>>>> +host OS initially handles the interrupt and must somehow signal this
>>>>> +event as a virtual interrupt to the guest.  Another example could be a
>>>>> +passthrough device, where the physical interrupts are initially handled
>>>>> +by the host, but the device driver for the device lives in the guest OS
>>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>>>> +the physical one to the guest OS.
>>>>> +
>>>>> +These virtual interrupts corresponding to a physical interrupt on the
>>>>> +host are called forwarded physical interrupts, but are also sometimes
>>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>>>> +
>>>>> +Forwarded physical interrupts are handled slightly differently compared
>>>>> +to virtual interrupts generated purely by a software emulated device.
>>>>> +
>>>>> +
>>>>> +The HW bit
>>>>> +----------
>>>>> +Virtual interrupts are signalled to the guest by programming the List
>>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>>>> +inactive.
>>>>> +
>>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>>>> +KVM must also program an additional field in the LR, the physical IRQ
>>>>> +number, to link the virtual with the physical IRQ.
>>>>> +
>>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>>>> +bit, never both at the same time.
>>>>> +
>>>>> +Setting the HW bit causes the hardware to deactivate the physical
>>>>> +interrupt on the physical distributor when the guest deactivates the
>>>>> +corresponding virtual interrupt.
>>>>> +
>>>>> +
>>>>> +Forwarded Physical Interrupts Life Cycle
>>>>> +----------------------------------------
>>>>> +
>>>>> +The state of forwarded physical interrupts is managed in the following way:
>>>>> +
>>>>> +  - The physical interrupt is acked by the host, and becomes active on
>>>>> +    the physical distributor (*).
>>>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>>>>> +    interface is going to present it to the guest.
>>>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>>>> +    expected.
>>>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>>>>> +    but the LR.Active is left untouched (set).
>>>>
>>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>>>> the following pattern over and over (case 1):
>>>> (This is the kvm/kvm.git:queue branch from last week, so including the
>>>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>>>
>>>> ...
>>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>>> 0xffffffc0004089d8
>>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>>>> ELRSR: 1, dist active: 0, log. active: 1
>>>> ....
>>>>
>>>> My hunch is that the following happens (please correct me if needed!):
>>>> First there is an unrelated trap (line 1), then later the guest exits
>>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>>>> The host injects the timer IRQ (not shown here) and returns to the
>>>> guest. On the next trap (line 3, due to a stage 2 page fault),
>>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>>>> GIC actually did deactivate both the LR (state=8, which is inactive,
>>>> just the HW bit is still set) _and_ the state on the physical
>>>> distributor (dist active=0). This trace_printk is just after entering
>>>> the function, so before the code there performs these steps redundantly.
>>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>>>> point of view this virtual IRQ cycle is finished.
>>>>
>>>> The other sequence I see is this one (case 2):
>>>>
>>>> ....
>>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>>> 0xffffffc0004089d8
>>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>>> ELRSR: 0, dist active: 1, log. active: 1
>>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>>> ELRSR: 0, dist active: 0, log. active: 1
>>>> ...
>>>>
>>>> In line 1 the timer fires, the host injects the timer IRQ into the
>>>> guest, which exits again in line 2 due to a page fault (may have IRQs
>>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>>>> pending in the LR (state=9) and active on the physical distributor. Now
>>>> the code in vgic_sync_hwirq() clears the active state in the physical
>>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>>>> alone (by returning 0 to the caller).
>>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>>>> (line 5), only that the physical dist state in now inactive (due to us
>>>> clearing that explicitly during the last exit).
>>> Normally the physical dist state was set active on previous flush, right
>>> (done for all mapped IRQs)?
>>
>> Where is this done? I see that the physical dist state is altered on the
>> actual IRQ forwarding, but not on later exits/entries? Do you mean
>> kvm_vgic_flush_hwstate() with "flush"?
> 
> this is a bug and should be fixed in the 'fixes' patches I sent last
> week.  We should set active state on every entry to the guest for IRQs
> with the HW bit set in either pending or active state.

OK, sorry, I missed that one patch, I was looking at what should become
-rc1 soon (because that's what I want to rebase my ITS emulation patches
on). That patch wasn't in queue at the time I started looking at it.

So I updated to the latest queue containing those two fixes and also
applied your v2 series. Indeed this series addresses some of the things
I was wondering about the last time, but the main thing still persists:
- Every time the physical dist state is active we have the virtual state
still at pending or active.
- If the physical dist state is non-active, the virtual state is
inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
(LR empty).
(I was tracing every HW mapped LR in vgic_sync_hwirq() for this)

So that contradicts:

+  - On guest EOI, the *physical distributor* active bit gets cleared,
+    but the LR.Active is left untouched (set).

This is the main point I was actually wondering about: I cannot confirm
this statement. In my tests the LR state and the physical dist state
always correspond, as excepted by reading the spec.

I reckon that these observations are mostly independent from the actual
KVM code, as I try to observe hardware state (physical distributor and
LRs) before KVM tinkers with them.

...

> 
>> Is this an observation, an implementation bug or is this mentioned in
>> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
>> awkward to me.
> 
> What do you mean?  How are we spoon-feeding the VGIC?

By looking at the physical dist state and all LRs and clearing the LR we
do what the GIC is actually supposed to do for us - and what it actually
does according to my observations.

The point is that patch 1 in my ITS emulation series is reworking the LR
handling and this patch was based on assumptions that seem to be no
longer true (i.e. we don't care about inactive LRs except for our LR
mapping code). So I want to be sure that I fully get what is going on
here and I struggle at this at the moment due to the above statement.

What are the plans regarding your "v2: Rework architected timer..."
series? Will this be queued for 4.4? I want to do the
rebasing^Wrewriting of my series only once if possible ;-)

Cheers,
Andre.

>> I will try to add more tracing to see what is actually happening, trying
>> to trace a timer IRQ life cycle more accurately to see what's going on here.
>>
> 
> By all means, trace through the thing, it would be great to get others
> to look at this, but I recommend applying both the fixes I sent and
> this v2 timer rework series before doing so, because otherwise things
> don't work as I outline in this document.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-11 11:21             ` Andre Przywara
  0 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-11 11:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

(actually you are not supposed to reply during your holidays!)

On 09/09/15 09:49, Christoffer Dall wrote:
> On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara@arm.com> wrote:
>> Hi Eric,
>>
>> thanks for you answer.
>>
>> On 08/09/15 09:43, Eric Auger wrote:
>>> Hi Andre,
>>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
>>>> Hi,
>>>>
>>>> firstly: this text is really great, thanks for coming up with that.
>>>> See below for some information I got from tracing the host which I
>>>> cannot make sense of....
>>>>
>>>>
>>>> On 04/09/15 20:40, Christoffer Dall wrote:
>>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
>>>>> way we deal with them is not apparently easy to understand by reading
>>>>> various specs.
>>>>>
>>>>> Therefore, add a proper documentation file explaining the flow and
>>>>> rationale of the behavior of the vgic.
>>>>>
>>>>> Some of this text was contributed by Marc Zyngier and edited by me.
>>>>> Omissions and errors are all mine.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> ---
>>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
>>>>>  1 file changed, 181 insertions(+)
>>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>>
>>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> new file mode 100644
>>>>> index 0000000..24b6f28
>>>>> --- /dev/null
>>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>>>> @@ -0,0 +1,181 @@
>>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
>>>>> +==========================================
>>>>> +
>>>>> +The KVM/ARM code implements software support for the ARM Generic
>>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
>>>>> +allowing software to inject virtual interrupts to a VM, which the guest
>>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
>>>>> +
>>>>> +Some of these virtual interrupts, however, correspond to physical
>>>>> +interrupts from real physical devices.  One example could be the
>>>>> +architected timer, which itself supports virtualization, and therefore
>>>>> +lets a guest OS program the hardware device directly to raise an
>>>>> +interrupt at some point in time.  When such an interrupt is raised, the
>>>>> +host OS initially handles the interrupt and must somehow signal this
>>>>> +event as a virtual interrupt to the guest.  Another example could be a
>>>>> +passthrough device, where the physical interrupts are initially handled
>>>>> +by the host, but the device driver for the device lives in the guest OS
>>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
>>>>> +the physical one to the guest OS.
>>>>> +
>>>>> +These virtual interrupts corresponding to a physical interrupt on the
>>>>> +host are called forwarded physical interrupts, but are also sometimes
>>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
>>>>> +
>>>>> +Forwarded physical interrupts are handled slightly differently compared
>>>>> +to virtual interrupts generated purely by a software emulated device.
>>>>> +
>>>>> +
>>>>> +The HW bit
>>>>> +----------
>>>>> +Virtual interrupts are signalled to the guest by programming the List
>>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
>>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
>>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
>>>>> +interrupt, the LR state moves from Pending to Active, and finally to
>>>>> +inactive.
>>>>> +
>>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
>>>>> +KVM must also program an additional field in the LR, the physical IRQ
>>>>> +number, to link the virtual with the physical IRQ.
>>>>> +
>>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
>>>>> +bit, never both at the same time.
>>>>> +
>>>>> +Setting the HW bit causes the hardware to deactivate the physical
>>>>> +interrupt on the physical distributor when the guest deactivates the
>>>>> +corresponding virtual interrupt.
>>>>> +
>>>>> +
>>>>> +Forwarded Physical Interrupts Life Cycle
>>>>> +----------------------------------------
>>>>> +
>>>>> +The state of forwarded physical interrupts is managed in the following way:
>>>>> +
>>>>> +  - The physical interrupt is acked by the host, and becomes active on
>>>>> +    the physical distributor (*).
>>>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
>>>>> +    interface is going to present it to the guest.
>>>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>>>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
>>>>> +    expected.
>>>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>>>>> +    but the LR.Active is left untouched (set).
>>>>
>>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
>>>> the following pattern over and over (case 1):
>>>> (This is the kvm/kvm.git:queue branch from last week, so including the
>>>> mapped timer IRQ code. Tests were done on Juno and Midway)
>>>>
>>>> ...
>>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
>>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
>>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>>> 0xffffffc0004089d8
>>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
>>>> ELRSR: 1, dist active: 0, log. active: 1
>>>> ....
>>>>
>>>> My hunch is that the following happens (please correct me if needed!):
>>>> First there is an unrelated trap (line 1), then later the guest exits
>>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
>>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
>>>> The host injects the timer IRQ (not shown here) and returns to the
>>>> guest. On the next trap (line 3, due to a stage 2 page fault),
>>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
>>>> GIC actually did deactivate both the LR (state=8, which is inactive,
>>>> just the HW bit is still set) _and_ the state on the physical
>>>> distributor (dist active=0). This trace_printk is just after entering
>>>> the function, so before the code there performs these steps redundantly.
>>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
>>>> point of view this virtual IRQ cycle is finished.
>>>>
>>>> The other sequence I see is this one (case 2):
>>>>
>>>> ....
>>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
>>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
>>>> 0xffffffc0004089d8
>>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>>> ELRSR: 0, dist active: 1, log. active: 1
>>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
>>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
>>>> ELRSR: 0, dist active: 0, log. active: 1
>>>> ...
>>>>
>>>> In line 1 the timer fires, the host injects the timer IRQ into the
>>>> guest, which exits again in line 2 due to a page fault (may have IRQs
>>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
>>>> pending in the LR (state=9) and active on the physical distributor. Now
>>>> the code in vgic_sync_hwirq() clears the active state in the physical
>>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
>>>> alone (by returning 0 to the caller).
>>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
>>>> (line 5), only that the physical dist state in now inactive (due to us
>>>> clearing that explicitly during the last exit).
>>> Normally the physical dist state was set active on previous flush, right
>>> (done for all mapped IRQs)?
>>
>> Where is this done? I see that the physical dist state is altered on the
>> actual IRQ forwarding, but not on later exits/entries? Do you mean
>> kvm_vgic_flush_hwstate() with "flush"?
> 
> this is a bug and should be fixed in the 'fixes' patches I sent last
> week.  We should set active state on every entry to the guest for IRQs
> with the HW bit set in either pending or active state.

OK, sorry, I missed that one patch, I was looking at what should become
-rc1 soon (because that's what I want to rebase my ITS emulation patches
on). That patch wasn't in queue at the time I started looking at it.

So I updated to the latest queue containing those two fixes and also
applied your v2 series. Indeed this series addresses some of the things
I was wondering about the last time, but the main thing still persists:
- Every time the physical dist state is active we have the virtual state
still at pending or active.
- If the physical dist state is non-active, the virtual state is
inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
(LR empty).
(I was tracing every HW mapped LR in vgic_sync_hwirq() for this)

So that contradicts:

+  - On guest EOI, the *physical distributor* active bit gets cleared,
+    but the LR.Active is left untouched (set).

This is the main point I was actually wondering about: I cannot confirm
this statement. In my tests the LR state and the physical dist state
always correspond, as excepted by reading the spec.

I reckon that these observations are mostly independent from the actual
KVM code, as I try to observe hardware state (physical distributor and
LRs) before KVM tinkers with them.

...

> 
>> Is this an observation, an implementation bug or is this mentioned in
>> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
>> awkward to me.
> 
> What do you mean?  How are we spoon-feeding the VGIC?

By looking at the physical dist state and all LRs and clearing the LR we
do what the GIC is actually supposed to do for us - and what it actually
does according to my observations.

The point is that patch 1 in my ITS emulation series is reworking the LR
handling and this patch was based on assumptions that seem to be no
longer true (i.e. we don't care about inactive LRs except for our LR
mapping code). So I want to be sure that I fully get what is going on
here and I struggle at this at the moment due to the above statement.

What are the plans regarding your "v2: Rework architected timer..."
series? Will this be queued for 4.4? I want to do the
rebasing^Wrewriting of my series only once if possible ;-)

Cheers,
Andre.

>> I will try to add more tracing to see what is actually happening, trying
>> to trace a timer IRQ life cycle more accurately to see what's going on here.
>>
> 
> By all means, trace through the thing, it would be great to get others
> to look at this, but I recommend applying both the fixes I sent and
> this v2 timer rework series before doing so, because otherwise things
> don't work as I outline in this document.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 2/8] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  2015-09-07 15:01     ` Eric Auger
@ 2015-09-13 15:56       ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-13 15:56 UTC (permalink / raw)
  To: Eric Auger; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, kvm

On Mon, Sep 07, 2015 at 05:01:59PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> > We currently schedule a soft timer every time we exit the guest if the
> > timer did not expire while running the guest.  This is really not
> > necessary, because the only work we do in the timer work function is to
> > kick the vcpu.
> > 
> > Kicking the vcpu does two things:
> > (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> > from the waitqueue.
> > (2) If the vcpu is running on a different physical CPU from the one
> > doing the kick, it sends a reschedule IPI.
> > 
> > The second case cannot happen, because the soft timer is only ever
> > scheduled when the vcpu is not running.  The first case is only relevant
> > when the vcpu thread is on a waitqueue, which is only the case when the
> > vcpu thread has called kvm_vcpu_block().
> > 
> > Therefore, we only need to make sure a timer is scheduled for
> > kvm_vcpu_block(), which we do by encapsulating all calls to
> > kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> > 
> > Additionally, we only schedule a soft timer if the timer is enabled and
> > unmasked, since it is useless otherwise.
> > 
> > Note that theoretically userspace can use the SET_ONE_REG interface to
> > change registers that should cause the timer to fire, even if the vcpu
> > is blocked without a scheduled timer, but this case was not supported
> > before this patch and we leave it for future work for now.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  3 --
> >  arch/arm/kvm/arm.c                | 10 +++++
> >  arch/arm64/include/asm/kvm_host.h |  3 --
> >  include/kvm/arm_arch_timer.h      |  2 +
> >  virt/kvm/arm/arch_timer.c         | 91 ++++++++++++++++++++++++++-------------
> >  5 files changed, 72 insertions(+), 37 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 86fcf6e..dcba0fa 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
> >  
> > -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> > -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> > -
> >  #endif /* __ARM_KVM_HOST_H__ */
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index ce404a5..bdf8871 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> >  	return kvm_timer_should_fire(vcpu);
> >  }
> >  
> > +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_timer_schedule(vcpu);
> > +}
> > +
> > +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_timer_unschedule(vcpu);
> > +}
> > +
> >  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >  {
> >  	/* Force users to call KVM_ARM_VCPU_INIT */
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index dd143f5..415938d 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >  
> > -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> > -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> > -
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index e1e4d7c..ef14cc1 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >  
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> > +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> > +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >  
> >  #endif
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 48c6e1a..7991537 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> > +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> > +		!kvm_vgic_get_phys_irq_active(timer->map);
> kvm_vgic_get_phys_irq_active(timer->map) checks a logical state and not
> the actual HW state. What is the exact aim of that check? in case the
> PPI already is active, ie. timer hit, no use to schedule anything?

I struggled myself with making sense of this, which was one of the
inspirations for creating this series.

The point is to ensure we kick the vgic at the right time, because we
currently don't sample the line state on mapped interrupts, so the timer
does that job for us, when we know from this variable that the guest is
done processing a previous interrupt.

Nevertheless, that is not what this patch is about, this patch does
something else and just preserves existing functionality, you'll notice
I get rid of this stuff later on.

> 
> > +}
> > +
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	cycle_t cval, now;
> >  
> > -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> > -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> > -	    kvm_vgic_get_phys_irq_active(timer->map))
> > +	if (!kvm_timer_irq_can_fire(vcpu))
> >  		return false;
> >  
> >  	cval = timer->cntv_cval;
> > @@ -127,24 +134,61 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  	return cval <= now;
> >  }
> >  
> > -/**
> > - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> > - * @vcpu: The vcpu pointer
> > - *
> > - * Disarm any pending soft timers, since the world-switch code will write the
> > - * virtual timer state back to the physical CPU.
> > +/*
> > + * Schedule the background timer before calling kvm_vcpu_block, so that this
> > + * thread is removed from its waitqueue and made runnable when there's a timer
> > + * interrupt to handle.
> >   */
> > -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	u64 ns;
> > +	cycle_t cval, now;
> > +
> > +	BUG_ON(timer_is_armed(timer));
> > +
> > +	/*
> > +	 * No need to schedule a background timer if the guest timer has
> > +	 * already expired, because kvm_vcpu_block will return before putting
> > +	 * the thread to sleep.
> > +	 */
> > +	if (kvm_timer_should_fire(vcpu))
> > +		return;
> >  
> >  	/*
> > -	 * We're about to run this vcpu again, so there is no need to
> > -	 * keep the background timer running, as we're about to
> > -	 * populate the CPU timer again.
> > +	 * If the timer is either not capable of raising interrupts (disabled
> > +	 * or masked) or if we already have a background timer, then there's
> > +	 * no more work for us to do.
> I don't understand the comment about "if we already have a background
> timer", related to the above comment...

yeah, I got rid of this, so that part of the comment is stale and we
should get rid of it.

> >  	 */
> > +	if (!kvm_timer_irq_can_fire(vcpu))
> > +		return;
> > +
> > +	/*  The timer has not yet expired, schedule a background timer */
> > +	cval = timer->cntv_cval;
> > +	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> > +
> > +	ns = cyclecounter_cyc2ns(timecounter->cc,
> > +				 cval - now,
> > +				 timecounter->mask,
> > +				 &timecounter->frac);
> > +	timer_arm(timer, ns);
> > +}
> > +
> > +void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	timer_disarm(timer);
> > +}
> >  
> > +/**
> > + * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> > + * @vcpu: The vcpu pointer
> > + *
> > + * Check if the virtual timer has expired while we were running in the host,
> > + * and inject an interrupt if that was the case.
> > + */
> > +void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > +{
> >  	/*
> >  	 * If the timer expired while we were not scheduled, now is the time
> >  	 * to inject it.
> above comment seems duplicated now?

possibly, we can get rid of it.

> > @@ -157,32 +201,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >   * kvm_timer_sync_hwstate - sync timer state from cpu
> >   * @vcpu: The vcpu pointer
> >   *
> > - * Check if the virtual timer was armed and either schedule a corresponding
> > - * soft timer or inject directly if already expired.
> > + * Check if the virtual timer has expired while we were running in the guest,
> > + * and inject an interrupt if that was the case.
> >   */
> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	cycle_t cval, now;
> > -	u64 ns;
> >  
> >  	BUG_ON(timer_is_armed(timer));
> >  
> > -	if (kvm_timer_should_fire(vcpu)) {
> > -		/*
> > -		 * Timer has already expired while we were not
> > -		 * looking. Inject the interrupt and carry on.
> > -		 */
> > +	if (kvm_timer_should_fire(vcpu))
> >  		kvm_timer_inject_irq(vcpu);
> > -		return;
> > -	}
> > -
> > -	cval = timer->cntv_cval;
> > -	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> > -
> > -	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
> > -				 &timecounter->frac);
> > -	timer_arm(timer, ns);
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> > 
> 

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 2/8] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
@ 2015-09-13 15:56       ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-13 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 07, 2015 at 05:01:59PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> > We currently schedule a soft timer every time we exit the guest if the
> > timer did not expire while running the guest.  This is really not
> > necessary, because the only work we do in the timer work function is to
> > kick the vcpu.
> > 
> > Kicking the vcpu does two things:
> > (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> > from the waitqueue.
> > (2) If the vcpu is running on a different physical CPU from the one
> > doing the kick, it sends a reschedule IPI.
> > 
> > The second case cannot happen, because the soft timer is only ever
> > scheduled when the vcpu is not running.  The first case is only relevant
> > when the vcpu thread is on a waitqueue, which is only the case when the
> > vcpu thread has called kvm_vcpu_block().
> > 
> > Therefore, we only need to make sure a timer is scheduled for
> > kvm_vcpu_block(), which we do by encapsulating all calls to
> > kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> > 
> > Additionally, we only schedule a soft timer if the timer is enabled and
> > unmasked, since it is useless otherwise.
> > 
> > Note that theoretically userspace can use the SET_ONE_REG interface to
> > change registers that should cause the timer to fire, even if the vcpu
> > is blocked without a scheduled timer, but this case was not supported
> > before this patch and we leave it for future work for now.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  3 --
> >  arch/arm/kvm/arm.c                | 10 +++++
> >  arch/arm64/include/asm/kvm_host.h |  3 --
> >  include/kvm/arm_arch_timer.h      |  2 +
> >  virt/kvm/arm/arch_timer.c         | 91 ++++++++++++++++++++++++++-------------
> >  5 files changed, 72 insertions(+), 37 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 86fcf6e..dcba0fa 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
> >  
> > -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> > -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> > -
> >  #endif /* __ARM_KVM_HOST_H__ */
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index ce404a5..bdf8871 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> >  	return kvm_timer_should_fire(vcpu);
> >  }
> >  
> > +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_timer_schedule(vcpu);
> > +}
> > +
> > +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_timer_unschedule(vcpu);
> > +}
> > +
> >  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >  {
> >  	/* Force users to call KVM_ARM_VCPU_INIT */
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index dd143f5..415938d 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >  
> > -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> > -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> > -
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index e1e4d7c..ef14cc1 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >  
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> > +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> > +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >  
> >  #endif
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 48c6e1a..7991537 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> > +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> > +		!kvm_vgic_get_phys_irq_active(timer->map);
> kvm_vgic_get_phys_irq_active(timer->map) checks a logical state and not
> the actual HW state. What is the exact aim of that check? in case the
> PPI already is active, ie. timer hit, no use to schedule anything?

I struggled myself with making sense of this, which was one of the
inspirations for creating this series.

The point is to ensure we kick the vgic at the right time, because we
currently don't sample the line state on mapped interrupts, so the timer
does that job for us, when we know from this variable that the guest is
done processing a previous interrupt.

Nevertheless, that is not what this patch is about, this patch does
something else and just preserves existing functionality, you'll notice
I get rid of this stuff later on.

> 
> > +}
> > +
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	cycle_t cval, now;
> >  
> > -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> > -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> > -	    kvm_vgic_get_phys_irq_active(timer->map))
> > +	if (!kvm_timer_irq_can_fire(vcpu))
> >  		return false;
> >  
> >  	cval = timer->cntv_cval;
> > @@ -127,24 +134,61 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  	return cval <= now;
> >  }
> >  
> > -/**
> > - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> > - * @vcpu: The vcpu pointer
> > - *
> > - * Disarm any pending soft timers, since the world-switch code will write the
> > - * virtual timer state back to the physical CPU.
> > +/*
> > + * Schedule the background timer before calling kvm_vcpu_block, so that this
> > + * thread is removed from its waitqueue and made runnable when there's a timer
> > + * interrupt to handle.
> >   */
> > -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	u64 ns;
> > +	cycle_t cval, now;
> > +
> > +	BUG_ON(timer_is_armed(timer));
> > +
> > +	/*
> > +	 * No need to schedule a background timer if the guest timer has
> > +	 * already expired, because kvm_vcpu_block will return before putting
> > +	 * the thread to sleep.
> > +	 */
> > +	if (kvm_timer_should_fire(vcpu))
> > +		return;
> >  
> >  	/*
> > -	 * We're about to run this vcpu again, so there is no need to
> > -	 * keep the background timer running, as we're about to
> > -	 * populate the CPU timer again.
> > +	 * If the timer is either not capable of raising interrupts (disabled
> > +	 * or masked) or if we already have a background timer, then there's
> > +	 * no more work for us to do.
> I don't understand the comment about "if we already have a background
> timer", related to the above comment...

yeah, I got rid of this, so that part of the comment is stale and we
should get rid of it.

> >  	 */
> > +	if (!kvm_timer_irq_can_fire(vcpu))
> > +		return;
> > +
> > +	/*  The timer has not yet expired, schedule a background timer */
> > +	cval = timer->cntv_cval;
> > +	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> > +
> > +	ns = cyclecounter_cyc2ns(timecounter->cc,
> > +				 cval - now,
> > +				 timecounter->mask,
> > +				 &timecounter->frac);
> > +	timer_arm(timer, ns);
> > +}
> > +
> > +void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	timer_disarm(timer);
> > +}
> >  
> > +/**
> > + * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> > + * @vcpu: The vcpu pointer
> > + *
> > + * Check if the virtual timer has expired while we were running in the host,
> > + * and inject an interrupt if that was the case.
> > + */
> > +void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > +{
> >  	/*
> >  	 * If the timer expired while we were not scheduled, now is the time
> >  	 * to inject it.
> above comment seems duplicated now?

possibly, we can get rid of it.

> > @@ -157,32 +201,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >   * kvm_timer_sync_hwstate - sync timer state from cpu
> >   * @vcpu: The vcpu pointer
> >   *
> > - * Check if the virtual timer was armed and either schedule a corresponding
> > - * soft timer or inject directly if already expired.
> > + * Check if the virtual timer has expired while we were running in the guest,
> > + * and inject an interrupt if that was the case.
> >   */
> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	cycle_t cval, now;
> > -	u64 ns;
> >  
> >  	BUG_ON(timer_is_armed(timer));
> >  
> > -	if (kvm_timer_should_fire(vcpu)) {
> > -		/*
> > -		 * Timer has already expired while we were not
> > -		 * looking. Inject the interrupt and carry on.
> > -		 */
> > +	if (kvm_timer_should_fire(vcpu))
> >  		kvm_timer_inject_irq(vcpu);
> > -		return;
> > -	}
> > -
> > -	cval = timer->cntv_cval;
> > -	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> > -
> > -	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
> > -				 &timecounter->frac);
> > -	timer_arm(timer, ns);
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> > 
> 

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-04 19:40   ` Christoffer Dall
@ 2015-09-14  9:29     ` Eric Auger
  -1 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-14  9:29 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: kvm, Marc Zyngier

On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> The arch timer currently uses edge-triggered semantics in the sense that
> the line is never sampled by the vgic and lowering the line from the
> timer to the vgic doesn't have any affect on the pending state of
s/affect/effect
> virtual interrupts in the vgic.  This means that we do not support a
> guest with the otherwise valid behavior of (1) disable interrupts (2)
> enable the timer (3) disable the timer (4) enable interrupts.  Such a
> guest would validly not expect to see any interrupts on real hardware,
> but will see interrupts on KVM.
> 
> This patches fixes this shortcoming through the following series of
> changes.
> 
> First, we change the flow of the timer/vgic sync/flush operations.  Now
> the timer is always flushed/synced before the vgic,
for the flush it was already the case
 because the vgic
> samples the state of the timer output.  This has the implication that we
> move the timer operations in to non-preempible sections, but that is
> fine after the previous commit getting rid of hrtimer schedules on every
> entry/exit.
> 
> Second, we change the internal behavior of the timer, letting the timer
> keep track of its previous output state, and only lower/raise the line
> to the vgic when the state changes.  Note that in theory this could have
> been accomplished more simply by signalling the vgic every time the
> state *potentially* changed, but we don't want to be hitting the vgic
> more often than necessary.
> 
> Third, we get rid of the use of the map->active field in the vgic and
> instead simply set the interrupt as active on the physical distributor
> whenever we signal a mapped interrupt to the guest, and we reset the
> active state when we sync back the HW state from the vgic.
> 
> Fourth, and finally, we now initialize the timer PPIs (and all the other
> unused PPIs for now), to be level-triggered, and modify the sync code to
> sample the line state on HW sync and re-inject a new interrupt if it is
> still pending at that time.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/kvm/arm.c           | 11 ++++++--
>  include/kvm/arm_arch_timer.h |  2 +-
>  include/kvm/arm_vgic.h       |  3 --
>  virt/kvm/arm/arch_timer.c    | 65 +++++++++++++++++++++++++++++-------------
>  virt/kvm/arm/vgic.c          | 67 +++++++++++++++-----------------------------
>  5 files changed, 78 insertions(+), 70 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index bdf8871..102a4aa 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> +			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
>  			preempt_enable();
> -			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
>  
> @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_guest_exit();
>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>  
> +		/*
> +		 * We must sync the timer state before the vgic state so that
> +		 * the vgic can properly sample the updated state of the
> +		 * interrupt line.
> +		 */
> +		kvm_timer_sync_hwstate(vcpu);
> +
>  		kvm_vgic_sync_hwstate(vcpu);
>  
>  		preempt_enable();
>  
> -		kvm_timer_sync_hwstate(vcpu);
> -
>  		ret = handle_exit(vcpu, run, ret);
>  	}
>  
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index ef14cc1..1800227 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -51,7 +51,7 @@ struct arch_timer_cpu {
>  	bool				armed;
>  
>  	/* Timer IRQ */
> -	const struct kvm_irq_level	*irq;
> +	struct kvm_irq_level		irq;
>  
>  	/* VGIC mapping */
>  	struct irq_phys_map		*map;
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index d901f1a..99011a0 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -163,7 +163,6 @@ struct irq_phys_map {
>  	u32			virt_irq;
>  	u32			phys_irq;
>  	u32			irq;
> -	bool			active;
>  };
>  
>  struct irq_phys_map_entry {
> @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  					   int virt_irq, int irq);
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 7991537..0cdd092 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
>  	}
>  }
>  
> -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> -{
> -	int ret;
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -
> -	kvm_vgic_set_phys_irq_active(timer->map, true);
> -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> -					 timer->map,
> -					 timer->irq->level);
> -	WARN_ON(ret);
> -}
> -
>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>  {
>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> @@ -116,8 +104,7 @@ static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
>  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> -		!kvm_vgic_get_phys_irq_active(timer->map);
> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
>  }
>  
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> @@ -134,6 +121,41 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
> +{
> +	int ret;
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	BUG_ON(!vgic_initialized(vcpu->kvm));
> +
> +	timer->irq.level = new_level;
> +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> +					 timer->map,
> +					 timer->irq.level);
> +	WARN_ON(ret);
> +}
> +
> +/*
> + * Check if there was a change in the timer state (should we raise or lower
> + * the line level to the GIC).
> + */
> +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	/*
> +	 * If userspace modified the timer registers via SET_ONE_REG before
> +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> +	 * because the guest would never see the interrupt.  Instead wait
> +	 * until we call this funciton from kvm_timer_flush_hwstate.
s/funciton/function
> +	 */
> +	if (!vgic_initialized(vcpu->kvm))
> +	    return;
> +
> +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
> +		kvm_timer_update_irq(vcpu, !timer->irq.level);
> +}
> +
>  /*
>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>   * thread is removed from its waitqueue and made runnable when there's a timer
> @@ -193,8 +215,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  	 * If the timer expired while we were not scheduled, now is the time
>  	 * to inject it.
>  	 */
> -	if (kvm_timer_should_fire(vcpu))
> -		kvm_timer_inject_irq(vcpu);
> +	kvm_timer_update_state(vcpu);
>  }
>  
>  /**
> @@ -210,8 +231,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  
>  	BUG_ON(timer_is_armed(timer));
>  
> -	if (kvm_timer_should_fire(vcpu))
> -		kvm_timer_inject_irq(vcpu);
> +	/*
> +	 * The guest could have modified the timer registers or the timer
> +	 * could have expired, update the timer state.
> +	 */
> +	kvm_timer_update_state(vcpu);
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> @@ -226,7 +250,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  	 * kvm_vcpu_set_target(). To handle this, we determine
>  	 * vcpu timer irq number when the vcpu is reset.
>  	 */
> -	timer->irq = irq;
> +	timer->irq.irq = irq->irq;
>  
>  	/*
>  	 * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> @@ -235,6 +259,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  	 * the ARMv7 architecture.
>  	 */
>  	timer->cntv_ctl = 0;
> +	kvm_timer_update_state(vcpu);
>  
>  	/*
>  	 * Tell the VGIC that the virtual interrupt is tied to a
> @@ -279,6 +304,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  	default:
>  		return -1;
>  	}
> +
> +	kvm_timer_update_state(vcpu);
>  	return 0;
>  }
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool phys_active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;
>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &phys_active);
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> +	if (phys_active) {
> +		/*
> +		 * Interrupt still marked as active on the physical
> +		 * distributor, so guest did not EOI it yet.  Reset to
> +		 * non-active so that other VMs can see interrupts from this
> +		 * device.
> +		 */
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
>  		WARN_ON(ret);
> -		return 0;
> +		return false;
>  	}
>  
> -	return 1;
> +	/* Mapped edge-triggered interrupts not yet supported. */
> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> +	return process_level_irq(vcpu, lr, vlr);
>  }
>  
>  /* Sync back the VGIC state after a guest run */
> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> -		if (vgic_sync_hwirq(vcpu, vlr)) {
> -			/*
> -			 * So this is a HW interrupt that the guest
> -			 * EOI-ed. Clean the LR state and allow the
> -			 * interrupt to be sampled again.
> -			 */
> -			vlr.state = 0;
> -			vlr.hwirq = 0;
> -			vgic_set_lr(vcpu, lr, vlr);
> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> -			set_bit(lr, elrsr_ptr);
> -		}
> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> +			level_pending = true;
>  
>  		if (!test_bit(lr, elrsr_ptr))
>  			continue;
> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
>  }
>  
>  /**
> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> - *
> - * Return the logical active state of a mapped interrupt. This doesn't
> - * necessarily reflects the current HW state.
> - */
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> -{
> -	BUG_ON(!map);
> -	return map->active;
> -}
> -
> -/**
> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> - *
> - * Set the logical active state of a mapped interrupt. This doesn't
> - * immediately affects the HW state.
> - */
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> -{
> -	BUG_ON(!map);
> -	map->active = active;
> -}
> -
> -/**
>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>   * @vcpu: The VCPU pointer
>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>  			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>  							vcpu->vcpu_id, i, 1);
> -			if (i < VGIC_NR_PRIVATE_IRQS)
> +			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>  							vcpu->vcpu_id, i,
>  							VGIC_CFG_EDGE);
> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> +							vcpu->vcpu_id, i,
> +							VGIC_CFG_LEVEL);
nit: use the same if block for enable & cfg?
>  		}
>  
>  		vgic_enable(vcpu);
> 


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-14  9:29     ` Eric Auger
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Auger @ 2015-09-14  9:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> The arch timer currently uses edge-triggered semantics in the sense that
> the line is never sampled by the vgic and lowering the line from the
> timer to the vgic doesn't have any affect on the pending state of
s/affect/effect
> virtual interrupts in the vgic.  This means that we do not support a
> guest with the otherwise valid behavior of (1) disable interrupts (2)
> enable the timer (3) disable the timer (4) enable interrupts.  Such a
> guest would validly not expect to see any interrupts on real hardware,
> but will see interrupts on KVM.
> 
> This patches fixes this shortcoming through the following series of
> changes.
> 
> First, we change the flow of the timer/vgic sync/flush operations.  Now
> the timer is always flushed/synced before the vgic,
for the flush it was already the case
 because the vgic
> samples the state of the timer output.  This has the implication that we
> move the timer operations in to non-preempible sections, but that is
> fine after the previous commit getting rid of hrtimer schedules on every
> entry/exit.
> 
> Second, we change the internal behavior of the timer, letting the timer
> keep track of its previous output state, and only lower/raise the line
> to the vgic when the state changes.  Note that in theory this could have
> been accomplished more simply by signalling the vgic every time the
> state *potentially* changed, but we don't want to be hitting the vgic
> more often than necessary.
> 
> Third, we get rid of the use of the map->active field in the vgic and
> instead simply set the interrupt as active on the physical distributor
> whenever we signal a mapped interrupt to the guest, and we reset the
> active state when we sync back the HW state from the vgic.
> 
> Fourth, and finally, we now initialize the timer PPIs (and all the other
> unused PPIs for now), to be level-triggered, and modify the sync code to
> sample the line state on HW sync and re-inject a new interrupt if it is
> still pending at that time.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/kvm/arm.c           | 11 ++++++--
>  include/kvm/arm_arch_timer.h |  2 +-
>  include/kvm/arm_vgic.h       |  3 --
>  virt/kvm/arm/arch_timer.c    | 65 +++++++++++++++++++++++++++++-------------
>  virt/kvm/arm/vgic.c          | 67 +++++++++++++++-----------------------------
>  5 files changed, 78 insertions(+), 70 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index bdf8871..102a4aa 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> +			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
>  			preempt_enable();
> -			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
>  
> @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_guest_exit();
>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>  
> +		/*
> +		 * We must sync the timer state before the vgic state so that
> +		 * the vgic can properly sample the updated state of the
> +		 * interrupt line.
> +		 */
> +		kvm_timer_sync_hwstate(vcpu);
> +
>  		kvm_vgic_sync_hwstate(vcpu);
>  
>  		preempt_enable();
>  
> -		kvm_timer_sync_hwstate(vcpu);
> -
>  		ret = handle_exit(vcpu, run, ret);
>  	}
>  
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index ef14cc1..1800227 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -51,7 +51,7 @@ struct arch_timer_cpu {
>  	bool				armed;
>  
>  	/* Timer IRQ */
> -	const struct kvm_irq_level	*irq;
> +	struct kvm_irq_level		irq;
>  
>  	/* VGIC mapping */
>  	struct irq_phys_map		*map;
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index d901f1a..99011a0 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -163,7 +163,6 @@ struct irq_phys_map {
>  	u32			virt_irq;
>  	u32			phys_irq;
>  	u32			irq;
> -	bool			active;
>  };
>  
>  struct irq_phys_map_entry {
> @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  					   int virt_irq, int irq);
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 7991537..0cdd092 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
>  	}
>  }
>  
> -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> -{
> -	int ret;
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -
> -	kvm_vgic_set_phys_irq_active(timer->map, true);
> -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> -					 timer->map,
> -					 timer->irq->level);
> -	WARN_ON(ret);
> -}
> -
>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>  {
>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> @@ -116,8 +104,7 @@ static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
>  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> -		!kvm_vgic_get_phys_irq_active(timer->map);
> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
>  }
>  
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> @@ -134,6 +121,41 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
> +{
> +	int ret;
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	BUG_ON(!vgic_initialized(vcpu->kvm));
> +
> +	timer->irq.level = new_level;
> +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> +					 timer->map,
> +					 timer->irq.level);
> +	WARN_ON(ret);
> +}
> +
> +/*
> + * Check if there was a change in the timer state (should we raise or lower
> + * the line level to the GIC).
> + */
> +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	/*
> +	 * If userspace modified the timer registers via SET_ONE_REG before
> +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> +	 * because the guest would never see the interrupt.  Instead wait
> +	 * until we call this funciton from kvm_timer_flush_hwstate.
s/funciton/function
> +	 */
> +	if (!vgic_initialized(vcpu->kvm))
> +	    return;
> +
> +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
> +		kvm_timer_update_irq(vcpu, !timer->irq.level);
> +}
> +
>  /*
>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>   * thread is removed from its waitqueue and made runnable when there's a timer
> @@ -193,8 +215,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  	 * If the timer expired while we were not scheduled, now is the time
>  	 * to inject it.
>  	 */
> -	if (kvm_timer_should_fire(vcpu))
> -		kvm_timer_inject_irq(vcpu);
> +	kvm_timer_update_state(vcpu);
>  }
>  
>  /**
> @@ -210,8 +231,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  
>  	BUG_ON(timer_is_armed(timer));
>  
> -	if (kvm_timer_should_fire(vcpu))
> -		kvm_timer_inject_irq(vcpu);
> +	/*
> +	 * The guest could have modified the timer registers or the timer
> +	 * could have expired, update the timer state.
> +	 */
> +	kvm_timer_update_state(vcpu);
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> @@ -226,7 +250,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  	 * kvm_vcpu_set_target(). To handle this, we determine
>  	 * vcpu timer irq number when the vcpu is reset.
>  	 */
> -	timer->irq = irq;
> +	timer->irq.irq = irq->irq;
>  
>  	/*
>  	 * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> @@ -235,6 +259,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  	 * the ARMv7 architecture.
>  	 */
>  	timer->cntv_ctl = 0;
> +	kvm_timer_update_state(vcpu);
>  
>  	/*
>  	 * Tell the VGIC that the virtual interrupt is tied to a
> @@ -279,6 +304,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  	default:
>  		return -1;
>  	}
> +
> +	kvm_timer_update_state(vcpu);
>  	return 0;
>  }
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool phys_active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;
>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &phys_active);
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> +	if (phys_active) {
> +		/*
> +		 * Interrupt still marked as active on the physical
> +		 * distributor, so guest did not EOI it yet.  Reset to
> +		 * non-active so that other VMs can see interrupts from this
> +		 * device.
> +		 */
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
>  		WARN_ON(ret);
> -		return 0;
> +		return false;
>  	}
>  
> -	return 1;
> +	/* Mapped edge-triggered interrupts not yet supported. */
> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> +	return process_level_irq(vcpu, lr, vlr);
>  }
>  
>  /* Sync back the VGIC state after a guest run */
> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> -		if (vgic_sync_hwirq(vcpu, vlr)) {
> -			/*
> -			 * So this is a HW interrupt that the guest
> -			 * EOI-ed. Clean the LR state and allow the
> -			 * interrupt to be sampled again.
> -			 */
> -			vlr.state = 0;
> -			vlr.hwirq = 0;
> -			vgic_set_lr(vcpu, lr, vlr);
> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> -			set_bit(lr, elrsr_ptr);
> -		}
> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> +			level_pending = true;
>  
>  		if (!test_bit(lr, elrsr_ptr))
>  			continue;
> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
>  }
>  
>  /**
> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> - *
> - * Return the logical active state of a mapped interrupt. This doesn't
> - * necessarily reflects the current HW state.
> - */
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> -{
> -	BUG_ON(!map);
> -	return map->active;
> -}
> -
> -/**
> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> - *
> - * Set the logical active state of a mapped interrupt. This doesn't
> - * immediately affects the HW state.
> - */
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> -{
> -	BUG_ON(!map);
> -	map->active = active;
> -}
> -
> -/**
>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>   * @vcpu: The VCPU pointer
>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>  			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>  							vcpu->vcpu_id, i, 1);
> -			if (i < VGIC_NR_PRIVATE_IRQS)
> +			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>  							vcpu->vcpu_id, i,
>  							VGIC_CFG_EDGE);
> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> +							vcpu->vcpu_id, i,
> +							VGIC_CFG_LEVEL);
nit: use the same if block for enable & cfg?
>  		}
>  
>  		vgic_enable(vcpu);
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 3/8] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  2015-09-07 15:32     ` Eric Auger
@ 2015-09-14 11:31       ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-14 11:31 UTC (permalink / raw)
  To: Eric Auger; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, kvm

On Mon, Sep 07, 2015 at 05:32:35PM +0200, Eric Auger wrote:
> 
> 
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> > Currently vgic_process_maintenance() processes dealing with a completed
> > level-triggered interrupt directly, but we are soon going to reuse this
> > logic for level-triggered mapped interrupts with the HW bit set, so
> > move this logic into a separate static function.
> > 
> > Probably the most scary part of this commit is convincing yourself that
> > the current flow is safe compared to the old one.  In the following I
> > try to list the changes and why they are harmless:
> > 
> >   Move vgic_irq_clear_queued after kvm_notify_acked_irq:
> >     Harmless because the effect of clearing the queued flag wrt.
> >     kvm_set_irq is only that vgic_update_irq_pending does not set the
> >     pending bit on the emulated CPU interface or in the pending_on_cpu
> >     bitmask,
> well actually the notifier calls vgic_update_irq_pending with level ==0
> so it does not reach the can_sample.
>  but we set this in __kvm_vgic_sync_hwstate later on if the

can the notifier never go through a flow where it calls the function
with level = 1 ?  For example if the interrupt hit in between?

In any case, it should still be functionally correct.

Thanks for the RB.

-Christoffer

> >     level is stil high.
> still
> 
> Reviewed-by: Eric Auger <eric.auger@linaro.org>
> 
> Eric
> > 
> >   Move vgic_set_lr before kvm_notify_acked_irq:
> >     Also, harmless because the LR are cpu-local operations and
> >     kvm_notify_acked only affects the dist
> > 
> >   Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
> >     Also harmless because it's just a bit which is cleared and altering
> >     the line state does not affect this bit.
> > 
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic.c | 88 ++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 50 insertions(+), 38 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 6bd1c9b..fe0e5db 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1322,12 +1322,56 @@ epilog:
> >  	}
> >  }
> >  
> > +static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> > +{
> > +	int level_pending = 0;
> > +
> > +	vlr.state = 0;
> > +	vlr.hwirq = 0;
> > +	vgic_set_lr(vcpu, lr, vlr);
> > +
> > +	/*
> > +	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
> > +	 * went from active to non-active (called from vgic_sync_hwirq) it was
> > +	 * also ACKed and we we therefore assume we can clear the soft pending
> > +	 * state (should it had been set) for this interrupt.
> > +	 *
> > +	 * Note: if the IRQ soft pending state was set after the IRQ was
> > +	 * acked, it actually shouldn't be cleared, but we have no way of
> > +	 * knowing that unless we start trapping ACKs when the soft-pending
> > +	 * state is set.
> > +	 */
> > +	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
> > +
> > +	/*
> > +	 * Tell the gic to start sampling the line of this interrupt again.
> > +	 */
> > +	vgic_irq_clear_queued(vcpu, vlr.irq);
> > +
> > +	/* Any additional pending interrupt? */
> > +	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> > +		vgic_cpu_irq_set(vcpu, vlr.irq);
> > +		level_pending = 1;
> > +	} else {
> > +		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> > +		vgic_cpu_irq_clear(vcpu, vlr.irq);
> > +	}
> > +
> > +	/*
> > +	 * Despite being EOIed, the LR may not have
> > +	 * been marked as empty.
> > +	 */
> > +	vgic_sync_lr_elrsr(vcpu, lr, vlr);
> > +
> > +	return level_pending;
> > +}
> > +
> >  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  {
> >  	u32 status = vgic_get_interrupt_status(vcpu);
> >  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> > -	bool level_pending = false;
> >  	struct kvm *kvm = vcpu->kvm;
> > +	int level_pending = 0;
> >  
> >  	kvm_debug("STATUS = %08x\n", status);
> >  
> > @@ -1342,54 +1386,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  
> >  		for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
> >  			struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
> > -			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> >  
> > -			spin_lock(&dist->lock);
> > -			vgic_irq_clear_queued(vcpu, vlr.irq);
> > +			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> >  			WARN_ON(vlr.state & LR_STATE_MASK);
> > -			vlr.state = 0;
> > -			vgic_set_lr(vcpu, lr, vlr);
> >  
> > -			/*
> > -			 * If the IRQ was EOIed it was also ACKed and we we
> > -			 * therefore assume we can clear the soft pending
> > -			 * state (should it had been set) for this interrupt.
> > -			 *
> > -			 * Note: if the IRQ soft pending state was set after
> > -			 * the IRQ was acked, it actually shouldn't be
> > -			 * cleared, but we have no way of knowing that unless
> > -			 * we start trapping ACKs when the soft-pending state
> > -			 * is set.
> > -			 */
> > -			vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
> >  
> >  			/*
> >  			 * kvm_notify_acked_irq calls kvm_set_irq()
> > -			 * to reset the IRQ level. Need to release the
> > -			 * lock for kvm_set_irq to grab it.
> > +			 * to reset the IRQ level, which grabs the dist->lock
> > +			 * so we call this before taking the dist->lock.
> >  			 */
> > -			spin_unlock(&dist->lock);
> > -
> >  			kvm_notify_acked_irq(kvm, 0,
> >  					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
> > -			spin_lock(&dist->lock);
> > -
> > -			/* Any additional pending interrupt? */
> > -			if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> > -				vgic_cpu_irq_set(vcpu, vlr.irq);
> > -				level_pending = true;
> > -			} else {
> > -				vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> > -				vgic_cpu_irq_clear(vcpu, vlr.irq);
> > -			}
> >  
> > +			spin_lock(&dist->lock);
> > +			level_pending |= process_level_irq(vcpu, lr, vlr);
> >  			spin_unlock(&dist->lock);
> > -
> > -			/*
> > -			 * Despite being EOIed, the LR may not have
> > -			 * been marked as empty.
> > -			 */
> > -			vgic_sync_lr_elrsr(vcpu, lr, vlr);
> >  		}
> >  	}
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 3/8] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
@ 2015-09-14 11:31       ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-14 11:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 07, 2015 at 05:32:35PM +0200, Eric Auger wrote:
> 
> 
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> > Currently vgic_process_maintenance() processes dealing with a completed
> > level-triggered interrupt directly, but we are soon going to reuse this
> > logic for level-triggered mapped interrupts with the HW bit set, so
> > move this logic into a separate static function.
> > 
> > Probably the most scary part of this commit is convincing yourself that
> > the current flow is safe compared to the old one.  In the following I
> > try to list the changes and why they are harmless:
> > 
> >   Move vgic_irq_clear_queued after kvm_notify_acked_irq:
> >     Harmless because the effect of clearing the queued flag wrt.
> >     kvm_set_irq is only that vgic_update_irq_pending does not set the
> >     pending bit on the emulated CPU interface or in the pending_on_cpu
> >     bitmask,
> well actually the notifier calls vgic_update_irq_pending with level ==0
> so it does not reach the can_sample.
>  but we set this in __kvm_vgic_sync_hwstate later on if the

can the notifier never go through a flow where it calls the function
with level = 1 ?  For example if the interrupt hit in between?

In any case, it should still be functionally correct.

Thanks for the RB.

-Christoffer

> >     level is stil high.
> still
> 
> Reviewed-by: Eric Auger <eric.auger@linaro.org>
> 
> Eric
> > 
> >   Move vgic_set_lr before kvm_notify_acked_irq:
> >     Also, harmless because the LR are cpu-local operations and
> >     kvm_notify_acked only affects the dist
> > 
> >   Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
> >     Also harmless because it's just a bit which is cleared and altering
> >     the line state does not affect this bit.
> > 
> > Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic.c | 88 ++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 50 insertions(+), 38 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 6bd1c9b..fe0e5db 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1322,12 +1322,56 @@ epilog:
> >  	}
> >  }
> >  
> > +static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> > +{
> > +	int level_pending = 0;
> > +
> > +	vlr.state = 0;
> > +	vlr.hwirq = 0;
> > +	vgic_set_lr(vcpu, lr, vlr);
> > +
> > +	/*
> > +	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
> > +	 * went from active to non-active (called from vgic_sync_hwirq) it was
> > +	 * also ACKed and we we therefore assume we can clear the soft pending
> > +	 * state (should it had been set) for this interrupt.
> > +	 *
> > +	 * Note: if the IRQ soft pending state was set after the IRQ was
> > +	 * acked, it actually shouldn't be cleared, but we have no way of
> > +	 * knowing that unless we start trapping ACKs when the soft-pending
> > +	 * state is set.
> > +	 */
> > +	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
> > +
> > +	/*
> > +	 * Tell the gic to start sampling the line of this interrupt again.
> > +	 */
> > +	vgic_irq_clear_queued(vcpu, vlr.irq);
> > +
> > +	/* Any additional pending interrupt? */
> > +	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> > +		vgic_cpu_irq_set(vcpu, vlr.irq);
> > +		level_pending = 1;
> > +	} else {
> > +		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> > +		vgic_cpu_irq_clear(vcpu, vlr.irq);
> > +	}
> > +
> > +	/*
> > +	 * Despite being EOIed, the LR may not have
> > +	 * been marked as empty.
> > +	 */
> > +	vgic_sync_lr_elrsr(vcpu, lr, vlr);
> > +
> > +	return level_pending;
> > +}
> > +
> >  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  {
> >  	u32 status = vgic_get_interrupt_status(vcpu);
> >  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> > -	bool level_pending = false;
> >  	struct kvm *kvm = vcpu->kvm;
> > +	int level_pending = 0;
> >  
> >  	kvm_debug("STATUS = %08x\n", status);
> >  
> > @@ -1342,54 +1386,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  
> >  		for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
> >  			struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
> > -			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> >  
> > -			spin_lock(&dist->lock);
> > -			vgic_irq_clear_queued(vcpu, vlr.irq);
> > +			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> >  			WARN_ON(vlr.state & LR_STATE_MASK);
> > -			vlr.state = 0;
> > -			vgic_set_lr(vcpu, lr, vlr);
> >  
> > -			/*
> > -			 * If the IRQ was EOIed it was also ACKed and we we
> > -			 * therefore assume we can clear the soft pending
> > -			 * state (should it had been set) for this interrupt.
> > -			 *
> > -			 * Note: if the IRQ soft pending state was set after
> > -			 * the IRQ was acked, it actually shouldn't be
> > -			 * cleared, but we have no way of knowing that unless
> > -			 * we start trapping ACKs when the soft-pending state
> > -			 * is set.
> > -			 */
> > -			vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
> >  
> >  			/*
> >  			 * kvm_notify_acked_irq calls kvm_set_irq()
> > -			 * to reset the IRQ level. Need to release the
> > -			 * lock for kvm_set_irq to grab it.
> > +			 * to reset the IRQ level, which grabs the dist->lock
> > +			 * so we call this before taking the dist->lock.
> >  			 */
> > -			spin_unlock(&dist->lock);
> > -
> >  			kvm_notify_acked_irq(kvm, 0,
> >  					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
> > -			spin_lock(&dist->lock);
> > -
> > -			/* Any additional pending interrupt? */
> > -			if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
> > -				vgic_cpu_irq_set(vcpu, vlr.irq);
> > -				level_pending = true;
> > -			} else {
> > -				vgic_dist_irq_clear_pending(vcpu, vlr.irq);
> > -				vgic_cpu_irq_clear(vcpu, vlr.irq);
> > -			}
> >  
> > +			spin_lock(&dist->lock);
> > +			level_pending |= process_level_irq(vcpu, lr, vlr);
> >  			spin_unlock(&dist->lock);
> > -
> > -			/*
> > -			 * Despite being EOIed, the LR may not have
> > -			 * been marked as empty.
> > -			 */
> > -			vgic_sync_lr_elrsr(vcpu, lr, vlr);
> >  		}
> >  	}
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-11 11:21             ` Andre Przywara
@ 2015-09-14 11:42               ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-14 11:42 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Eric Auger, Marc Zyngier, kvmarm, linux-arm-kernel, kvm

Hi Andre,

On Fri, Sep 11, 2015 at 12:21:22PM +0100, Andre Przywara wrote:
> Hi Christoffer,
> 
> (actually you are not supposed to reply during your holidays!)

yeah, I know, but I couldn't help myself here.

> 
> On 09/09/15 09:49, Christoffer Dall wrote:
> > On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara@arm.com> wrote:
> >> Hi Eric,
> >>
> >> thanks for you answer.
> >>
> >> On 08/09/15 09:43, Eric Auger wrote:
> >>> Hi Andre,
> >>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
> >>>> Hi,
> >>>>
> >>>> firstly: this text is really great, thanks for coming up with that.
> >>>> See below for some information I got from tracing the host which I
> >>>> cannot make sense of....
> >>>>
> >>>>
> >>>> On 04/09/15 20:40, Christoffer Dall wrote:
> >>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> >>>>> way we deal with them is not apparently easy to understand by reading
> >>>>> various specs.
> >>>>>
> >>>>> Therefore, add a proper documentation file explaining the flow and
> >>>>> rationale of the behavior of the vgic.
> >>>>>
> >>>>> Some of this text was contributed by Marc Zyngier and edited by me.
> >>>>> Omissions and errors are all mine.
> >>>>>
> >>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>>>> ---
> >>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
> >>>>>  1 file changed, 181 insertions(+)
> >>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >>>>>
> >>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >>>>> new file mode 100644
> >>>>> index 0000000..24b6f28
> >>>>> --- /dev/null
> >>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >>>>> @@ -0,0 +1,181 @@
> >>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
> >>>>> +==========================================
> >>>>> +
> >>>>> +The KVM/ARM code implements software support for the ARM Generic
> >>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
> >>>>> +allowing software to inject virtual interrupts to a VM, which the guest
> >>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
> >>>>> +
> >>>>> +Some of these virtual interrupts, however, correspond to physical
> >>>>> +interrupts from real physical devices.  One example could be the
> >>>>> +architected timer, which itself supports virtualization, and therefore
> >>>>> +lets a guest OS program the hardware device directly to raise an
> >>>>> +interrupt at some point in time.  When such an interrupt is raised, the
> >>>>> +host OS initially handles the interrupt and must somehow signal this
> >>>>> +event as a virtual interrupt to the guest.  Another example could be a
> >>>>> +passthrough device, where the physical interrupts are initially handled
> >>>>> +by the host, but the device driver for the device lives in the guest OS
> >>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
> >>>>> +the physical one to the guest OS.
> >>>>> +
> >>>>> +These virtual interrupts corresponding to a physical interrupt on the
> >>>>> +host are called forwarded physical interrupts, but are also sometimes
> >>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> >>>>> +
> >>>>> +Forwarded physical interrupts are handled slightly differently compared
> >>>>> +to virtual interrupts generated purely by a software emulated device.
> >>>>> +
> >>>>> +
> >>>>> +The HW bit
> >>>>> +----------
> >>>>> +Virtual interrupts are signalled to the guest by programming the List
> >>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> >>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
> >>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> >>>>> +interrupt, the LR state moves from Pending to Active, and finally to
> >>>>> +inactive.
> >>>>> +
> >>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
> >>>>> +KVM must also program an additional field in the LR, the physical IRQ
> >>>>> +number, to link the virtual with the physical IRQ.
> >>>>> +
> >>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> >>>>> +bit, never both at the same time.
> >>>>> +
> >>>>> +Setting the HW bit causes the hardware to deactivate the physical
> >>>>> +interrupt on the physical distributor when the guest deactivates the
> >>>>> +corresponding virtual interrupt.
> >>>>> +
> >>>>> +
> >>>>> +Forwarded Physical Interrupts Life Cycle
> >>>>> +----------------------------------------
> >>>>> +
> >>>>> +The state of forwarded physical interrupts is managed in the following way:
> >>>>> +
> >>>>> +  - The physical interrupt is acked by the host, and becomes active on
> >>>>> +    the physical distributor (*).
> >>>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> >>>>> +    interface is going to present it to the guest.
> >>>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> >>>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> >>>>> +    expected.
> >>>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> >>>>> +    but the LR.Active is left untouched (set).
> >>>>
> >>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
> >>>> the following pattern over and over (case 1):
> >>>> (This is the kvm/kvm.git:queue branch from last week, so including the
> >>>> mapped timer IRQ code. Tests were done on Juno and Midway)
> >>>>
> >>>> ...
> >>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
> >>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
> >>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> >>>> 0xffffffc0004089d8
> >>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
> >>>> ELRSR: 1, dist active: 0, log. active: 1
> >>>> ....
> >>>>
> >>>> My hunch is that the following happens (please correct me if needed!):
> >>>> First there is an unrelated trap (line 1), then later the guest exits
> >>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
> >>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
> >>>> The host injects the timer IRQ (not shown here) and returns to the
> >>>> guest. On the next trap (line 3, due to a stage 2 page fault),
> >>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
> >>>> GIC actually did deactivate both the LR (state=8, which is inactive,
> >>>> just the HW bit is still set) _and_ the state on the physical
> >>>> distributor (dist active=0). This trace_printk is just after entering
> >>>> the function, so before the code there performs these steps redundantly.
> >>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
> >>>> point of view this virtual IRQ cycle is finished.
> >>>>
> >>>> The other sequence I see is this one (case 2):
> >>>>
> >>>> ....
> >>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
> >>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> >>>> 0xffffffc0004089d8
> >>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> >>>> ELRSR: 0, dist active: 1, log. active: 1
> >>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
> >>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> >>>> ELRSR: 0, dist active: 0, log. active: 1
> >>>> ...
> >>>>
> >>>> In line 1 the timer fires, the host injects the timer IRQ into the
> >>>> guest, which exits again in line 2 due to a page fault (may have IRQs
> >>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
> >>>> pending in the LR (state=9) and active on the physical distributor. Now
> >>>> the code in vgic_sync_hwirq() clears the active state in the physical
> >>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
> >>>> alone (by returning 0 to the caller).
> >>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
> >>>> (line 5), only that the physical dist state in now inactive (due to us
> >>>> clearing that explicitly during the last exit).
> >>> Normally the physical dist state was set active on previous flush, right
> >>> (done for all mapped IRQs)?
> >>
> >> Where is this done? I see that the physical dist state is altered on the
> >> actual IRQ forwarding, but not on later exits/entries? Do you mean
> >> kvm_vgic_flush_hwstate() with "flush"?
> > 
> > this is a bug and should be fixed in the 'fixes' patches I sent last
> > week.  We should set active state on every entry to the guest for IRQs
> > with the HW bit set in either pending or active state.
> 
> OK, sorry, I missed that one patch, I was looking at what should become
> -rc1 soon (because that's what I want to rebase my ITS emulation patches
> on). That patch wasn't in queue at the time I started looking at it.
> 
> So I updated to the latest queue containing those two fixes and also
> applied your v2 series. Indeed this series addresses some of the things
> I was wondering about the last time, but the main thing still persists:
> - Every time the physical dist state is active we have the virtual state
> still at pending or active.

For the arch timer, yes.

For a passthrough device, there should be a situation where the physical
dist state is active but we didn't see the virtual state updated at the
vgic yet (after physical IRQ fires and before the VFIO ISR calls
kvm_set_irq).

> - If the physical dist state is non-active, the virtual state is
> inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
> (LR empty).
> (I was tracing every HW mapped LR in vgic_sync_hwirq() for this)
> 
> So that contradicts:
> 
> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> +    but the LR.Active is left untouched (set).
> 
> This is the main point I was actually wondering about: I cannot confirm
> this statement. In my tests the LR state and the physical dist state
> always correspond, as excepted by reading the spec.
> 
> I reckon that these observations are mostly independent from the actual
> KVM code, as I try to observe hardware state (physical distributor and
> LRs) before KVM tinkers with them.

ok, I got this paragraph from Marc, so we really need to ask him?  Which
hardware are you seeing this behavior on?  Perhaps implementations vary
on this point?

I have no objections removing this point from the doc though, I'm just
relaying information on this one.

> 
> ...
> 
> > 
> >> Is this an observation, an implementation bug or is this mentioned in
> >> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
> >> awkward to me.
> > 
> > What do you mean?  How are we spoon-feeding the VGIC?
> 
> By looking at the physical dist state and all LRs and clearing the LR we
> do what the GIC is actually supposed to do for us - and what it actually
> does according to my observations.
> 
> The point is that patch 1 in my ITS emulation series is reworking the LR
> handling and this patch was based on assumptions that seem to be no
> longer true (i.e. we don't care about inactive LRs except for our LR
> mapping code). So I want to be sure that I fully get what is going on
> here and I struggle at this at the moment due to the above statement.
> 
> What are the plans regarding your "v2: Rework architected timer..."
> series? Will this be queued for 4.4? I want to do the
> rebasing^Wrewriting of my series only once if possible ;-)
> 
I think we should settle on this series ASAP and base your ITS stuff on
top of it.  What do you think?

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-14 11:42               ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-14 11:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Andre,

On Fri, Sep 11, 2015 at 12:21:22PM +0100, Andre Przywara wrote:
> Hi Christoffer,
> 
> (actually you are not supposed to reply during your holidays!)

yeah, I know, but I couldn't help myself here.

> 
> On 09/09/15 09:49, Christoffer Dall wrote:
> > On Tue, Sep 8, 2015 at 6:57 PM, Andre Przywara <andre.przywara@arm.com> wrote:
> >> Hi Eric,
> >>
> >> thanks for you answer.
> >>
> >> On 08/09/15 09:43, Eric Auger wrote:
> >>> Hi Andre,
> >>> On 09/07/2015 01:25 PM, Andre Przywara wrote:
> >>>> Hi,
> >>>>
> >>>> firstly: this text is really great, thanks for coming up with that.
> >>>> See below for some information I got from tracing the host which I
> >>>> cannot make sense of....
> >>>>
> >>>>
> >>>> On 04/09/15 20:40, Christoffer Dall wrote:
> >>>>> Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> >>>>> way we deal with them is not apparently easy to understand by reading
> >>>>> various specs.
> >>>>>
> >>>>> Therefore, add a proper documentation file explaining the flow and
> >>>>> rationale of the behavior of the vgic.
> >>>>>
> >>>>> Some of this text was contributed by Marc Zyngier and edited by me.
> >>>>> Omissions and errors are all mine.
> >>>>>
> >>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>>>> ---
> >>>>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
> >>>>>  1 file changed, 181 insertions(+)
> >>>>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >>>>>
> >>>>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >>>>> new file mode 100644
> >>>>> index 0000000..24b6f28
> >>>>> --- /dev/null
> >>>>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >>>>> @@ -0,0 +1,181 @@
> >>>>> +KVM/ARM VGIC Forwarded Physical Interrupts
> >>>>> +==========================================
> >>>>> +
> >>>>> +The KVM/ARM code implements software support for the ARM Generic
> >>>>> +Interrupt Controller's (GIC's) hardware support for virtualization by
> >>>>> +allowing software to inject virtual interrupts to a VM, which the guest
> >>>>> +OS sees as regular interrupts.  The code is famously known as the VGIC.
> >>>>> +
> >>>>> +Some of these virtual interrupts, however, correspond to physical
> >>>>> +interrupts from real physical devices.  One example could be the
> >>>>> +architected timer, which itself supports virtualization, and therefore
> >>>>> +lets a guest OS program the hardware device directly to raise an
> >>>>> +interrupt at some point in time.  When such an interrupt is raised, the
> >>>>> +host OS initially handles the interrupt and must somehow signal this
> >>>>> +event as a virtual interrupt to the guest.  Another example could be a
> >>>>> +passthrough device, where the physical interrupts are initially handled
> >>>>> +by the host, but the device driver for the device lives in the guest OS
> >>>>> +and KVM must therefore somehow inject a virtual interrupt on behalf of
> >>>>> +the physical one to the guest OS.
> >>>>> +
> >>>>> +These virtual interrupts corresponding to a physical interrupt on the
> >>>>> +host are called forwarded physical interrupts, but are also sometimes
> >>>>> +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> >>>>> +
> >>>>> +Forwarded physical interrupts are handled slightly differently compared
> >>>>> +to virtual interrupts generated purely by a software emulated device.
> >>>>> +
> >>>>> +
> >>>>> +The HW bit
> >>>>> +----------
> >>>>> +Virtual interrupts are signalled to the guest by programming the List
> >>>>> +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> >>>>> +with the virtual IRQ number and the state of the interrupt (Pending,
> >>>>> +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> >>>>> +interrupt, the LR state moves from Pending to Active, and finally to
> >>>>> +inactive.
> >>>>> +
> >>>>> +The LRs include an extra bit, called the HW bit.  When this bit is set,
> >>>>> +KVM must also program an additional field in the LR, the physical IRQ
> >>>>> +number, to link the virtual with the physical IRQ.
> >>>>> +
> >>>>> +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> >>>>> +bit, never both at the same time.
> >>>>> +
> >>>>> +Setting the HW bit causes the hardware to deactivate the physical
> >>>>> +interrupt on the physical distributor when the guest deactivates the
> >>>>> +corresponding virtual interrupt.
> >>>>> +
> >>>>> +
> >>>>> +Forwarded Physical Interrupts Life Cycle
> >>>>> +----------------------------------------
> >>>>> +
> >>>>> +The state of forwarded physical interrupts is managed in the following way:
> >>>>> +
> >>>>> +  - The physical interrupt is acked by the host, and becomes active on
> >>>>> +    the physical distributor (*).
> >>>>> +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> >>>>> +    interface is going to present it to the guest.
> >>>>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> >>>>> +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> >>>>> +    expected.
> >>>>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> >>>>> +    but the LR.Active is left untouched (set).
> >>>>
> >>>> I tried hard in the last week, but couldn't confirm this. Tracing shows
> >>>> the following pattern over and over (case 1):
> >>>> (This is the kvm/kvm.git:queue branch from last week, so including the
> >>>> mapped timer IRQ code. Tests were done on Juno and Midway)
> >>>>
> >>>> ...
> >>>> 229.340171: kvm_exit: TRAP: HSR_EC: 0x0001 (WFx), PC: 0xffffffc000098a64
> >>>> 229.340324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0001c63a0
> >>>> 229.340428: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> >>>> 0xffffffc0004089d8
> >>>> 229.340430: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 8,
> >>>> ELRSR: 1, dist active: 0, log. active: 1
> >>>> ....
> >>>>
> >>>> My hunch is that the following happens (please correct me if needed!):
> >>>> First there is an unrelated trap (line 1), then later the guest exits
> >>>> due to to an IRQ (line 2, presumably the timer, the WFx is a red herring
> >>>> here since ESR_EL2.EC is not valid on IRQ triggered exceptions).
> >>>> The host injects the timer IRQ (not shown here) and returns to the
> >>>> guest. On the next trap (line 3, due to a stage 2 page fault),
> >>>> vgic_sync_hwirq() will be called on the LR (line 4) and shows that the
> >>>> GIC actually did deactivate both the LR (state=8, which is inactive,
> >>>> just the HW bit is still set) _and_ the state on the physical
> >>>> distributor (dist active=0). This trace_printk is just after entering
> >>>> the function, so before the code there performs these steps redundantly.
> >>>> Also it shows that the ELRSR bit is set to 1 (empty), so from the GIC
> >>>> point of view this virtual IRQ cycle is finished.
> >>>>
> >>>> The other sequence I see is this one (case 2):
> >>>>
> >>>> ....
> >>>> 231.055324: kvm_exit: IRQ: HSR_EC: 0x0001 (WFx), PC: 0xffffffc0000f0e70
> >>>> 231.055329: kvm_exit: TRAP: HSR_EC: 0x0024 (DABT_LOW), PC:
> >>>> 0xffffffc0004089d8
> >>>> 231.055331: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> >>>> ELRSR: 0, dist active: 1, log. active: 1
> >>>> 231.055338: kvm_exit: IRQ: HSR_EC: 0x0024 (DABT_LOW), PC: 0xffffffc0004089dc
> >>>> 231.055340: kvm_vgic_sync_hwstate: LR0 vIRQ: 27, HWIRQ: 27, LR.state: 9,
> >>>> ELRSR: 0, dist active: 0, log. active: 1
> >>>> ...
> >>>>
> >>>> In line 1 the timer fires, the host injects the timer IRQ into the
> >>>> guest, which exits again in line 2 due to a page fault (may have IRQs
> >>>> disabled?). The LR dump in line 3 shows that the timer IRQ is still
> >>>> pending in the LR (state=9) and active on the physical distributor. Now
> >>>> the code in vgic_sync_hwirq() clears the active state in the physical
> >>>> distributor (by calling irq_set_irqchip_state()), but leaves the LR
> >>>> alone (by returning 0 to the caller).
> >>>> On the next exit (line 4, due to some HW IRQ?) the LR is still the same
> >>>> (line 5), only that the physical dist state in now inactive (due to us
> >>>> clearing that explicitly during the last exit).
> >>> Normally the physical dist state was set active on previous flush, right
> >>> (done for all mapped IRQs)?
> >>
> >> Where is this done? I see that the physical dist state is altered on the
> >> actual IRQ forwarding, but not on later exits/entries? Do you mean
> >> kvm_vgic_flush_hwstate() with "flush"?
> > 
> > this is a bug and should be fixed in the 'fixes' patches I sent last
> > week.  We should set active state on every entry to the guest for IRQs
> > with the HW bit set in either pending or active state.
> 
> OK, sorry, I missed that one patch, I was looking at what should become
> -rc1 soon (because that's what I want to rebase my ITS emulation patches
> on). That patch wasn't in queue at the time I started looking at it.
> 
> So I updated to the latest queue containing those two fixes and also
> applied your v2 series. Indeed this series addresses some of the things
> I was wondering about the last time, but the main thing still persists:
> - Every time the physical dist state is active we have the virtual state
> still at pending or active.

For the arch timer, yes.

For a passthrough device, there should be a situation where the physical
dist state is active but we didn't see the virtual state updated at the
vgic yet (after physical IRQ fires and before the VFIO ISR calls
kvm_set_irq).

> - If the physical dist state is non-active, the virtual state is
> inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
> (LR empty).
> (I was tracing every HW mapped LR in vgic_sync_hwirq() for this)
> 
> So that contradicts:
> 
> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> +    but the LR.Active is left untouched (set).
> 
> This is the main point I was actually wondering about: I cannot confirm
> this statement. In my tests the LR state and the physical dist state
> always correspond, as excepted by reading the spec.
> 
> I reckon that these observations are mostly independent from the actual
> KVM code, as I try to observe hardware state (physical distributor and
> LRs) before KVM tinkers with them.

ok, I got this paragraph from Marc, so we really need to ask him?  Which
hardware are you seeing this behavior on?  Perhaps implementations vary
on this point?

I have no objections removing this point from the doc though, I'm just
relaying information on this one.

> 
> ...
> 
> > 
> >> Is this an observation, an implementation bug or is this mentioned in
> >> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
> >> awkward to me.
> > 
> > What do you mean?  How are we spoon-feeding the VGIC?
> 
> By looking at the physical dist state and all LRs and clearing the LR we
> do what the GIC is actually supposed to do for us - and what it actually
> does according to my observations.
> 
> The point is that patch 1 in my ITS emulation series is reworking the LR
> handling and this patch was based on assumptions that seem to be no
> longer true (i.e. we don't care about inactive LRs except for our LR
> mapping code). So I want to be sure that I fully get what is going on
> here and I struggle at this at the moment due to the above statement.
> 
> What are the plans regarding your "v2: Rework architected timer..."
> series? Will this be queued for 4.4? I want to do the
> rebasing^Wrewriting of my series only once if possible ;-)
> 
I think we should settle on this series ASAP and base your ITS stuff on
top of it.  What do you think?

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-07 16:45     ` Eric Auger
@ 2015-09-14 11:46       ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-14 11:46 UTC (permalink / raw)
  To: Eric Auger; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm

On Mon, Sep 07, 2015 at 06:45:42PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> > Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> > way we deal with them is not apparently easy to understand by reading
> > various specs.
> > 
> > Therefore, add a proper documentation file explaining the flow and
> > rationale of the behavior of the vgic.
> > 
> > Some of this text was contributed by Marc Zyngier and edited by me.
> > Omissions and errors are all mine.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
> >  1 file changed, 181 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > 
> > diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > new file mode 100644
> > index 0000000..24b6f28
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > @@ -0,0 +1,181 @@
> > +KVM/ARM VGIC Forwarded Physical Interrupts
> > +==========================================
> > +
> > +The KVM/ARM code implements software support for the ARM Generic
> > +Interrupt Controller's (GIC's) hardware support for virtualization by
> > +allowing software to inject virtual interrupts to a VM, which the guest
> > +OS sees as regular interrupts.  The code is famously known as the VGIC.
> > +
> > +Some of these virtual interrupts, however, correspond to physical
> > +interrupts from real physical devices.  One example could be the
> > +architected timer, which itself supports virtualization, and therefore
> > +lets a guest OS program the hardware device directly to raise an
> > +interrupt at some point in time.  When such an interrupt is raised, the
> > +host OS initially handles the interrupt and must somehow signal this
> > +event as a virtual interrupt to the guest.  Another example could be a
> > +passthrough device, where the physical interrupts are initially handled
> > +by the host, but the device driver for the device lives in the guest OS
> > +and KVM must therefore somehow inject a virtual interrupt on behalf of
> > +the physical one to the guest OS.
> > +
> > +These virtual interrupts corresponding to a physical interrupt on the
> > +host are called forwarded physical interrupts, but are also sometimes
> > +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> > +
> > +Forwarded physical interrupts are handled slightly differently compared
> > +to virtual interrupts generated purely by a software emulated device.
> > +
> > +
> > +The HW bit
> > +----------
> > +Virtual interrupts are signalled to the guest by programming the List
> > +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> > +with the virtual IRQ number and the state of the interrupt (Pending,
> > +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> > +interrupt, the LR state moves from Pending to Active, and finally to
> > +inactive.
> > +
> > +The LRs include an extra bit, called the HW bit.  When this bit is set,
> > +KVM must also program an additional field in the LR, the physical IRQ
> > +number, to link the virtual with the physical IRQ.
> > +
> > +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> > +bit, never both at the same time.
> > +
> > +Setting the HW bit causes the hardware to deactivate the physical
> > +interrupt on the physical distributor when the guest deactivates the
> > +corresponding virtual interrupt.
> > +
> > +
> > +Forwarded Physical Interrupts Life Cycle
> > +----------------------------------------
> > +
> > +The state of forwarded physical interrupts is managed in the following way:
> > +
> > +  - The physical interrupt is acked by the host, and becomes active on
> > +    the physical distributor (*).
> > +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> > +    interface is going to present it to the guest.
> > +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> > +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> > +    expected.
> > +  - On guest EOI, the *physical distributor* active bit gets cleared,
> > +    but the LR.Active is left untouched (set).
> > +  - KVM clears the LR when on VM exits when the physical distributor
> s/when//?
> > +    active state has been cleared.
> > +
> > +(*): The host handling is slightly more complicated.  For some devices
> > +(shared), KVM directly sets the active state on the physical distributor
> > +before entering the guest, and for some devices (non-shared) the host
> > +configures the GIC such that it does not deactivate the interrupt on
> > +host EOIs, but only performs a priority drop allowing the GIC to receive
> > +other interrupts and leaves the interrupt in the active state on the
> > +physical distributor.
> EOIMode == 1 is set globally and impacts all forwarded SPI/PPIs, shared
> or not shared I think. reading the above lines I have the impression
> this is a per-device programming.

true, this was me remembering incorrectly.

> 
> My understanding is for the timer it is needed to manually set the
> physical distributor state because 1) the HW (GIC) does not do it and 2)
> we need to context switch depending on the vCPU. For non shared device
> the GIC sets the physical distributor state and the state is fully
> maintained by HW until the guest deactivation.

true, I'll try to rework this slightly.

> 
> > +
> > +
> > +Forwarded Edge and Level Triggered PPIs and SPIs
> > +------------------------------------------------
> > +Forwarded physical interrupts injected should always be active on the
> > +physical distributor when injected to a guest.
> > +
> > +Level-triggered interrupts will keep the interrupt line to the GIC
> > +asserted, typically until the guest programs the device to deassert the
> > +line.  This means that the interrupt will remain pending on the physical
> > +distributor until the guest has reprogrammed the device.  Since we
> > +always run the VM with interrupts enabled on the CPU, a pending
> > +interrupt will exit the guest as soon as we switch into the guest,
> > +preventing the guest from ever making progress as the process repeats
> > +over and over.  Therefore, the active state on the physical distributor
> > +must be set when entering the guest, preventing the GIC from forwarding
> > +the pending interrupt to the CPU.  As soon as the guest deactivates
> > +(EOIs) the interrupt, the physical line is sampled by the hardware again
> I think you can remove "(EOI)". This depends on EOI mode setting on
> guest side. it can be 2-in-1 EOI or EOI+DIR.

right, fair enough.

> > +and the host takes a new interrupt if and only if the physical line is
> > +still asserted.
> > +
> > +Edge-triggered interrupts do not exhibit the same problem with
> > +preventing guest execution that level-triggered interrupts do.  One
> > +option is to not use HW bit at all, and inject edge-triggered interrupts
> > +from a physical device as pure virtual interrupts.  But that would
> > +potentially slow down handling of the interrupt in the guest, because a
> > +physical interrupt occurring in the middle of the guest ISR would
> > +preempt the guest for the host to handle the interrupt.  Additionally,
> > +if you configure the system to handle interrupts on a separate physical
> > +core from that running your VCPU, you still have to interrupt the VCPU
> > +to queue the pending state onto the LR, even though the guest won't use
> > +this information until the guest ISR completes.  Therefore, the HW
> > +bit should always be set for forwarded edge-triggered interrupts.  With
> > +the HW bit set, the virtual interrupt is injected and additional
> > +physical interrupts occurring before the guest deactivates the interrupt
> > +simply mark the state on the physical distributor as Pending+Active.  As
> > +soon as the guest deactivates the interrupt, the host takes another
> > +interrupt if and only if there was a physical interrupt between
> > +injecting the forwarded interrupt to the guest
> missing and?

yes, thanks.

>  the guest deactivating
> > +the interrupt.
> > +
> > +Consequently, whenever we schedule a VCPU with one or more LRs with the
> > +HW bit set, the interrupt must also be active on the physical
> > +distributor.
> > +
> > +
> > +Forwarded LPIs
> > +--------------
> > +LPIs, introduced in GICv3, are always edge-triggered and do not have an
> > +active state.  They become pending when a device signal them, and as
> > +soon as they are acked by the CPU, they are inactive again.
> > +
> > +It therefore doesn't make sense, and is not supported, to set the HW bit
> > +for physical LPIs that are forwarded to a VM as virtual interrupts,
> > +typically virtual SPIs.
> > +
> > +For LPIs, there is no other choice than to preempt the VCPU thread if
> > +necessary, and queue the pending state onto the LR.
> > +
> > +
> > +Putting It Together: The Architected Timer
> > +------------------------------------------
> > +The architected timer is a device that signals interrupts with level
> > +triggered semantics.  The timer hardware is directly accessed by VCPUs
> > +which program the timer to fire at some point in time.  Each VCPU on a
> > +system programs the timer to fire at different times, and therefore the
> > +hardware is multiplexed between multiple VCPUs.  This is implemented by
> > +context-switching the timer state along with each VCPU thread.
> > +
> > +However, this means that a scenario like the following is entirely
> > +possible, and in fact, typical:
> > +
> > +1.  KVM runs the VCPU
> > +2.  The guest programs the time to fire in T+100
> > +3.  The guest is idle and calls WFI (wait-for-interrupts)
> > +4.  The hardware traps to the host
> > +5.  KVM stores the timer state to memory and disables the hardware timer
> > +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
> > +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
> > +8.  The soft timer fires, waking up the VCPU thread
> > +9.  KVM reprograms the timer hardware with the VCPU's values
> > +10. KVM marks the timer interrupt as active on the physical distributor
> > +11. KVM injects a forwarded physical interrupt to the guest
> > +12. KVM runs the VCPU
> > +
> > +Notice that KVM injects a forwarded physical interrupt in step 11 without
> > +the corresponding interrupt having actually fired on the host.  That is
> > +exactly why we mark the timer interrupt as active in step 10, because
> > +the active state on the physical distributor is part of the state
> > +belonging to the timer hardware, which is context-switched along with
> > +the VCPU thread.
> > +
> > +If the guest does not idle because it is busy, flow looks like this
> > +instead:
> > +
> > +1.  KVM runs the VCPU
> > +2.  The guest programs the time to fire in T+100
> > +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
> > +5.  With interrupts disabled on the CPU, KVM looks at the timer state
> > +    and injects a forwarded physical interrupt because it concludes the
> > +    timer has expired.
> I don't get how we can trap without the virtual timer PPI handler being
> entered on host side. Please can you elaborate on this?

As Marc pointed out, we trap with interrupts disabled, disable the
virtual timer, then re-enable interrupts and since the interrupt is now
disabled the interrupt is not taken.

Recall that the way we deal with physical interrupts is basically (1)
trap to EL2 with interrupts disable (2) switch all state (3) re-enable
interrupts on the CPU (4) the interrupt hits again and then the host ISR
runs.

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-14 11:46       ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-14 11:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 07, 2015 at 06:45:42PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> > Forwarded physical interrupts on arm/arm64 is a tricky concept and the
> > way we deal with them is not apparently easy to understand by reading
> > various specs.
> > 
> > Therefore, add a proper documentation file explaining the flow and
> > rationale of the behavior of the vgic.
> > 
> > Some of this text was contributed by Marc Zyngier and edited by me.
> > Omissions and errors are all mine.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 181 +++++++++++++++++++++
> >  1 file changed, 181 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > 
> > diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > new file mode 100644
> > index 0000000..24b6f28
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > @@ -0,0 +1,181 @@
> > +KVM/ARM VGIC Forwarded Physical Interrupts
> > +==========================================
> > +
> > +The KVM/ARM code implements software support for the ARM Generic
> > +Interrupt Controller's (GIC's) hardware support for virtualization by
> > +allowing software to inject virtual interrupts to a VM, which the guest
> > +OS sees as regular interrupts.  The code is famously known as the VGIC.
> > +
> > +Some of these virtual interrupts, however, correspond to physical
> > +interrupts from real physical devices.  One example could be the
> > +architected timer, which itself supports virtualization, and therefore
> > +lets a guest OS program the hardware device directly to raise an
> > +interrupt at some point in time.  When such an interrupt is raised, the
> > +host OS initially handles the interrupt and must somehow signal this
> > +event as a virtual interrupt to the guest.  Another example could be a
> > +passthrough device, where the physical interrupts are initially handled
> > +by the host, but the device driver for the device lives in the guest OS
> > +and KVM must therefore somehow inject a virtual interrupt on behalf of
> > +the physical one to the guest OS.
> > +
> > +These virtual interrupts corresponding to a physical interrupt on the
> > +host are called forwarded physical interrupts, but are also sometimes
> > +referred to as 'virtualized physical interrupts' and 'mapped interrupts'.
> > +
> > +Forwarded physical interrupts are handled slightly differently compared
> > +to virtual interrupts generated purely by a software emulated device.
> > +
> > +
> > +The HW bit
> > +----------
> > +Virtual interrupts are signalled to the guest by programming the List
> > +Registers (LRs) on the GIC before running a VCPU.  The LR is programmed
> > +with the virtual IRQ number and the state of the interrupt (Pending,
> > +Active, or Pending+Active).  When the guest ACKs and EOIs a virtual
> > +interrupt, the LR state moves from Pending to Active, and finally to
> > +inactive.
> > +
> > +The LRs include an extra bit, called the HW bit.  When this bit is set,
> > +KVM must also program an additional field in the LR, the physical IRQ
> > +number, to link the virtual with the physical IRQ.
> > +
> > +When the HW bit is set, KVM must EITHER set the Pending OR the Active
> > +bit, never both at the same time.
> > +
> > +Setting the HW bit causes the hardware to deactivate the physical
> > +interrupt on the physical distributor when the guest deactivates the
> > +corresponding virtual interrupt.
> > +
> > +
> > +Forwarded Physical Interrupts Life Cycle
> > +----------------------------------------
> > +
> > +The state of forwarded physical interrupts is managed in the following way:
> > +
> > +  - The physical interrupt is acked by the host, and becomes active on
> > +    the physical distributor (*).
> > +  - KVM sets the LR.Pending bit, because this is the only way the GICV
> > +    interface is going to present it to the guest.
> > +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> > +  - LR.Pending transitions to LR.Active on the guest read of the IAR, as
> > +    expected.
> > +  - On guest EOI, the *physical distributor* active bit gets cleared,
> > +    but the LR.Active is left untouched (set).
> > +  - KVM clears the LR when on VM exits when the physical distributor
> s/when//?
> > +    active state has been cleared.
> > +
> > +(*): The host handling is slightly more complicated.  For some devices
> > +(shared), KVM directly sets the active state on the physical distributor
> > +before entering the guest, and for some devices (non-shared) the host
> > +configures the GIC such that it does not deactivate the interrupt on
> > +host EOIs, but only performs a priority drop allowing the GIC to receive
> > +other interrupts and leaves the interrupt in the active state on the
> > +physical distributor.
> EOIMode == 1 is set globally and impacts all forwarded SPI/PPIs, shared
> or not shared I think. reading the above lines I have the impression
> this is a per-device programming.

true, this was me remembering incorrectly.

> 
> My understanding is for the timer it is needed to manually set the
> physical distributor state because 1) the HW (GIC) does not do it and 2)
> we need to context switch depending on the vCPU. For non shared device
> the GIC sets the physical distributor state and the state is fully
> maintained by HW until the guest deactivation.

true, I'll try to rework this slightly.

> 
> > +
> > +
> > +Forwarded Edge and Level Triggered PPIs and SPIs
> > +------------------------------------------------
> > +Forwarded physical interrupts injected should always be active on the
> > +physical distributor when injected to a guest.
> > +
> > +Level-triggered interrupts will keep the interrupt line to the GIC
> > +asserted, typically until the guest programs the device to deassert the
> > +line.  This means that the interrupt will remain pending on the physical
> > +distributor until the guest has reprogrammed the device.  Since we
> > +always run the VM with interrupts enabled on the CPU, a pending
> > +interrupt will exit the guest as soon as we switch into the guest,
> > +preventing the guest from ever making progress as the process repeats
> > +over and over.  Therefore, the active state on the physical distributor
> > +must be set when entering the guest, preventing the GIC from forwarding
> > +the pending interrupt to the CPU.  As soon as the guest deactivates
> > +(EOIs) the interrupt, the physical line is sampled by the hardware again
> I think you can remove "(EOI)". This depends on EOI mode setting on
> guest side. it can be 2-in-1 EOI or EOI+DIR.

right, fair enough.

> > +and the host takes a new interrupt if and only if the physical line is
> > +still asserted.
> > +
> > +Edge-triggered interrupts do not exhibit the same problem with
> > +preventing guest execution that level-triggered interrupts do.  One
> > +option is to not use HW bit at all, and inject edge-triggered interrupts
> > +from a physical device as pure virtual interrupts.  But that would
> > +potentially slow down handling of the interrupt in the guest, because a
> > +physical interrupt occurring in the middle of the guest ISR would
> > +preempt the guest for the host to handle the interrupt.  Additionally,
> > +if you configure the system to handle interrupts on a separate physical
> > +core from that running your VCPU, you still have to interrupt the VCPU
> > +to queue the pending state onto the LR, even though the guest won't use
> > +this information until the guest ISR completes.  Therefore, the HW
> > +bit should always be set for forwarded edge-triggered interrupts.  With
> > +the HW bit set, the virtual interrupt is injected and additional
> > +physical interrupts occurring before the guest deactivates the interrupt
> > +simply mark the state on the physical distributor as Pending+Active.  As
> > +soon as the guest deactivates the interrupt, the host takes another
> > +interrupt if and only if there was a physical interrupt between
> > +injecting the forwarded interrupt to the guest
> missing and?

yes, thanks.

>  the guest deactivating
> > +the interrupt.
> > +
> > +Consequently, whenever we schedule a VCPU with one or more LRs with the
> > +HW bit set, the interrupt must also be active on the physical
> > +distributor.
> > +
> > +
> > +Forwarded LPIs
> > +--------------
> > +LPIs, introduced in GICv3, are always edge-triggered and do not have an
> > +active state.  They become pending when a device signal them, and as
> > +soon as they are acked by the CPU, they are inactive again.
> > +
> > +It therefore doesn't make sense, and is not supported, to set the HW bit
> > +for physical LPIs that are forwarded to a VM as virtual interrupts,
> > +typically virtual SPIs.
> > +
> > +For LPIs, there is no other choice than to preempt the VCPU thread if
> > +necessary, and queue the pending state onto the LR.
> > +
> > +
> > +Putting It Together: The Architected Timer
> > +------------------------------------------
> > +The architected timer is a device that signals interrupts with level
> > +triggered semantics.  The timer hardware is directly accessed by VCPUs
> > +which program the timer to fire at some point in time.  Each VCPU on a
> > +system programs the timer to fire at different times, and therefore the
> > +hardware is multiplexed between multiple VCPUs.  This is implemented by
> > +context-switching the timer state along with each VCPU thread.
> > +
> > +However, this means that a scenario like the following is entirely
> > +possible, and in fact, typical:
> > +
> > +1.  KVM runs the VCPU
> > +2.  The guest programs the time to fire in T+100
> > +3.  The guest is idle and calls WFI (wait-for-interrupts)
> > +4.  The hardware traps to the host
> > +5.  KVM stores the timer state to memory and disables the hardware timer
> > +6.  KVM schedules a soft timer to fire in T+(100 - time since step 2)
> > +7.  KVM puts the VCPU thread to sleep (on a waitqueue)
> > +8.  The soft timer fires, waking up the VCPU thread
> > +9.  KVM reprograms the timer hardware with the VCPU's values
> > +10. KVM marks the timer interrupt as active on the physical distributor
> > +11. KVM injects a forwarded physical interrupt to the guest
> > +12. KVM runs the VCPU
> > +
> > +Notice that KVM injects a forwarded physical interrupt in step 11 without
> > +the corresponding interrupt having actually fired on the host.  That is
> > +exactly why we mark the timer interrupt as active in step 10, because
> > +the active state on the physical distributor is part of the state
> > +belonging to the timer hardware, which is context-switched along with
> > +the VCPU thread.
> > +
> > +If the guest does not idle because it is busy, flow looks like this
> > +instead:
> > +
> > +1.  KVM runs the VCPU
> > +2.  The guest programs the time to fire in T+100
> > +4.  At T+100 the timer fires and a physical IRQ causes the VM to exit
> > +5.  With interrupts disabled on the CPU, KVM looks at the timer state
> > +    and injects a forwarded physical interrupt because it concludes the
> > +    timer has expired.
> I don't get how we can trap without the virtual timer PPI handler being
> entered on host side. Please can you elaborate on this?

As Marc pointed out, we trap with interrupts disabled, disable the
virtual timer, then re-enable interrupts and since the interrupt is now
disabled the interrupt is not taken.

Recall that the way we deal with physical interrupts is basically (1)
trap to EL2 with interrupts disable (2) switch all state (3) re-enable
interrupts on the CPU (4) the interrupt hits again and then the host ISR
runs.

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-14  9:29     ` Eric Auger
@ 2015-09-14 11:48       ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-14 11:48 UTC (permalink / raw)
  To: Eric Auger; +Cc: kvmarm, linux-arm-kernel, kvm, Marc Zyngier

On Mon, Sep 14, 2015 at 11:29:53AM +0200, Eric Auger wrote:
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> > The arch timer currently uses edge-triggered semantics in the sense that
> > the line is never sampled by the vgic and lowering the line from the
> > timer to the vgic doesn't have any affect on the pending state of
> s/affect/effect
> > virtual interrupts in the vgic.  This means that we do not support a
> > guest with the otherwise valid behavior of (1) disable interrupts (2)
> > enable the timer (3) disable the timer (4) enable interrupts.  Such a
> > guest would validly not expect to see any interrupts on real hardware,
> > but will see interrupts on KVM.
> > 
> > This patches fixes this shortcoming through the following series of
> > changes.
> > 
> > First, we change the flow of the timer/vgic sync/flush operations.  Now
> > the timer is always flushed/synced before the vgic,
> for the flush it was already the case
>  because the vgic
> > samples the state of the timer output.  This has the implication that we
> > move the timer operations in to non-preempible sections, but that is
> > fine after the previous commit getting rid of hrtimer schedules on every
> > entry/exit.
> > 
> > Second, we change the internal behavior of the timer, letting the timer
> > keep track of its previous output state, and only lower/raise the line
> > to the vgic when the state changes.  Note that in theory this could have
> > been accomplished more simply by signalling the vgic every time the
> > state *potentially* changed, but we don't want to be hitting the vgic
> > more often than necessary.
> > 
> > Third, we get rid of the use of the map->active field in the vgic and
> > instead simply set the interrupt as active on the physical distributor
> > whenever we signal a mapped interrupt to the guest, and we reset the
> > active state when we sync back the HW state from the vgic.
> > 
> > Fourth, and finally, we now initialize the timer PPIs (and all the other
> > unused PPIs for now), to be level-triggered, and modify the sync code to
> > sample the line state on HW sync and re-inject a new interrupt if it is
> > still pending at that time.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/kvm/arm.c           | 11 ++++++--
> >  include/kvm/arm_arch_timer.h |  2 +-
> >  include/kvm/arm_vgic.h       |  3 --
> >  virt/kvm/arm/arch_timer.c    | 65 +++++++++++++++++++++++++++++-------------
> >  virt/kvm/arm/vgic.c          | 67 +++++++++++++++-----------------------------
> >  5 files changed, 78 insertions(+), 70 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index bdf8871..102a4aa 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >  			local_irq_enable();
> > +			kvm_timer_sync_hwstate(vcpu);
> >  			kvm_vgic_sync_hwstate(vcpu);
> >  			preempt_enable();
> > -			kvm_timer_sync_hwstate(vcpu);
> >  			continue;
> >  		}
> >  
> > @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_guest_exit();
> >  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >  
> > +		/*
> > +		 * We must sync the timer state before the vgic state so that
> > +		 * the vgic can properly sample the updated state of the
> > +		 * interrupt line.
> > +		 */
> > +		kvm_timer_sync_hwstate(vcpu);
> > +
> >  		kvm_vgic_sync_hwstate(vcpu);
> >  
> >  		preempt_enable();
> >  
> > -		kvm_timer_sync_hwstate(vcpu);
> > -
> >  		ret = handle_exit(vcpu, run, ret);
> >  	}
> >  
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index ef14cc1..1800227 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -51,7 +51,7 @@ struct arch_timer_cpu {
> >  	bool				armed;
> >  
> >  	/* Timer IRQ */
> > -	const struct kvm_irq_level	*irq;
> > +	struct kvm_irq_level		irq;
> >  
> >  	/* VGIC mapping */
> >  	struct irq_phys_map		*map;
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index d901f1a..99011a0 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -163,7 +163,6 @@ struct irq_phys_map {
> >  	u32			virt_irq;
> >  	u32			phys_irq;
> >  	u32			irq;
> > -	bool			active;
> >  };
> >  
> >  struct irq_phys_map_entry {
> > @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> >  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> >  					   int virt_irq, int irq);
> >  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> > -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> > -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> >  
> >  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
> >  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 7991537..0cdd092 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
> >  	}
> >  }
> >  
> > -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> > -{
> > -	int ret;
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -
> > -	kvm_vgic_set_phys_irq_active(timer->map, true);
> > -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> > -					 timer->map,
> > -					 timer->irq->level);
> > -	WARN_ON(ret);
> > -}
> > -
> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> >  {
> >  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > @@ -116,8 +104,7 @@ static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  
> >  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> > -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> > -		!kvm_vgic_get_phys_irq_active(timer->map);
> > +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
> >  }
> >  
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> > @@ -134,6 +121,41 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  	return cval <= now;
> >  }
> >  
> > +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
> > +{
> > +	int ret;
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	BUG_ON(!vgic_initialized(vcpu->kvm));
> > +
> > +	timer->irq.level = new_level;
> > +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> > +					 timer->map,
> > +					 timer->irq.level);
> > +	WARN_ON(ret);
> > +}
> > +
> > +/*
> > + * Check if there was a change in the timer state (should we raise or lower
> > + * the line level to the GIC).
> > + */
> > +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	/*
> > +	 * If userspace modified the timer registers via SET_ONE_REG before
> > +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> > +	 * because the guest would never see the interrupt.  Instead wait
> > +	 * until we call this funciton from kvm_timer_flush_hwstate.
> s/funciton/function
> > +	 */
> > +	if (!vgic_initialized(vcpu->kvm))
> > +	    return;
> > +
> > +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
> > +		kvm_timer_update_irq(vcpu, !timer->irq.level);
> > +}
> > +
> >  /*
> >   * Schedule the background timer before calling kvm_vcpu_block, so that this
> >   * thread is removed from its waitqueue and made runnable when there's a timer
> > @@ -193,8 +215,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	 * If the timer expired while we were not scheduled, now is the time
> >  	 * to inject it.
> >  	 */
> > -	if (kvm_timer_should_fire(vcpu))
> > -		kvm_timer_inject_irq(vcpu);
> > +	kvm_timer_update_state(vcpu);
> >  }
> >  
> >  /**
> > @@ -210,8 +231,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  
> >  	BUG_ON(timer_is_armed(timer));
> >  
> > -	if (kvm_timer_should_fire(vcpu))
> > -		kvm_timer_inject_irq(vcpu);
> > +	/*
> > +	 * The guest could have modified the timer registers or the timer
> > +	 * could have expired, update the timer state.
> > +	 */
> > +	kvm_timer_update_state(vcpu);
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> > @@ -226,7 +250,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >  	 * kvm_vcpu_set_target(). To handle this, we determine
> >  	 * vcpu timer irq number when the vcpu is reset.
> >  	 */
> > -	timer->irq = irq;
> > +	timer->irq.irq = irq->irq;
> >  
> >  	/*
> >  	 * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> > @@ -235,6 +259,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >  	 * the ARMv7 architecture.
> >  	 */
> >  	timer->cntv_ctl = 0;
> > +	kvm_timer_update_state(vcpu);
> >  
> >  	/*
> >  	 * Tell the VGIC that the virtual interrupt is tied to a
> > @@ -279,6 +304,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  	default:
> >  		return -1;
> >  	}
> > +
> > +	kvm_timer_update_state(vcpu);
> >  	return 0;
> >  }
> >  
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 9ed8d53..f4ea950 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  /*
> >   * Save the physical active state, and reset it to inactive.
> >   *
> > - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> > + * Return true if there's a pending level triggered interrupt line to queue.
> >   */
> > -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> > +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> >  {
> >  	struct irq_phys_map *map;
> > +	bool phys_active;
> >  	int ret;
> >  
> >  	if (!(vlr.state & LR_HW))
> >  		return 0;
> >  
> >  	map = vgic_irq_map_search(vcpu, vlr.irq);
> > -	BUG_ON(!map || !map->active);
> > +	BUG_ON(!map);
> >  
> >  	ret = irq_get_irqchip_state(map->irq,
> >  				    IRQCHIP_STATE_ACTIVE,
> > -				    &map->active);
> > +				    &phys_active);
> >  
> >  	WARN_ON(ret);
> >  
> > -	if (map->active) {
> > +	if (phys_active) {
> > +		/*
> > +		 * Interrupt still marked as active on the physical
> > +		 * distributor, so guest did not EOI it yet.  Reset to
> > +		 * non-active so that other VMs can see interrupts from this
> > +		 * device.
> > +		 */
> >  		ret = irq_set_irqchip_state(map->irq,
> >  					    IRQCHIP_STATE_ACTIVE,
> >  					    false);
> >  		WARN_ON(ret);
> > -		return 0;
> > +		return false;
> >  	}
> >  
> > -	return 1;
> > +	/* Mapped edge-triggered interrupts not yet supported. */
> > +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> > +	return process_level_irq(vcpu, lr, vlr);
> >  }
> >  
> >  /* Sync back the VGIC state after a guest run */
> > @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
> >  			continue;
> >  
> >  		vlr = vgic_get_lr(vcpu, lr);
> > -		if (vgic_sync_hwirq(vcpu, vlr)) {
> > -			/*
> > -			 * So this is a HW interrupt that the guest
> > -			 * EOI-ed. Clean the LR state and allow the
> > -			 * interrupt to be sampled again.
> > -			 */
> > -			vlr.state = 0;
> > -			vlr.hwirq = 0;
> > -			vgic_set_lr(vcpu, lr, vlr);
> > -			vgic_irq_clear_queued(vcpu, vlr.irq);
> > -			set_bit(lr, elrsr_ptr);
> > -		}
> > +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> > +			level_pending = true;
> >  
> >  		if (!test_bit(lr, elrsr_ptr))
> >  			continue;
> > @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
> >  }
> >  
> >  /**
> > - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> > - *
> > - * Return the logical active state of a mapped interrupt. This doesn't
> > - * necessarily reflects the current HW state.
> > - */
> > -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> > -{
> > -	BUG_ON(!map);
> > -	return map->active;
> > -}
> > -
> > -/**
> > - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> > - *
> > - * Set the logical active state of a mapped interrupt. This doesn't
> > - * immediately affects the HW state.
> > - */
> > -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> > -{
> > -	BUG_ON(!map);
> > -	map->active = active;
> > -}
> > -
> > -/**
> >   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
> >   * @vcpu: The VCPU pointer
> >   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> > @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
> >  			if (i < VGIC_NR_SGIS)
> >  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
> >  							vcpu->vcpu_id, i, 1);
> > -			if (i < VGIC_NR_PRIVATE_IRQS)
> > +			if (i < VGIC_NR_SGIS)
> >  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> >  							vcpu->vcpu_id, i,
> >  							VGIC_CFG_EDGE);
> > +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> > +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> > +							vcpu->vcpu_id, i,
> > +							VGIC_CFG_LEVEL);
> nit: use the same if block for enable & cfg?

sure.

Thanks,
-Christoffer

> >  		}
> >  
> >  		vgic_enable(vcpu);
> > 
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-14 11:48       ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-14 11:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Sep 14, 2015 at 11:29:53AM +0200, Eric Auger wrote:
> On 09/04/2015 09:40 PM, Christoffer Dall wrote:
> > The arch timer currently uses edge-triggered semantics in the sense that
> > the line is never sampled by the vgic and lowering the line from the
> > timer to the vgic doesn't have any affect on the pending state of
> s/affect/effect
> > virtual interrupts in the vgic.  This means that we do not support a
> > guest with the otherwise valid behavior of (1) disable interrupts (2)
> > enable the timer (3) disable the timer (4) enable interrupts.  Such a
> > guest would validly not expect to see any interrupts on real hardware,
> > but will see interrupts on KVM.
> > 
> > This patches fixes this shortcoming through the following series of
> > changes.
> > 
> > First, we change the flow of the timer/vgic sync/flush operations.  Now
> > the timer is always flushed/synced before the vgic,
> for the flush it was already the case
>  because the vgic
> > samples the state of the timer output.  This has the implication that we
> > move the timer operations in to non-preempible sections, but that is
> > fine after the previous commit getting rid of hrtimer schedules on every
> > entry/exit.
> > 
> > Second, we change the internal behavior of the timer, letting the timer
> > keep track of its previous output state, and only lower/raise the line
> > to the vgic when the state changes.  Note that in theory this could have
> > been accomplished more simply by signalling the vgic every time the
> > state *potentially* changed, but we don't want to be hitting the vgic
> > more often than necessary.
> > 
> > Third, we get rid of the use of the map->active field in the vgic and
> > instead simply set the interrupt as active on the physical distributor
> > whenever we signal a mapped interrupt to the guest, and we reset the
> > active state when we sync back the HW state from the vgic.
> > 
> > Fourth, and finally, we now initialize the timer PPIs (and all the other
> > unused PPIs for now), to be level-triggered, and modify the sync code to
> > sample the line state on HW sync and re-inject a new interrupt if it is
> > still pending at that time.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/kvm/arm.c           | 11 ++++++--
> >  include/kvm/arm_arch_timer.h |  2 +-
> >  include/kvm/arm_vgic.h       |  3 --
> >  virt/kvm/arm/arch_timer.c    | 65 +++++++++++++++++++++++++++++-------------
> >  virt/kvm/arm/vgic.c          | 67 +++++++++++++++-----------------------------
> >  5 files changed, 78 insertions(+), 70 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index bdf8871..102a4aa 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >  			local_irq_enable();
> > +			kvm_timer_sync_hwstate(vcpu);
> >  			kvm_vgic_sync_hwstate(vcpu);
> >  			preempt_enable();
> > -			kvm_timer_sync_hwstate(vcpu);
> >  			continue;
> >  		}
> >  
> > @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_guest_exit();
> >  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >  
> > +		/*
> > +		 * We must sync the timer state before the vgic state so that
> > +		 * the vgic can properly sample the updated state of the
> > +		 * interrupt line.
> > +		 */
> > +		kvm_timer_sync_hwstate(vcpu);
> > +
> >  		kvm_vgic_sync_hwstate(vcpu);
> >  
> >  		preempt_enable();
> >  
> > -		kvm_timer_sync_hwstate(vcpu);
> > -
> >  		ret = handle_exit(vcpu, run, ret);
> >  	}
> >  
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index ef14cc1..1800227 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -51,7 +51,7 @@ struct arch_timer_cpu {
> >  	bool				armed;
> >  
> >  	/* Timer IRQ */
> > -	const struct kvm_irq_level	*irq;
> > +	struct kvm_irq_level		irq;
> >  
> >  	/* VGIC mapping */
> >  	struct irq_phys_map		*map;
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index d901f1a..99011a0 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -163,7 +163,6 @@ struct irq_phys_map {
> >  	u32			virt_irq;
> >  	u32			phys_irq;
> >  	u32			irq;
> > -	bool			active;
> >  };
> >  
> >  struct irq_phys_map_entry {
> > @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> >  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> >  					   int virt_irq, int irq);
> >  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> > -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> > -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> >  
> >  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
> >  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 7991537..0cdd092 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
> >  	}
> >  }
> >  
> > -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> > -{
> > -	int ret;
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -
> > -	kvm_vgic_set_phys_irq_active(timer->map, true);
> > -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> > -					 timer->map,
> > -					 timer->irq->level);
> > -	WARN_ON(ret);
> > -}
> > -
> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> >  {
> >  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > @@ -116,8 +104,7 @@ static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  
> >  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> > -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> > -		!kvm_vgic_get_phys_irq_active(timer->map);
> > +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
> >  }
> >  
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> > @@ -134,6 +121,41 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  	return cval <= now;
> >  }
> >  
> > +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level)
> > +{
> > +	int ret;
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	BUG_ON(!vgic_initialized(vcpu->kvm));
> > +
> > +	timer->irq.level = new_level;
> > +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> > +					 timer->map,
> > +					 timer->irq.level);
> > +	WARN_ON(ret);
> > +}
> > +
> > +/*
> > + * Check if there was a change in the timer state (should we raise or lower
> > + * the line level to the GIC).
> > + */
> > +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	/*
> > +	 * If userspace modified the timer registers via SET_ONE_REG before
> > +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> > +	 * because the guest would never see the interrupt.  Instead wait
> > +	 * until we call this funciton from kvm_timer_flush_hwstate.
> s/funciton/function
> > +	 */
> > +	if (!vgic_initialized(vcpu->kvm))
> > +	    return;
> > +
> > +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
> > +		kvm_timer_update_irq(vcpu, !timer->irq.level);
> > +}
> > +
> >  /*
> >   * Schedule the background timer before calling kvm_vcpu_block, so that this
> >   * thread is removed from its waitqueue and made runnable when there's a timer
> > @@ -193,8 +215,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	 * If the timer expired while we were not scheduled, now is the time
> >  	 * to inject it.
> >  	 */
> > -	if (kvm_timer_should_fire(vcpu))
> > -		kvm_timer_inject_irq(vcpu);
> > +	kvm_timer_update_state(vcpu);
> >  }
> >  
> >  /**
> > @@ -210,8 +231,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  
> >  	BUG_ON(timer_is_armed(timer));
> >  
> > -	if (kvm_timer_should_fire(vcpu))
> > -		kvm_timer_inject_irq(vcpu);
> > +	/*
> > +	 * The guest could have modified the timer registers or the timer
> > +	 * could have expired, update the timer state.
> > +	 */
> > +	kvm_timer_update_state(vcpu);
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> > @@ -226,7 +250,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >  	 * kvm_vcpu_set_target(). To handle this, we determine
> >  	 * vcpu timer irq number when the vcpu is reset.
> >  	 */
> > -	timer->irq = irq;
> > +	timer->irq.irq = irq->irq;
> >  
> >  	/*
> >  	 * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> > @@ -235,6 +259,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >  	 * the ARMv7 architecture.
> >  	 */
> >  	timer->cntv_ctl = 0;
> > +	kvm_timer_update_state(vcpu);
> >  
> >  	/*
> >  	 * Tell the VGIC that the virtual interrupt is tied to a
> > @@ -279,6 +304,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  	default:
> >  		return -1;
> >  	}
> > +
> > +	kvm_timer_update_state(vcpu);
> >  	return 0;
> >  }
> >  
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 9ed8d53..f4ea950 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  /*
> >   * Save the physical active state, and reset it to inactive.
> >   *
> > - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> > + * Return true if there's a pending level triggered interrupt line to queue.
> >   */
> > -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> > +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> >  {
> >  	struct irq_phys_map *map;
> > +	bool phys_active;
> >  	int ret;
> >  
> >  	if (!(vlr.state & LR_HW))
> >  		return 0;
> >  
> >  	map = vgic_irq_map_search(vcpu, vlr.irq);
> > -	BUG_ON(!map || !map->active);
> > +	BUG_ON(!map);
> >  
> >  	ret = irq_get_irqchip_state(map->irq,
> >  				    IRQCHIP_STATE_ACTIVE,
> > -				    &map->active);
> > +				    &phys_active);
> >  
> >  	WARN_ON(ret);
> >  
> > -	if (map->active) {
> > +	if (phys_active) {
> > +		/*
> > +		 * Interrupt still marked as active on the physical
> > +		 * distributor, so guest did not EOI it yet.  Reset to
> > +		 * non-active so that other VMs can see interrupts from this
> > +		 * device.
> > +		 */
> >  		ret = irq_set_irqchip_state(map->irq,
> >  					    IRQCHIP_STATE_ACTIVE,
> >  					    false);
> >  		WARN_ON(ret);
> > -		return 0;
> > +		return false;
> >  	}
> >  
> > -	return 1;
> > +	/* Mapped edge-triggered interrupts not yet supported. */
> > +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> > +	return process_level_irq(vcpu, lr, vlr);
> >  }
> >  
> >  /* Sync back the VGIC state after a guest run */
> > @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
> >  			continue;
> >  
> >  		vlr = vgic_get_lr(vcpu, lr);
> > -		if (vgic_sync_hwirq(vcpu, vlr)) {
> > -			/*
> > -			 * So this is a HW interrupt that the guest
> > -			 * EOI-ed. Clean the LR state and allow the
> > -			 * interrupt to be sampled again.
> > -			 */
> > -			vlr.state = 0;
> > -			vlr.hwirq = 0;
> > -			vgic_set_lr(vcpu, lr, vlr);
> > -			vgic_irq_clear_queued(vcpu, vlr.irq);
> > -			set_bit(lr, elrsr_ptr);
> > -		}
> > +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> > +			level_pending = true;
> >  
> >  		if (!test_bit(lr, elrsr_ptr))
> >  			continue;
> > @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
> >  }
> >  
> >  /**
> > - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> > - *
> > - * Return the logical active state of a mapped interrupt. This doesn't
> > - * necessarily reflects the current HW state.
> > - */
> > -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> > -{
> > -	BUG_ON(!map);
> > -	return map->active;
> > -}
> > -
> > -/**
> > - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> > - *
> > - * Set the logical active state of a mapped interrupt. This doesn't
> > - * immediately affects the HW state.
> > - */
> > -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> > -{
> > -	BUG_ON(!map);
> > -	map->active = active;
> > -}
> > -
> > -/**
> >   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
> >   * @vcpu: The VCPU pointer
> >   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> > @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
> >  			if (i < VGIC_NR_SGIS)
> >  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
> >  							vcpu->vcpu_id, i, 1);
> > -			if (i < VGIC_NR_PRIVATE_IRQS)
> > +			if (i < VGIC_NR_SGIS)
> >  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> >  							vcpu->vcpu_id, i,
> >  							VGIC_CFG_EDGE);
> > +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> > +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> > +							vcpu->vcpu_id, i,
> > +							VGIC_CFG_LEVEL);
> nit: use the same if block for enable & cfg?

sure.

Thanks,
-Christoffer

> >  		}
> >  
> >  		vgic_enable(vcpu);
> > 
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-04 19:40   ` Christoffer Dall
@ 2015-09-14 15:51     ` Andre Przywara
  -1 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-14 15:51 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

Hi Christoffer,

just one small nit I stumbled upon:

On 04/09/15 20:40, Christoffer Dall wrote:
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool phys_active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;

This should read "return false;" now.

Cheers,
Andre.

>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &phys_active);
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> +	if (phys_active) {
> +		/*
> +		 * Interrupt still marked as active on the physical
> +		 * distributor, so guest did not EOI it yet.  Reset to
> +		 * non-active so that other VMs can see interrupts from this
> +		 * device.
> +		 */
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
>  		WARN_ON(ret);
> -		return 0;
> +		return false;
>  	}
>  
> -	return 1;
> +	/* Mapped edge-triggered interrupts not yet supported. */
> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> +	return process_level_irq(vcpu, lr, vlr);
>  }
>  
>  /* Sync back the VGIC state after a guest run */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-14 15:51     ` Andre Przywara
  0 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-14 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

just one small nit I stumbled upon:

On 04/09/15 20:40, Christoffer Dall wrote:
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool phys_active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;

This should read "return false;" now.

Cheers,
Andre.

>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &phys_active);
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> +	if (phys_active) {
> +		/*
> +		 * Interrupt still marked as active on the physical
> +		 * distributor, so guest did not EOI it yet.  Reset to
> +		 * non-active so that other VMs can see interrupts from this
> +		 * device.
> +		 */
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
>  		WARN_ON(ret);
> -		return 0;
> +		return false;
>  	}
>  
> -	return 1;
> +	/* Mapped edge-triggered interrupts not yet supported. */
> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> +	return process_level_irq(vcpu, lr, vlr);
>  }
>  
>  /* Sync back the VGIC state after a guest run */

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-14 11:42               ` Christoffer Dall
@ 2015-09-15 15:16                 ` Andre Przywara
  -1 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-15 15:16 UTC (permalink / raw)
  To: Christoffer Dall, Eric Auger; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm

Hi Christoffer,

On 14/09/15 12:42, Christoffer Dall wrote:
....
>>>> Where is this done? I see that the physical dist state is altered on the
>>>> actual IRQ forwarding, but not on later exits/entries? Do you mean
>>>> kvm_vgic_flush_hwstate() with "flush"?
>>>
>>> this is a bug and should be fixed in the 'fixes' patches I sent last
>>> week.  We should set active state on every entry to the guest for IRQs
>>> with the HW bit set in either pending or active state.
>>
>> OK, sorry, I missed that one patch, I was looking at what should become
>> -rc1 soon (because that's what I want to rebase my ITS emulation patches
>> on). That patch wasn't in queue at the time I started looking at it.
>>
>> So I updated to the latest queue containing those two fixes and also
>> applied your v2 series. Indeed this series addresses some of the things
>> I was wondering about the last time, but the main thing still persists:
>> - Every time the physical dist state is active we have the virtual state
>> still at pending or active.
> 
> For the arch timer, yes.
> 
> For a passthrough device, there should be a situation where the physical
> dist state is active but we didn't see the virtual state updated at the
> vgic yet (after physical IRQ fires and before the VFIO ISR calls
> kvm_set_irq).

But then we wouldn't get into vgic_sync_hwirq(), because we wouldn't
inject a mapped IRQ before kvm_set_irq() is called, would we?

>> - If the physical dist state is non-active, the virtual state is
>> inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
>> (LR empty).
>> (I was tracing every HW mapped LR in vgic_sync_hwirq() for this)
>>
>> So that contradicts:
>>
>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>> +    but the LR.Active is left untouched (set).
>>
>> This is the main point I was actually wondering about: I cannot confirm
>> this statement. In my tests the LR state and the physical dist state
>> always correspond, as excepted by reading the spec.
>>
>> I reckon that these observations are mostly independent from the actual
>> KVM code, as I try to observe hardware state (physical distributor and
>> LRs) before KVM tinkers with them.
> 
> ok, I got this paragraph from Marc, so we really need to ask him?  Which
> hardware are you seeing this behavior on?  Perhaps implementations vary
> on this point?

I checked this on Midway and Juno. Both have a GIC-400, but I don't have
access to any other GIC implementations.
I added the two BUG_ONs shown below to prove that assumption.

Eric, I've been told you observed the behaviour with the GIC not syncing
LR and phys state for a mapped HWIRQ which was not the timer.
Can you reproduce this? Does it complain with the patch below?

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 5942ce9..7fac16e 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1459,9 +1459,12 @@ static bool vgic_sync_hwirq(struct kvm_vcpu
 					     IRQCHIP_STATE_ACTIVE,
 					     false);
 		WARN_ON(ret);
+		BUG_ON(!(vlr.state & 3));
 		return false;
 	}

+	BUG_ON(vlr.state & 3);
+
 	return process_queued_irq(vcpu, lr, vlr);
 }

> 
> I have no objections removing this point from the doc though, I'm just
> relaying information on this one.

I see, I talked with Marc and I am about to gather more data with the
above patch to prove that this never happens.

>>
>> ...
>>
>>>
>>>> Is this an observation, an implementation bug or is this mentioned in
>>>> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
>>>> awkward to me.
>>>
>>> What do you mean?  How are we spoon-feeding the VGIC?
>>
>> By looking at the physical dist state and all LRs and clearing the LR we
>> do what the GIC is actually supposed to do for us - and what it actually
>> does according to my observations.
>>
>> The point is that patch 1 in my ITS emulation series is reworking the LR
>> handling and this patch was based on assumptions that seem to be no
>> longer true (i.e. we don't care about inactive LRs except for our LR
>> mapping code). So I want to be sure that I fully get what is going on
>> here and I struggle at this at the moment due to the above statement.
>>
>> What are the plans regarding your "v2: Rework architected timer..."
>> series? Will this be queued for 4.4? I want to do the
>> rebasing^Wrewriting of my series only once if possible ;-)
>>
> I think we should settle on this series ASAP and base your ITS stuff on
> top of it.  What do you think?

Yeah, that's what I was thinking too. So I will be working against
4.3-rc1 with your timer-rework-v2 branch plus the other fixes from the
kvm-arm queue merged.

Cheers,
Andre.

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-15 15:16                 ` Andre Przywara
  0 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-15 15:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 14/09/15 12:42, Christoffer Dall wrote:
....
>>>> Where is this done? I see that the physical dist state is altered on the
>>>> actual IRQ forwarding, but not on later exits/entries? Do you mean
>>>> kvm_vgic_flush_hwstate() with "flush"?
>>>
>>> this is a bug and should be fixed in the 'fixes' patches I sent last
>>> week.  We should set active state on every entry to the guest for IRQs
>>> with the HW bit set in either pending or active state.
>>
>> OK, sorry, I missed that one patch, I was looking at what should become
>> -rc1 soon (because that's what I want to rebase my ITS emulation patches
>> on). That patch wasn't in queue at the time I started looking at it.
>>
>> So I updated to the latest queue containing those two fixes and also
>> applied your v2 series. Indeed this series addresses some of the things
>> I was wondering about the last time, but the main thing still persists:
>> - Every time the physical dist state is active we have the virtual state
>> still at pending or active.
> 
> For the arch timer, yes.
> 
> For a passthrough device, there should be a situation where the physical
> dist state is active but we didn't see the virtual state updated at the
> vgic yet (after physical IRQ fires and before the VFIO ISR calls
> kvm_set_irq).

But then we wouldn't get into vgic_sync_hwirq(), because we wouldn't
inject a mapped IRQ before kvm_set_irq() is called, would we?

>> - If the physical dist state is non-active, the virtual state is
>> inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
>> (LR empty).
>> (I was tracing every HW mapped LR in vgic_sync_hwirq() for this)
>>
>> So that contradicts:
>>
>> +  - On guest EOI, the *physical distributor* active bit gets cleared,
>> +    but the LR.Active is left untouched (set).
>>
>> This is the main point I was actually wondering about: I cannot confirm
>> this statement. In my tests the LR state and the physical dist state
>> always correspond, as excepted by reading the spec.
>>
>> I reckon that these observations are mostly independent from the actual
>> KVM code, as I try to observe hardware state (physical distributor and
>> LRs) before KVM tinkers with them.
> 
> ok, I got this paragraph from Marc, so we really need to ask him?  Which
> hardware are you seeing this behavior on?  Perhaps implementations vary
> on this point?

I checked this on Midway and Juno. Both have a GIC-400, but I don't have
access to any other GIC implementations.
I added the two BUG_ONs shown below to prove that assumption.

Eric, I've been told you observed the behaviour with the GIC not syncing
LR and phys state for a mapped HWIRQ which was not the timer.
Can you reproduce this? Does it complain with the patch below?

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 5942ce9..7fac16e 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1459,9 +1459,12 @@ static bool vgic_sync_hwirq(struct kvm_vcpu
 					     IRQCHIP_STATE_ACTIVE,
 					     false);
 		WARN_ON(ret);
+		BUG_ON(!(vlr.state & 3));
 		return false;
 	}

+	BUG_ON(vlr.state & 3);
+
 	return process_queued_irq(vcpu, lr, vlr);
 }

> 
> I have no objections removing this point from the doc though, I'm just
> relaying information on this one.

I see, I talked with Marc and I am about to gather more data with the
above patch to prove that this never happens.

>>
>> ...
>>
>>>
>>>> Is this an observation, an implementation bug or is this mentioned in
>>>> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
>>>> awkward to me.
>>>
>>> What do you mean?  How are we spoon-feeding the VGIC?
>>
>> By looking at the physical dist state and all LRs and clearing the LR we
>> do what the GIC is actually supposed to do for us - and what it actually
>> does according to my observations.
>>
>> The point is that patch 1 in my ITS emulation series is reworking the LR
>> handling and this patch was based on assumptions that seem to be no
>> longer true (i.e. we don't care about inactive LRs except for our LR
>> mapping code). So I want to be sure that I fully get what is going on
>> here and I struggle at this at the moment due to the above statement.
>>
>> What are the plans regarding your "v2: Rework architected timer..."
>> series? Will this be queued for 4.4? I want to do the
>> rebasing^Wrewriting of my series only once if possible ;-)
>>
> I think we should settle on this series ASAP and base your ITS stuff on
> top of it.  What do you think?

Yeah, that's what I was thinking too. So I will be working against
4.3-rc1 with your timer-rework-v2 branch plus the other fixes from the
kvm-arm queue merged.

Cheers,
Andre.

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
  2015-09-15 15:16                 ` Andre Przywara
@ 2015-09-15 19:09                   ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-15 19:09 UTC (permalink / raw)
  To: Andre Przywara; +Cc: Eric Auger, Marc Zyngier, kvmarm, linux-arm-kernel, kvm

On Tue, Sep 15, 2015 at 04:16:07PM +0100, Andre Przywara wrote:
> Hi Christoffer,
> 
> On 14/09/15 12:42, Christoffer Dall wrote:
> ....
> >>>> Where is this done? I see that the physical dist state is altered on the
> >>>> actual IRQ forwarding, but not on later exits/entries? Do you mean
> >>>> kvm_vgic_flush_hwstate() with "flush"?
> >>>
> >>> this is a bug and should be fixed in the 'fixes' patches I sent last
> >>> week.  We should set active state on every entry to the guest for IRQs
> >>> with the HW bit set in either pending or active state.
> >>
> >> OK, sorry, I missed that one patch, I was looking at what should become
> >> -rc1 soon (because that's what I want to rebase my ITS emulation patches
> >> on). That patch wasn't in queue at the time I started looking at it.
> >>
> >> So I updated to the latest queue containing those two fixes and also
> >> applied your v2 series. Indeed this series addresses some of the things
> >> I was wondering about the last time, but the main thing still persists:
> >> - Every time the physical dist state is active we have the virtual state
> >> still at pending or active.
> > 
> > For the arch timer, yes.
> > 
> > For a passthrough device, there should be a situation where the physical
> > dist state is active but we didn't see the virtual state updated at the
> > vgic yet (after physical IRQ fires and before the VFIO ISR calls
> > kvm_set_irq).
> 
> But then we wouldn't get into vgic_sync_hwirq(), because we wouldn't
> inject a mapped IRQ before kvm_set_irq() is called, would we?

Ah, you meant, if we are in vgic_sync_hwirq() and the dist state is
active, then we have the virtual state still at pending or active?

That's a slightly different question from what you posed above.

I haven't thought extremely carefully about it, but could you not have
(1) guest deactivates (2) physical interrupt is handled on different CPU
on host for passthrough device (3) VFIO ISR leaves the IRQ active (3)
guest exits and you now hit vgic_sync_hwirq() and the virtual interrupt
is now inactive but the physical interrupt is active?

> 
> >> - If the physical dist state is non-active, the virtual state is
> >> inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
> >> (LR empty).
> >> (I was tracing every HW mapped LR in vgic_sync_hwirq() for this)
> >>
> >> So that contradicts:
> >>
> >> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> >> +    but the LR.Active is left untouched (set).
> >>
> >> This is the main point I was actually wondering about: I cannot confirm
> >> this statement. In my tests the LR state and the physical dist state
> >> always correspond, as excepted by reading the spec.
> >>
> >> I reckon that these observations are mostly independent from the actual
> >> KVM code, as I try to observe hardware state (physical distributor and
> >> LRs) before KVM tinkers with them.
> > 
> > ok, I got this paragraph from Marc, so we really need to ask him?  Which
> > hardware are you seeing this behavior on?  Perhaps implementations vary
> > on this point?
> 
> I checked this on Midway and Juno. Both have a GIC-400, but I don't have
> access to any other GIC implementations.
> I added the two BUG_ONs shown below to prove that assumption.
> 
> Eric, I've been told you observed the behaviour with the GIC not syncing
> LR and phys state for a mapped HWIRQ which was not the timer.
> Can you reproduce this? Does it complain with the patch below?
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 5942ce9..7fac16e 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1459,9 +1459,12 @@ static bool vgic_sync_hwirq(struct kvm_vcpu
>  					     IRQCHIP_STATE_ACTIVE,
>  					     false);
>  		WARN_ON(ret);
> +		BUG_ON(!(vlr.state & 3));
>  		return false;
>  	}
> 
> +	BUG_ON(vlr.state & 3);
> +
>  	return process_queued_irq(vcpu, lr, vlr);
>  }
> 
> > 
> > I have no objections removing this point from the doc though, I'm just
> > relaying information on this one.
> 
> I see, I talked with Marc and I am about to gather more data with the
> above patch to prove that this never happens.
> 
> >>
> >> ...
> >>
> >>>
> >>>> Is this an observation, an implementation bug or is this mentioned in
> >>>> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
> >>>> awkward to me.
> >>>
> >>> What do you mean?  How are we spoon-feeding the VGIC?
> >>
> >> By looking at the physical dist state and all LRs and clearing the LR we
> >> do what the GIC is actually supposed to do for us - and what it actually
> >> does according to my observations.
> >>
> >> The point is that patch 1 in my ITS emulation series is reworking the LR
> >> handling and this patch was based on assumptions that seem to be no
> >> longer true (i.e. we don't care about inactive LRs except for our LR
> >> mapping code). So I want to be sure that I fully get what is going on
> >> here and I struggle at this at the moment due to the above statement.
> >>
> >> What are the plans regarding your "v2: Rework architected timer..."
> >> series? Will this be queued for 4.4? I want to do the
> >> rebasing^Wrewriting of my series only once if possible ;-)
> >>
> > I think we should settle on this series ASAP and base your ITS stuff on
> > top of it.  What do you think?
> 
> Yeah, that's what I was thinking too. So I will be working against
> 4.3-rc1 with your timer-rework-v2 branch plus the other fixes from the
> kvm-arm queue merged.
> 
Sounds good!  Thanks.

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation
@ 2015-09-15 19:09                   ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-15 19:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 15, 2015 at 04:16:07PM +0100, Andre Przywara wrote:
> Hi Christoffer,
> 
> On 14/09/15 12:42, Christoffer Dall wrote:
> ....
> >>>> Where is this done? I see that the physical dist state is altered on the
> >>>> actual IRQ forwarding, but not on later exits/entries? Do you mean
> >>>> kvm_vgic_flush_hwstate() with "flush"?
> >>>
> >>> this is a bug and should be fixed in the 'fixes' patches I sent last
> >>> week.  We should set active state on every entry to the guest for IRQs
> >>> with the HW bit set in either pending or active state.
> >>
> >> OK, sorry, I missed that one patch, I was looking at what should become
> >> -rc1 soon (because that's what I want to rebase my ITS emulation patches
> >> on). That patch wasn't in queue at the time I started looking at it.
> >>
> >> So I updated to the latest queue containing those two fixes and also
> >> applied your v2 series. Indeed this series addresses some of the things
> >> I was wondering about the last time, but the main thing still persists:
> >> - Every time the physical dist state is active we have the virtual state
> >> still at pending or active.
> > 
> > For the arch timer, yes.
> > 
> > For a passthrough device, there should be a situation where the physical
> > dist state is active but we didn't see the virtual state updated at the
> > vgic yet (after physical IRQ fires and before the VFIO ISR calls
> > kvm_set_irq).
> 
> But then we wouldn't get into vgic_sync_hwirq(), because we wouldn't
> inject a mapped IRQ before kvm_set_irq() is called, would we?

Ah, you meant, if we are in vgic_sync_hwirq() and the dist state is
active, then we have the virtual state still at pending or active?

That's a slightly different question from what you posed above.

I haven't thought extremely carefully about it, but could you not have
(1) guest deactivates (2) physical interrupt is handled on different CPU
on host for passthrough device (3) VFIO ISR leaves the IRQ active (3)
guest exits and you now hit vgic_sync_hwirq() and the virtual interrupt
is now inactive but the physical interrupt is active?

> 
> >> - If the physical dist state is non-active, the virtual state is
> >> inactive (LR.state==8: HW bit) as well. The associated ELRSR bit is 1
> >> (LR empty).
> >> (I was tracing every HW mapped LR in vgic_sync_hwirq() for this)
> >>
> >> So that contradicts:
> >>
> >> +  - On guest EOI, the *physical distributor* active bit gets cleared,
> >> +    but the LR.Active is left untouched (set).
> >>
> >> This is the main point I was actually wondering about: I cannot confirm
> >> this statement. In my tests the LR state and the physical dist state
> >> always correspond, as excepted by reading the spec.
> >>
> >> I reckon that these observations are mostly independent from the actual
> >> KVM code, as I try to observe hardware state (physical distributor and
> >> LRs) before KVM tinkers with them.
> > 
> > ok, I got this paragraph from Marc, so we really need to ask him?  Which
> > hardware are you seeing this behavior on?  Perhaps implementations vary
> > on this point?
> 
> I checked this on Midway and Juno. Both have a GIC-400, but I don't have
> access to any other GIC implementations.
> I added the two BUG_ONs shown below to prove that assumption.
> 
> Eric, I've been told you observed the behaviour with the GIC not syncing
> LR and phys state for a mapped HWIRQ which was not the timer.
> Can you reproduce this? Does it complain with the patch below?
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 5942ce9..7fac16e 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1459,9 +1459,12 @@ static bool vgic_sync_hwirq(struct kvm_vcpu
>  					     IRQCHIP_STATE_ACTIVE,
>  					     false);
>  		WARN_ON(ret);
> +		BUG_ON(!(vlr.state & 3));
>  		return false;
>  	}
> 
> +	BUG_ON(vlr.state & 3);
> +
>  	return process_queued_irq(vcpu, lr, vlr);
>  }
> 
> > 
> > I have no objections removing this point from the doc though, I'm just
> > relaying information on this one.
> 
> I see, I talked with Marc and I am about to gather more data with the
> above patch to prove that this never happens.
> 
> >>
> >> ...
> >>
> >>>
> >>>> Is this an observation, an implementation bug or is this mentioned in
> >>>> the spec? Needing to spoon-feed the VGIC by doing it's job sounds a bit
> >>>> awkward to me.
> >>>
> >>> What do you mean?  How are we spoon-feeding the VGIC?
> >>
> >> By looking at the physical dist state and all LRs and clearing the LR we
> >> do what the GIC is actually supposed to do for us - and what it actually
> >> does according to my observations.
> >>
> >> The point is that patch 1 in my ITS emulation series is reworking the LR
> >> handling and this patch was based on assumptions that seem to be no
> >> longer true (i.e. we don't care about inactive LRs except for our LR
> >> mapping code). So I want to be sure that I fully get what is going on
> >> here and I struggle at this at the moment due to the above statement.
> >>
> >> What are the plans regarding your "v2: Rework architected timer..."
> >> series? Will this be queued for 4.4? I want to do the
> >> rebasing^Wrewriting of my series only once if possible ;-)
> >>
> > I think we should settle on this series ASAP and base your ITS stuff on
> > top of it.  What do you think?
> 
> Yeah, that's what I was thinking too. So I will be working against
> 4.3-rc1 with your timer-rework-v2 branch plus the other fixes from the
> kvm-arm queue merged.
> 
Sounds good!  Thanks.

-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-04 19:40   ` Christoffer Dall
@ 2015-09-23 17:44     ` Andre Przywara
  -1 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-23 17:44 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel; +Cc: Marc Zyngier, kvm

Hi Christoffer,

> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool phys_active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;
>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &phys_active);
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> +	if (phys_active) {
> +		/*
> +		 * Interrupt still marked as active on the physical
> +		 * distributor, so guest did not EOI it yet.  Reset to
> +		 * non-active so that other VMs can see interrupts from this
> +		 * device.
> +		 */
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
>  		WARN_ON(ret);
> -		return 0;
> +		return false;
>  	}
>  
> -	return 1;
> +	/* Mapped edge-triggered interrupts not yet supported. */
> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> +	return process_level_irq(vcpu, lr, vlr);

Don't you miss the dist->lock here? The other call to
process_level_irq() certainly does it, and Eric recently removed the
coarse grained lock around the whole __kvm_vgic_sync_hwstate() function.
So we don't hold the lock here, but we change quite some common VGIC
state in there.

Cheers.
Andre.

>  }
>  
>  /* Sync back the VGIC state after a guest run */
> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> -		if (vgic_sync_hwirq(vcpu, vlr)) {
> -			/*
> -			 * So this is a HW interrupt that the guest
> -			 * EOI-ed. Clean the LR state and allow the
> -			 * interrupt to be sampled again.
> -			 */
> -			vlr.state = 0;
> -			vlr.hwirq = 0;
> -			vgic_set_lr(vcpu, lr, vlr);
> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> -			set_bit(lr, elrsr_ptr);
> -		}
> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> +			level_pending = true;
>  
>  		if (!test_bit(lr, elrsr_ptr))
>  			continue;
> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
>  }
>  
>  /**
> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> - *
> - * Return the logical active state of a mapped interrupt. This doesn't
> - * necessarily reflects the current HW state.
> - */
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> -{
> -	BUG_ON(!map);
> -	return map->active;
> -}
> -
> -/**
> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> - *
> - * Set the logical active state of a mapped interrupt. This doesn't
> - * immediately affects the HW state.
> - */
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> -{
> -	BUG_ON(!map);
> -	map->active = active;
> -}
> -
> -/**
>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>   * @vcpu: The VCPU pointer
>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>  			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>  							vcpu->vcpu_id, i, 1);
> -			if (i < VGIC_NR_PRIVATE_IRQS)
> +			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>  							vcpu->vcpu_id, i,
>  							VGIC_CFG_EDGE);
> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> +							vcpu->vcpu_id, i,
> +							VGIC_CFG_LEVEL);
>  		}
>  
>  		vgic_enable(vcpu);
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-23 17:44     ` Andre Przywara
  0 siblings, 0 replies; 64+ messages in thread
From: Andre Przywara @ 2015-09-23 17:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool phys_active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;
>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &phys_active);
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> +	if (phys_active) {
> +		/*
> +		 * Interrupt still marked as active on the physical
> +		 * distributor, so guest did not EOI it yet.  Reset to
> +		 * non-active so that other VMs can see interrupts from this
> +		 * device.
> +		 */
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
>  		WARN_ON(ret);
> -		return 0;
> +		return false;
>  	}
>  
> -	return 1;
> +	/* Mapped edge-triggered interrupts not yet supported. */
> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> +	return process_level_irq(vcpu, lr, vlr);

Don't you miss the dist->lock here? The other call to
process_level_irq() certainly does it, and Eric recently removed the
coarse grained lock around the whole __kvm_vgic_sync_hwstate() function.
So we don't hold the lock here, but we change quite some common VGIC
state in there.

Cheers.
Andre.

>  }
>  
>  /* Sync back the VGIC state after a guest run */
> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> -		if (vgic_sync_hwirq(vcpu, vlr)) {
> -			/*
> -			 * So this is a HW interrupt that the guest
> -			 * EOI-ed. Clean the LR state and allow the
> -			 * interrupt to be sampled again.
> -			 */
> -			vlr.state = 0;
> -			vlr.hwirq = 0;
> -			vgic_set_lr(vcpu, lr, vlr);
> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> -			set_bit(lr, elrsr_ptr);
> -		}
> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> +			level_pending = true;
>  
>  		if (!test_bit(lr, elrsr_ptr))
>  			continue;
> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
>  }
>  
>  /**
> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> - *
> - * Return the logical active state of a mapped interrupt. This doesn't
> - * necessarily reflects the current HW state.
> - */
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> -{
> -	BUG_ON(!map);
> -	return map->active;
> -}
> -
> -/**
> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> - *
> - * Set the logical active state of a mapped interrupt. This doesn't
> - * immediately affects the HW state.
> - */
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> -{
> -	BUG_ON(!map);
> -	map->active = active;
> -}
> -
> -/**
>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>   * @vcpu: The VCPU pointer
>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>  			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>  							vcpu->vcpu_id, i, 1);
> -			if (i < VGIC_NR_PRIVATE_IRQS)
> +			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>  							vcpu->vcpu_id, i,
>  							VGIC_CFG_EDGE);
> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> +							vcpu->vcpu_id, i,
> +							VGIC_CFG_LEVEL);
>  		}
>  
>  		vgic_enable(vcpu);
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-23 17:44     ` Andre Przywara
@ 2015-09-29 14:30       ` Christoffer Dall
  -1 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-29 14:30 UTC (permalink / raw)
  To: Andre Przywara; +Cc: kvmarm, linux-arm-kernel, Marc Zyngier, kvm

On Wed, Sep 23, 2015 at 06:44:21PM +0100, Andre Przywara wrote:
> Hi Christoffer,
> 
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 9ed8d53..f4ea950 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  /*
> >   * Save the physical active state, and reset it to inactive.
> >   *
> > - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> > + * Return true if there's a pending level triggered interrupt line to queue.
> >   */
> > -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> > +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> >  {
> >  	struct irq_phys_map *map;
> > +	bool phys_active;
> >  	int ret;
> >  
> >  	if (!(vlr.state & LR_HW))
> >  		return 0;
> >  
> >  	map = vgic_irq_map_search(vcpu, vlr.irq);
> > -	BUG_ON(!map || !map->active);
> > +	BUG_ON(!map);
> >  
> >  	ret = irq_get_irqchip_state(map->irq,
> >  				    IRQCHIP_STATE_ACTIVE,
> > -				    &map->active);
> > +				    &phys_active);
> >  
> >  	WARN_ON(ret);
> >  
> > -	if (map->active) {
> > +	if (phys_active) {
> > +		/*
> > +		 * Interrupt still marked as active on the physical
> > +		 * distributor, so guest did not EOI it yet.  Reset to
> > +		 * non-active so that other VMs can see interrupts from this
> > +		 * device.
> > +		 */
> >  		ret = irq_set_irqchip_state(map->irq,
> >  					    IRQCHIP_STATE_ACTIVE,
> >  					    false);
> >  		WARN_ON(ret);
> > -		return 0;
> > +		return false;
> >  	}
> >  
> > -	return 1;
> > +	/* Mapped edge-triggered interrupts not yet supported. */
> > +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> > +	return process_level_irq(vcpu, lr, vlr);
> 
> Don't you miss the dist->lock here? The other call to
> process_level_irq() certainly does it, and Eric recently removed the
> coarse grained lock around the whole __kvm_vgic_sync_hwstate() function.
> So we don't hold the lock here, but we change quite some common VGIC
> state in there.
> 

Indeed I think we should.

I'll fix that for the next revision.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-29 14:30       ` Christoffer Dall
  0 siblings, 0 replies; 64+ messages in thread
From: Christoffer Dall @ 2015-09-29 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 23, 2015 at 06:44:21PM +0100, Andre Przywara wrote:
> Hi Christoffer,
> 
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 9ed8d53..f4ea950 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  /*
> >   * Save the physical active state, and reset it to inactive.
> >   *
> > - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> > + * Return true if there's a pending level triggered interrupt line to queue.
> >   */
> > -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> > +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> >  {
> >  	struct irq_phys_map *map;
> > +	bool phys_active;
> >  	int ret;
> >  
> >  	if (!(vlr.state & LR_HW))
> >  		return 0;
> >  
> >  	map = vgic_irq_map_search(vcpu, vlr.irq);
> > -	BUG_ON(!map || !map->active);
> > +	BUG_ON(!map);
> >  
> >  	ret = irq_get_irqchip_state(map->irq,
> >  				    IRQCHIP_STATE_ACTIVE,
> > -				    &map->active);
> > +				    &phys_active);
> >  
> >  	WARN_ON(ret);
> >  
> > -	if (map->active) {
> > +	if (phys_active) {
> > +		/*
> > +		 * Interrupt still marked as active on the physical
> > +		 * distributor, so guest did not EOI it yet.  Reset to
> > +		 * non-active so that other VMs can see interrupts from this
> > +		 * device.
> > +		 */
> >  		ret = irq_set_irqchip_state(map->irq,
> >  					    IRQCHIP_STATE_ACTIVE,
> >  					    false);
> >  		WARN_ON(ret);
> > -		return 0;
> > +		return false;
> >  	}
> >  
> > -	return 1;
> > +	/* Mapped edge-triggered interrupts not yet supported. */
> > +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> > +	return process_level_irq(vcpu, lr, vlr);
> 
> Don't you miss the dist->lock here? The other call to
> process_level_irq() certainly does it, and Eric recently removed the
> coarse grained lock around the whole __kvm_vgic_sync_hwstate() function.
> So we don't hold the lock here, but we change quite some common VGIC
> state in there.
> 

Indeed I think we should.

I'll fix that for the next revision.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2015-09-29 14:30 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-04 19:40 [PATCH v2 0/8] Rework architected timer and forwarded IRQs handling Christoffer Dall
2015-09-04 19:40 ` Christoffer Dall
2015-09-04 19:40 ` [PATCH v2 1/8] KVM: Add kvm_arch_vcpu_{un}blocking callbacks Christoffer Dall
2015-09-04 19:40   ` Christoffer Dall
2015-09-04 19:40 ` [PATCH v2 2/8] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block Christoffer Dall
2015-09-04 19:40   ` Christoffer Dall
2015-09-07 15:01   ` Eric Auger
2015-09-07 15:01     ` Eric Auger
2015-09-13 15:56     ` Christoffer Dall
2015-09-13 15:56       ` Christoffer Dall
2015-09-04 19:40 ` [PATCH v2 3/8] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit Christoffer Dall
2015-09-04 19:40   ` Christoffer Dall
2015-09-07 15:32   ` Eric Auger
2015-09-07 15:32     ` Eric Auger
2015-09-14 11:31     ` Christoffer Dall
2015-09-14 11:31       ` Christoffer Dall
2015-09-04 19:40 ` [PATCH v2 4/8] arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs Christoffer Dall
2015-09-04 19:40   ` Christoffer Dall
2015-09-04 19:40 ` [PATCH v2 5/8] arm/arm64: KVM: Use appropriate define in VGIC reset code Christoffer Dall
2015-09-04 19:40   ` Christoffer Dall
2015-09-04 19:40 ` [PATCH v2 6/8] arm/arm64: KVM: Add forwarded physical interrupts documentation Christoffer Dall
2015-09-04 19:40   ` Christoffer Dall
2015-09-07 11:25   ` Andre Przywara
2015-09-07 11:25     ` Andre Przywara
2015-09-08  8:43     ` Eric Auger
2015-09-08  8:43       ` Eric Auger
2015-09-08 16:57       ` Andre Przywara
2015-09-08 16:57         ` Andre Przywara
2015-09-09  8:49         ` Christoffer Dall
2015-09-09  8:49           ` Christoffer Dall
2015-09-09  8:57           ` Eric Auger
2015-09-09  8:57             ` Eric Auger
2015-09-11 11:21           ` Andre Przywara
2015-09-11 11:21             ` Andre Przywara
2015-09-14 11:42             ` Christoffer Dall
2015-09-14 11:42               ` Christoffer Dall
2015-09-15 15:16               ` Andre Przywara
2015-09-15 15:16                 ` Andre Przywara
2015-09-15 19:09                 ` Christoffer Dall
2015-09-15 19:09                   ` Christoffer Dall
2015-09-08 14:18     ` Christoffer Dall
2015-09-08 14:18       ` Christoffer Dall
2015-09-07 16:45   ` Eric Auger
2015-09-07 16:45     ` Eric Auger
2015-09-07 17:50     ` Marc Zyngier
2015-09-07 17:50       ` Marc Zyngier
2015-09-08  7:44       ` Eric Auger
2015-09-08  7:44         ` Eric Auger
2015-09-14 11:46     ` Christoffer Dall
2015-09-14 11:46       ` Christoffer Dall
2015-09-04 19:40 ` [PATCH v2 7/8] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics Christoffer Dall
2015-09-04 19:40   ` Christoffer Dall
2015-09-14  9:29   ` Eric Auger
2015-09-14  9:29     ` Eric Auger
2015-09-14 11:48     ` Christoffer Dall
2015-09-14 11:48       ` Christoffer Dall
2015-09-14 15:51   ` Andre Przywara
2015-09-14 15:51     ` Andre Przywara
2015-09-23 17:44   ` Andre Przywara
2015-09-23 17:44     ` Andre Przywara
2015-09-29 14:30     ` Christoffer Dall
2015-09-29 14:30       ` Christoffer Dall
2015-09-04 19:40 ` [PATCH v2 8/8] arm/arm64: KVM: Support edge-triggered forwarded interrupts Christoffer Dall
2015-09-04 19:40   ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.