All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Rework architected timer and fix UEFI reset
@ 2015-08-30 13:54 ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

The architected timer integration with the vgic had some shortcomings in
that certain guests (one being UEFI) weren't fully supported.

In fixing this I also found that we are scheduling the hrtimer for the
virtual timer way too often, with a potential performance overhead.

This series tries to address these problems in proviging level-triggered
semantics for the arch timer and vgic intergration and seeks to clarify
the behavior when setting/clearing the active state on the physical
distributor.

Series based on kvmarm/next and also available at:
https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git timer-rework

Christoffer Dall (9):
  KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  arm/arm64: Implement GICD_ICFGR as RO for PPIs
  arm/arm64: KVM: Use appropriate define in VGIC reset code
  arm/arm64: KVM: Add mapped interrupts documentation
  arm/arm64: KVM: vgic: Move active state handling to flush_hwstate
  arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0

 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt |  59 ++++++
 arch/arm/kvm/arm.c                                 |  21 ++-
 arch/mips/include/asm/kvm_host.h                   |   2 +
 arch/powerpc/include/asm/kvm_host.h                |   2 +
 arch/s390/include/asm/kvm_host.h                   |   2 +
 arch/x86/include/asm/kvm_host.h                    |   3 +
 include/kvm/arm_arch_timer.h                       |   4 +-
 include/kvm/arm_vgic.h                             |   3 -
 include/linux/kvm_host.h                           |   2 +
 virt/kvm/arm/arch_timer.c                          | 160 +++++++++++-----
 virt/kvm/arm/vgic.c                                | 201 +++++++++++----------
 virt/kvm/kvm_main.c                                |   3 +
 12 files changed, 308 insertions(+), 154 deletions(-)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 0/9] Rework architected timer and fix UEFI reset
@ 2015-08-30 13:54 ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

The architected timer integration with the vgic had some shortcomings in
that certain guests (one being UEFI) weren't fully supported.

In fixing this I also found that we are scheduling the hrtimer for the
virtual timer way too often, with a potential performance overhead.

This series tries to address these problems in proviging level-triggered
semantics for the arch timer and vgic intergration and seeks to clarify
the behavior when setting/clearing the active state on the physical
distributor.

Series based on kvmarm/next and also available at:
https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git timer-rework

Christoffer Dall (9):
  KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  arm/arm64: Implement GICD_ICFGR as RO for PPIs
  arm/arm64: KVM: Use appropriate define in VGIC reset code
  arm/arm64: KVM: Add mapped interrupts documentation
  arm/arm64: KVM: vgic: Move active state handling to flush_hwstate
  arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0

 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt |  59 ++++++
 arch/arm/kvm/arm.c                                 |  21 ++-
 arch/mips/include/asm/kvm_host.h                   |   2 +
 arch/powerpc/include/asm/kvm_host.h                |   2 +
 arch/s390/include/asm/kvm_host.h                   |   2 +
 arch/x86/include/asm/kvm_host.h                    |   3 +
 include/kvm/arm_arch_timer.h                       |   4 +-
 include/kvm/arm_vgic.h                             |   3 -
 include/linux/kvm_host.h                           |   2 +
 virt/kvm/arm/arch_timer.c                          | 160 +++++++++++-----
 virt/kvm/arm/vgic.c                                | 201 +++++++++++----------
 virt/kvm/kvm_main.c                                |   3 +
 12 files changed, 308 insertions(+), 154 deletions(-)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h     | 3 +++
 arch/arm64/include/asm/kvm_host.h   | 3 +++
 arch/mips/include/asm/kvm_host.h    | 2 ++
 arch/powerpc/include/asm/kvm_host.h | 2 ++
 arch/s390/include/asm/kvm_host.h    | 2 ++
 arch/x86/include/asm/kvm_host.h     | 3 +++
 include/linux/kvm_host.h            | 2 ++
 virt/kvm/kvm_main.c                 | 3 +++
 8 files changed, 20 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..86fcf6e 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,4 +236,7 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 415938d..dd143f5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -257,4 +257,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index e8c8d9d..58f0f4d 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -845,5 +845,7 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..179f9a7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -702,5 +702,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_exit(void) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3024acb..04a97df 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -640,5 +640,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2a7f5d7..26c4086 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1202,4 +1202,7 @@ int __x86_set_memory_region(struct kvm *kvm,
 int x86_set_memory_region(struct kvm *kvm,
 			  const struct kvm_userspace_memory_region *mem);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9564fd7..87d7be6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -619,6 +619,8 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
 void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..04b59dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1946,6 +1946,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		} while (single_task_running() && ktime_before(cur, stop));
 	}
 
+	kvm_arch_vcpu_blocking(vcpu);
+
 	for (;;) {
 		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
@@ -1959,6 +1961,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	finish_wait(&vcpu->wq, &wait);
 	cur = ktime_get();
 
+	kvm_arch_vcpu_unblocking(vcpu);
 out:
 	trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
 }
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h     | 3 +++
 arch/arm64/include/asm/kvm_host.h   | 3 +++
 arch/mips/include/asm/kvm_host.h    | 2 ++
 arch/powerpc/include/asm/kvm_host.h | 2 ++
 arch/s390/include/asm/kvm_host.h    | 2 ++
 arch/x86/include/asm/kvm_host.h     | 3 +++
 include/linux/kvm_host.h            | 2 ++
 virt/kvm/kvm_main.c                 | 3 +++
 8 files changed, 20 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index dcba0fa..86fcf6e 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,4 +236,7 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 415938d..dd143f5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -257,4 +257,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index e8c8d9d..58f0f4d 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -845,5 +845,7 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __MIPS_KVM_HOST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index d91f65b..179f9a7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -702,5 +702,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_exit(void) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif /* __POWERPC_KVM_HOST_H__ */
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3024acb..04a97df 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -640,5 +640,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
 static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
 static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
 		struct kvm_memory_slot *slot) {}
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
 
 #endif
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2a7f5d7..26c4086 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1202,4 +1202,7 @@ int __x86_set_memory_region(struct kvm *kvm,
 int x86_set_memory_region(struct kvm *kvm,
 			  const struct kvm_userspace_memory_region *mem);
 
+static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9564fd7..87d7be6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -619,6 +619,8 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
 void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 void kvm_vcpu_block(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..04b59dd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1946,6 +1946,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		} while (single_task_running() && ktime_before(cur, stop));
 	}
 
+	kvm_arch_vcpu_blocking(vcpu);
+
 	for (;;) {
 		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
 
@@ -1959,6 +1961,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
 	finish_wait(&vcpu->wq, &wait);
 	cur = ktime_get();
 
+	kvm_arch_vcpu_unblocking(vcpu);
 out:
 	trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
 }
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

We currently schedule a soft timer every time we exit the guest if the
timer did not expire while running the guest.  This is really not
necessary, because the only work we do in the timer work function is to
kick the vcpu.

Kicking the vcpu does two things:
(1) If the vpcu thread is on a waitqueue, make it runnable and remove it
from the waitqueue.
(2) If the vcpu is running on a different physical CPU from the one
doing the kick, it sends a reschedule IPI.

The second case cannot happen, because the soft timer is only ever
scheduled when the vcpu is not running.  The first case is only relevant
when the vcpu thread is on a waitqueue, which is only the case when the
vcpu thread has called kvm_vcpu_block().

Therefore, we only need to make sure a timer is scheduled for
kvm_vcpu_block(), which we do by encapsulating all calls to
kvm_vcpu_block() with kvm_timer_{un}schedule calls.

Additionally, we only schedule a soft timer if the timer is enabled and
unmasked, since it is useless otherwise.

Note that theoretically userspace can use the SET_ONE_REG interface to
change registers that should cause the timer to fire, even if the vcpu
is blocked without a scheduled timer, but this case was not supported
before this patch and we leave it for future work for now.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   |  3 --
 arch/arm/kvm/arm.c                | 10 +++++
 arch/arm64/include/asm/kvm_host.h |  3 --
 include/kvm/arm_arch_timer.h      |  2 +
 virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
 5 files changed, 70 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 86fcf6e..dcba0fa 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce404a5..bdf8871 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 	return kvm_timer_should_fire(vcpu);
 }
 
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	kvm_timer_schedule(vcpu);
+}
+
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	kvm_timer_unschedule(vcpu);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	/* Force users to call KVM_ARM_VCPU_INIT */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dd143f5..415938d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index e1e4d7c..ef14cc1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
+void kvm_timer_schedule(struct kvm_vcpu *vcpu);
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 76e38d2..018f3d6 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
 	return HRTIMER_NORESTART;
 }
 
+static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
+		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
+		!kvm_vgic_get_phys_irq_active(timer->map);
+}
+
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	cycle_t cval, now;
 
-	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
-	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
-	    kvm_vgic_get_phys_irq_active(timer->map))
+	if (!kvm_timer_irq_enabled(vcpu))
 		return false;
 
 	cval = timer->cntv_cval;
@@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
-/**
- * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
- * @vcpu: The vcpu pointer
- *
- * Disarm any pending soft timers, since the world-switch code will write the
- * virtual timer state back to the physical CPU.
+/*
+ * Schedule the background timer before calling kvm_vcpu_block, so that this
+ * thread is removed from its waitqueue and made runnable when there's a timer
+ * interrupt to handle.
  */
-void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
+void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	u64 ns;
+	cycle_t cval, now;
+
+	/*
+	 * No need to schedule a background timer if the guest timer has
+	 * already expired, because kvm_vcpu_block will return before putting
+	 * the thread to sleep.
+	 */
+	if (kvm_timer_should_fire(vcpu))
+		return;
 
 	/*
-	 * We're about to run this vcpu again, so there is no need to
-	 * keep the background timer running, as we're about to
-	 * populate the CPU timer again.
+	 * If the timer is either not capable of raising interrupts (disabled
+	 * or masked) or if we already have a background timer, then there's
+	 * no more work for us to do.
 	 */
+	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
+		return;
+
+	/*  The timer has not yet expired, schedule a background timer */
+	cval = timer->cntv_cval;
+	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
+
+	ns = cyclecounter_cyc2ns(timecounter->cc,
+				 cval - now,
+				 timecounter->mask,
+				 &timecounter->frac);
+	timer_arm(timer, ns);
+}
+
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	timer_disarm(timer);
+}
 
+/**
+ * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Check if the virtual timer has expired while we were running in the host,
+ * and inject an interrupt if that was the case.
+ */
+void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
+{
 	/*
 	 * If the timer expired while we were not scheduled, now is the time
 	 * to inject it.
@@ -157,32 +199,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
  * kvm_timer_sync_hwstate - sync timer state from cpu
  * @vcpu: The vcpu pointer
  *
- * Check if the virtual timer was armed and either schedule a corresponding
- * soft timer or inject directly if already expired.
+ * Check if the virtual timer has expired while we werer running in the guest,
+ * and inject an interrupt if that was the case.
  */
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	cycle_t cval, now;
-	u64 ns;
 
 	BUG_ON(timer_is_armed(timer));
 
-	if (kvm_timer_should_fire(vcpu)) {
-		/*
-		 * Timer has already expired while we were not
-		 * looking. Inject the interrupt and carry on.
-		 */
+	if (kvm_timer_should_fire(vcpu))
 		kvm_timer_inject_irq(vcpu);
-		return;
-	}
-
-	cval = timer->cntv_cval;
-	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
-
-	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
-				 &timecounter->frac);
-	timer_arm(timer, ns);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

We currently schedule a soft timer every time we exit the guest if the
timer did not expire while running the guest.  This is really not
necessary, because the only work we do in the timer work function is to
kick the vcpu.

Kicking the vcpu does two things:
(1) If the vpcu thread is on a waitqueue, make it runnable and remove it
from the waitqueue.
(2) If the vcpu is running on a different physical CPU from the one
doing the kick, it sends a reschedule IPI.

The second case cannot happen, because the soft timer is only ever
scheduled when the vcpu is not running.  The first case is only relevant
when the vcpu thread is on a waitqueue, which is only the case when the
vcpu thread has called kvm_vcpu_block().

Therefore, we only need to make sure a timer is scheduled for
kvm_vcpu_block(), which we do by encapsulating all calls to
kvm_vcpu_block() with kvm_timer_{un}schedule calls.

Additionally, we only schedule a soft timer if the timer is enabled and
unmasked, since it is useless otherwise.

Note that theoretically userspace can use the SET_ONE_REG interface to
change registers that should cause the timer to fire, even if the vcpu
is blocked without a scheduled timer, but this case was not supported
before this patch and we leave it for future work for now.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/include/asm/kvm_host.h   |  3 --
 arch/arm/kvm/arm.c                | 10 +++++
 arch/arm64/include/asm/kvm_host.h |  3 --
 include/kvm/arm_arch_timer.h      |  2 +
 virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
 5 files changed, 70 insertions(+), 37 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 86fcf6e..dcba0fa 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index ce404a5..bdf8871 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 	return kvm_timer_should_fire(vcpu);
 }
 
+void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	kvm_timer_schedule(vcpu);
+}
+
+void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
+{
+	kvm_timer_unschedule(vcpu);
+}
+
 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	/* Force users to call KVM_ARM_VCPU_INIT */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dd143f5..415938d 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
 
-static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
-static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
-
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index e1e4d7c..ef14cc1 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
 int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
+void kvm_timer_schedule(struct kvm_vcpu *vcpu);
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
 
 #endif
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 76e38d2..018f3d6 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
 	return HRTIMER_NORESTART;
 }
 
+static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
+		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
+		!kvm_vgic_get_phys_irq_active(timer->map);
+}
+
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	cycle_t cval, now;
 
-	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
-	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
-	    kvm_vgic_get_phys_irq_active(timer->map))
+	if (!kvm_timer_irq_enabled(vcpu))
 		return false;
 
 	cval = timer->cntv_cval;
@@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
-/**
- * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
- * @vcpu: The vcpu pointer
- *
- * Disarm any pending soft timers, since the world-switch code will write the
- * virtual timer state back to the physical CPU.
+/*
+ * Schedule the background timer before calling kvm_vcpu_block, so that this
+ * thread is removed from its waitqueue and made runnable when there's a timer
+ * interrupt to handle.
  */
-void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
+void kvm_timer_schedule(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	u64 ns;
+	cycle_t cval, now;
+
+	/*
+	 * No need to schedule a background timer if the guest timer has
+	 * already expired, because kvm_vcpu_block will return before putting
+	 * the thread to sleep.
+	 */
+	if (kvm_timer_should_fire(vcpu))
+		return;
 
 	/*
-	 * We're about to run this vcpu again, so there is no need to
-	 * keep the background timer running, as we're about to
-	 * populate the CPU timer again.
+	 * If the timer is either not capable of raising interrupts (disabled
+	 * or masked) or if we already have a background timer, then there's
+	 * no more work for us to do.
 	 */
+	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
+		return;
+
+	/*  The timer has not yet expired, schedule a background timer */
+	cval = timer->cntv_cval;
+	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
+
+	ns = cyclecounter_cyc2ns(timecounter->cc,
+				 cval - now,
+				 timecounter->mask,
+				 &timecounter->frac);
+	timer_arm(timer, ns);
+}
+
+void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 	timer_disarm(timer);
+}
 
+/**
+ * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Check if the virtual timer has expired while we were running in the host,
+ * and inject an interrupt if that was the case.
+ */
+void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
+{
 	/*
 	 * If the timer expired while we were not scheduled, now is the time
 	 * to inject it.
@@ -157,32 +199,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
  * kvm_timer_sync_hwstate - sync timer state from cpu
  * @vcpu: The vcpu pointer
  *
- * Check if the virtual timer was armed and either schedule a corresponding
- * soft timer or inject directly if already expired.
+ * Check if the virtual timer has expired while we werer running in the guest,
+ * and inject an interrupt if that was the case.
  */
 void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-	cycle_t cval, now;
-	u64 ns;
 
 	BUG_ON(timer_is_armed(timer));
 
-	if (kvm_timer_should_fire(vcpu)) {
-		/*
-		 * Timer has already expired while we were not
-		 * looking. Inject the interrupt and carry on.
-		 */
+	if (kvm_timer_should_fire(vcpu))
 		kvm_timer_inject_irq(vcpu);
-		return;
-	}
-
-	cval = timer->cntv_cval;
-	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
-
-	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
-				 &timecounter->frac);
-	timer_arm(timer, ns);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 3/9] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

Currently vgic_process_maintenance() processes dealing with a completed
level-triggered interrupt directly, but we are soon going to reuse this
logic for level-triggered mapped interrupts with the HW bit set, so
move this logic into a separate static function.

Probably the most scary part of this commit is convincing yourself that
the current flow is safe compared to the old one.  In the following I
try to list the changes and why they are harmless:

  Move vgic_irq_clear_queued after kvm_notify_acked_irq:
    Harmless because the effect of clearing the queued flag wrt.
    kvm_set_irq is only that vgic_update_irq_pending does not set the
    pending bit on the emulated CPU interface or in the pending_on_cpu
    bitmask, but we set this in __kvm_vgic_sync_hwstate later on if the
    level is stil high.

  Move vgic_set_lr before kvm_notify_acked_irq:
    Also, harmless because the LR are cpu-local operations and
    kvm_notify_acked only affects the dist

  Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
    Also harmless because it's just a bit which is cleared and altering
    the line state does not affect this bit.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 88 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 50 insertions(+), 38 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 9eb489a..c5750be 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1312,12 +1312,56 @@ epilog:
 	}
 }
 
+static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
+{
+	int level_pending = 0;
+
+	vlr.state = 0;
+	vlr.hwirq = 0;
+	vgic_set_lr(vcpu, lr, vlr);
+
+	/*
+	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
+	 * went from active to non-active (called from vgic_sync_hwirq) it was
+	 * also ACKed and we we therefore assume we can clear the soft pending
+	 * state (should it had been set) for this interrupt.
+	 *
+	 * Note: if the IRQ soft pending state was set after the IRQ was
+	 * acked, it actually shouldn't be cleared, but we have no way of
+	 * knowing that unless we start trapping ACKs when the soft-pending
+	 * state is set.
+	 */
+	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
+
+	/*
+	 * Tell the gic to start sampling the line of this interrupt again.
+	 */
+	vgic_irq_clear_queued(vcpu, vlr.irq);
+
+	/* Any additional pending interrupt? */
+	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
+		vgic_cpu_irq_set(vcpu, vlr.irq);
+		level_pending = 1;
+	} else {
+		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
+		vgic_cpu_irq_clear(vcpu, vlr.irq);
+	}
+
+	/*
+	 * Despite being EOIed, the LR may not have
+	 * been marked as empty.
+	 */
+	vgic_sync_lr_elrsr(vcpu, lr, vlr);
+
+	return level_pending;
+}
+
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 {
 	u32 status = vgic_get_interrupt_status(vcpu);
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-	bool level_pending = false;
 	struct kvm *kvm = vcpu->kvm;
+	int level_pending = 0;
 
 	kvm_debug("STATUS = %08x\n", status);
 
@@ -1332,54 +1376,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 
 		for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
 			struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
-			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 
-			spin_lock(&dist->lock);
-			vgic_irq_clear_queued(vcpu, vlr.irq);
+			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 			WARN_ON(vlr.state & LR_STATE_MASK);
-			vlr.state = 0;
-			vgic_set_lr(vcpu, lr, vlr);
 
-			/*
-			 * If the IRQ was EOIed it was also ACKed and we we
-			 * therefore assume we can clear the soft pending
-			 * state (should it had been set) for this interrupt.
-			 *
-			 * Note: if the IRQ soft pending state was set after
-			 * the IRQ was acked, it actually shouldn't be
-			 * cleared, but we have no way of knowing that unless
-			 * we start trapping ACKs when the soft-pending state
-			 * is set.
-			 */
-			vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
 
 			/*
 			 * kvm_notify_acked_irq calls kvm_set_irq()
-			 * to reset the IRQ level. Need to release the
-			 * lock for kvm_set_irq to grab it.
+			 * to reset the IRQ level, which grabs the dist->lock
+			 * so we call this before taking the dist->lock.
 			 */
-			spin_unlock(&dist->lock);
-
 			kvm_notify_acked_irq(kvm, 0,
 					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
-			spin_lock(&dist->lock);
-
-			/* Any additional pending interrupt? */
-			if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
-				vgic_cpu_irq_set(vcpu, vlr.irq);
-				level_pending = true;
-			} else {
-				vgic_dist_irq_clear_pending(vcpu, vlr.irq);
-				vgic_cpu_irq_clear(vcpu, vlr.irq);
-			}
 
+			spin_lock(&dist->lock);
+			level_pending |= process_level_irq(vcpu, lr, vlr);
 			spin_unlock(&dist->lock);
-
-			/*
-			 * Despite being EOIed, the LR may not have
-			 * been marked as empty.
-			 */
-			vgic_sync_lr_elrsr(vcpu, lr, vlr);
 		}
 	}
 
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 3/9] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Currently vgic_process_maintenance() processes dealing with a completed
level-triggered interrupt directly, but we are soon going to reuse this
logic for level-triggered mapped interrupts with the HW bit set, so
move this logic into a separate static function.

Probably the most scary part of this commit is convincing yourself that
the current flow is safe compared to the old one.  In the following I
try to list the changes and why they are harmless:

  Move vgic_irq_clear_queued after kvm_notify_acked_irq:
    Harmless because the effect of clearing the queued flag wrt.
    kvm_set_irq is only that vgic_update_irq_pending does not set the
    pending bit on the emulated CPU interface or in the pending_on_cpu
    bitmask, but we set this in __kvm_vgic_sync_hwstate later on if the
    level is stil high.

  Move vgic_set_lr before kvm_notify_acked_irq:
    Also, harmless because the LR are cpu-local operations and
    kvm_notify_acked only affects the dist

  Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
    Also harmless because it's just a bit which is cleared and altering
    the line state does not affect this bit.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 88 ++++++++++++++++++++++++++++++-----------------------
 1 file changed, 50 insertions(+), 38 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 9eb489a..c5750be 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1312,12 +1312,56 @@ epilog:
 	}
 }
 
+static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
+{
+	int level_pending = 0;
+
+	vlr.state = 0;
+	vlr.hwirq = 0;
+	vgic_set_lr(vcpu, lr, vlr);
+
+	/*
+	 * If the IRQ was EOIed (called from vgic_process_maintenance) or it
+	 * went from active to non-active (called from vgic_sync_hwirq) it was
+	 * also ACKed and we we therefore assume we can clear the soft pending
+	 * state (should it had been set) for this interrupt.
+	 *
+	 * Note: if the IRQ soft pending state was set after the IRQ was
+	 * acked, it actually shouldn't be cleared, but we have no way of
+	 * knowing that unless we start trapping ACKs when the soft-pending
+	 * state is set.
+	 */
+	vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
+
+	/*
+	 * Tell the gic to start sampling the line of this interrupt again.
+	 */
+	vgic_irq_clear_queued(vcpu, vlr.irq);
+
+	/* Any additional pending interrupt? */
+	if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
+		vgic_cpu_irq_set(vcpu, vlr.irq);
+		level_pending = 1;
+	} else {
+		vgic_dist_irq_clear_pending(vcpu, vlr.irq);
+		vgic_cpu_irq_clear(vcpu, vlr.irq);
+	}
+
+	/*
+	 * Despite being EOIed, the LR may not have
+	 * been marked as empty.
+	 */
+	vgic_sync_lr_elrsr(vcpu, lr, vlr);
+
+	return level_pending;
+}
+
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 {
 	u32 status = vgic_get_interrupt_status(vcpu);
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
-	bool level_pending = false;
 	struct kvm *kvm = vcpu->kvm;
+	int level_pending = 0;
 
 	kvm_debug("STATUS = %08x\n", status);
 
@@ -1332,54 +1376,22 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 
 		for_each_set_bit(lr, eisr_ptr, vgic->nr_lr) {
 			struct vgic_lr vlr = vgic_get_lr(vcpu, lr);
-			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 
-			spin_lock(&dist->lock);
-			vgic_irq_clear_queued(vcpu, vlr.irq);
+			WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
 			WARN_ON(vlr.state & LR_STATE_MASK);
-			vlr.state = 0;
-			vgic_set_lr(vcpu, lr, vlr);
 
-			/*
-			 * If the IRQ was EOIed it was also ACKed and we we
-			 * therefore assume we can clear the soft pending
-			 * state (should it had been set) for this interrupt.
-			 *
-			 * Note: if the IRQ soft pending state was set after
-			 * the IRQ was acked, it actually shouldn't be
-			 * cleared, but we have no way of knowing that unless
-			 * we start trapping ACKs when the soft-pending state
-			 * is set.
-			 */
-			vgic_dist_irq_clear_soft_pend(vcpu, vlr.irq);
 
 			/*
 			 * kvm_notify_acked_irq calls kvm_set_irq()
-			 * to reset the IRQ level. Need to release the
-			 * lock for kvm_set_irq to grab it.
+			 * to reset the IRQ level, which grabs the dist->lock
+			 * so we call this before taking the dist->lock.
 			 */
-			spin_unlock(&dist->lock);
-
 			kvm_notify_acked_irq(kvm, 0,
 					     vlr.irq - VGIC_NR_PRIVATE_IRQS);
-			spin_lock(&dist->lock);
-
-			/* Any additional pending interrupt? */
-			if (vgic_dist_irq_get_level(vcpu, vlr.irq)) {
-				vgic_cpu_irq_set(vcpu, vlr.irq);
-				level_pending = true;
-			} else {
-				vgic_dist_irq_clear_pending(vcpu, vlr.irq);
-				vgic_cpu_irq_clear(vcpu, vlr.irq);
-			}
 
+			spin_lock(&dist->lock);
+			level_pending |= process_level_irq(vcpu, lr, vlr);
 			spin_unlock(&dist->lock);
-
-			/*
-			 * Despite being EOIed, the LR may not have
-			 * been marked as empty.
-			 */
-			vgic_sync_lr_elrsr(vcpu, lr, vlr);
 		}
 	}
 
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 4/9] arm/arm64: Implement GICD_ICFGR as RO for PPIs
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
We currently simulate this behavior by writing a hardcoded value to the
register for the SGIs and PPIs on every write of these bits to the
register (ignoring what the guest actually wrote), and by writing the
same value as the reset value to the register.

This is a bit counter-intuitive, as the register is RO for these bits,
and we can just implement it that way, allowing us to control the value
of the bits purely in the reset code.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index c5750be..0ba92d3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -655,7 +655,7 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
 			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
 	if (mmio->is_write) {
 		if (offset < 8) {
-			*reg = ~0U; /* Force PPIs/SGIs to 1 */
+			/* Ignore writes to read-only SGI and PPI bits */
 			return false;
 		}
 
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 4/9] arm/arm64: Implement GICD_ICFGR as RO for PPIs
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
We currently simulate this behavior by writing a hardcoded value to the
register for the SGIs and PPIs on every write of these bits to the
register (ignoring what the guest actually wrote), and by writing the
same value as the reset value to the register.

This is a bit counter-intuitive, as the register is RO for these bits,
and we can just implement it that way, allowing us to control the value
of the bits purely in the reset code.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index c5750be..0ba92d3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -655,7 +655,7 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio,
 			ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
 	if (mmio->is_write) {
 		if (offset < 8) {
-			*reg = ~0U; /* Force PPIs/SGIs to 1 */
+			/* Ignore writes to read-only SGI and PPI bits */
 			return false;
 		}
 
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 5/9] arm/arm64: KVM: Use appropriate define in VGIC reset code
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

We currently initialize the SGIs to be enabled in the VGIC code, but we
use the VGIC_NR_PPIS define for this purpose, instead of the the more
natural VGIC_NR_SGIS.  Change this slightly confusing use of the
defines.

Note: This should have no functional change, as both names are defined
to the number 16.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 0ba92d3..8299c24 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2099,7 +2099,7 @@ int vgic_init(struct kvm *kvm)
 		}
 
 		for (i = 0; i < dist->nr_irqs; i++) {
-			if (i < VGIC_NR_PPIS)
+			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_enabled,
 							vcpu->vcpu_id, i, 1);
 			if (i < VGIC_NR_PRIVATE_IRQS)
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 5/9] arm/arm64: KVM: Use appropriate define in VGIC reset code
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

We currently initialize the SGIs to be enabled in the VGIC code, but we
use the VGIC_NR_PPIS define for this purpose, instead of the the more
natural VGIC_NR_SGIS.  Change this slightly confusing use of the
defines.

Note: This should have no functional change, as both names are defined
to the number 16.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 0ba92d3..8299c24 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2099,7 +2099,7 @@ int vgic_init(struct kvm *kvm)
 		}
 
 		for (i = 0; i < dist->nr_irqs; i++) {
-			if (i < VGIC_NR_PPIS)
+			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_enabled,
 							vcpu->vcpu_id, i, 1);
 			if (i < VGIC_NR_PRIVATE_IRQS)
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
with them is not apparently easy to understand by reading various specs.

Therefore, add a proper documentation file explaining the flow and
rationale of the behavior of the vgic.

Some of this text was contributed by Marc Zyngier.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
new file mode 100644
index 0000000..49e1357
--- /dev/null
+++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
@@ -0,0 +1,59 @@
+KVM/ARM VGIC Mapped Interrupts
+==============================
+
+Setting the Physical Active State for Edge vs. Level Triggered IRQs
+-------------------------------------------------------------------
+
+Mapped non-shared interrupts injected to a guest should always mark the
+interrupt as active on the physical distributor.
+
+The reasoning for level-triggered interrupts:
+For level-triggered interrupts, we have to mark the interrupt as active
+on the physical distributor, because otherwise, as the line remains
+asserted, the guest will never execute because the host will keep taking
+interrupts.  As soon as the guest deactivates the interrupt, the
+physical line is sampled by the hardware again and the host takes a new
+interrupt if the physical line is still asserted.
+
+The reasoning for edge-triggered interrupts:
+For edge-triggered interrupts, if we set the HW bit in the LR we also
+have to mark the interrupt as active on the physical distributor.  If we
+don't set the physical active bit and the interrupt hits again before
+the guest has deactivated the interrupt, the interrupt goes to the host,
+which cannot set the state to ACTIVE+PENDING in the LR, because that is
+not supported when setting the HW bit in the LR.
+
+An alternative could be to not use HW bit at all, and inject
+edge-triggered interrupts from a physical assigned device as pure
+virtual interrupts, but that would potentially slow down handling of the
+interrupt in the guest, because a physical interrupt occurring in the
+middle of the guest ISR would preempt the guest for the host to handle
+the interrupt.
+
+
+Life Cycle for Forwarded Physical Interrupts
+--------------------------------------------
+
+By forwarded physical interrupts we mean interrupts presented to a guest
+representing a real HW event originally signaled to the host as a
+physical interrupt and injecting this as a virtual interrupt with the HW
+bit set in the LR.
+
+The state of such an interrupt is managed in the following way:
+
+  - LR.Pending must be set when the interrupt is first injected, because this
+    is the only way the GICV interface is going to present it to the guest.
+  - LR.Pending will stay set as long as the guest has not acked the interrupt.
+  - LR.Pending transitions to LR.Active on read of IAR, as expected.
+  - On EOI, the *physical distributor* active bit gets cleared, but the
+    LR.Active is left untouched - it looks like the GIC can only clear a
+    single bit (either the virtual active, or the physical one).
+  - This means we cannot trust LR.Active to find out about the state of the
+    interrupt, and we definitely need to look at the distributor version.
+
+Consequently, when we context switch the state of a VCPU with forwarded
+physical interrupts, we must context switch set pending *or* active bits in the
+LR for that VCPU until the guest has deactivated the physical interrupt, and
+then clear the corresponding bits in the LR.  If we ever set an LR to pending or
+mapped when switching in a VCPU for a forwarded physical interrupt, we must also
+set the active state on the *physical distributor*.
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
with them is not apparently easy to understand by reading various specs.

Therefore, add a proper documentation file explaining the flow and
rationale of the behavior of the vgic.

Some of this text was contributed by Marc Zyngier.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
 1 file changed, 59 insertions(+)
 create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt

diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
new file mode 100644
index 0000000..49e1357
--- /dev/null
+++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
@@ -0,0 +1,59 @@
+KVM/ARM VGIC Mapped Interrupts
+==============================
+
+Setting the Physical Active State for Edge vs. Level Triggered IRQs
+-------------------------------------------------------------------
+
+Mapped non-shared interrupts injected to a guest should always mark the
+interrupt as active on the physical distributor.
+
+The reasoning for level-triggered interrupts:
+For level-triggered interrupts, we have to mark the interrupt as active
+on the physical distributor, because otherwise, as the line remains
+asserted, the guest will never execute because the host will keep taking
+interrupts.  As soon as the guest deactivates the interrupt, the
+physical line is sampled by the hardware again and the host takes a new
+interrupt if the physical line is still asserted.
+
+The reasoning for edge-triggered interrupts:
+For edge-triggered interrupts, if we set the HW bit in the LR we also
+have to mark the interrupt as active on the physical distributor.  If we
+don't set the physical active bit and the interrupt hits again before
+the guest has deactivated the interrupt, the interrupt goes to the host,
+which cannot set the state to ACTIVE+PENDING in the LR, because that is
+not supported when setting the HW bit in the LR.
+
+An alternative could be to not use HW bit at all, and inject
+edge-triggered interrupts from a physical assigned device as pure
+virtual interrupts, but that would potentially slow down handling of the
+interrupt in the guest, because a physical interrupt occurring in the
+middle of the guest ISR would preempt the guest for the host to handle
+the interrupt.
+
+
+Life Cycle for Forwarded Physical Interrupts
+--------------------------------------------
+
+By forwarded physical interrupts we mean interrupts presented to a guest
+representing a real HW event originally signaled to the host as a
+physical interrupt and injecting this as a virtual interrupt with the HW
+bit set in the LR.
+
+The state of such an interrupt is managed in the following way:
+
+  - LR.Pending must be set when the interrupt is first injected, because this
+    is the only way the GICV interface is going to present it to the guest.
+  - LR.Pending will stay set as long as the guest has not acked the interrupt.
+  - LR.Pending transitions to LR.Active on read of IAR, as expected.
+  - On EOI, the *physical distributor* active bit gets cleared, but the
+    LR.Active is left untouched - it looks like the GIC can only clear a
+    single bit (either the virtual active, or the physical one).
+  - This means we cannot trust LR.Active to find out about the state of the
+    interrupt, and we definitely need to look at the distributor version.
+
+Consequently, when we context switch the state of a VCPU with forwarded
+physical interrupts, we must context switch set pending *or* active bits in the
+LR for that VCPU until the guest has deactivated the physical interrupt, and
+then clear the corresponding bits in the LR.  If we ever set an LR to pending or
+mapped when switching in a VCPU for a forwarded physical interrupt, we must also
+set the active state on the *physical distributor*.
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 7/9] arm/arm64: KVM: vgic: Move active state handling to flush_hwstate
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

We currently set the physical active state only when we *inject* a new
pending virtual interrupt, but this is actually not correct, because we
could have been preempted and run something else on the system that
resets the active state to clear.  This causes us to run the VM with the
timer set to fire, but without setting the physical active state.

The solution is to always check the LR configurations, and we if have a
mapped interrupt in th LR in either the pending or active state
(virtual), then set the physical active state.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 42 ++++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 8299c24..9ed8d53 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1144,26 +1144,11 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 		struct irq_phys_map *map;
 		map = vgic_irq_map_search(vcpu, irq);
 
-		/*
-		 * If we have a mapping, and the virtual interrupt is
-		 * being injected, then we must set the state to
-		 * active in the physical world. Otherwise the
-		 * physical interrupt will fire and the guest will
-		 * exit before processing the virtual interrupt.
-		 */
 		if (map) {
-			int ret;
-
-			BUG_ON(!map->active);
 			vlr.hwirq = map->phys_irq;
 			vlr.state |= LR_HW;
 			vlr.state &= ~LR_EOI_INT;
 
-			ret = irq_set_irqchip_state(map->irq,
-						    IRQCHIP_STATE_ACTIVE,
-						    true);
-			WARN_ON(ret);
-
 			/*
 			 * Make sure we're not going to sample this
 			 * again, as a HW-backed interrupt cannot be
@@ -1255,7 +1240,7 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 	unsigned long *pa_percpu, *pa_shared;
-	int i, vcpu_id;
+	int i, vcpu_id, lr, ret;
 	int overflow = 0;
 	int nr_shared = vgic_nr_shared_irqs(dist);
 
@@ -1310,6 +1295,31 @@ epilog:
 		 */
 		clear_bit(vcpu_id, dist->irq_pending_on_cpu);
 	}
+
+	for (lr = 0; lr < vgic->nr_lr; lr++) {
+		struct vgic_lr vlr;
+
+		if (!test_bit(lr, vgic_cpu->lr_used))
+			continue;
+
+		vlr = vgic_get_lr(vcpu, lr);
+
+		/*
+		 * If we have a mapping, and the virtual interrupt is
+		 * presented to the guest (as pending or active), then we must
+		 * set the state to active in the physical world. See
+		 * Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt.
+		 */
+		if (vlr.state & LR_HW) {
+			struct irq_phys_map *map;
+			map = vgic_irq_map_search(vcpu, vlr.irq);
+
+			ret = irq_set_irqchip_state(map->irq,
+						    IRQCHIP_STATE_ACTIVE,
+						    true);
+			WARN_ON(ret);
+		}
+	}
 }
 
 static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 7/9] arm/arm64: KVM: vgic: Move active state handling to flush_hwstate
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

We currently set the physical active state only when we *inject* a new
pending virtual interrupt, but this is actually not correct, because we
could have been preempted and run something else on the system that
resets the active state to clear.  This causes us to run the VM with the
timer set to fire, but without setting the physical active state.

The solution is to always check the LR configurations, and we if have a
mapped interrupt in th LR in either the pending or active state
(virtual), then set the physical active state.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 42 ++++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 8299c24..9ed8d53 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1144,26 +1144,11 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
 		struct irq_phys_map *map;
 		map = vgic_irq_map_search(vcpu, irq);
 
-		/*
-		 * If we have a mapping, and the virtual interrupt is
-		 * being injected, then we must set the state to
-		 * active in the physical world. Otherwise the
-		 * physical interrupt will fire and the guest will
-		 * exit before processing the virtual interrupt.
-		 */
 		if (map) {
-			int ret;
-
-			BUG_ON(!map->active);
 			vlr.hwirq = map->phys_irq;
 			vlr.state |= LR_HW;
 			vlr.state &= ~LR_EOI_INT;
 
-			ret = irq_set_irqchip_state(map->irq,
-						    IRQCHIP_STATE_ACTIVE,
-						    true);
-			WARN_ON(ret);
-
 			/*
 			 * Make sure we're not going to sample this
 			 * again, as a HW-backed interrupt cannot be
@@ -1255,7 +1240,7 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 	unsigned long *pa_percpu, *pa_shared;
-	int i, vcpu_id;
+	int i, vcpu_id, lr, ret;
 	int overflow = 0;
 	int nr_shared = vgic_nr_shared_irqs(dist);
 
@@ -1310,6 +1295,31 @@ epilog:
 		 */
 		clear_bit(vcpu_id, dist->irq_pending_on_cpu);
 	}
+
+	for (lr = 0; lr < vgic->nr_lr; lr++) {
+		struct vgic_lr vlr;
+
+		if (!test_bit(lr, vgic_cpu->lr_used))
+			continue;
+
+		vlr = vgic_get_lr(vcpu, lr);
+
+		/*
+		 * If we have a mapping, and the virtual interrupt is
+		 * presented to the guest (as pending or active), then we must
+		 * set the state to active in the physical world. See
+		 * Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt.
+		 */
+		if (vlr.state & LR_HW) {
+			struct irq_phys_map *map;
+			map = vgic_irq_map_search(vcpu, vlr.irq);
+
+			ret = irq_set_irqchip_state(map->irq,
+						    IRQCHIP_STATE_ACTIVE,
+						    true);
+			WARN_ON(ret);
+		}
+	}
 }
 
 static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm; +Cc: Christoffer Dall

The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any affect on the pending state of
virtual interrupts in the vgic.  This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts.  Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.

This patches fixes this shortcoming through the following series of
changes.

First, we change the flow of the timer/vgic sync/flush operations.  Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output.  This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.

Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes.  Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.

Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever we signal a mapped interrupt to the guest, and we reset the
active state when we sync back the HW state from the vgic.

Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/kvm/arm.c           | 11 +++++--
 include/kvm/arm_arch_timer.h |  2 +-
 include/kvm/arm_vgic.h       |  3 --
 virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
 virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
 5 files changed, 81 insertions(+), 70 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index bdf8871..102a4aa 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
+			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			preempt_enable();
-			kvm_timer_sync_hwstate(vcpu);
 			continue;
 		}
 
@@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_exit();
 		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
+		/*
+		 * We must sync the timer state before the vgic state so that
+		 * the vgic can properly sample the updated state of the
+		 * interrupt line.
+		 */
+		kvm_timer_sync_hwstate(vcpu);
+
 		kvm_vgic_sync_hwstate(vcpu);
 
 		preempt_enable();
 
-		kvm_timer_sync_hwstate(vcpu);
-
 		ret = handle_exit(vcpu, run, ret);
 	}
 
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ef14cc1..1800227 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -51,7 +51,7 @@ struct arch_timer_cpu {
 	bool				armed;
 
 	/* Timer IRQ */
-	const struct kvm_irq_level	*irq;
+	struct kvm_irq_level		irq;
 
 	/* VGIC mapping */
 	struct irq_phys_map		*map;
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d901f1a..99011a0 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -163,7 +163,6 @@ struct irq_phys_map {
 	u32			virt_irq;
 	u32			phys_irq;
 	u32			irq;
-	bool			active;
 };
 
 struct irq_phys_map_entry {
@@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 					   int virt_irq, int irq);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 018f3d6..747302f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
 	}
 }
 
-static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
-{
-	int ret;
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-
-	kvm_vgic_set_phys_irq_active(timer->map, true);
-	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
-					 timer->map,
-					 timer->irq->level);
-	WARN_ON(ret);
-}
-
 static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
 {
 	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
@@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
-		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
-		!kvm_vgic_get_phys_irq_active(timer->map);
+		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
 }
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
@@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
+static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
+{
+	int ret;
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	BUG_ON(!vgic_initialized(vcpu->kvm));
+
+	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
+					 timer->map,
+					 timer->irq.level);
+	WARN_ON(ret);
+}
+
+/*
+ * Check if there was a change in the timer state (should we raise or lower
+ * the line level to the GIC).
+ */
+static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	/*
+	 * If userspace modified the timer registers via SET_ONE_REG before
+	 * the vgic was initialized, we mustn't set the timer->irq.level value
+	 * because the guest would never see the interrupt.  Instead wait
+	 * until we call this funciton from kvm_timer_flush_hwstate.
+	 */
+	if (!vgic_initialized(vcpu->kvm))
+	    return;
+
+	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
+		timer->irq.level = 1;
+		kvm_timer_update_irq(vcpu);
+	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
+		timer->irq.level = 0;
+		kvm_timer_update_irq(vcpu);
+	}
+}
+
 /*
  * Schedule the background timer before calling kvm_vcpu_block, so that this
  * thread is removed from its waitqueue and made runnable when there's a timer
@@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	 * If the timer expired while we were not scheduled, now is the time
 	 * to inject it.
 	 */
-	if (kvm_timer_should_fire(vcpu))
-		kvm_timer_inject_irq(vcpu);
+	kvm_timer_update_state(vcpu);
 }
 
 /**
@@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 
 	BUG_ON(timer_is_armed(timer));
 
-	if (kvm_timer_should_fire(vcpu))
-		kvm_timer_inject_irq(vcpu);
+	/*
+	 * The guest could have modified the timer registers or the timer
+	 * could have expired, update the timer state.
+	 */
+	kvm_timer_update_state(vcpu);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
@@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * kvm_vcpu_set_target(). To handle this, we determine
 	 * vcpu timer irq number when the vcpu is reset.
 	 */
-	timer->irq = irq;
+	timer->irq.irq = irq->irq;
 
 	/*
 	 * Tell the VGIC that the virtual interrupt is tied to a
@@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 	default:
 		return -1;
 	}
+
+	kvm_timer_update_state(vcpu);
 	return 0;
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 9ed8d53..f4ea950 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 /*
  * Save the physical active state, and reset it to inactive.
  *
- * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
+ * Return true if there's a pending level triggered interrupt line to queue.
  */
-static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
+static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 {
 	struct irq_phys_map *map;
+	bool phys_active;
 	int ret;
 
 	if (!(vlr.state & LR_HW))
 		return 0;
 
 	map = vgic_irq_map_search(vcpu, vlr.irq);
-	BUG_ON(!map || !map->active);
+	BUG_ON(!map);
 
 	ret = irq_get_irqchip_state(map->irq,
 				    IRQCHIP_STATE_ACTIVE,
-				    &map->active);
+				    &phys_active);
 
 	WARN_ON(ret);
 
-	if (map->active) {
+	if (phys_active) {
+		/*
+		 * Interrupt still marked as active on the physical
+		 * distributor, so guest did not EOI it yet.  Reset to
+		 * non-active so that other VMs can see interrupts from this
+		 * device.
+		 */
 		ret = irq_set_irqchip_state(map->irq,
 					    IRQCHIP_STATE_ACTIVE,
 					    false);
 		WARN_ON(ret);
-		return 0;
+		return false;
 	}
 
-	return 1;
+	/* Mapped edge-triggered interrupts not yet supported. */
+	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
+	return process_level_irq(vcpu, lr, vlr);
 }
 
 /* Sync back the VGIC state after a guest run */
@@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 			continue;
 
 		vlr = vgic_get_lr(vcpu, lr);
-		if (vgic_sync_hwirq(vcpu, vlr)) {
-			/*
-			 * So this is a HW interrupt that the guest
-			 * EOI-ed. Clean the LR state and allow the
-			 * interrupt to be sampled again.
-			 */
-			vlr.state = 0;
-			vlr.hwirq = 0;
-			vgic_set_lr(vcpu, lr, vlr);
-			vgic_irq_clear_queued(vcpu, vlr.irq);
-			set_bit(lr, elrsr_ptr);
-		}
+		if (vgic_sync_hwirq(vcpu, lr, vlr))
+			level_pending = true;
 
 		if (!test_bit(lr, elrsr_ptr))
 			continue;
@@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
 }
 
 /**
- * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
- *
- * Return the logical active state of a mapped interrupt. This doesn't
- * necessarily reflects the current HW state.
- */
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
-{
-	BUG_ON(!map);
-	return map->active;
-}
-
-/**
- * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
- *
- * Set the logical active state of a mapped interrupt. This doesn't
- * immediately affects the HW state.
- */
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
-{
-	BUG_ON(!map);
-	map->active = active;
-}
-
-/**
  * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
  * @vcpu: The VCPU pointer
  * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
@@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
 			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_enabled,
 							vcpu->vcpu_id, i, 1);
-			if (i < VGIC_NR_PRIVATE_IRQS)
+			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_cfg,
 							vcpu->vcpu_id, i,
 							VGIC_CFG_EDGE);
+			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
+				vgic_bitmap_set_irq_val(&dist->irq_cfg,
+							vcpu->vcpu_id, i,
+							VGIC_CFG_LEVEL);
 		}
 
 		vgic_enable(vcpu);
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

The arch timer currently uses edge-triggered semantics in the sense that
the line is never sampled by the vgic and lowering the line from the
timer to the vgic doesn't have any affect on the pending state of
virtual interrupts in the vgic.  This means that we do not support a
guest with the otherwise valid behavior of (1) disable interrupts (2)
enable the timer (3) disable the timer (4) enable interrupts.  Such a
guest would validly not expect to see any interrupts on real hardware,
but will see interrupts on KVM.

This patches fixes this shortcoming through the following series of
changes.

First, we change the flow of the timer/vgic sync/flush operations.  Now
the timer is always flushed/synced before the vgic, because the vgic
samples the state of the timer output.  This has the implication that we
move the timer operations in to non-preempible sections, but that is
fine after the previous commit getting rid of hrtimer schedules on every
entry/exit.

Second, we change the internal behavior of the timer, letting the timer
keep track of its previous output state, and only lower/raise the line
to the vgic when the state changes.  Note that in theory this could have
been accomplished more simply by signalling the vgic every time the
state *potentially* changed, but we don't want to be hitting the vgic
more often than necessary.

Third, we get rid of the use of the map->active field in the vgic and
instead simply set the interrupt as active on the physical distributor
whenever we signal a mapped interrupt to the guest, and we reset the
active state when we sync back the HW state from the vgic.

Fourth, and finally, we now initialize the timer PPIs (and all the other
unused PPIs for now), to be level-triggered, and modify the sync code to
sample the line state on HW sync and re-inject a new interrupt if it is
still pending at that time.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 arch/arm/kvm/arm.c           | 11 +++++--
 include/kvm/arm_arch_timer.h |  2 +-
 include/kvm/arm_vgic.h       |  3 --
 virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
 virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
 5 files changed, 81 insertions(+), 70 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index bdf8871..102a4aa 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
 			local_irq_enable();
+			kvm_timer_sync_hwstate(vcpu);
 			kvm_vgic_sync_hwstate(vcpu);
 			preempt_enable();
-			kvm_timer_sync_hwstate(vcpu);
 			continue;
 		}
 
@@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_guest_exit();
 		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
+		/*
+		 * We must sync the timer state before the vgic state so that
+		 * the vgic can properly sample the updated state of the
+		 * interrupt line.
+		 */
+		kvm_timer_sync_hwstate(vcpu);
+
 		kvm_vgic_sync_hwstate(vcpu);
 
 		preempt_enable();
 
-		kvm_timer_sync_hwstate(vcpu);
-
 		ret = handle_exit(vcpu, run, ret);
 	}
 
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index ef14cc1..1800227 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -51,7 +51,7 @@ struct arch_timer_cpu {
 	bool				armed;
 
 	/* Timer IRQ */
-	const struct kvm_irq_level	*irq;
+	struct kvm_irq_level		irq;
 
 	/* VGIC mapping */
 	struct irq_phys_map		*map;
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d901f1a..99011a0 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -163,7 +163,6 @@ struct irq_phys_map {
 	u32			virt_irq;
 	u32			phys_irq;
 	u32			irq;
-	bool			active;
 };
 
 struct irq_phys_map_entry {
@@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
 					   int virt_irq, int irq);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
 
 #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
 #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 018f3d6..747302f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
 	}
 }
 
-static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
-{
-	int ret;
-	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
-
-	kvm_vgic_set_phys_irq_active(timer->map, true);
-	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
-					 timer->map,
-					 timer->irq->level);
-	WARN_ON(ret);
-}
-
 static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
 {
 	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
@@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
-		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
-		!kvm_vgic_get_phys_irq_active(timer->map);
+		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
 }
 
 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
@@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
+static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
+{
+	int ret;
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	BUG_ON(!vgic_initialized(vcpu->kvm));
+
+	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
+					 timer->map,
+					 timer->irq.level);
+	WARN_ON(ret);
+}
+
+/*
+ * Check if there was a change in the timer state (should we raise or lower
+ * the line level to the GIC).
+ */
+static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
+{
+	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+
+	/*
+	 * If userspace modified the timer registers via SET_ONE_REG before
+	 * the vgic was initialized, we mustn't set the timer->irq.level value
+	 * because the guest would never see the interrupt.  Instead wait
+	 * until we call this funciton from kvm_timer_flush_hwstate.
+	 */
+	if (!vgic_initialized(vcpu->kvm))
+	    return;
+
+	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
+		timer->irq.level = 1;
+		kvm_timer_update_irq(vcpu);
+	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
+		timer->irq.level = 0;
+		kvm_timer_update_irq(vcpu);
+	}
+}
+
 /*
  * Schedule the background timer before calling kvm_vcpu_block, so that this
  * thread is removed from its waitqueue and made runnable when there's a timer
@@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	 * If the timer expired while we were not scheduled, now is the time
 	 * to inject it.
 	 */
-	if (kvm_timer_should_fire(vcpu))
-		kvm_timer_inject_irq(vcpu);
+	kvm_timer_update_state(vcpu);
 }
 
 /**
@@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
 
 	BUG_ON(timer_is_armed(timer));
 
-	if (kvm_timer_should_fire(vcpu))
-		kvm_timer_inject_irq(vcpu);
+	/*
+	 * The guest could have modified the timer registers or the timer
+	 * could have expired, update the timer state.
+	 */
+	kvm_timer_update_state(vcpu);
 }
 
 int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
@@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	 * kvm_vcpu_set_target(). To handle this, we determine
 	 * vcpu timer irq number when the vcpu is reset.
 	 */
-	timer->irq = irq;
+	timer->irq.irq = irq->irq;
 
 	/*
 	 * Tell the VGIC that the virtual interrupt is tied to a
@@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 	default:
 		return -1;
 	}
+
+	kvm_timer_update_state(vcpu);
 	return 0;
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 9ed8d53..f4ea950 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
 /*
  * Save the physical active state, and reset it to inactive.
  *
- * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
+ * Return true if there's a pending level triggered interrupt line to queue.
  */
-static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
+static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
 {
 	struct irq_phys_map *map;
+	bool phys_active;
 	int ret;
 
 	if (!(vlr.state & LR_HW))
 		return 0;
 
 	map = vgic_irq_map_search(vcpu, vlr.irq);
-	BUG_ON(!map || !map->active);
+	BUG_ON(!map);
 
 	ret = irq_get_irqchip_state(map->irq,
 				    IRQCHIP_STATE_ACTIVE,
-				    &map->active);
+				    &phys_active);
 
 	WARN_ON(ret);
 
-	if (map->active) {
+	if (phys_active) {
+		/*
+		 * Interrupt still marked as active on the physical
+		 * distributor, so guest did not EOI it yet.  Reset to
+		 * non-active so that other VMs can see interrupts from this
+		 * device.
+		 */
 		ret = irq_set_irqchip_state(map->irq,
 					    IRQCHIP_STATE_ACTIVE,
 					    false);
 		WARN_ON(ret);
-		return 0;
+		return false;
 	}
 
-	return 1;
+	/* Mapped edge-triggered interrupts not yet supported. */
+	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
+	return process_level_irq(vcpu, lr, vlr);
 }
 
 /* Sync back the VGIC state after a guest run */
@@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
 			continue;
 
 		vlr = vgic_get_lr(vcpu, lr);
-		if (vgic_sync_hwirq(vcpu, vlr)) {
-			/*
-			 * So this is a HW interrupt that the guest
-			 * EOI-ed. Clean the LR state and allow the
-			 * interrupt to be sampled again.
-			 */
-			vlr.state = 0;
-			vlr.hwirq = 0;
-			vgic_set_lr(vcpu, lr, vlr);
-			vgic_irq_clear_queued(vcpu, vlr.irq);
-			set_bit(lr, elrsr_ptr);
-		}
+		if (vgic_sync_hwirq(vcpu, lr, vlr))
+			level_pending = true;
 
 		if (!test_bit(lr, elrsr_ptr))
 			continue;
@@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
 }
 
 /**
- * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
- *
- * Return the logical active state of a mapped interrupt. This doesn't
- * necessarily reflects the current HW state.
- */
-bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
-{
-	BUG_ON(!map);
-	return map->active;
-}
-
-/**
- * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
- *
- * Set the logical active state of a mapped interrupt. This doesn't
- * immediately affects the HW state.
- */
-void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
-{
-	BUG_ON(!map);
-	map->active = active;
-}
-
-/**
  * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
  * @vcpu: The VCPU pointer
  * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
@@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
 			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_enabled,
 							vcpu->vcpu_id, i, 1);
-			if (i < VGIC_NR_PRIVATE_IRQS)
+			if (i < VGIC_NR_SGIS)
 				vgic_bitmap_set_irq_val(&dist->irq_cfg,
 							vcpu->vcpu_id, i,
 							VGIC_CFG_EDGE);
+			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
+				vgic_bitmap_set_irq_val(&dist->irq_cfg,
+							vcpu->vcpu_id, i,
+							VGIC_CFG_LEVEL);
 		}
 
 		vgic_enable(vcpu);
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-08-30 13:54   ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: kvmarm, linux-arm-kernel, kvm
  Cc: Christoffer Dall, Laszlo Ersek, Ard Biesheuvel, Drew Jones,
	Wei Huang, Peter Maydell

Provide a better quality of implementation and be architecture compliant
on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
reset of the timer, and call kvm_timer_update_state(vcpu) at the same
time, ensuring the timer output is not asserted after, for example, a
PSCI system reset.

This change alone fixes the UEFI reset issue reported by Laszlo back in
February.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Drew Jones <drjones@redhat.com>
Cc: Wei Huang <wei@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 747302f..8a0fdfc 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -255,6 +255,15 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	timer->irq.irq = irq->irq;
 
 	/*
+	 * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
+	 * and to 0 for ARMv7.  We provide an implementation that always
+	 * resets the timer to be disabled and unmasked and is compliant with
+	 * the ARMv7 architecture.
+	 */
+	timer->cntv_ctl = 0;
+	kvm_timer_update_state(vcpu);
+
+	/*
 	 * Tell the VGIC that the virtual interrupt is tied to a
 	 * physical interrupt. We do that once per VCPU.
 	 */
-- 
2.1.2.330.g565301e.dirty


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
@ 2015-08-30 13:54   ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-30 13:54 UTC (permalink / raw)
  To: linux-arm-kernel

Provide a better quality of implementation and be architecture compliant
on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
reset of the timer, and call kvm_timer_update_state(vcpu) at the same
time, ensuring the timer output is not asserted after, for example, a
PSCI system reset.

This change alone fixes the UEFI reset issue reported by Laszlo back in
February.

Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Drew Jones <drjones@redhat.com>
Cc: Wei Huang <wei@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 747302f..8a0fdfc 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -255,6 +255,15 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 	timer->irq.irq = irq->irq;
 
 	/*
+	 * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
+	 * and to 0 for ARMv7.  We provide an implementation that always
+	 * resets the timer to be disabled and unmasked and is compliant with
+	 * the ARMv7 architecture.
+	 */
+	timer->cntv_ctl = 0;
+	kvm_timer_update_state(vcpu);
+
+	/*
 	 * Tell the VGIC that the virtual interrupt is tied to a
 	 * physical interrupt. We do that once per VCPU.
 	 */
-- 
2.1.2.330.g565301e.dirty

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-08-31  8:46     ` Ard Biesheuvel
  -1 siblings, 0 replies; 74+ messages in thread
From: Ard Biesheuvel @ 2015-08-31  8:46 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM devel mailing list, Laszlo Ersek,
	Drew Jones, Wei Huang, Peter Maydell

On 30 August 2015 at 15:54, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Provide a better quality of implementation and be architecture compliant
> on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
> reset of the timer, and call kvm_timer_update_state(vcpu) at the same
> time, ensuring the timer output is not asserted after, for example, a
> PSCI system reset.
>
> This change alone fixes the UEFI reset issue reported by Laszlo back in
> February.
>

Do you have a link to that report? I can't quite remember the details ...

> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Drew Jones <drjones@redhat.com>
> Cc: Wei Huang <wei@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 747302f..8a0fdfc 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -255,6 +255,15 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>         timer->irq.irq = irq->irq;
>
>         /*
> +        * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> +        * and to 0 for ARMv7.  We provide an implementation that always
> +        * resets the timer to be disabled and unmasked and is compliant with
> +        * the ARMv7 architecture.
> +        */
> +       timer->cntv_ctl = 0;
> +       kvm_timer_update_state(vcpu);
> +
> +       /*
>          * Tell the VGIC that the virtual interrupt is tied to a
>          * physical interrupt. We do that once per VCPU.
>          */
> --
> 2.1.2.330.g565301e.dirty
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
@ 2015-08-31  8:46     ` Ard Biesheuvel
  0 siblings, 0 replies; 74+ messages in thread
From: Ard Biesheuvel @ 2015-08-31  8:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 August 2015 at 15:54, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> Provide a better quality of implementation and be architecture compliant
> on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
> reset of the timer, and call kvm_timer_update_state(vcpu) at the same
> time, ensuring the timer output is not asserted after, for example, a
> PSCI system reset.
>
> This change alone fixes the UEFI reset issue reported by Laszlo back in
> February.
>

Do you have a link to that report? I can't quite remember the details ...

> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Drew Jones <drjones@redhat.com>
> Cc: Wei Huang <wei@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 747302f..8a0fdfc 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -255,6 +255,15 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>         timer->irq.irq = irq->irq;
>
>         /*
> +        * The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> +        * and to 0 for ARMv7.  We provide an implementation that always
> +        * resets the timer to be disabled and unmasked and is compliant with
> +        * the ARMv7 architecture.
> +        */
> +       timer->cntv_ctl = 0;
> +       kvm_timer_update_state(vcpu);
> +
> +       /*
>          * Tell the VGIC that the virtual interrupt is tied to a
>          * physical interrupt. We do that once per VCPU.
>          */
> --
> 2.1.2.330.g565301e.dirty
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
  2015-08-31  8:46     ` Ard Biesheuvel
@ 2015-08-31  8:57       ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-31  8:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: kvmarm, linux-arm-kernel, KVM devel mailing list, Laszlo Ersek,
	Drew Jones, Wei Huang, Peter Maydell

On Mon, Aug 31, 2015 at 10:46:59AM +0200, Ard Biesheuvel wrote:
> On 30 August 2015 at 15:54, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > Provide a better quality of implementation and be architecture compliant
> > on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
> > reset of the timer, and call kvm_timer_update_state(vcpu) at the same
> > time, ensuring the timer output is not asserted after, for example, a
> > PSCI system reset.
> >
> > This change alone fixes the UEFI reset issue reported by Laszlo back in
> > February.
> >
> 
> Do you have a link to that report? I can't quite remember the details ...
> 
There was no public mailing list cc'ed on the report, and the bugzilla
bug was created in a non-public Red Hat instance.

But you were cc'ed on the original mail from Laszlo.  Look for an e-mail
from February 5th with the subject "virtual timer interrupt stuck across
guest firmware reboot on KVM".

-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
@ 2015-08-31  8:57       ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-08-31  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 31, 2015 at 10:46:59AM +0200, Ard Biesheuvel wrote:
> On 30 August 2015 at 15:54, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > Provide a better quality of implementation and be architecture compliant
> > on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
> > reset of the timer, and call kvm_timer_update_state(vcpu) at the same
> > time, ensuring the timer output is not asserted after, for example, a
> > PSCI system reset.
> >
> > This change alone fixes the UEFI reset issue reported by Laszlo back in
> > February.
> >
> 
> Do you have a link to that report? I can't quite remember the details ...
> 
There was no public mailing list cc'ed on the report, and the bugzilla
bug was created in a non-public Red Hat instance.

But you were cc'ed on the original mail from Laszlo.  Look for an e-mail
from February 5th with the subject "virtual timer interrupt stuck across
guest firmware reboot on KVM".

-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
  2015-08-31  8:57       ` Christoffer Dall
@ 2015-08-31  9:02         ` Ard Biesheuvel
  -1 siblings, 0 replies; 74+ messages in thread
From: Ard Biesheuvel @ 2015-08-31  9:02 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvmarm, linux-arm-kernel, KVM devel mailing list, Laszlo Ersek,
	Drew Jones, Wei Huang, Peter Maydell

On 31 August 2015 at 10:57, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Mon, Aug 31, 2015 at 10:46:59AM +0200, Ard Biesheuvel wrote:
>> On 30 August 2015 at 15:54, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > Provide a better quality of implementation and be architecture compliant
>> > on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
>> > reset of the timer, and call kvm_timer_update_state(vcpu) at the same
>> > time, ensuring the timer output is not asserted after, for example, a
>> > PSCI system reset.
>> >
>> > This change alone fixes the UEFI reset issue reported by Laszlo back in
>> > February.
>> >
>>
>> Do you have a link to that report? I can't quite remember the details ...
>>
> There was no public mailing list cc'ed on the report, and the bugzilla
> bug was created in a non-public Red Hat instance.
>
> But you were cc'ed on the original mail from Laszlo.  Look for an e-mail
> from February 5th with the subject "virtual timer interrupt stuck across
> guest firmware reboot on KVM".
>

Found it, thanks.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
@ 2015-08-31  9:02         ` Ard Biesheuvel
  0 siblings, 0 replies; 74+ messages in thread
From: Ard Biesheuvel @ 2015-08-31  9:02 UTC (permalink / raw)
  To: linux-arm-kernel

On 31 August 2015 at 10:57, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Mon, Aug 31, 2015 at 10:46:59AM +0200, Ard Biesheuvel wrote:
>> On 30 August 2015 at 15:54, Christoffer Dall
>> <christoffer.dall@linaro.org> wrote:
>> > Provide a better quality of implementation and be architecture compliant
>> > on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
>> > reset of the timer, and call kvm_timer_update_state(vcpu) at the same
>> > time, ensuring the timer output is not asserted after, for example, a
>> > PSCI system reset.
>> >
>> > This change alone fixes the UEFI reset issue reported by Laszlo back in
>> > February.
>> >
>>
>> Do you have a link to that report? I can't quite remember the details ...
>>
> There was no public mailing list cc'ed on the report, and the bugzilla
> bug was created in a non-public Red Hat instance.
>
> But you were cc'ed on the original mail from Laszlo.  Look for an e-mail
> from February 5th with the subject "virtual timer interrupt stuck across
> guest firmware reboot on KVM".
>

Found it, thanks.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 14:21     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 14:21 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 30/08/15 14:54, Christoffer Dall wrote:
> Some times it is useful for architecture implementations of KVM to know
> when the VCPU thread is about to block or when it comes back from
> blocking (arm/arm64 needs to know this to properly implement timers, for
> example).
> 
> Therefore provide a generic architecture callback function in line with
> what we do elsewhere for KVM generic-arch interactions.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
@ 2015-09-03 14:21     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 14:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> Some times it is useful for architecture implementations of KVM to know
> when the VCPU thread is about to block or when it comes back from
> blocking (arm/arm64 needs to know this to properly implement timers, for
> example).
> 
> Therefore provide a generic architecture callback function in line with
> what we do elsewhere for KVM generic-arch interactions.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 14:43     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 14:43 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 30/08/15 14:54, Christoffer Dall wrote:
> We currently schedule a soft timer every time we exit the guest if the
> timer did not expire while running the guest.  This is really not
> necessary, because the only work we do in the timer work function is to
> kick the vcpu.
> 
> Kicking the vcpu does two things:
> (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> from the waitqueue.
> (2) If the vcpu is running on a different physical CPU from the one
> doing the kick, it sends a reschedule IPI.
> 
> The second case cannot happen, because the soft timer is only ever
> scheduled when the vcpu is not running.  The first case is only relevant
> when the vcpu thread is on a waitqueue, which is only the case when the
> vcpu thread has called kvm_vcpu_block().
> 
> Therefore, we only need to make sure a timer is scheduled for
> kvm_vcpu_block(), which we do by encapsulating all calls to
> kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> 
> Additionally, we only schedule a soft timer if the timer is enabled and
> unmasked, since it is useless otherwise.
> 
> Note that theoretically userspace can use the SET_ONE_REG interface to
> change registers that should cause the timer to fire, even if the vcpu
> is blocked without a scheduled timer, but this case was not supported
> before this patch and we leave it for future work for now.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_host.h   |  3 --
>  arch/arm/kvm/arm.c                | 10 +++++
>  arch/arm64/include/asm/kvm_host.h |  3 --
>  include/kvm/arm_arch_timer.h      |  2 +
>  virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
>  5 files changed, 70 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 86fcf6e..dcba0fa 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index ce404a5..bdf8871 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  	return kvm_timer_should_fire(vcpu);
>  }
>  
> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> +{
> +	kvm_timer_schedule(vcpu);
> +}
> +
> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> +{
> +	kvm_timer_unschedule(vcpu);
> +}
> +
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  {
>  	/* Force users to call KVM_ARM_VCPU_INIT */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index dd143f5..415938d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index e1e4d7c..ef14cc1 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 76e38d2..018f3d6 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
>  	return HRTIMER_NORESTART;
>  }
>  
> +static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> +		!kvm_vgic_get_phys_irq_active(timer->map);
> +}

Nit: To me, this is not a predicate for "IRQ enabled", but "IRQ can
fire" instead, which seems to complement the kvm_timer_should_fire just
below.

> +
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	cycle_t cval, now;
>  
> -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> -	    kvm_vgic_get_phys_irq_active(timer->map))
> +	if (!kvm_timer_irq_enabled(vcpu))
>  		return false;
>  
>  	cval = timer->cntv_cval;
> @@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> -/**
> - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> - * @vcpu: The vcpu pointer
> - *
> - * Disarm any pending soft timers, since the world-switch code will write the
> - * virtual timer state back to the physical CPU.
> +/*
> + * Schedule the background timer before calling kvm_vcpu_block, so that this
> + * thread is removed from its waitqueue and made runnable when there's a timer
> + * interrupt to handle.
>   */
> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	u64 ns;
> +	cycle_t cval, now;
> +
> +	/*
> +	 * No need to schedule a background timer if the guest timer has
> +	 * already expired, because kvm_vcpu_block will return before putting
> +	 * the thread to sleep.
> +	 */
> +	if (kvm_timer_should_fire(vcpu))
> +		return;
>  
>  	/*
> -	 * We're about to run this vcpu again, so there is no need to
> -	 * keep the background timer running, as we're about to
> -	 * populate the CPU timer again.
> +	 * If the timer is either not capable of raising interrupts (disabled
> +	 * or masked) or if we already have a background timer, then there's
> +	 * no more work for us to do.
>  	 */
> +	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
> +		return;

Do we need to retest kvm_timer_irq_enabled here? It is already implied
by kvm_timer_should_fire...

> +
> +	/*  The timer has not yet expired, schedule a background timer */
> +	cval = timer->cntv_cval;
> +	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> +
> +	ns = cyclecounter_cyc2ns(timecounter->cc,
> +				 cval - now,
> +				 timecounter->mask,
> +				 &timecounter->frac);
> +	timer_arm(timer, ns);
> +}
> +
> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	timer_disarm(timer);
> +}
>  
> +/**
> + * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> + * @vcpu: The vcpu pointer
> + *
> + * Check if the virtual timer has expired while we were running in the host,
> + * and inject an interrupt if that was the case.
> + */
> +void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> +{
>  	/*
>  	 * If the timer expired while we were not scheduled, now is the time
>  	 * to inject it.
> @@ -157,32 +199,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>   * kvm_timer_sync_hwstate - sync timer state from cpu
>   * @vcpu: The vcpu pointer
>   *
> - * Check if the virtual timer was armed and either schedule a corresponding
> - * soft timer or inject directly if already expired.
> + * Check if the virtual timer has expired while we werer running in the guest,

s/werer/were/

> + * and inject an interrupt if that was the case.
>   */
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	cycle_t cval, now;
> -	u64 ns;
>  
>  	BUG_ON(timer_is_armed(timer));
>  
> -	if (kvm_timer_should_fire(vcpu)) {
> -		/*
> -		 * Timer has already expired while we were not
> -		 * looking. Inject the interrupt and carry on.
> -		 */
> +	if (kvm_timer_should_fire(vcpu))
>  		kvm_timer_inject_irq(vcpu);
> -		return;
> -	}
> -
> -	cval = timer->cntv_cval;
> -	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> -
> -	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
> -				 &timecounter->frac);
> -	timer_arm(timer, ns);
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> 

Apart from the few comments above, looks pretty good.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
@ 2015-09-03 14:43     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> We currently schedule a soft timer every time we exit the guest if the
> timer did not expire while running the guest.  This is really not
> necessary, because the only work we do in the timer work function is to
> kick the vcpu.
> 
> Kicking the vcpu does two things:
> (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> from the waitqueue.
> (2) If the vcpu is running on a different physical CPU from the one
> doing the kick, it sends a reschedule IPI.
> 
> The second case cannot happen, because the soft timer is only ever
> scheduled when the vcpu is not running.  The first case is only relevant
> when the vcpu thread is on a waitqueue, which is only the case when the
> vcpu thread has called kvm_vcpu_block().
> 
> Therefore, we only need to make sure a timer is scheduled for
> kvm_vcpu_block(), which we do by encapsulating all calls to
> kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> 
> Additionally, we only schedule a soft timer if the timer is enabled and
> unmasked, since it is useless otherwise.
> 
> Note that theoretically userspace can use the SET_ONE_REG interface to
> change registers that should cause the timer to fire, even if the vcpu
> is blocked without a scheduled timer, but this case was not supported
> before this patch and we leave it for future work for now.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_host.h   |  3 --
>  arch/arm/kvm/arm.c                | 10 +++++
>  arch/arm64/include/asm/kvm_host.h |  3 --
>  include/kvm/arm_arch_timer.h      |  2 +
>  virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
>  5 files changed, 70 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 86fcf6e..dcba0fa 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index ce404a5..bdf8871 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  	return kvm_timer_should_fire(vcpu);
>  }
>  
> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> +{
> +	kvm_timer_schedule(vcpu);
> +}
> +
> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> +{
> +	kvm_timer_unschedule(vcpu);
> +}
> +
>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>  {
>  	/* Force users to call KVM_ARM_VCPU_INIT */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index dd143f5..415938d 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>  
> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> -
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index e1e4d7c..ef14cc1 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
>  #endif
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 76e38d2..018f3d6 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
>  	return HRTIMER_NORESTART;
>  }
>  
> +static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> +		!kvm_vgic_get_phys_irq_active(timer->map);
> +}

Nit: To me, this is not a predicate for "IRQ enabled", but "IRQ can
fire" instead, which seems to complement the kvm_timer_should_fire just
below.

> +
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	cycle_t cval, now;
>  
> -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> -	    kvm_vgic_get_phys_irq_active(timer->map))
> +	if (!kvm_timer_irq_enabled(vcpu))
>  		return false;
>  
>  	cval = timer->cntv_cval;
> @@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> -/**
> - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> - * @vcpu: The vcpu pointer
> - *
> - * Disarm any pending soft timers, since the world-switch code will write the
> - * virtual timer state back to the physical CPU.
> +/*
> + * Schedule the background timer before calling kvm_vcpu_block, so that this
> + * thread is removed from its waitqueue and made runnable when there's a timer
> + * interrupt to handle.
>   */
> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	u64 ns;
> +	cycle_t cval, now;
> +
> +	/*
> +	 * No need to schedule a background timer if the guest timer has
> +	 * already expired, because kvm_vcpu_block will return before putting
> +	 * the thread to sleep.
> +	 */
> +	if (kvm_timer_should_fire(vcpu))
> +		return;
>  
>  	/*
> -	 * We're about to run this vcpu again, so there is no need to
> -	 * keep the background timer running, as we're about to
> -	 * populate the CPU timer again.
> +	 * If the timer is either not capable of raising interrupts (disabled
> +	 * or masked) or if we already have a background timer, then there's
> +	 * no more work for us to do.
>  	 */
> +	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
> +		return;

Do we need to retest kvm_timer_irq_enabled here? It is already implied
by kvm_timer_should_fire...

> +
> +	/*  The timer has not yet expired, schedule a background timer */
> +	cval = timer->cntv_cval;
> +	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> +
> +	ns = cyclecounter_cyc2ns(timecounter->cc,
> +				 cval - now,
> +				 timecounter->mask,
> +				 &timecounter->frac);
> +	timer_arm(timer, ns);
> +}
> +
> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  	timer_disarm(timer);
> +}
>  
> +/**
> + * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> + * @vcpu: The vcpu pointer
> + *
> + * Check if the virtual timer has expired while we were running in the host,
> + * and inject an interrupt if that was the case.
> + */
> +void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> +{
>  	/*
>  	 * If the timer expired while we were not scheduled, now is the time
>  	 * to inject it.
> @@ -157,32 +199,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>   * kvm_timer_sync_hwstate - sync timer state from cpu
>   * @vcpu: The vcpu pointer
>   *
> - * Check if the virtual timer was armed and either schedule a corresponding
> - * soft timer or inject directly if already expired.
> + * Check if the virtual timer has expired while we werer running in the guest,

s/werer/were/

> + * and inject an interrupt if that was the case.
>   */
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -	cycle_t cval, now;
> -	u64 ns;
>  
>  	BUG_ON(timer_is_armed(timer));
>  
> -	if (kvm_timer_should_fire(vcpu)) {
> -		/*
> -		 * Timer has already expired while we were not
> -		 * looking. Inject the interrupt and carry on.
> -		 */
> +	if (kvm_timer_should_fire(vcpu))
>  		kvm_timer_inject_irq(vcpu);
> -		return;
> -	}
> -
> -	cval = timer->cntv_cval;
> -	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> -
> -	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
> -				 &timecounter->frac);
> -	timer_arm(timer, ns);
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> 

Apart from the few comments above, looks pretty good.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  2015-09-03 14:43     ` Marc Zyngier
@ 2015-09-03 14:58       ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-03 14:58 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm

On Thu, Sep 03, 2015 at 03:43:19PM +0100, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
> > We currently schedule a soft timer every time we exit the guest if the
> > timer did not expire while running the guest.  This is really not
> > necessary, because the only work we do in the timer work function is to
> > kick the vcpu.
> > 
> > Kicking the vcpu does two things:
> > (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> > from the waitqueue.
> > (2) If the vcpu is running on a different physical CPU from the one
> > doing the kick, it sends a reschedule IPI.
> > 
> > The second case cannot happen, because the soft timer is only ever
> > scheduled when the vcpu is not running.  The first case is only relevant
> > when the vcpu thread is on a waitqueue, which is only the case when the
> > vcpu thread has called kvm_vcpu_block().
> > 
> > Therefore, we only need to make sure a timer is scheduled for
> > kvm_vcpu_block(), which we do by encapsulating all calls to
> > kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> > 
> > Additionally, we only schedule a soft timer if the timer is enabled and
> > unmasked, since it is useless otherwise.
> > 
> > Note that theoretically userspace can use the SET_ONE_REG interface to
> > change registers that should cause the timer to fire, even if the vcpu
> > is blocked without a scheduled timer, but this case was not supported
> > before this patch and we leave it for future work for now.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  3 --
> >  arch/arm/kvm/arm.c                | 10 +++++
> >  arch/arm64/include/asm/kvm_host.h |  3 --
> >  include/kvm/arm_arch_timer.h      |  2 +
> >  virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
> >  5 files changed, 70 insertions(+), 37 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 86fcf6e..dcba0fa 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
> >  
> > -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> > -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> > -
> >  #endif /* __ARM_KVM_HOST_H__ */
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index ce404a5..bdf8871 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> >  	return kvm_timer_should_fire(vcpu);
> >  }
> >  
> > +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_timer_schedule(vcpu);
> > +}
> > +
> > +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_timer_unschedule(vcpu);
> > +}
> > +
> >  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >  {
> >  	/* Force users to call KVM_ARM_VCPU_INIT */
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index dd143f5..415938d 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >  
> > -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> > -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> > -
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index e1e4d7c..ef14cc1 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >  
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> > +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> > +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >  
> >  #endif
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 76e38d2..018f3d6 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> > +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> > +		!kvm_vgic_get_phys_irq_active(timer->map);
> > +}
> 
> Nit: To me, this is not a predicate for "IRQ enabled", but "IRQ can
> fire" instead, which seems to complement the kvm_timer_should_fire just
> below.
> 

so you're suggesting kvm_timer_irq_can_fire (or
kvm_timer_irq_could_file) or something else?

> > +
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	cycle_t cval, now;
> >  
> > -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> > -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> > -	    kvm_vgic_get_phys_irq_active(timer->map))
> > +	if (!kvm_timer_irq_enabled(vcpu))
> >  		return false;
> >  
> >  	cval = timer->cntv_cval;
> > @@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  	return cval <= now;
> >  }
> >  
> > -/**
> > - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> > - * @vcpu: The vcpu pointer
> > - *
> > - * Disarm any pending soft timers, since the world-switch code will write the
> > - * virtual timer state back to the physical CPU.
> > +/*
> > + * Schedule the background timer before calling kvm_vcpu_block, so that this
> > + * thread is removed from its waitqueue and made runnable when there's a timer
> > + * interrupt to handle.
> >   */
> > -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	u64 ns;
> > +	cycle_t cval, now;
> > +
> > +	/*
> > +	 * No need to schedule a background timer if the guest timer has
> > +	 * already expired, because kvm_vcpu_block will return before putting
> > +	 * the thread to sleep.
> > +	 */
> > +	if (kvm_timer_should_fire(vcpu))
> > +		return;
> >  
> >  	/*
> > -	 * We're about to run this vcpu again, so there is no need to
> > -	 * keep the background timer running, as we're about to
> > -	 * populate the CPU timer again.
> > +	 * If the timer is either not capable of raising interrupts (disabled
> > +	 * or masked) or if we already have a background timer, then there's
> > +	 * no more work for us to do.
> >  	 */
> > +	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
> > +		return;
> 
> Do we need to retest kvm_timer_irq_enabled here? It is already implied
> by kvm_timer_should_fire...
> 

yes we do, when we reach this if statement there are two cases:
(1) kvm_timer_irq_enabled == true but cval > now
(2) kvm_timer_irq_enabled == false

We hould only schedule a timer in in case (1), which happens exactly
when kvm_timer_irq_enabled == true, hence the return on the opposite.
Does that make sense?

> > +
> > +	/*  The timer has not yet expired, schedule a background timer */
> > +	cval = timer->cntv_cval;
> > +	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> > +
> > +	ns = cyclecounter_cyc2ns(timecounter->cc,
> > +				 cval - now,
> > +				 timecounter->mask,
> > +				 &timecounter->frac);
> > +	timer_arm(timer, ns);
> > +}
> > +
> > +void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	timer_disarm(timer);
> > +}
> >  
> > +/**
> > + * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> > + * @vcpu: The vcpu pointer
> > + *
> > + * Check if the virtual timer has expired while we were running in the host,
> > + * and inject an interrupt if that was the case.
> > + */
> > +void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > +{
> >  	/*
> >  	 * If the timer expired while we were not scheduled, now is the time
> >  	 * to inject it.
> > @@ -157,32 +199,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >   * kvm_timer_sync_hwstate - sync timer state from cpu
> >   * @vcpu: The vcpu pointer
> >   *
> > - * Check if the virtual timer was armed and either schedule a corresponding
> > - * soft timer or inject directly if already expired.
> > + * Check if the virtual timer has expired while we werer running in the guest,
> 
> s/werer/were/
> 
> > + * and inject an interrupt if that was the case.
> >   */
> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	cycle_t cval, now;
> > -	u64 ns;
> >  
> >  	BUG_ON(timer_is_armed(timer));
> >  
> > -	if (kvm_timer_should_fire(vcpu)) {
> > -		/*
> > -		 * Timer has already expired while we were not
> > -		 * looking. Inject the interrupt and carry on.
> > -		 */
> > +	if (kvm_timer_should_fire(vcpu))
> >  		kvm_timer_inject_irq(vcpu);
> > -		return;
> > -	}
> > -
> > -	cval = timer->cntv_cval;
> > -	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> > -
> > -	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
> > -				 &timecounter->frac);
> > -	timer_arm(timer, ns);
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> > 
> 
> Apart from the few comments above, looks pretty good.
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
@ 2015-09-03 14:58       ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-03 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 03, 2015 at 03:43:19PM +0100, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
> > We currently schedule a soft timer every time we exit the guest if the
> > timer did not expire while running the guest.  This is really not
> > necessary, because the only work we do in the timer work function is to
> > kick the vcpu.
> > 
> > Kicking the vcpu does two things:
> > (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> > from the waitqueue.
> > (2) If the vcpu is running on a different physical CPU from the one
> > doing the kick, it sends a reschedule IPI.
> > 
> > The second case cannot happen, because the soft timer is only ever
> > scheduled when the vcpu is not running.  The first case is only relevant
> > when the vcpu thread is on a waitqueue, which is only the case when the
> > vcpu thread has called kvm_vcpu_block().
> > 
> > Therefore, we only need to make sure a timer is scheduled for
> > kvm_vcpu_block(), which we do by encapsulating all calls to
> > kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> > 
> > Additionally, we only schedule a soft timer if the timer is enabled and
> > unmasked, since it is useless otherwise.
> > 
> > Note that theoretically userspace can use the SET_ONE_REG interface to
> > change registers that should cause the timer to fire, even if the vcpu
> > is blocked without a scheduled timer, but this case was not supported
> > before this patch and we leave it for future work for now.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  3 --
> >  arch/arm/kvm/arm.c                | 10 +++++
> >  arch/arm64/include/asm/kvm_host.h |  3 --
> >  include/kvm/arm_arch_timer.h      |  2 +
> >  virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
> >  5 files changed, 70 insertions(+), 37 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 86fcf6e..dcba0fa 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
> >  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
> >  
> > -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> > -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> > -
> >  #endif /* __ARM_KVM_HOST_H__ */
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index ce404a5..bdf8871 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> >  	return kvm_timer_should_fire(vcpu);
> >  }
> >  
> > +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_timer_schedule(vcpu);
> > +}
> > +
> > +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> > +{
> > +	kvm_timer_unschedule(vcpu);
> > +}
> > +
> >  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >  {
> >  	/* Force users to call KVM_ARM_VCPU_INIT */
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index dd143f5..415938d 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >  
> > -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> > -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> > -
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index e1e4d7c..ef14cc1 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >  
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> > +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> > +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >  
> >  #endif
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 76e38d2..018f3d6 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
> >  	return HRTIMER_NORESTART;
> >  }
> >  
> > +static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> > +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> > +		!kvm_vgic_get_phys_irq_active(timer->map);
> > +}
> 
> Nit: To me, this is not a predicate for "IRQ enabled", but "IRQ can
> fire" instead, which seems to complement the kvm_timer_should_fire just
> below.
> 

so you're suggesting kvm_timer_irq_can_fire (or
kvm_timer_irq_could_file) or something else?

> > +
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	cycle_t cval, now;
> >  
> > -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> > -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> > -	    kvm_vgic_get_phys_irq_active(timer->map))
> > +	if (!kvm_timer_irq_enabled(vcpu))
> >  		return false;
> >  
> >  	cval = timer->cntv_cval;
> > @@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  	return cval <= now;
> >  }
> >  
> > -/**
> > - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> > - * @vcpu: The vcpu pointer
> > - *
> > - * Disarm any pending soft timers, since the world-switch code will write the
> > - * virtual timer state back to the physical CPU.
> > +/*
> > + * Schedule the background timer before calling kvm_vcpu_block, so that this
> > + * thread is removed from its waitqueue and made runnable when there's a timer
> > + * interrupt to handle.
> >   */
> > -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	u64 ns;
> > +	cycle_t cval, now;
> > +
> > +	/*
> > +	 * No need to schedule a background timer if the guest timer has
> > +	 * already expired, because kvm_vcpu_block will return before putting
> > +	 * the thread to sleep.
> > +	 */
> > +	if (kvm_timer_should_fire(vcpu))
> > +		return;
> >  
> >  	/*
> > -	 * We're about to run this vcpu again, so there is no need to
> > -	 * keep the background timer running, as we're about to
> > -	 * populate the CPU timer again.
> > +	 * If the timer is either not capable of raising interrupts (disabled
> > +	 * or masked) or if we already have a background timer, then there's
> > +	 * no more work for us to do.
> >  	 */
> > +	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
> > +		return;
> 
> Do we need to retest kvm_timer_irq_enabled here? It is already implied
> by kvm_timer_should_fire...
> 

yes we do, when we reach this if statement there are two cases:
(1) kvm_timer_irq_enabled == true but cval > now
(2) kvm_timer_irq_enabled == false

We hould only schedule a timer in in case (1), which happens exactly
when kvm_timer_irq_enabled == true, hence the return on the opposite.
Does that make sense?

> > +
> > +	/*  The timer has not yet expired, schedule a background timer */
> > +	cval = timer->cntv_cval;
> > +	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> > +
> > +	ns = cyclecounter_cyc2ns(timecounter->cc,
> > +				 cval - now,
> > +				 timecounter->mask,
> > +				 &timecounter->frac);
> > +	timer_arm(timer, ns);
> > +}
> > +
> > +void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  	timer_disarm(timer);
> > +}
> >  
> > +/**
> > + * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> > + * @vcpu: The vcpu pointer
> > + *
> > + * Check if the virtual timer has expired while we were running in the host,
> > + * and inject an interrupt if that was the case.
> > + */
> > +void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> > +{
> >  	/*
> >  	 * If the timer expired while we were not scheduled, now is the time
> >  	 * to inject it.
> > @@ -157,32 +199,17 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >   * kvm_timer_sync_hwstate - sync timer state from cpu
> >   * @vcpu: The vcpu pointer
> >   *
> > - * Check if the virtual timer was armed and either schedule a corresponding
> > - * soft timer or inject directly if already expired.
> > + * Check if the virtual timer has expired while we werer running in the guest,
> 
> s/werer/were/
> 
> > + * and inject an interrupt if that was the case.
> >   */
> >  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -	cycle_t cval, now;
> > -	u64 ns;
> >  
> >  	BUG_ON(timer_is_armed(timer));
> >  
> > -	if (kvm_timer_should_fire(vcpu)) {
> > -		/*
> > -		 * Timer has already expired while we were not
> > -		 * looking. Inject the interrupt and carry on.
> > -		 */
> > +	if (kvm_timer_should_fire(vcpu))
> >  		kvm_timer_inject_irq(vcpu);
> > -		return;
> > -	}
> > -
> > -	cval = timer->cntv_cval;
> > -	now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> > -
> > -	ns = cyclecounter_cyc2ns(timecounter->cc, cval - now, timecounter->mask,
> > -				 &timecounter->frac);
> > -	timer_arm(timer, ns);
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> > 
> 
> Apart from the few comments above, looks pretty good.
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 3/9] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 15:01     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:01 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 30/08/15 14:54, Christoffer Dall wrote:
> Currently vgic_process_maintenance() processes dealing with a completed
> level-triggered interrupt directly, but we are soon going to reuse this
> logic for level-triggered mapped interrupts with the HW bit set, so
> move this logic into a separate static function.
> 
> Probably the most scary part of this commit is convincing yourself that
> the current flow is safe compared to the old one.  In the following I
> try to list the changes and why they are harmless:
> 
>   Move vgic_irq_clear_queued after kvm_notify_acked_irq:
>     Harmless because the effect of clearing the queued flag wrt.
>     kvm_set_irq is only that vgic_update_irq_pending does not set the
>     pending bit on the emulated CPU interface or in the pending_on_cpu
>     bitmask, but we set this in __kvm_vgic_sync_hwstate later on if the
>     level is stil high.
> 
>   Move vgic_set_lr before kvm_notify_acked_irq:
>     Also, harmless because the LR are cpu-local operations and
>     kvm_notify_acked only affects the dist
> 
>   Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
>     Also harmless because it's just a bit which is cleared and altering
>     the line state does not affect this bit.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

This one has wrecked my brain, but I can't fault it so far.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 3/9] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
@ 2015-09-03 15:01     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> Currently vgic_process_maintenance() processes dealing with a completed
> level-triggered interrupt directly, but we are soon going to reuse this
> logic for level-triggered mapped interrupts with the HW bit set, so
> move this logic into a separate static function.
> 
> Probably the most scary part of this commit is convincing yourself that
> the current flow is safe compared to the old one.  In the following I
> try to list the changes and why they are harmless:
> 
>   Move vgic_irq_clear_queued after kvm_notify_acked_irq:
>     Harmless because the effect of clearing the queued flag wrt.
>     kvm_set_irq is only that vgic_update_irq_pending does not set the
>     pending bit on the emulated CPU interface or in the pending_on_cpu
>     bitmask, but we set this in __kvm_vgic_sync_hwstate later on if the
>     level is stil high.
> 
>   Move vgic_set_lr before kvm_notify_acked_irq:
>     Also, harmless because the LR are cpu-local operations and
>     kvm_notify_acked only affects the dist
> 
>   Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq:
>     Also harmless because it's just a bit which is cleared and altering
>     the line state does not affect this bit.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

This one has wrecked my brain, but I can't fault it so far.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 4/9] arm/arm64: Implement GICD_ICFGR as RO for PPIs
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 15:03     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:03 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 30/08/15 14:54, Christoffer Dall wrote:
> The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
> We currently simulate this behavior by writing a hardcoded value to the
> register for the SGIs and PPIs on every write of these bits to the
> register (ignoring what the guest actually wrote), and by writing the
> same value as the reset value to the register.
> 
> This is a bit counter-intuitive, as the register is RO for these bits,
> and we can just implement it that way, allowing us to control the value
> of the bits purely in the reset code.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 4/9] arm/arm64: Implement GICD_ICFGR as RO for PPIs
@ 2015-09-03 15:03     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only.
> We currently simulate this behavior by writing a hardcoded value to the
> register for the SGIs and PPIs on every write of these bits to the
> register (ignoring what the guest actually wrote), and by writing the
> same value as the reset value to the register.
> 
> This is a bit counter-intuitive, as the register is RO for these bits,
> and we can just implement it that way, allowing us to control the value
> of the bits purely in the reset code.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/9] arm/arm64: KVM: Use appropriate define in VGIC reset code
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 15:04     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:04 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 30/08/15 14:54, Christoffer Dall wrote:
> We currently initialize the SGIs to be enabled in the VGIC code, but we
> use the VGIC_NR_PPIS define for this purpose, instead of the the more
> natural VGIC_NR_SGIS.  Change this slightly confusing use of the
> defines.
> 
> Note: This should have no functional change, as both names are defined
> to the number 16.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Ah, that's a confusing one, well spotted.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5/9] arm/arm64: KVM: Use appropriate define in VGIC reset code
@ 2015-09-03 15:04     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> We currently initialize the SGIs to be enabled in the VGIC code, but we
> use the VGIC_NR_PPIS define for this purpose, instead of the the more
> natural VGIC_NR_SGIS.  Change this slightly confusing use of the
> defines.
> 
> Note: This should have no functional change, as both names are defined
> to the number 16.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Ah, that's a confusing one, well spotted.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 15:23     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:23 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 30/08/15 14:54, Christoffer Dall wrote:
> Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
> with them is not apparently easy to understand by reading various specs.
> 
> Therefore, add a proper documentation file explaining the flow and
> rationale of the behavior of the vgic.
> 
> Some of this text was contributed by Marc Zyngier.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> 
> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> new file mode 100644
> index 0000000..49e1357
> --- /dev/null
> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> @@ -0,0 +1,59 @@
> +KVM/ARM VGIC Mapped Interrupts
> +==============================
> +
> +Setting the Physical Active State for Edge vs. Level Triggered IRQs
> +-------------------------------------------------------------------
> +
> +Mapped non-shared interrupts injected to a guest should always mark the
> +interrupt as active on the physical distributor.
> +
> +The reasoning for level-triggered interrupts:
> +For level-triggered interrupts, we have to mark the interrupt as active
> +on the physical distributor, because otherwise, as the line remains
> +asserted, the guest will never execute because the host will keep taking
> +interrupts.  As soon as the guest deactivates the interrupt, the
> +physical line is sampled by the hardware again and the host takes a new
> +interrupt if the physical line is still asserted.
> +
> +The reasoning for edge-triggered interrupts:
> +For edge-triggered interrupts, if we set the HW bit in the LR we also
> +have to mark the interrupt as active on the physical distributor.  If we
> +don't set the physical active bit and the interrupt hits again before
> +the guest has deactivated the interrupt, the interrupt goes to the host,
> +which cannot set the state to ACTIVE+PENDING in the LR, because that is
> +not supported when setting the HW bit in the LR.
> +
> +An alternative could be to not use HW bit at all, and inject
> +edge-triggered interrupts from a physical assigned device as pure
> +virtual interrupts, but that would potentially slow down handling of the
> +interrupt in the guest, because a physical interrupt occurring in the
> +middle of the guest ISR would preempt the guest for the host to handle
> +the interrupt.

It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
do not have an Active state (they are either Pending or not), so we'll
have to deal with edge interrupts as you just described at some point.
Other architectures do something similar, I'd expect.

> +
> +
> +Life Cycle for Forwarded Physical Interrupts
> +--------------------------------------------
> +
> +By forwarded physical interrupts we mean interrupts presented to a guest
> +representing a real HW event originally signaled to the host as a

s/signaled/signalled/

> +physical interrupt and injecting this as a virtual interrupt with the HW
> +bit set in the LR.
> +
> +The state of such an interrupt is managed in the following way:
> +
> +  - LR.Pending must be set when the interrupt is first injected, because this
> +    is the only way the GICV interface is going to present it to the guest.
> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> +  - LR.Pending transitions to LR.Active on read of IAR, as expected.
> +  - On EOI, the *physical distributor* active bit gets cleared, but the
> +    LR.Active is left untouched - it looks like the GIC can only clear a
> +    single bit (either the virtual active, or the physical one).
> +  - This means we cannot trust LR.Active to find out about the state of the
> +    interrupt, and we definitely need to look at the distributor version.
> +
> +Consequently, when we context switch the state of a VCPU with forwarded
> +physical interrupts, we must context switch set pending *or* active bits in the
> +LR for that VCPU until the guest has deactivated the physical interrupt, and
> +then clear the corresponding bits in the LR.  If we ever set an LR to pending or
> +mapped when switching in a VCPU for a forwarded physical interrupt, we must also
> +set the active state on the *physical distributor*.
> 

I wonder if it may be worth adding a small example with the timer,
because it is not immediately obvious why the interrupt would fire on
and on without putting the generating device in the picture...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
@ 2015-09-03 15:23     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
> with them is not apparently easy to understand by reading various specs.
> 
> Therefore, add a proper documentation file explaining the flow and
> rationale of the behavior of the vgic.
> 
> Some of this text was contributed by Marc Zyngier.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
>  1 file changed, 59 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> 
> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> new file mode 100644
> index 0000000..49e1357
> --- /dev/null
> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> @@ -0,0 +1,59 @@
> +KVM/ARM VGIC Mapped Interrupts
> +==============================
> +
> +Setting the Physical Active State for Edge vs. Level Triggered IRQs
> +-------------------------------------------------------------------
> +
> +Mapped non-shared interrupts injected to a guest should always mark the
> +interrupt as active on the physical distributor.
> +
> +The reasoning for level-triggered interrupts:
> +For level-triggered interrupts, we have to mark the interrupt as active
> +on the physical distributor, because otherwise, as the line remains
> +asserted, the guest will never execute because the host will keep taking
> +interrupts.  As soon as the guest deactivates the interrupt, the
> +physical line is sampled by the hardware again and the host takes a new
> +interrupt if the physical line is still asserted.
> +
> +The reasoning for edge-triggered interrupts:
> +For edge-triggered interrupts, if we set the HW bit in the LR we also
> +have to mark the interrupt as active on the physical distributor.  If we
> +don't set the physical active bit and the interrupt hits again before
> +the guest has deactivated the interrupt, the interrupt goes to the host,
> +which cannot set the state to ACTIVE+PENDING in the LR, because that is
> +not supported when setting the HW bit in the LR.
> +
> +An alternative could be to not use HW bit at all, and inject
> +edge-triggered interrupts from a physical assigned device as pure
> +virtual interrupts, but that would potentially slow down handling of the
> +interrupt in the guest, because a physical interrupt occurring in the
> +middle of the guest ISR would preempt the guest for the host to handle
> +the interrupt.

It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
do not have an Active state (they are either Pending or not), so we'll
have to deal with edge interrupts as you just described at some point.
Other architectures do something similar, I'd expect.

> +
> +
> +Life Cycle for Forwarded Physical Interrupts
> +--------------------------------------------
> +
> +By forwarded physical interrupts we mean interrupts presented to a guest
> +representing a real HW event originally signaled to the host as a

s/signaled/signalled/

> +physical interrupt and injecting this as a virtual interrupt with the HW
> +bit set in the LR.
> +
> +The state of such an interrupt is managed in the following way:
> +
> +  - LR.Pending must be set when the interrupt is first injected, because this
> +    is the only way the GICV interface is going to present it to the guest.
> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> +  - LR.Pending transitions to LR.Active on read of IAR, as expected.
> +  - On EOI, the *physical distributor* active bit gets cleared, but the
> +    LR.Active is left untouched - it looks like the GIC can only clear a
> +    single bit (either the virtual active, or the physical one).
> +  - This means we cannot trust LR.Active to find out about the state of the
> +    interrupt, and we definitely need to look at the distributor version.
> +
> +Consequently, when we context switch the state of a VCPU with forwarded
> +physical interrupts, we must context switch set pending *or* active bits in the
> +LR for that VCPU until the guest has deactivated the physical interrupt, and
> +then clear the corresponding bits in the LR.  If we ever set an LR to pending or
> +mapped when switching in a VCPU for a forwarded physical interrupt, we must also
> +set the active state on the *physical distributor*.
> 

I wonder if it may be worth adding a small example with the timer,
because it is not immediately obvious why the interrupt would fire on
and on without putting the generating device in the picture...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 7/9] arm/arm64: KVM: vgic: Move active state handling to flush_hwstate
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 15:33     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:33 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 30/08/15 14:54, Christoffer Dall wrote:
> We currently set the physical active state only when we *inject* a new
> pending virtual interrupt, but this is actually not correct, because we
> could have been preempted and run something else on the system that
> resets the active state to clear.  This causes us to run the VM with the
> timer set to fire, but without setting the physical active state.
> 
> The solution is to always check the LR configurations, and we if have a
> mapped interrupt in th LR in either the pending or active state

s/th/the/

> (virtual), then set the physical active state.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 42 ++++++++++++++++++++++++++----------------
>  1 file changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 8299c24..9ed8d53 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1144,26 +1144,11 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  		struct irq_phys_map *map;
>  		map = vgic_irq_map_search(vcpu, irq);
>  
> -		/*
> -		 * If we have a mapping, and the virtual interrupt is
> -		 * being injected, then we must set the state to
> -		 * active in the physical world. Otherwise the
> -		 * physical interrupt will fire and the guest will
> -		 * exit before processing the virtual interrupt.
> -		 */
>  		if (map) {
> -			int ret;
> -
> -			BUG_ON(!map->active);
>  			vlr.hwirq = map->phys_irq;
>  			vlr.state |= LR_HW;
>  			vlr.state &= ~LR_EOI_INT;
>  
> -			ret = irq_set_irqchip_state(map->irq,
> -						    IRQCHIP_STATE_ACTIVE,
> -						    true);
> -			WARN_ON(ret);
> -
>  			/*
>  			 * Make sure we're not going to sample this
>  			 * again, as a HW-backed interrupt cannot be
> @@ -1255,7 +1240,7 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>  	unsigned long *pa_percpu, *pa_shared;
> -	int i, vcpu_id;
> +	int i, vcpu_id, lr, ret;
>  	int overflow = 0;
>  	int nr_shared = vgic_nr_shared_irqs(dist);
>  
> @@ -1310,6 +1295,31 @@ epilog:
>  		 */
>  		clear_bit(vcpu_id, dist->irq_pending_on_cpu);
>  	}
> +
> +	for (lr = 0; lr < vgic->nr_lr; lr++) {
> +		struct vgic_lr vlr;
> +
> +		if (!test_bit(lr, vgic_cpu->lr_used))
> +			continue;
> +
> +		vlr = vgic_get_lr(vcpu, lr);
> +
> +		/*
> +		 * If we have a mapping, and the virtual interrupt is
> +		 * presented to the guest (as pending or active), then we must
> +		 * set the state to active in the physical world. See
> +		 * Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt.
> +		 */
> +		if (vlr.state & LR_HW) {
> +			struct irq_phys_map *map;
> +			map = vgic_irq_map_search(vcpu, vlr.irq);
> +
> +			ret = irq_set_irqchip_state(map->irq,
> +						    IRQCHIP_STATE_ACTIVE,
> +						    true);
> +			WARN_ON(ret);
> +		}
> +	}
>  }
>  
>  static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> 

/me bangs his head on the wall...

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 7/9] arm/arm64: KVM: vgic: Move active state handling to flush_hwstate
@ 2015-09-03 15:33     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> We currently set the physical active state only when we *inject* a new
> pending virtual interrupt, but this is actually not correct, because we
> could have been preempted and run something else on the system that
> resets the active state to clear.  This causes us to run the VM with the
> timer set to fire, but without setting the physical active state.
> 
> The solution is to always check the LR configurations, and we if have a
> mapped interrupt in th LR in either the pending or active state

s/th/the/

> (virtual), then set the physical active state.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 42 ++++++++++++++++++++++++++----------------
>  1 file changed, 26 insertions(+), 16 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 8299c24..9ed8d53 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1144,26 +1144,11 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq,
>  		struct irq_phys_map *map;
>  		map = vgic_irq_map_search(vcpu, irq);
>  
> -		/*
> -		 * If we have a mapping, and the virtual interrupt is
> -		 * being injected, then we must set the state to
> -		 * active in the physical world. Otherwise the
> -		 * physical interrupt will fire and the guest will
> -		 * exit before processing the virtual interrupt.
> -		 */
>  		if (map) {
> -			int ret;
> -
> -			BUG_ON(!map->active);
>  			vlr.hwirq = map->phys_irq;
>  			vlr.state |= LR_HW;
>  			vlr.state &= ~LR_EOI_INT;
>  
> -			ret = irq_set_irqchip_state(map->irq,
> -						    IRQCHIP_STATE_ACTIVE,
> -						    true);
> -			WARN_ON(ret);
> -
>  			/*
>  			 * Make sure we're not going to sample this
>  			 * again, as a HW-backed interrupt cannot be
> @@ -1255,7 +1240,7 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>  	unsigned long *pa_percpu, *pa_shared;
> -	int i, vcpu_id;
> +	int i, vcpu_id, lr, ret;
>  	int overflow = 0;
>  	int nr_shared = vgic_nr_shared_irqs(dist);
>  
> @@ -1310,6 +1295,31 @@ epilog:
>  		 */
>  		clear_bit(vcpu_id, dist->irq_pending_on_cpu);
>  	}
> +
> +	for (lr = 0; lr < vgic->nr_lr; lr++) {
> +		struct vgic_lr vlr;
> +
> +		if (!test_bit(lr, vgic_cpu->lr_used))
> +			continue;
> +
> +		vlr = vgic_get_lr(vcpu, lr);
> +
> +		/*
> +		 * If we have a mapping, and the virtual interrupt is
> +		 * presented to the guest (as pending or active), then we must
> +		 * set the state to active in the physical world. See
> +		 * Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt.
> +		 */
> +		if (vlr.state & LR_HW) {
> +			struct irq_phys_map *map;
> +			map = vgic_irq_map_search(vcpu, vlr.irq);
> +
> +			ret = irq_set_irqchip_state(map->irq,
> +						    IRQCHIP_STATE_ACTIVE,
> +						    true);
> +			WARN_ON(ret);
> +		}
> +	}
>  }
>  
>  static int process_level_irq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> 

/me bangs his head on the wall...

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  2015-09-03 14:58       ` Christoffer Dall
@ 2015-09-03 15:53         ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:53 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, linux-arm-kernel, kvm

On 03/09/15 15:58, Christoffer Dall wrote:
> On Thu, Sep 03, 2015 at 03:43:19PM +0100, Marc Zyngier wrote:
>> On 30/08/15 14:54, Christoffer Dall wrote:
>>> We currently schedule a soft timer every time we exit the guest if the
>>> timer did not expire while running the guest.  This is really not
>>> necessary, because the only work we do in the timer work function is to
>>> kick the vcpu.
>>>
>>> Kicking the vcpu does two things:
>>> (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
>>> from the waitqueue.
>>> (2) If the vcpu is running on a different physical CPU from the one
>>> doing the kick, it sends a reschedule IPI.
>>>
>>> The second case cannot happen, because the soft timer is only ever
>>> scheduled when the vcpu is not running.  The first case is only relevant
>>> when the vcpu thread is on a waitqueue, which is only the case when the
>>> vcpu thread has called kvm_vcpu_block().
>>>
>>> Therefore, we only need to make sure a timer is scheduled for
>>> kvm_vcpu_block(), which we do by encapsulating all calls to
>>> kvm_vcpu_block() with kvm_timer_{un}schedule calls.
>>>
>>> Additionally, we only schedule a soft timer if the timer is enabled and
>>> unmasked, since it is useless otherwise.
>>>
>>> Note that theoretically userspace can use the SET_ONE_REG interface to
>>> change registers that should cause the timer to fire, even if the vcpu
>>> is blocked without a scheduled timer, but this case was not supported
>>> before this patch and we leave it for future work for now.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  arch/arm/include/asm/kvm_host.h   |  3 --
>>>  arch/arm/kvm/arm.c                | 10 +++++
>>>  arch/arm64/include/asm/kvm_host.h |  3 --
>>>  include/kvm/arm_arch_timer.h      |  2 +
>>>  virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
>>>  5 files changed, 70 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index 86fcf6e..dcba0fa 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>>>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>>>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
>>>  
>>> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>> -
>>>  #endif /* __ARM_KVM_HOST_H__ */
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index ce404a5..bdf8871 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>>>  	return kvm_timer_should_fire(vcpu);
>>>  }
>>>  
>>> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
>>> +{
>>> +	kvm_timer_schedule(vcpu);
>>> +}
>>> +
>>> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
>>> +{
>>> +	kvm_timer_unschedule(vcpu);
>>> +}
>>> +
>>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>>  {
>>>  	/* Force users to call KVM_ARM_VCPU_INIT */
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index dd143f5..415938d 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>>>  
>>> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>> -
>>>  #endif /* __ARM64_KVM_HOST_H__ */
>>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
>>> index e1e4d7c..ef14cc1 100644
>>> --- a/include/kvm/arm_arch_timer.h
>>> +++ b/include/kvm/arm_arch_timer.h
>>> @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>>  
>>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
>>> +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
>>> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>>>  
>>>  #endif
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index 76e38d2..018f3d6 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
>>>  	return HRTIMER_NORESTART;
>>>  }
>>>  
>>> +static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +
>>> +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
>>> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
>>> +		!kvm_vgic_get_phys_irq_active(timer->map);
>>> +}
>>
>> Nit: To me, this is not a predicate for "IRQ enabled", but "IRQ can
>> fire" instead, which seems to complement the kvm_timer_should_fire just
>> below.
>>
> 
> so you're suggesting kvm_timer_irq_can_fire (or
> kvm_timer_irq_could_file) or something else?

kvm_timer_can_fire() would have my preference (but I'm known to be bad
at picking names...).

>>> +
>>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>>  {
>>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>>  	cycle_t cval, now;
>>>  
>>> -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
>>> -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
>>> -	    kvm_vgic_get_phys_irq_active(timer->map))
>>> +	if (!kvm_timer_irq_enabled(vcpu))
>>>  		return false;
>>>  
>>>  	cval = timer->cntv_cval;
>>> @@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>>  	return cval <= now;
>>>  }
>>>  
>>> -/**
>>> - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
>>> - * @vcpu: The vcpu pointer
>>> - *
>>> - * Disarm any pending soft timers, since the world-switch code will write the
>>> - * virtual timer state back to the physical CPU.
>>> +/*
>>> + * Schedule the background timer before calling kvm_vcpu_block, so that this
>>> + * thread is removed from its waitqueue and made runnable when there's a timer
>>> + * interrupt to handle.
>>>   */
>>> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>>> +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>>>  {
>>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +	u64 ns;
>>> +	cycle_t cval, now;
>>> +
>>> +	/*
>>> +	 * No need to schedule a background timer if the guest timer has
>>> +	 * already expired, because kvm_vcpu_block will return before putting
>>> +	 * the thread to sleep.
>>> +	 */
>>> +	if (kvm_timer_should_fire(vcpu))
>>> +		return;
>>>  
>>>  	/*
>>> -	 * We're about to run this vcpu again, so there is no need to
>>> -	 * keep the background timer running, as we're about to
>>> -	 * populate the CPU timer again.
>>> +	 * If the timer is either not capable of raising interrupts (disabled
>>> +	 * or masked) or if we already have a background timer, then there's
>>> +	 * no more work for us to do.
>>>  	 */
>>> +	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
>>> +		return;
>>
>> Do we need to retest kvm_timer_irq_enabled here? It is already implied
>> by kvm_timer_should_fire...
>>
> 
> yes we do, when we reach this if statement there are two cases:
> (1) kvm_timer_irq_enabled == true but cval > now
> (2) kvm_timer_irq_enabled == false
> 
> We hould only schedule a timer in in case (1), which happens exactly
> when kvm_timer_irq_enabled == true, hence the return on the opposite.
> Does that make sense?

It does now.

What is not completely obvious at the moment is how we can end-up with
timer_is_armed() being true here. If a timer is already armed, it means
we've blocked already... What am I missing?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
@ 2015-09-03 15:53         ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 15:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/09/15 15:58, Christoffer Dall wrote:
> On Thu, Sep 03, 2015 at 03:43:19PM +0100, Marc Zyngier wrote:
>> On 30/08/15 14:54, Christoffer Dall wrote:
>>> We currently schedule a soft timer every time we exit the guest if the
>>> timer did not expire while running the guest.  This is really not
>>> necessary, because the only work we do in the timer work function is to
>>> kick the vcpu.
>>>
>>> Kicking the vcpu does two things:
>>> (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
>>> from the waitqueue.
>>> (2) If the vcpu is running on a different physical CPU from the one
>>> doing the kick, it sends a reschedule IPI.
>>>
>>> The second case cannot happen, because the soft timer is only ever
>>> scheduled when the vcpu is not running.  The first case is only relevant
>>> when the vcpu thread is on a waitqueue, which is only the case when the
>>> vcpu thread has called kvm_vcpu_block().
>>>
>>> Therefore, we only need to make sure a timer is scheduled for
>>> kvm_vcpu_block(), which we do by encapsulating all calls to
>>> kvm_vcpu_block() with kvm_timer_{un}schedule calls.
>>>
>>> Additionally, we only schedule a soft timer if the timer is enabled and
>>> unmasked, since it is useless otherwise.
>>>
>>> Note that theoretically userspace can use the SET_ONE_REG interface to
>>> change registers that should cause the timer to fire, even if the vcpu
>>> is blocked without a scheduled timer, but this case was not supported
>>> before this patch and we leave it for future work for now.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  arch/arm/include/asm/kvm_host.h   |  3 --
>>>  arch/arm/kvm/arm.c                | 10 +++++
>>>  arch/arm64/include/asm/kvm_host.h |  3 --
>>>  include/kvm/arm_arch_timer.h      |  2 +
>>>  virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
>>>  5 files changed, 70 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index 86fcf6e..dcba0fa 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>>>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>>>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
>>>  
>>> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>> -
>>>  #endif /* __ARM_KVM_HOST_H__ */
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index ce404a5..bdf8871 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>>>  	return kvm_timer_should_fire(vcpu);
>>>  }
>>>  
>>> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
>>> +{
>>> +	kvm_timer_schedule(vcpu);
>>> +}
>>> +
>>> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
>>> +{
>>> +	kvm_timer_unschedule(vcpu);
>>> +}
>>> +
>>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>>>  {
>>>  	/* Force users to call KVM_ARM_VCPU_INIT */
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index dd143f5..415938d 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>>>  
>>> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
>>> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>>> -
>>>  #endif /* __ARM64_KVM_HOST_H__ */
>>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
>>> index e1e4d7c..ef14cc1 100644
>>> --- a/include/kvm/arm_arch_timer.h
>>> +++ b/include/kvm/arm_arch_timer.h
>>> @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>>>  
>>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
>>> +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
>>> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>>>  
>>>  #endif
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index 76e38d2..018f3d6 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
>>>  	return HRTIMER_NORESTART;
>>>  }
>>>  
>>> +static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +
>>> +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
>>> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
>>> +		!kvm_vgic_get_phys_irq_active(timer->map);
>>> +}
>>
>> Nit: To me, this is not a predicate for "IRQ enabled", but "IRQ can
>> fire" instead, which seems to complement the kvm_timer_should_fire just
>> below.
>>
> 
> so you're suggesting kvm_timer_irq_can_fire (or
> kvm_timer_irq_could_file) or something else?

kvm_timer_can_fire() would have my preference (but I'm known to be bad
at picking names...).

>>> +
>>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>>  {
>>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>>  	cycle_t cval, now;
>>>  
>>> -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
>>> -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
>>> -	    kvm_vgic_get_phys_irq_active(timer->map))
>>> +	if (!kvm_timer_irq_enabled(vcpu))
>>>  		return false;
>>>  
>>>  	cval = timer->cntv_cval;
>>> @@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>>  	return cval <= now;
>>>  }
>>>  
>>> -/**
>>> - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
>>> - * @vcpu: The vcpu pointer
>>> - *
>>> - * Disarm any pending soft timers, since the world-switch code will write the
>>> - * virtual timer state back to the physical CPU.
>>> +/*
>>> + * Schedule the background timer before calling kvm_vcpu_block, so that this
>>> + * thread is removed from its waitqueue and made runnable when there's a timer
>>> + * interrupt to handle.
>>>   */
>>> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>>> +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>>>  {
>>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +	u64 ns;
>>> +	cycle_t cval, now;
>>> +
>>> +	/*
>>> +	 * No need to schedule a background timer if the guest timer has
>>> +	 * already expired, because kvm_vcpu_block will return before putting
>>> +	 * the thread to sleep.
>>> +	 */
>>> +	if (kvm_timer_should_fire(vcpu))
>>> +		return;
>>>  
>>>  	/*
>>> -	 * We're about to run this vcpu again, so there is no need to
>>> -	 * keep the background timer running, as we're about to
>>> -	 * populate the CPU timer again.
>>> +	 * If the timer is either not capable of raising interrupts (disabled
>>> +	 * or masked) or if we already have a background timer, then there's
>>> +	 * no more work for us to do.
>>>  	 */
>>> +	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
>>> +		return;
>>
>> Do we need to retest kvm_timer_irq_enabled here? It is already implied
>> by kvm_timer_should_fire...
>>
> 
> yes we do, when we reach this if statement there are two cases:
> (1) kvm_timer_irq_enabled == true but cval > now
> (2) kvm_timer_irq_enabled == false
> 
> We hould only schedule a timer in in case (1), which happens exactly
> when kvm_timer_irq_enabled == true, hence the return on the opposite.
> Does that make sense?

It does now.

What is not completely obvious at the moment is how we can end-up with
timer_is_armed() being true here. If a timer is already armed, it means
we've blocked already... What am I missing?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
  2015-09-03 15:23     ` Marc Zyngier
@ 2015-09-03 15:56       ` Eric Auger
  -1 siblings, 0 replies; 74+ messages in thread
From: Eric Auger @ 2015-09-03 15:56 UTC (permalink / raw)
  To: Marc Zyngier, Christoffer Dall, kvmarm, linux-arm-kernel, kvm

Hi Christoffer,
On 09/03/2015 05:23 PM, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
>> Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
>> with them is not apparently easy to understand by reading various specs.
>>
>> Therefore, add a proper documentation file explaining the flow and
>> rationale of the behavior of the vgic.
>>
>> Some of this text was contributed by Marc Zyngier.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
>>  1 file changed, 59 insertions(+)
>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>
>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> new file mode 100644
>> index 0000000..49e1357
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> @@ -0,0 +1,59 @@
>> +KVM/ARM VGIC Mapped Interrupts
>> +==============================
>> +
>> +Setting the Physical Active State for Edge vs. Level Triggered IRQs
>> +-------------------------------------------------------------------
>> +
>> +Mapped non-shared interrupts injected to a guest should always mark the
>> +interrupt as active on the physical distributor.
When injecting the virtual IRQ associated to the mapped=forwarded IRQ
(see next comment), the host must not deactivate the physical IRQ so
that its active state remains?
>> +
>> +The reasoning for level-triggered interrupts:
>> +For level-triggered interrupts, we have to mark the interrupt as active
>> +on the physical distributor,
to leave the interrupt as active? I have the impression you talk about
shared IRQ here where the HW would not have any impact on the physical
distributor state? The physical IRQ can be pending+active too?
 because otherwise, as the line remains
>> +asserted, the guest will never execute because the host will keep taking
>> +interrupts.  As soon as the guest deactivates the interrupt, the
>> +physical line is sampled by the hardware again and the host takes a new
>> +interrupt if the physical line is still asserted.
>> +
>> +The reasoning for edge-triggered interrupts:
>> +For edge-triggered interrupts, if we set the HW bit in the LR we also
>> +have to mark the interrupt as active on the physical distributor.  If we
>> +don't set the physical active bit and the interrupt hits again before
>> +the guest has deactivated the interrupt, the interrupt goes to the host,
>> +which cannot set the state to ACTIVE+PENDING in the LR, because that is
>> +not supported when setting the HW bit in the LR.
>> +
>> +An alternative could be to not use HW bit at all, and inject
>> +edge-triggered interrupts from a physical assigned device as pure
>> +virtual interrupts, but that would potentially slow down handling of the
>> +interrupt in the guest, because a physical interrupt occurring in the
>> +middle of the guest ISR would preempt the guest for the host to handle
>> +the interrupt.
> 
> It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
> do not have an Active state (they are either Pending or not), so we'll
> have to deal with edge interrupts as you just described at some point.
> Other architectures do something similar, I'd expect.
> 
>> +
>> +
>> +Life Cycle for Forwarded Physical Interrupts
>> +--------------------------------------------
>> +
>> +By forwarded physical interrupts we mean interrupts presented to a guest
>> +representing a real HW event originally signaled to the host as a
> 
> s/signaled/signalled/
> 
>> +physical interrupt
is it always true for the timer? sometimes isn't it a SW counter that
expires and upon that event you inject the virtual IRQ with HW bit set?
 and injecting this as a virtual interrupt with the HW
>> +bit set in the LR.
another definition of a forwarded/mapped physical IRQ is a physical IRQ
that is deactivated by the guest and not by the host.

Shouldn't we start this file by the definition of a Forwarded Physical
Interrupts. Here you were supposed to describe their Life Cycle. Also
note that we previously talked about mapped IRQ and now we talk about
forwarded IRQ which can be confusing for the reader. Also we may
re-introduce the fact that we distinguish between shared and non shared
beasts to give the full picture?
>> +
>> +The state of such an interrupt is managed in the following way:
>> +
>> +  - LR.Pending must be set when the interrupt is first injected, because this
>> +    is the only way the GICV interface is going to present it to the guest.
>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>> +  - LR.Pending transitions to LR.Active on read of IAR, as expected.
>> +  - On EOI, the *physical distributor* active bit gets cleared, but the
>> +    LR.Active is left untouched - it looks like the GIC can only clear a
>> +    single bit (either the virtual active, or the physical one).
>> +  - This means we cannot trust LR.Active to find out about the state of the
>> +    interrupt, and we definitely need to look at the distributor version.
physical distributor version?

Best Regards

Eric
>> +
>> +Consequently, when we context switch the state of a VCPU with forwarded
>> +physical interrupts, we must context switch set pending *or* active bits in the
>> +LR for that VCPU until the guest has deactivated the physical interrupt, and
>> +then clear the corresponding bits in the LR.  If we ever set an LR to pending or
>> +mapped when switching in a VCPU for a forwarded physical interrupt, we must also
>> +set the active state on the *physical distributor*.
>>
> 
> I wonder if it may be worth adding a small example with the timer,
> because it is not immediately obvious why the interrupt would fire on
> and on without putting the generating device in the picture...
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
@ 2015-09-03 15:56       ` Eric Auger
  0 siblings, 0 replies; 74+ messages in thread
From: Eric Auger @ 2015-09-03 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,
On 09/03/2015 05:23 PM, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
>> Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
>> with them is not apparently easy to understand by reading various specs.
>>
>> Therefore, add a proper documentation file explaining the flow and
>> rationale of the behavior of the vgic.
>>
>> Some of this text was contributed by Marc Zyngier.
>>
>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>> ---
>>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
>>  1 file changed, 59 insertions(+)
>>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>>
>> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> new file mode 100644
>> index 0000000..49e1357
>> --- /dev/null
>> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
>> @@ -0,0 +1,59 @@
>> +KVM/ARM VGIC Mapped Interrupts
>> +==============================
>> +
>> +Setting the Physical Active State for Edge vs. Level Triggered IRQs
>> +-------------------------------------------------------------------
>> +
>> +Mapped non-shared interrupts injected to a guest should always mark the
>> +interrupt as active on the physical distributor.
When injecting the virtual IRQ associated to the mapped=forwarded IRQ
(see next comment), the host must not deactivate the physical IRQ so
that its active state remains?
>> +
>> +The reasoning for level-triggered interrupts:
>> +For level-triggered interrupts, we have to mark the interrupt as active
>> +on the physical distributor,
to leave the interrupt as active? I have the impression you talk about
shared IRQ here where the HW would not have any impact on the physical
distributor state? The physical IRQ can be pending+active too?
 because otherwise, as the line remains
>> +asserted, the guest will never execute because the host will keep taking
>> +interrupts.  As soon as the guest deactivates the interrupt, the
>> +physical line is sampled by the hardware again and the host takes a new
>> +interrupt if the physical line is still asserted.
>> +
>> +The reasoning for edge-triggered interrupts:
>> +For edge-triggered interrupts, if we set the HW bit in the LR we also
>> +have to mark the interrupt as active on the physical distributor.  If we
>> +don't set the physical active bit and the interrupt hits again before
>> +the guest has deactivated the interrupt, the interrupt goes to the host,
>> +which cannot set the state to ACTIVE+PENDING in the LR, because that is
>> +not supported when setting the HW bit in the LR.
>> +
>> +An alternative could be to not use HW bit at all, and inject
>> +edge-triggered interrupts from a physical assigned device as pure
>> +virtual interrupts, but that would potentially slow down handling of the
>> +interrupt in the guest, because a physical interrupt occurring in the
>> +middle of the guest ISR would preempt the guest for the host to handle
>> +the interrupt.
> 
> It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
> do not have an Active state (they are either Pending or not), so we'll
> have to deal with edge interrupts as you just described at some point.
> Other architectures do something similar, I'd expect.
> 
>> +
>> +
>> +Life Cycle for Forwarded Physical Interrupts
>> +--------------------------------------------
>> +
>> +By forwarded physical interrupts we mean interrupts presented to a guest
>> +representing a real HW event originally signaled to the host as a
> 
> s/signaled/signalled/
> 
>> +physical interrupt
is it always true for the timer? sometimes isn't it a SW counter that
expires and upon that event you inject the virtual IRQ with HW bit set?
 and injecting this as a virtual interrupt with the HW
>> +bit set in the LR.
another definition of a forwarded/mapped physical IRQ is a physical IRQ
that is deactivated by the guest and not by the host.

Shouldn't we start this file by the definition of a Forwarded Physical
Interrupts. Here you were supposed to describe their Life Cycle. Also
note that we previously talked about mapped IRQ and now we talk about
forwarded IRQ which can be confusing for the reader. Also we may
re-introduce the fact that we distinguish between shared and non shared
beasts to give the full picture?
>> +
>> +The state of such an interrupt is managed in the following way:
>> +
>> +  - LR.Pending must be set when the interrupt is first injected, because this
>> +    is the only way the GICV interface is going to present it to the guest.
>> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
>> +  - LR.Pending transitions to LR.Active on read of IAR, as expected.
>> +  - On EOI, the *physical distributor* active bit gets cleared, but the
>> +    LR.Active is left untouched - it looks like the GIC can only clear a
>> +    single bit (either the virtual active, or the physical one).
>> +  - This means we cannot trust LR.Active to find out about the state of the
>> +    interrupt, and we definitely need to look at the distributor version.
physical distributor version?

Best Regards

Eric
>> +
>> +Consequently, when we context switch the state of a VCPU with forwarded
>> +physical interrupts, we must context switch set pending *or* active bits in the
>> +LR for that VCPU until the guest has deactivated the physical interrupt, and
>> +then clear the corresponding bits in the LR.  If we ever set an LR to pending or
>> +mapped when switching in a VCPU for a forwarded physical interrupt, we must also
>> +set the active state on the *physical distributor*.
>>
> 
> I wonder if it may be worth adding a small example with the timer,
> because it is not immediately obvious why the interrupt would fire on
> and on without putting the generating device in the picture...
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  2015-09-03 15:53         ` Marc Zyngier
@ 2015-09-03 16:09           ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-03 16:09 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm

On Thu, Sep 03, 2015 at 04:53:22PM +0100, Marc Zyngier wrote:
> On 03/09/15 15:58, Christoffer Dall wrote:
> > On Thu, Sep 03, 2015 at 03:43:19PM +0100, Marc Zyngier wrote:
> >> On 30/08/15 14:54, Christoffer Dall wrote:
> >>> We currently schedule a soft timer every time we exit the guest if the
> >>> timer did not expire while running the guest.  This is really not
> >>> necessary, because the only work we do in the timer work function is to
> >>> kick the vcpu.
> >>>
> >>> Kicking the vcpu does two things:
> >>> (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> >>> from the waitqueue.
> >>> (2) If the vcpu is running on a different physical CPU from the one
> >>> doing the kick, it sends a reschedule IPI.
> >>>
> >>> The second case cannot happen, because the soft timer is only ever
> >>> scheduled when the vcpu is not running.  The first case is only relevant
> >>> when the vcpu thread is on a waitqueue, which is only the case when the
> >>> vcpu thread has called kvm_vcpu_block().
> >>>
> >>> Therefore, we only need to make sure a timer is scheduled for
> >>> kvm_vcpu_block(), which we do by encapsulating all calls to
> >>> kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> >>>
> >>> Additionally, we only schedule a soft timer if the timer is enabled and
> >>> unmasked, since it is useless otherwise.
> >>>
> >>> Note that theoretically userspace can use the SET_ONE_REG interface to
> >>> change registers that should cause the timer to fire, even if the vcpu
> >>> is blocked without a scheduled timer, but this case was not supported
> >>> before this patch and we leave it for future work for now.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  arch/arm/include/asm/kvm_host.h   |  3 --
> >>>  arch/arm/kvm/arm.c                | 10 +++++
> >>>  arch/arm64/include/asm/kvm_host.h |  3 --
> >>>  include/kvm/arm_arch_timer.h      |  2 +
> >>>  virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
> >>>  5 files changed, 70 insertions(+), 37 deletions(-)
> >>>
> >>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>> index 86fcf6e..dcba0fa 100644
> >>> --- a/arch/arm/include/asm/kvm_host.h
> >>> +++ b/arch/arm/include/asm/kvm_host.h
> >>> @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
> >>>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
> >>>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
> >>>  
> >>> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> >>> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> >>> -
> >>>  #endif /* __ARM_KVM_HOST_H__ */
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index ce404a5..bdf8871 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> >>>  	return kvm_timer_should_fire(vcpu);
> >>>  }
> >>>  
> >>> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	kvm_timer_schedule(vcpu);
> >>> +}
> >>> +
> >>> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	kvm_timer_unschedule(vcpu);
> >>> +}
> >>> +
> >>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>>  {
> >>>  	/* Force users to call KVM_ARM_VCPU_INIT */
> >>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>> index dd143f5..415938d 100644
> >>> --- a/arch/arm64/include/asm/kvm_host.h
> >>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>> @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >>>  
> >>> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> >>> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> >>> -
> >>>  #endif /* __ARM64_KVM_HOST_H__ */
> >>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> >>> index e1e4d7c..ef14cc1 100644
> >>> --- a/include/kvm/arm_arch_timer.h
> >>> +++ b/include/kvm/arm_arch_timer.h
> >>> @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >>>  
> >>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> >>> +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> >>> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >>>  
> >>>  #endif
> >>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >>> index 76e38d2..018f3d6 100644
> >>> --- a/virt/kvm/arm/arch_timer.c
> >>> +++ b/virt/kvm/arm/arch_timer.c
> >>> @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
> >>>  	return HRTIMER_NORESTART;
> >>>  }
> >>>  
> >>> +static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +
> >>> +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> >>> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> >>> +		!kvm_vgic_get_phys_irq_active(timer->map);
> >>> +}
> >>
> >> Nit: To me, this is not a predicate for "IRQ enabled", but "IRQ can
> >> fire" instead, which seems to complement the kvm_timer_should_fire just
> >> below.
> >>
> > 
> > so you're suggesting kvm_timer_irq_can_fire (or
> > kvm_timer_irq_could_file) or something else?
> 
> kvm_timer_can_fire() would have my preference (but I'm known to be bad
> at picking names...).
> 
> >>> +
> >>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>>  {
> >>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>>  	cycle_t cval, now;
> >>>  
> >>> -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> >>> -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> >>> -	    kvm_vgic_get_phys_irq_active(timer->map))
> >>> +	if (!kvm_timer_irq_enabled(vcpu))
> >>>  		return false;
> >>>  
> >>>  	cval = timer->cntv_cval;
> >>> @@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>>  	return cval <= now;
> >>>  }
> >>>  
> >>> -/**
> >>> - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> >>> - * @vcpu: The vcpu pointer
> >>> - *
> >>> - * Disarm any pending soft timers, since the world-switch code will write the
> >>> - * virtual timer state back to the physical CPU.
> >>> +/*
> >>> + * Schedule the background timer before calling kvm_vcpu_block, so that this
> >>> + * thread is removed from its waitqueue and made runnable when there's a timer
> >>> + * interrupt to handle.
> >>>   */
> >>> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >>> +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >>>  {
> >>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +	u64 ns;
> >>> +	cycle_t cval, now;
> >>> +
> >>> +	/*
> >>> +	 * No need to schedule a background timer if the guest timer has
> >>> +	 * already expired, because kvm_vcpu_block will return before putting
> >>> +	 * the thread to sleep.
> >>> +	 */
> >>> +	if (kvm_timer_should_fire(vcpu))
> >>> +		return;
> >>>  
> >>>  	/*
> >>> -	 * We're about to run this vcpu again, so there is no need to
> >>> -	 * keep the background timer running, as we're about to
> >>> -	 * populate the CPU timer again.
> >>> +	 * If the timer is either not capable of raising interrupts (disabled
> >>> +	 * or masked) or if we already have a background timer, then there's
> >>> +	 * no more work for us to do.
> >>>  	 */
> >>> +	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
> >>> +		return;
> >>
> >> Do we need to retest kvm_timer_irq_enabled here? It is already implied
> >> by kvm_timer_should_fire...
> >>
> > 
> > yes we do, when we reach this if statement there are two cases:
> > (1) kvm_timer_irq_enabled == true but cval > now
> > (2) kvm_timer_irq_enabled == false
> > 
> > We hould only schedule a timer in in case (1), which happens exactly
> > when kvm_timer_irq_enabled == true, hence the return on the opposite.
> > Does that make sense?
> 
> It does now.
> 
> What is not completely obvious at the moment is how we can end-up with
> timer_is_armed() being true here. If a timer is already armed, it means
> we've blocked already... What am I missing?
> 
Hmm, this is probably a leftover from my development cycles.  This could
be modified to a BUG_ON, except if we start calling this function when
userspace modified the registers, but I don't remember if that's even
possible when the thread is blocked (i.e. modified from another thread).

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
@ 2015-09-03 16:09           ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-03 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 03, 2015 at 04:53:22PM +0100, Marc Zyngier wrote:
> On 03/09/15 15:58, Christoffer Dall wrote:
> > On Thu, Sep 03, 2015 at 03:43:19PM +0100, Marc Zyngier wrote:
> >> On 30/08/15 14:54, Christoffer Dall wrote:
> >>> We currently schedule a soft timer every time we exit the guest if the
> >>> timer did not expire while running the guest.  This is really not
> >>> necessary, because the only work we do in the timer work function is to
> >>> kick the vcpu.
> >>>
> >>> Kicking the vcpu does two things:
> >>> (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
> >>> from the waitqueue.
> >>> (2) If the vcpu is running on a different physical CPU from the one
> >>> doing the kick, it sends a reschedule IPI.
> >>>
> >>> The second case cannot happen, because the soft timer is only ever
> >>> scheduled when the vcpu is not running.  The first case is only relevant
> >>> when the vcpu thread is on a waitqueue, which is only the case when the
> >>> vcpu thread has called kvm_vcpu_block().
> >>>
> >>> Therefore, we only need to make sure a timer is scheduled for
> >>> kvm_vcpu_block(), which we do by encapsulating all calls to
> >>> kvm_vcpu_block() with kvm_timer_{un}schedule calls.
> >>>
> >>> Additionally, we only schedule a soft timer if the timer is enabled and
> >>> unmasked, since it is useless otherwise.
> >>>
> >>> Note that theoretically userspace can use the SET_ONE_REG interface to
> >>> change registers that should cause the timer to fire, even if the vcpu
> >>> is blocked without a scheduled timer, but this case was not supported
> >>> before this patch and we leave it for future work for now.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  arch/arm/include/asm/kvm_host.h   |  3 --
> >>>  arch/arm/kvm/arm.c                | 10 +++++
> >>>  arch/arm64/include/asm/kvm_host.h |  3 --
> >>>  include/kvm/arm_arch_timer.h      |  2 +
> >>>  virt/kvm/arm/arch_timer.c         | 89 +++++++++++++++++++++++++--------------
> >>>  5 files changed, 70 insertions(+), 37 deletions(-)
> >>>
> >>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >>> index 86fcf6e..dcba0fa 100644
> >>> --- a/arch/arm/include/asm/kvm_host.h
> >>> +++ b/arch/arm/include/asm/kvm_host.h
> >>> @@ -236,7 +236,4 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
> >>>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
> >>>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
> >>>  
> >>> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> >>> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> >>> -
> >>>  #endif /* __ARM_KVM_HOST_H__ */
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index ce404a5..bdf8871 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -271,6 +271,16 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
> >>>  	return kvm_timer_should_fire(vcpu);
> >>>  }
> >>>  
> >>> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	kvm_timer_schedule(vcpu);
> >>> +}
> >>> +
> >>> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	kvm_timer_unschedule(vcpu);
> >>> +}
> >>> +
> >>>  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
> >>>  {
> >>>  	/* Force users to call KVM_ARM_VCPU_INIT */
> >>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >>> index dd143f5..415938d 100644
> >>> --- a/arch/arm64/include/asm/kvm_host.h
> >>> +++ b/arch/arm64/include/asm/kvm_host.h
> >>> @@ -257,7 +257,4 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >>>  
> >>> -static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> >>> -static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> >>> -
> >>>  #endif /* __ARM64_KVM_HOST_H__ */
> >>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> >>> index e1e4d7c..ef14cc1 100644
> >>> --- a/include/kvm/arm_arch_timer.h
> >>> +++ b/include/kvm/arm_arch_timer.h
> >>> @@ -71,5 +71,7 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
> >>>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >>>  
> >>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> >>> +void kvm_timer_schedule(struct kvm_vcpu *vcpu);
> >>> +void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> >>>  
> >>>  #endif
> >>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >>> index 76e38d2..018f3d6 100644
> >>> --- a/virt/kvm/arm/arch_timer.c
> >>> +++ b/virt/kvm/arm/arch_timer.c
> >>> @@ -111,14 +111,21 @@ static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
> >>>  	return HRTIMER_NORESTART;
> >>>  }
> >>>  
> >>> +static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +
> >>> +	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> >>> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> >>> +		!kvm_vgic_get_phys_irq_active(timer->map);
> >>> +}
> >>
> >> Nit: To me, this is not a predicate for "IRQ enabled", but "IRQ can
> >> fire" instead, which seems to complement the kvm_timer_should_fire just
> >> below.
> >>
> > 
> > so you're suggesting kvm_timer_irq_can_fire (or
> > kvm_timer_irq_could_file) or something else?
> 
> kvm_timer_can_fire() would have my preference (but I'm known to be bad
> at picking names...).
> 
> >>> +
> >>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>>  {
> >>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>>  	cycle_t cval, now;
> >>>  
> >>> -	if ((timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) ||
> >>> -	    !(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) ||
> >>> -	    kvm_vgic_get_phys_irq_active(timer->map))
> >>> +	if (!kvm_timer_irq_enabled(vcpu))
> >>>  		return false;
> >>>  
> >>>  	cval = timer->cntv_cval;
> >>> @@ -127,24 +134,59 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>>  	return cval <= now;
> >>>  }
> >>>  
> >>> -/**
> >>> - * kvm_timer_flush_hwstate - prepare to move the virt timer to the cpu
> >>> - * @vcpu: The vcpu pointer
> >>> - *
> >>> - * Disarm any pending soft timers, since the world-switch code will write the
> >>> - * virtual timer state back to the physical CPU.
> >>> +/*
> >>> + * Schedule the background timer before calling kvm_vcpu_block, so that this
> >>> + * thread is removed from its waitqueue and made runnable when there's a timer
> >>> + * interrupt to handle.
> >>>   */
> >>> -void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >>> +void kvm_timer_schedule(struct kvm_vcpu *vcpu)
> >>>  {
> >>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +	u64 ns;
> >>> +	cycle_t cval, now;
> >>> +
> >>> +	/*
> >>> +	 * No need to schedule a background timer if the guest timer has
> >>> +	 * already expired, because kvm_vcpu_block will return before putting
> >>> +	 * the thread to sleep.
> >>> +	 */
> >>> +	if (kvm_timer_should_fire(vcpu))
> >>> +		return;
> >>>  
> >>>  	/*
> >>> -	 * We're about to run this vcpu again, so there is no need to
> >>> -	 * keep the background timer running, as we're about to
> >>> -	 * populate the CPU timer again.
> >>> +	 * If the timer is either not capable of raising interrupts (disabled
> >>> +	 * or masked) or if we already have a background timer, then there's
> >>> +	 * no more work for us to do.
> >>>  	 */
> >>> +	if (!kvm_timer_irq_enabled(vcpu) || timer_is_armed(timer))
> >>> +		return;
> >>
> >> Do we need to retest kvm_timer_irq_enabled here? It is already implied
> >> by kvm_timer_should_fire...
> >>
> > 
> > yes we do, when we reach this if statement there are two cases:
> > (1) kvm_timer_irq_enabled == true but cval > now
> > (2) kvm_timer_irq_enabled == false
> > 
> > We hould only schedule a timer in in case (1), which happens exactly
> > when kvm_timer_irq_enabled == true, hence the return on the opposite.
> > Does that make sense?
> 
> It does now.
> 
> What is not completely obvious at the moment is how we can end-up with
> timer_is_armed() being true here. If a timer is already armed, it means
> we've blocked already... What am I missing?
> 
Hmm, this is probably a leftover from my development cycles.  This could
be modified to a BUG_ON, except if we start calling this function when
userspace modified the registers, but I don't remember if that's even
possible when the thread is blocked (i.e. modified from another thread).

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 17:06     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 17:06 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 30/08/15 14:54, Christoffer Dall wrote:
> The arch timer currently uses edge-triggered semantics in the sense that
> the line is never sampled by the vgic and lowering the line from the
> timer to the vgic doesn't have any affect on the pending state of
> virtual interrupts in the vgic.  This means that we do not support a
> guest with the otherwise valid behavior of (1) disable interrupts (2)
> enable the timer (3) disable the timer (4) enable interrupts.  Such a
> guest would validly not expect to see any interrupts on real hardware,
> but will see interrupts on KVM.
> 
> This patches fixes this shortcoming through the following series of
> changes.
> 
> First, we change the flow of the timer/vgic sync/flush operations.  Now
> the timer is always flushed/synced before the vgic, because the vgic
> samples the state of the timer output.  This has the implication that we
> move the timer operations in to non-preempible sections, but that is
> fine after the previous commit getting rid of hrtimer schedules on every
> entry/exit.
> 
> Second, we change the internal behavior of the timer, letting the timer
> keep track of its previous output state, and only lower/raise the line
> to the vgic when the state changes.  Note that in theory this could have
> been accomplished more simply by signalling the vgic every time the
> state *potentially* changed, but we don't want to be hitting the vgic
> more often than necessary.
> 
> Third, we get rid of the use of the map->active field in the vgic and
> instead simply set the interrupt as active on the physical distributor
> whenever we signal a mapped interrupt to the guest, and we reset the
> active state when we sync back the HW state from the vgic.
> 
> Fourth, and finally, we now initialize the timer PPIs (and all the other
> unused PPIs for now), to be level-triggered, and modify the sync code to
> sample the line state on HW sync and re-inject a new interrupt if it is
> still pending at that time.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/kvm/arm.c           | 11 +++++--
>  include/kvm/arm_arch_timer.h |  2 +-
>  include/kvm/arm_vgic.h       |  3 --
>  virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
>  virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
>  5 files changed, 81 insertions(+), 70 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index bdf8871..102a4aa 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> +			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
>  			preempt_enable();
> -			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
>  
> @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_guest_exit();
>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>  
> +		/*
> +		 * We must sync the timer state before the vgic state so that
> +		 * the vgic can properly sample the updated state of the
> +		 * interrupt line.
> +		 */
> +		kvm_timer_sync_hwstate(vcpu);
> +
>  		kvm_vgic_sync_hwstate(vcpu);
>  
>  		preempt_enable();
>  
> -		kvm_timer_sync_hwstate(vcpu);
> -
>  		ret = handle_exit(vcpu, run, ret);
>  	}
>  
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index ef14cc1..1800227 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -51,7 +51,7 @@ struct arch_timer_cpu {
>  	bool				armed;
>  
>  	/* Timer IRQ */
> -	const struct kvm_irq_level	*irq;
> +	struct kvm_irq_level		irq;
>  
>  	/* VGIC mapping */
>  	struct irq_phys_map		*map;
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index d901f1a..99011a0 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -163,7 +163,6 @@ struct irq_phys_map {
>  	u32			virt_irq;
>  	u32			phys_irq;
>  	u32			irq;
> -	bool			active;
>  };
>  
>  struct irq_phys_map_entry {
> @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  					   int virt_irq, int irq);
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 018f3d6..747302f 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
>  	}
>  }
>  
> -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> -{
> -	int ret;
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -
> -	kvm_vgic_set_phys_irq_active(timer->map, true);
> -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> -					 timer->map,
> -					 timer->irq->level);
> -	WARN_ON(ret);
> -}
> -
>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>  {
>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> @@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
>  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> -		!kvm_vgic_get_phys_irq_active(timer->map);
> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
>  }
>  
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> @@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	BUG_ON(!vgic_initialized(vcpu->kvm));
> +
> +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> +					 timer->map,
> +					 timer->irq.level);
> +	WARN_ON(ret);
> +}
> +
> +/*
> + * Check if there was a change in the timer state (should we raise or lower
> + * the line level to the GIC).
> + */
> +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	/*
> +	 * If userspace modified the timer registers via SET_ONE_REG before
> +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> +	 * because the guest would never see the interrupt.  Instead wait
> +	 * until we call this funciton from kvm_timer_flush_hwstate.
> +	 */
> +	if (!vgic_initialized(vcpu->kvm))
> +	    return;
> +
> +	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> +		timer->irq.level = 1;
> +		kvm_timer_update_irq(vcpu);
> +	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> +		timer->irq.level = 0;
> +		kvm_timer_update_irq(vcpu);
> +	}
> +}
> +

It took me ages to parse this, so I rewrote it to match my understanding:

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 8a0fdfc..a722f0f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -121,13 +121,14 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
-static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
+static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_state)
 {
 	int ret;
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	BUG_ON(!vgic_initialized(vcpu->kvm));
 
+	timer->irq.level = new_state;
 	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
 					 timer->map,
 					 timer->irq.level);
@@ -151,13 +152,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 	if (!vgic_initialized(vcpu->kvm))
 	    return;
 
-	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
-		timer->irq.level = 1;
-		kvm_timer_update_irq(vcpu);
-	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
-		timer->irq.level = 0;
-		kvm_timer_update_irq(vcpu);
-	}
+	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
+		kvm_timer_update_irq(vcpu, !timer->irq.level);
 }
 
 /*

Did I get it right?

>  /*
>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>   * thread is removed from its waitqueue and made runnable when there's a timer
> @@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  	 * If the timer expired while we were not scheduled, now is the time
>  	 * to inject it.
>  	 */
> -	if (kvm_timer_should_fire(vcpu))
> -		kvm_timer_inject_irq(vcpu);
> +	kvm_timer_update_state(vcpu);
>  }
>  
>  /**
> @@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  
>  	BUG_ON(timer_is_armed(timer));
>  
> -	if (kvm_timer_should_fire(vcpu))
> -		kvm_timer_inject_irq(vcpu);
> +	/*
> +	 * The guest could have modified the timer registers or the timer
> +	 * could have expired, update the timer state.
> +	 */
> +	kvm_timer_update_state(vcpu);
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> @@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  	 * kvm_vcpu_set_target(). To handle this, we determine
>  	 * vcpu timer irq number when the vcpu is reset.
>  	 */
> -	timer->irq = irq;
> +	timer->irq.irq = irq->irq;
>  
>  	/*
>  	 * Tell the VGIC that the virtual interrupt is tied to a
> @@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  	default:
>  		return -1;
>  	}
> +
> +	kvm_timer_update_state(vcpu);
>  	return 0;
>  }
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool phys_active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;
>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &phys_active);
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> +	if (phys_active) {
> +		/*
> +		 * Interrupt still marked as active on the physical
> +		 * distributor, so guest did not EOI it yet.  Reset to
> +		 * non-active so that other VMs can see interrupts from this
> +		 * device.
> +		 */
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
>  		WARN_ON(ret);
> -		return 0;
> +		return false;
>  	}
>  
> -	return 1;
> +	/* Mapped edge-triggered interrupts not yet supported. */
> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));

Hmmm. What are we missing?

> +	return process_level_irq(vcpu, lr, vlr);
>  }
>  
>  /* Sync back the VGIC state after a guest run */
> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> -		if (vgic_sync_hwirq(vcpu, vlr)) {
> -			/*
> -			 * So this is a HW interrupt that the guest
> -			 * EOI-ed. Clean the LR state and allow the
> -			 * interrupt to be sampled again.
> -			 */
> -			vlr.state = 0;
> -			vlr.hwirq = 0;
> -			vgic_set_lr(vcpu, lr, vlr);
> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> -			set_bit(lr, elrsr_ptr);
> -		}
> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> +			level_pending = true;
>  
>  		if (!test_bit(lr, elrsr_ptr))
>  			continue;
> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
>  }
>  
>  /**
> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> - *
> - * Return the logical active state of a mapped interrupt. This doesn't
> - * necessarily reflects the current HW state.
> - */
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> -{
> -	BUG_ON(!map);
> -	return map->active;
> -}
> -
> -/**
> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> - *
> - * Set the logical active state of a mapped interrupt. This doesn't
> - * immediately affects the HW state.
> - */
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> -{
> -	BUG_ON(!map);
> -	map->active = active;
> -}
> -
> -/**
>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>   * @vcpu: The VCPU pointer
>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>  			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>  							vcpu->vcpu_id, i, 1);
> -			if (i < VGIC_NR_PRIVATE_IRQS)
> +			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>  							vcpu->vcpu_id, i,
>  							VGIC_CFG_EDGE);
> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> +							vcpu->vcpu_id, i,
> +							VGIC_CFG_LEVEL);
>  		}
>  
>  		vgic_enable(vcpu);
> 

My only real objection to this patch is that it puts my brain upside down.
Hopefully that won't last.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-03 17:06     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 17:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> The arch timer currently uses edge-triggered semantics in the sense that
> the line is never sampled by the vgic and lowering the line from the
> timer to the vgic doesn't have any affect on the pending state of
> virtual interrupts in the vgic.  This means that we do not support a
> guest with the otherwise valid behavior of (1) disable interrupts (2)
> enable the timer (3) disable the timer (4) enable interrupts.  Such a
> guest would validly not expect to see any interrupts on real hardware,
> but will see interrupts on KVM.
> 
> This patches fixes this shortcoming through the following series of
> changes.
> 
> First, we change the flow of the timer/vgic sync/flush operations.  Now
> the timer is always flushed/synced before the vgic, because the vgic
> samples the state of the timer output.  This has the implication that we
> move the timer operations in to non-preempible sections, but that is
> fine after the previous commit getting rid of hrtimer schedules on every
> entry/exit.
> 
> Second, we change the internal behavior of the timer, letting the timer
> keep track of its previous output state, and only lower/raise the line
> to the vgic when the state changes.  Note that in theory this could have
> been accomplished more simply by signalling the vgic every time the
> state *potentially* changed, but we don't want to be hitting the vgic
> more often than necessary.
> 
> Third, we get rid of the use of the map->active field in the vgic and
> instead simply set the interrupt as active on the physical distributor
> whenever we signal a mapped interrupt to the guest, and we reset the
> active state when we sync back the HW state from the vgic.
> 
> Fourth, and finally, we now initialize the timer PPIs (and all the other
> unused PPIs for now), to be level-triggered, and modify the sync code to
> sample the line state on HW sync and re-inject a new interrupt if it is
> still pending at that time.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/kvm/arm.c           | 11 +++++--
>  include/kvm/arm_arch_timer.h |  2 +-
>  include/kvm/arm_vgic.h       |  3 --
>  virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
>  virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
>  5 files changed, 81 insertions(+), 70 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index bdf8871..102a4aa 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>  			local_irq_enable();
> +			kvm_timer_sync_hwstate(vcpu);
>  			kvm_vgic_sync_hwstate(vcpu);
>  			preempt_enable();
> -			kvm_timer_sync_hwstate(vcpu);
>  			continue;
>  		}
>  
> @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		kvm_guest_exit();
>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>  
> +		/*
> +		 * We must sync the timer state before the vgic state so that
> +		 * the vgic can properly sample the updated state of the
> +		 * interrupt line.
> +		 */
> +		kvm_timer_sync_hwstate(vcpu);
> +
>  		kvm_vgic_sync_hwstate(vcpu);
>  
>  		preempt_enable();
>  
> -		kvm_timer_sync_hwstate(vcpu);
> -
>  		ret = handle_exit(vcpu, run, ret);
>  	}
>  
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index ef14cc1..1800227 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -51,7 +51,7 @@ struct arch_timer_cpu {
>  	bool				armed;
>  
>  	/* Timer IRQ */
> -	const struct kvm_irq_level	*irq;
> +	struct kvm_irq_level		irq;
>  
>  	/* VGIC mapping */
>  	struct irq_phys_map		*map;
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index d901f1a..99011a0 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -163,7 +163,6 @@ struct irq_phys_map {
>  	u32			virt_irq;
>  	u32			phys_irq;
>  	u32			irq;
> -	bool			active;
>  };
>  
>  struct irq_phys_map_entry {
> @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>  					   int virt_irq, int irq);
>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>  
>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 018f3d6..747302f 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
>  	}
>  }
>  
> -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> -{
> -	int ret;
> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> -
> -	kvm_vgic_set_phys_irq_active(timer->map, true);
> -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> -					 timer->map,
> -					 timer->irq->level);
> -	WARN_ON(ret);
> -}
> -
>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>  {
>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> @@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
>  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> -		!kvm_vgic_get_phys_irq_active(timer->map);
> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
>  }
>  
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> @@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> +{
> +	int ret;
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	BUG_ON(!vgic_initialized(vcpu->kvm));
> +
> +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> +					 timer->map,
> +					 timer->irq.level);
> +	WARN_ON(ret);
> +}
> +
> +/*
> + * Check if there was a change in the timer state (should we raise or lower
> + * the line level to the GIC).
> + */
> +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> +{
> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +
> +	/*
> +	 * If userspace modified the timer registers via SET_ONE_REG before
> +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> +	 * because the guest would never see the interrupt.  Instead wait
> +	 * until we call this funciton from kvm_timer_flush_hwstate.
> +	 */
> +	if (!vgic_initialized(vcpu->kvm))
> +	    return;
> +
> +	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> +		timer->irq.level = 1;
> +		kvm_timer_update_irq(vcpu);
> +	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> +		timer->irq.level = 0;
> +		kvm_timer_update_irq(vcpu);
> +	}
> +}
> +

It took me ages to parse this, so I rewrote it to match my understanding:

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 8a0fdfc..a722f0f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -121,13 +121,14 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 	return cval <= now;
 }
 
-static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
+static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_state)
 {
 	int ret;
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
 
 	BUG_ON(!vgic_initialized(vcpu->kvm));
 
+	timer->irq.level = new_state;
 	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
 					 timer->map,
 					 timer->irq.level);
@@ -151,13 +152,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 	if (!vgic_initialized(vcpu->kvm))
 	    return;
 
-	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
-		timer->irq.level = 1;
-		kvm_timer_update_irq(vcpu);
-	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
-		timer->irq.level = 0;
-		kvm_timer_update_irq(vcpu);
-	}
+	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
+		kvm_timer_update_irq(vcpu, !timer->irq.level);
 }
 
 /*

Did I get it right?

>  /*
>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>   * thread is removed from its waitqueue and made runnable when there's a timer
> @@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  	 * If the timer expired while we were not scheduled, now is the time
>  	 * to inject it.
>  	 */
> -	if (kvm_timer_should_fire(vcpu))
> -		kvm_timer_inject_irq(vcpu);
> +	kvm_timer_update_state(vcpu);
>  }
>  
>  /**
> @@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  
>  	BUG_ON(timer_is_armed(timer));
>  
> -	if (kvm_timer_should_fire(vcpu))
> -		kvm_timer_inject_irq(vcpu);
> +	/*
> +	 * The guest could have modified the timer registers or the timer
> +	 * could have expired, update the timer state.
> +	 */
> +	kvm_timer_update_state(vcpu);
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> @@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  	 * kvm_vcpu_set_target(). To handle this, we determine
>  	 * vcpu timer irq number when the vcpu is reset.
>  	 */
> -	timer->irq = irq;
> +	timer->irq.irq = irq->irq;
>  
>  	/*
>  	 * Tell the VGIC that the virtual interrupt is tied to a
> @@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  	default:
>  		return -1;
>  	}
> +
> +	kvm_timer_update_state(vcpu);
>  	return 0;
>  }
>  
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 9ed8d53..f4ea950 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>  /*
>   * Save the physical active state, and reset it to inactive.
>   *
> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> + * Return true if there's a pending level triggered interrupt line to queue.
>   */
> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>  {
>  	struct irq_phys_map *map;
> +	bool phys_active;
>  	int ret;
>  
>  	if (!(vlr.state & LR_HW))
>  		return 0;
>  
>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> -	BUG_ON(!map || !map->active);
> +	BUG_ON(!map);
>  
>  	ret = irq_get_irqchip_state(map->irq,
>  				    IRQCHIP_STATE_ACTIVE,
> -				    &map->active);
> +				    &phys_active);
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> +	if (phys_active) {
> +		/*
> +		 * Interrupt still marked as active on the physical
> +		 * distributor, so guest did not EOI it yet.  Reset to
> +		 * non-active so that other VMs can see interrupts from this
> +		 * device.
> +		 */
>  		ret = irq_set_irqchip_state(map->irq,
>  					    IRQCHIP_STATE_ACTIVE,
>  					    false);
>  		WARN_ON(ret);
> -		return 0;
> +		return false;
>  	}
>  
> -	return 1;
> +	/* Mapped edge-triggered interrupts not yet supported. */
> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));

Hmmm. What are we missing?

> +	return process_level_irq(vcpu, lr, vlr);
>  }
>  
>  /* Sync back the VGIC state after a guest run */
> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>  			continue;
>  
>  		vlr = vgic_get_lr(vcpu, lr);
> -		if (vgic_sync_hwirq(vcpu, vlr)) {
> -			/*
> -			 * So this is a HW interrupt that the guest
> -			 * EOI-ed. Clean the LR state and allow the
> -			 * interrupt to be sampled again.
> -			 */
> -			vlr.state = 0;
> -			vlr.hwirq = 0;
> -			vgic_set_lr(vcpu, lr, vlr);
> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> -			set_bit(lr, elrsr_ptr);
> -		}
> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> +			level_pending = true;
>  
>  		if (!test_bit(lr, elrsr_ptr))
>  			continue;
> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
>  }
>  
>  /**
> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> - *
> - * Return the logical active state of a mapped interrupt. This doesn't
> - * necessarily reflects the current HW state.
> - */
> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> -{
> -	BUG_ON(!map);
> -	return map->active;
> -}
> -
> -/**
> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> - *
> - * Set the logical active state of a mapped interrupt. This doesn't
> - * immediately affects the HW state.
> - */
> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> -{
> -	BUG_ON(!map);
> -	map->active = active;
> -}
> -
> -/**
>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>   * @vcpu: The VCPU pointer
>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>  			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>  							vcpu->vcpu_id, i, 1);
> -			if (i < VGIC_NR_PRIVATE_IRQS)
> +			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>  							vcpu->vcpu_id, i,
>  							VGIC_CFG_EDGE);
> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> +							vcpu->vcpu_id, i,
> +							VGIC_CFG_LEVEL);
>  		}
>  
>  		vgic_enable(vcpu);
> 

My only real objection to this patch is that it puts my brain upside down.
Hopefully that won't last.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-03 17:07     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 17:07 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm
  Cc: Ard Biesheuvel, Laszlo Ersek

On 30/08/15 14:54, Christoffer Dall wrote:
> Provide a better quality of implementation and be architecture compliant
> on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
> reset of the timer, and call kvm_timer_update_state(vcpu) at the same
> time, ensuring the timer output is not asserted after, for example, a
> PSCI system reset.
> 
> This change alone fixes the UEFI reset issue reported by Laszlo back in
> February.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Drew Jones <drjones@redhat.com>
> Cc: Wei Huang <wei@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0
@ 2015-09-03 17:07     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/08/15 14:54, Christoffer Dall wrote:
> Provide a better quality of implementation and be architecture compliant
> on ARMv7 for the architected timer by resetting the CNTV_CTL to 0 on
> reset of the timer, and call kvm_timer_update_state(vcpu) at the same
> time, ensuring the timer output is not asserted after, for example, a
> PSCI system reset.
> 
> This change alone fixes the UEFI reset issue reported by Laszlo back in
> February.
> 
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Drew Jones <drjones@redhat.com>
> Cc: Wei Huang <wei@redhat.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 0/9] Rework architected timer and fix UEFI reset
  2015-08-30 13:54 ` Christoffer Dall
@ 2015-09-03 17:10   ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 17:10 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

Hi Christoffer,

On 30/08/15 14:54, Christoffer Dall wrote:
> The architected timer integration with the vgic had some shortcomings in
> that certain guests (one being UEFI) weren't fully supported.
> 
> In fixing this I also found that we are scheduling the hrtimer for the
> virtual timer way too often, with a potential performance overhead.
> 
> This series tries to address these problems in proviging level-triggered
> semantics for the arch timer and vgic intergration and seeks to clarify
> the behavior when setting/clearing the active state on the physical
> distributor.
> 
> Series based on kvmarm/next and also available at:
> https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git timer-rework

I'm quite pleased with the overall look of this series. It fixes a
number of issues that have been around for a while, plus some others
I've recently introduced...

Now, I fear this is probably too big a series to be shipped as a fix for
4.3. Can you come up with a minimal series that we could merge quickly
(I'm thinking patches 7 and 9), and leave the rest for the following
merge window?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 0/9] Rework architected timer and fix UEFI reset
@ 2015-09-03 17:10   ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 17:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 30/08/15 14:54, Christoffer Dall wrote:
> The architected timer integration with the vgic had some shortcomings in
> that certain guests (one being UEFI) weren't fully supported.
> 
> In fixing this I also found that we are scheduling the hrtimer for the
> virtual timer way too often, with a potential performance overhead.
> 
> This series tries to address these problems in proviging level-triggered
> semantics for the arch timer and vgic intergration and seeks to clarify
> the behavior when setting/clearing the active state on the physical
> distributor.
> 
> Series based on kvmarm/next and also available at:
> https://git.linaro.org/people/christoffer.dall/linux-kvm-arm.git timer-rework

I'm quite pleased with the overall look of this series. It fixes a
number of issues that have been around for a while, plus some others
I've recently introduced...

Now, I fear this is probably too big a series to be shipped as a fix for
4.3. Can you come up with a minimal series that we could merge quickly
(I'm thinking patches 7 and 9), and leave the rest for the following
merge window?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-03 17:06     ` Marc Zyngier
@ 2015-09-03 17:23       ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-03 17:23 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm

On Thu, Sep 03, 2015 at 06:06:39PM +0100, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
> > The arch timer currently uses edge-triggered semantics in the sense that
> > the line is never sampled by the vgic and lowering the line from the
> > timer to the vgic doesn't have any affect on the pending state of
> > virtual interrupts in the vgic.  This means that we do not support a
> > guest with the otherwise valid behavior of (1) disable interrupts (2)
> > enable the timer (3) disable the timer (4) enable interrupts.  Such a
> > guest would validly not expect to see any interrupts on real hardware,
> > but will see interrupts on KVM.
> > 
> > This patches fixes this shortcoming through the following series of
> > changes.
> > 
> > First, we change the flow of the timer/vgic sync/flush operations.  Now
> > the timer is always flushed/synced before the vgic, because the vgic
> > samples the state of the timer output.  This has the implication that we
> > move the timer operations in to non-preempible sections, but that is
> > fine after the previous commit getting rid of hrtimer schedules on every
> > entry/exit.
> > 
> > Second, we change the internal behavior of the timer, letting the timer
> > keep track of its previous output state, and only lower/raise the line
> > to the vgic when the state changes.  Note that in theory this could have
> > been accomplished more simply by signalling the vgic every time the
> > state *potentially* changed, but we don't want to be hitting the vgic
> > more often than necessary.
> > 
> > Third, we get rid of the use of the map->active field in the vgic and
> > instead simply set the interrupt as active on the physical distributor
> > whenever we signal a mapped interrupt to the guest, and we reset the
> > active state when we sync back the HW state from the vgic.
> > 
> > Fourth, and finally, we now initialize the timer PPIs (and all the other
> > unused PPIs for now), to be level-triggered, and modify the sync code to
> > sample the line state on HW sync and re-inject a new interrupt if it is
> > still pending at that time.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/kvm/arm.c           | 11 +++++--
> >  include/kvm/arm_arch_timer.h |  2 +-
> >  include/kvm/arm_vgic.h       |  3 --
> >  virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
> >  virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
> >  5 files changed, 81 insertions(+), 70 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index bdf8871..102a4aa 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >  			local_irq_enable();
> > +			kvm_timer_sync_hwstate(vcpu);
> >  			kvm_vgic_sync_hwstate(vcpu);
> >  			preempt_enable();
> > -			kvm_timer_sync_hwstate(vcpu);
> >  			continue;
> >  		}
> >  
> > @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_guest_exit();
> >  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >  
> > +		/*
> > +		 * We must sync the timer state before the vgic state so that
> > +		 * the vgic can properly sample the updated state of the
> > +		 * interrupt line.
> > +		 */
> > +		kvm_timer_sync_hwstate(vcpu);
> > +
> >  		kvm_vgic_sync_hwstate(vcpu);
> >  
> >  		preempt_enable();
> >  
> > -		kvm_timer_sync_hwstate(vcpu);
> > -
> >  		ret = handle_exit(vcpu, run, ret);
> >  	}
> >  
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index ef14cc1..1800227 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -51,7 +51,7 @@ struct arch_timer_cpu {
> >  	bool				armed;
> >  
> >  	/* Timer IRQ */
> > -	const struct kvm_irq_level	*irq;
> > +	struct kvm_irq_level		irq;
> >  
> >  	/* VGIC mapping */
> >  	struct irq_phys_map		*map;
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index d901f1a..99011a0 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -163,7 +163,6 @@ struct irq_phys_map {
> >  	u32			virt_irq;
> >  	u32			phys_irq;
> >  	u32			irq;
> > -	bool			active;
> >  };
> >  
> >  struct irq_phys_map_entry {
> > @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> >  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> >  					   int virt_irq, int irq);
> >  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> > -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> > -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> >  
> >  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
> >  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 018f3d6..747302f 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
> >  	}
> >  }
> >  
> > -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> > -{
> > -	int ret;
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -
> > -	kvm_vgic_set_phys_irq_active(timer->map, true);
> > -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> > -					 timer->map,
> > -					 timer->irq->level);
> > -	WARN_ON(ret);
> > -}
> > -
> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> >  {
> >  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > @@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  
> >  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> > -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> > -		!kvm_vgic_get_phys_irq_active(timer->map);
> > +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
> >  }
> >  
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> > @@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  	return cval <= now;
> >  }
> >  
> > +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> > +{
> > +	int ret;
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	BUG_ON(!vgic_initialized(vcpu->kvm));
> > +
> > +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> > +					 timer->map,
> > +					 timer->irq.level);
> > +	WARN_ON(ret);
> > +}
> > +
> > +/*
> > + * Check if there was a change in the timer state (should we raise or lower
> > + * the line level to the GIC).
> > + */
> > +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	/*
> > +	 * If userspace modified the timer registers via SET_ONE_REG before
> > +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> > +	 * because the guest would never see the interrupt.  Instead wait
> > +	 * until we call this funciton from kvm_timer_flush_hwstate.
> > +	 */
> > +	if (!vgic_initialized(vcpu->kvm))
> > +	    return;
> > +
> > +	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> > +		timer->irq.level = 1;
> > +		kvm_timer_update_irq(vcpu);
> > +	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> > +		timer->irq.level = 0;
> > +		kvm_timer_update_irq(vcpu);
> > +	}
> > +}
> > +
> 
> It took me ages to parse this, so I rewrote it to match my understanding:
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 8a0fdfc..a722f0f 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -121,13 +121,14 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> -static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_state)
>  {
>  	int ret;
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
>  	BUG_ON(!vgic_initialized(vcpu->kvm));
>  
> +	timer->irq.level = new_state;
>  	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
>  					 timer->map,
>  					 timer->irq.level);
> @@ -151,13 +152,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  	if (!vgic_initialized(vcpu->kvm))
>  	    return;
>  
> -	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> -		timer->irq.level = 1;
> -		kvm_timer_update_irq(vcpu);
> -	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> -		timer->irq.level = 0;
> -		kvm_timer_update_irq(vcpu);
> -	}
> +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
> +		kvm_timer_update_irq(vcpu, !timer->irq.level);
>  }
>  
>  /*
> 
> Did I get it right?

almost, you'd have to assign timer->irq.level after you check for it
though, right?

> 
> >  /*
> >   * Schedule the background timer before calling kvm_vcpu_block, so that this
> >   * thread is removed from its waitqueue and made runnable when there's a timer
> > @@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	 * If the timer expired while we were not scheduled, now is the time
> >  	 * to inject it.
> >  	 */
> > -	if (kvm_timer_should_fire(vcpu))
> > -		kvm_timer_inject_irq(vcpu);
> > +	kvm_timer_update_state(vcpu);
> >  }
> >  
> >  /**
> > @@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  
> >  	BUG_ON(timer_is_armed(timer));
> >  
> > -	if (kvm_timer_should_fire(vcpu))
> > -		kvm_timer_inject_irq(vcpu);
> > +	/*
> > +	 * The guest could have modified the timer registers or the timer
> > +	 * could have expired, update the timer state.
> > +	 */
> > +	kvm_timer_update_state(vcpu);
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> > @@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >  	 * kvm_vcpu_set_target(). To handle this, we determine
> >  	 * vcpu timer irq number when the vcpu is reset.
> >  	 */
> > -	timer->irq = irq;
> > +	timer->irq.irq = irq->irq;
> >  
> >  	/*
> >  	 * Tell the VGIC that the virtual interrupt is tied to a
> > @@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  	default:
> >  		return -1;
> >  	}
> > +
> > +	kvm_timer_update_state(vcpu);
> >  	return 0;
> >  }
> >  
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 9ed8d53..f4ea950 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  /*
> >   * Save the physical active state, and reset it to inactive.
> >   *
> > - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> > + * Return true if there's a pending level triggered interrupt line to queue.
> >   */
> > -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> > +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> >  {
> >  	struct irq_phys_map *map;
> > +	bool phys_active;
> >  	int ret;
> >  
> >  	if (!(vlr.state & LR_HW))
> >  		return 0;
> >  
> >  	map = vgic_irq_map_search(vcpu, vlr.irq);
> > -	BUG_ON(!map || !map->active);
> > +	BUG_ON(!map);
> >  
> >  	ret = irq_get_irqchip_state(map->irq,
> >  				    IRQCHIP_STATE_ACTIVE,
> > -				    &map->active);
> > +				    &phys_active);
> >  
> >  	WARN_ON(ret);
> >  
> > -	if (map->active) {
> > +	if (phys_active) {
> > +		/*
> > +		 * Interrupt still marked as active on the physical
> > +		 * distributor, so guest did not EOI it yet.  Reset to
> > +		 * non-active so that other VMs can see interrupts from this
> > +		 * device.
> > +		 */
> >  		ret = irq_set_irqchip_state(map->irq,
> >  					    IRQCHIP_STATE_ACTIVE,
> >  					    false);
> >  		WARN_ON(ret);
> > -		return 0;
> > +		return false;
> >  	}
> >  
> > -	return 1;
> > +	/* Mapped edge-triggered interrupts not yet supported. */
> > +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> 
> Hmmm. What are we missing?
> 

I don't know really, my brain ran out of memory, but it's not like we
claimed to support this earlier and clearly we didn't work this well
enough through.

> > +	return process_level_irq(vcpu, lr, vlr);
> >  }
> >  
> >  /* Sync back the VGIC state after a guest run */
> > @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
> >  			continue;
> >  
> >  		vlr = vgic_get_lr(vcpu, lr);
> > -		if (vgic_sync_hwirq(vcpu, vlr)) {
> > -			/*
> > -			 * So this is a HW interrupt that the guest
> > -			 * EOI-ed. Clean the LR state and allow the
> > -			 * interrupt to be sampled again.
> > -			 */
> > -			vlr.state = 0;
> > -			vlr.hwirq = 0;
> > -			vgic_set_lr(vcpu, lr, vlr);
> > -			vgic_irq_clear_queued(vcpu, vlr.irq);
> > -			set_bit(lr, elrsr_ptr);
> > -		}
> > +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> > +			level_pending = true;
> >  
> >  		if (!test_bit(lr, elrsr_ptr))
> >  			continue;
> > @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
> >  }
> >  
> >  /**
> > - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> > - *
> > - * Return the logical active state of a mapped interrupt. This doesn't
> > - * necessarily reflects the current HW state.
> > - */
> > -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> > -{
> > -	BUG_ON(!map);
> > -	return map->active;
> > -}
> > -
> > -/**
> > - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> > - *
> > - * Set the logical active state of a mapped interrupt. This doesn't
> > - * immediately affects the HW state.
> > - */
> > -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> > -{
> > -	BUG_ON(!map);
> > -	map->active = active;
> > -}
> > -
> > -/**
> >   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
> >   * @vcpu: The VCPU pointer
> >   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> > @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
> >  			if (i < VGIC_NR_SGIS)
> >  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
> >  							vcpu->vcpu_id, i, 1);
> > -			if (i < VGIC_NR_PRIVATE_IRQS)
> > +			if (i < VGIC_NR_SGIS)
> >  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> >  							vcpu->vcpu_id, i,
> >  							VGIC_CFG_EDGE);
> > +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> > +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> > +							vcpu->vcpu_id, i,
> > +							VGIC_CFG_LEVEL);
> >  		}
> >  
> >  		vgic_enable(vcpu);
> > 
> 
> My only real objection to this patch is that it puts my brain upside down.
> Hopefully that won't last.
> 
Yeah, I tried helping in the commit message, but I couldn't do much
beyond that. Splitting up the patch further didn't really work out for
me.

Thanks for the review,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-03 17:23       ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-03 17:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 03, 2015 at 06:06:39PM +0100, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
> > The arch timer currently uses edge-triggered semantics in the sense that
> > the line is never sampled by the vgic and lowering the line from the
> > timer to the vgic doesn't have any affect on the pending state of
> > virtual interrupts in the vgic.  This means that we do not support a
> > guest with the otherwise valid behavior of (1) disable interrupts (2)
> > enable the timer (3) disable the timer (4) enable interrupts.  Such a
> > guest would validly not expect to see any interrupts on real hardware,
> > but will see interrupts on KVM.
> > 
> > This patches fixes this shortcoming through the following series of
> > changes.
> > 
> > First, we change the flow of the timer/vgic sync/flush operations.  Now
> > the timer is always flushed/synced before the vgic, because the vgic
> > samples the state of the timer output.  This has the implication that we
> > move the timer operations in to non-preempible sections, but that is
> > fine after the previous commit getting rid of hrtimer schedules on every
> > entry/exit.
> > 
> > Second, we change the internal behavior of the timer, letting the timer
> > keep track of its previous output state, and only lower/raise the line
> > to the vgic when the state changes.  Note that in theory this could have
> > been accomplished more simply by signalling the vgic every time the
> > state *potentially* changed, but we don't want to be hitting the vgic
> > more often than necessary.
> > 
> > Third, we get rid of the use of the map->active field in the vgic and
> > instead simply set the interrupt as active on the physical distributor
> > whenever we signal a mapped interrupt to the guest, and we reset the
> > active state when we sync back the HW state from the vgic.
> > 
> > Fourth, and finally, we now initialize the timer PPIs (and all the other
> > unused PPIs for now), to be level-triggered, and modify the sync code to
> > sample the line state on HW sync and re-inject a new interrupt if it is
> > still pending at that time.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  arch/arm/kvm/arm.c           | 11 +++++--
> >  include/kvm/arm_arch_timer.h |  2 +-
> >  include/kvm/arm_vgic.h       |  3 --
> >  virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
> >  virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
> >  5 files changed, 81 insertions(+), 70 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index bdf8871..102a4aa 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >  			local_irq_enable();
> > +			kvm_timer_sync_hwstate(vcpu);
> >  			kvm_vgic_sync_hwstate(vcpu);
> >  			preempt_enable();
> > -			kvm_timer_sync_hwstate(vcpu);
> >  			continue;
> >  		}
> >  
> > @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		kvm_guest_exit();
> >  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >  
> > +		/*
> > +		 * We must sync the timer state before the vgic state so that
> > +		 * the vgic can properly sample the updated state of the
> > +		 * interrupt line.
> > +		 */
> > +		kvm_timer_sync_hwstate(vcpu);
> > +
> >  		kvm_vgic_sync_hwstate(vcpu);
> >  
> >  		preempt_enable();
> >  
> > -		kvm_timer_sync_hwstate(vcpu);
> > -
> >  		ret = handle_exit(vcpu, run, ret);
> >  	}
> >  
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index ef14cc1..1800227 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -51,7 +51,7 @@ struct arch_timer_cpu {
> >  	bool				armed;
> >  
> >  	/* Timer IRQ */
> > -	const struct kvm_irq_level	*irq;
> > +	struct kvm_irq_level		irq;
> >  
> >  	/* VGIC mapping */
> >  	struct irq_phys_map		*map;
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index d901f1a..99011a0 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -163,7 +163,6 @@ struct irq_phys_map {
> >  	u32			virt_irq;
> >  	u32			phys_irq;
> >  	u32			irq;
> > -	bool			active;
> >  };
> >  
> >  struct irq_phys_map_entry {
> > @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> >  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> >  					   int virt_irq, int irq);
> >  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> > -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> > -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> >  
> >  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
> >  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 018f3d6..747302f 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
> >  	}
> >  }
> >  
> > -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> > -{
> > -	int ret;
> > -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > -
> > -	kvm_vgic_set_phys_irq_active(timer->map, true);
> > -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> > -					 timer->map,
> > -					 timer->irq->level);
> > -	WARN_ON(ret);
> > -}
> > -
> >  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> >  {
> >  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > @@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >  
> >  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> > -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> > -		!kvm_vgic_get_phys_irq_active(timer->map);
> > +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
> >  }
> >  
> >  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> > @@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  	return cval <= now;
> >  }
> >  
> > +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> > +{
> > +	int ret;
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	BUG_ON(!vgic_initialized(vcpu->kvm));
> > +
> > +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> > +					 timer->map,
> > +					 timer->irq.level);
> > +	WARN_ON(ret);
> > +}
> > +
> > +/*
> > + * Check if there was a change in the timer state (should we raise or lower
> > + * the line level to the GIC).
> > + */
> > +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> > +{
> > +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +
> > +	/*
> > +	 * If userspace modified the timer registers via SET_ONE_REG before
> > +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> > +	 * because the guest would never see the interrupt.  Instead wait
> > +	 * until we call this funciton from kvm_timer_flush_hwstate.
> > +	 */
> > +	if (!vgic_initialized(vcpu->kvm))
> > +	    return;
> > +
> > +	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> > +		timer->irq.level = 1;
> > +		kvm_timer_update_irq(vcpu);
> > +	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> > +		timer->irq.level = 0;
> > +		kvm_timer_update_irq(vcpu);
> > +	}
> > +}
> > +
> 
> It took me ages to parse this, so I rewrote it to match my understanding:
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 8a0fdfc..a722f0f 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -121,13 +121,14 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  	return cval <= now;
>  }
>  
> -static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_state)
>  {
>  	int ret;
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>  
>  	BUG_ON(!vgic_initialized(vcpu->kvm));
>  
> +	timer->irq.level = new_state;
>  	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
>  					 timer->map,
>  					 timer->irq.level);
> @@ -151,13 +152,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  	if (!vgic_initialized(vcpu->kvm))
>  	    return;
>  
> -	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> -		timer->irq.level = 1;
> -		kvm_timer_update_irq(vcpu);
> -	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> -		timer->irq.level = 0;
> -		kvm_timer_update_irq(vcpu);
> -	}
> +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
> +		kvm_timer_update_irq(vcpu, !timer->irq.level);
>  }
>  
>  /*
> 
> Did I get it right?

almost, you'd have to assign timer->irq.level after you check for it
though, right?

> 
> >  /*
> >   * Schedule the background timer before calling kvm_vcpu_block, so that this
> >   * thread is removed from its waitqueue and made runnable when there's a timer
> > @@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	 * If the timer expired while we were not scheduled, now is the time
> >  	 * to inject it.
> >  	 */
> > -	if (kvm_timer_should_fire(vcpu))
> > -		kvm_timer_inject_irq(vcpu);
> > +	kvm_timer_update_state(vcpu);
> >  }
> >  
> >  /**
> > @@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >  
> >  	BUG_ON(timer_is_armed(timer));
> >  
> > -	if (kvm_timer_should_fire(vcpu))
> > -		kvm_timer_inject_irq(vcpu);
> > +	/*
> > +	 * The guest could have modified the timer registers or the timer
> > +	 * could have expired, update the timer state.
> > +	 */
> > +	kvm_timer_update_state(vcpu);
> >  }
> >  
> >  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> > @@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >  	 * kvm_vcpu_set_target(). To handle this, we determine
> >  	 * vcpu timer irq number when the vcpu is reset.
> >  	 */
> > -	timer->irq = irq;
> > +	timer->irq.irq = irq->irq;
> >  
> >  	/*
> >  	 * Tell the VGIC that the virtual interrupt is tied to a
> > @@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  	default:
> >  		return -1;
> >  	}
> > +
> > +	kvm_timer_update_state(vcpu);
> >  	return 0;
> >  }
> >  
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 9ed8d53..f4ea950 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >  /*
> >   * Save the physical active state, and reset it to inactive.
> >   *
> > - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> > + * Return true if there's a pending level triggered interrupt line to queue.
> >   */
> > -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> > +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> >  {
> >  	struct irq_phys_map *map;
> > +	bool phys_active;
> >  	int ret;
> >  
> >  	if (!(vlr.state & LR_HW))
> >  		return 0;
> >  
> >  	map = vgic_irq_map_search(vcpu, vlr.irq);
> > -	BUG_ON(!map || !map->active);
> > +	BUG_ON(!map);
> >  
> >  	ret = irq_get_irqchip_state(map->irq,
> >  				    IRQCHIP_STATE_ACTIVE,
> > -				    &map->active);
> > +				    &phys_active);
> >  
> >  	WARN_ON(ret);
> >  
> > -	if (map->active) {
> > +	if (phys_active) {
> > +		/*
> > +		 * Interrupt still marked as active on the physical
> > +		 * distributor, so guest did not EOI it yet.  Reset to
> > +		 * non-active so that other VMs can see interrupts from this
> > +		 * device.
> > +		 */
> >  		ret = irq_set_irqchip_state(map->irq,
> >  					    IRQCHIP_STATE_ACTIVE,
> >  					    false);
> >  		WARN_ON(ret);
> > -		return 0;
> > +		return false;
> >  	}
> >  
> > -	return 1;
> > +	/* Mapped edge-triggered interrupts not yet supported. */
> > +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> 
> Hmmm. What are we missing?
> 

I don't know really, my brain ran out of memory, but it's not like we
claimed to support this earlier and clearly we didn't work this well
enough through.

> > +	return process_level_irq(vcpu, lr, vlr);
> >  }
> >  
> >  /* Sync back the VGIC state after a guest run */
> > @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
> >  			continue;
> >  
> >  		vlr = vgic_get_lr(vcpu, lr);
> > -		if (vgic_sync_hwirq(vcpu, vlr)) {
> > -			/*
> > -			 * So this is a HW interrupt that the guest
> > -			 * EOI-ed. Clean the LR state and allow the
> > -			 * interrupt to be sampled again.
> > -			 */
> > -			vlr.state = 0;
> > -			vlr.hwirq = 0;
> > -			vgic_set_lr(vcpu, lr, vlr);
> > -			vgic_irq_clear_queued(vcpu, vlr.irq);
> > -			set_bit(lr, elrsr_ptr);
> > -		}
> > +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> > +			level_pending = true;
> >  
> >  		if (!test_bit(lr, elrsr_ptr))
> >  			continue;
> > @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
> >  }
> >  
> >  /**
> > - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> > - *
> > - * Return the logical active state of a mapped interrupt. This doesn't
> > - * necessarily reflects the current HW state.
> > - */
> > -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> > -{
> > -	BUG_ON(!map);
> > -	return map->active;
> > -}
> > -
> > -/**
> > - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> > - *
> > - * Set the logical active state of a mapped interrupt. This doesn't
> > - * immediately affects the HW state.
> > - */
> > -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> > -{
> > -	BUG_ON(!map);
> > -	map->active = active;
> > -}
> > -
> > -/**
> >   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
> >   * @vcpu: The VCPU pointer
> >   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> > @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
> >  			if (i < VGIC_NR_SGIS)
> >  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
> >  							vcpu->vcpu_id, i, 1);
> > -			if (i < VGIC_NR_PRIVATE_IRQS)
> > +			if (i < VGIC_NR_SGIS)
> >  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> >  							vcpu->vcpu_id, i,
> >  							VGIC_CFG_EDGE);
> > +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> > +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> > +							vcpu->vcpu_id, i,
> > +							VGIC_CFG_LEVEL);
> >  		}
> >  
> >  		vgic_enable(vcpu);
> > 
> 
> My only real objection to this patch is that it puts my brain upside down.
> Hopefully that won't last.
> 
Yeah, I tried helping in the commit message, but I couldn't do much
beyond that. Splitting up the patch further didn't really work out for
me.

Thanks for the review,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-03 17:23       ` Christoffer Dall
@ 2015-09-03 17:29         ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 17:29 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, linux-arm-kernel, kvm

On 03/09/15 18:23, Christoffer Dall wrote:
> On Thu, Sep 03, 2015 at 06:06:39PM +0100, Marc Zyngier wrote:
>> On 30/08/15 14:54, Christoffer Dall wrote:
>>> The arch timer currently uses edge-triggered semantics in the sense that
>>> the line is never sampled by the vgic and lowering the line from the
>>> timer to the vgic doesn't have any affect on the pending state of
>>> virtual interrupts in the vgic.  This means that we do not support a
>>> guest with the otherwise valid behavior of (1) disable interrupts (2)
>>> enable the timer (3) disable the timer (4) enable interrupts.  Such a
>>> guest would validly not expect to see any interrupts on real hardware,
>>> but will see interrupts on KVM.
>>>
>>> This patches fixes this shortcoming through the following series of
>>> changes.
>>>
>>> First, we change the flow of the timer/vgic sync/flush operations.  Now
>>> the timer is always flushed/synced before the vgic, because the vgic
>>> samples the state of the timer output.  This has the implication that we
>>> move the timer operations in to non-preempible sections, but that is
>>> fine after the previous commit getting rid of hrtimer schedules on every
>>> entry/exit.
>>>
>>> Second, we change the internal behavior of the timer, letting the timer
>>> keep track of its previous output state, and only lower/raise the line
>>> to the vgic when the state changes.  Note that in theory this could have
>>> been accomplished more simply by signalling the vgic every time the
>>> state *potentially* changed, but we don't want to be hitting the vgic
>>> more often than necessary.
>>>
>>> Third, we get rid of the use of the map->active field in the vgic and
>>> instead simply set the interrupt as active on the physical distributor
>>> whenever we signal a mapped interrupt to the guest, and we reset the
>>> active state when we sync back the HW state from the vgic.
>>>
>>> Fourth, and finally, we now initialize the timer PPIs (and all the other
>>> unused PPIs for now), to be level-triggered, and modify the sync code to
>>> sample the line state on HW sync and re-inject a new interrupt if it is
>>> still pending at that time.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  arch/arm/kvm/arm.c           | 11 +++++--
>>>  include/kvm/arm_arch_timer.h |  2 +-
>>>  include/kvm/arm_vgic.h       |  3 --
>>>  virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
>>>  virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
>>>  5 files changed, 81 insertions(+), 70 deletions(-)
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index bdf8871..102a4aa 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  
>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>  			local_irq_enable();
>>> +			kvm_timer_sync_hwstate(vcpu);
>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>  			preempt_enable();
>>> -			kvm_timer_sync_hwstate(vcpu);
>>>  			continue;
>>>  		}
>>>  
>>> @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		kvm_guest_exit();
>>>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>  
>>> +		/*
>>> +		 * We must sync the timer state before the vgic state so that
>>> +		 * the vgic can properly sample the updated state of the
>>> +		 * interrupt line.
>>> +		 */
>>> +		kvm_timer_sync_hwstate(vcpu);
>>> +
>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>  
>>>  		preempt_enable();
>>>  
>>> -		kvm_timer_sync_hwstate(vcpu);
>>> -
>>>  		ret = handle_exit(vcpu, run, ret);
>>>  	}
>>>  
>>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
>>> index ef14cc1..1800227 100644
>>> --- a/include/kvm/arm_arch_timer.h
>>> +++ b/include/kvm/arm_arch_timer.h
>>> @@ -51,7 +51,7 @@ struct arch_timer_cpu {
>>>  	bool				armed;
>>>  
>>>  	/* Timer IRQ */
>>> -	const struct kvm_irq_level	*irq;
>>> +	struct kvm_irq_level		irq;
>>>  
>>>  	/* VGIC mapping */
>>>  	struct irq_phys_map		*map;
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index d901f1a..99011a0 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -163,7 +163,6 @@ struct irq_phys_map {
>>>  	u32			virt_irq;
>>>  	u32			phys_irq;
>>>  	u32			irq;
>>> -	bool			active;
>>>  };
>>>  
>>>  struct irq_phys_map_entry {
>>> @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>>  					   int virt_irq, int irq);
>>>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
>>> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>>  
>>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index 018f3d6..747302f 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
>>>  	}
>>>  }
>>>  
>>> -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
>>> -{
>>> -	int ret;
>>> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> -
>>> -	kvm_vgic_set_phys_irq_active(timer->map, true);
>>> -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
>>> -					 timer->map,
>>> -					 timer->irq->level);
>>> -	WARN_ON(ret);
>>> -}
>>> -
>>>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>>>  {
>>>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
>>> @@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
>>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>>  
>>>  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
>>> -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
>>> -		!kvm_vgic_get_phys_irq_active(timer->map);
>>> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
>>>  }
>>>  
>>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>> @@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>>  	return cval <= now;
>>>  }
>>>  
>>> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
>>> +{
>>> +	int ret;
>>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +
>>> +	BUG_ON(!vgic_initialized(vcpu->kvm));
>>> +
>>> +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
>>> +					 timer->map,
>>> +					 timer->irq.level);
>>> +	WARN_ON(ret);
>>> +}
>>> +
>>> +/*
>>> + * Check if there was a change in the timer state (should we raise or lower
>>> + * the line level to the GIC).
>>> + */
>>> +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +
>>> +	/*
>>> +	 * If userspace modified the timer registers via SET_ONE_REG before
>>> +	 * the vgic was initialized, we mustn't set the timer->irq.level value
>>> +	 * because the guest would never see the interrupt.  Instead wait
>>> +	 * until we call this funciton from kvm_timer_flush_hwstate.
>>> +	 */
>>> +	if (!vgic_initialized(vcpu->kvm))
>>> +	    return;
>>> +
>>> +	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
>>> +		timer->irq.level = 1;
>>> +		kvm_timer_update_irq(vcpu);
>>> +	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
>>> +		timer->irq.level = 0;
>>> +		kvm_timer_update_irq(vcpu);
>>> +	}
>>> +}
>>> +
>>
>> It took me ages to parse this, so I rewrote it to match my understanding:
>>
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index 8a0fdfc..a722f0f 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -121,13 +121,14 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>  	return cval <= now;
>>  }
>>  
>> -static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
>> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_state)
>>  {
>>  	int ret;
>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>  
>>  	BUG_ON(!vgic_initialized(vcpu->kvm));
>>  
>> +	timer->irq.level = new_state;
>>  	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
>>  					 timer->map,
>>  					 timer->irq.level);
>> @@ -151,13 +152,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>>  	if (!vgic_initialized(vcpu->kvm))
>>  	    return;
>>  
>> -	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
>> -		timer->irq.level = 1;
>> -		kvm_timer_update_irq(vcpu);
>> -	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
>> -		timer->irq.level = 0;
>> -		kvm_timer_update_irq(vcpu);
>> -	}
>> +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
>> +		kvm_timer_update_irq(vcpu, !timer->irq.level);
>>  }
>>  
>>  /*
>>
>> Did I get it right?
> 
> almost, you'd have to assign timer->irq.level after you check for it
> though, right?

That's why I've added this line in kvm_timer_update_irq()! :-)

>>
>>>  /*
>>>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>>>   * thread is removed from its waitqueue and made runnable when there's a timer
>>> @@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>>>  	 * If the timer expired while we were not scheduled, now is the time
>>>  	 * to inject it.
>>>  	 */
>>> -	if (kvm_timer_should_fire(vcpu))
>>> -		kvm_timer_inject_irq(vcpu);
>>> +	kvm_timer_update_state(vcpu);
>>>  }
>>>  
>>>  /**
>>> @@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>>>  
>>>  	BUG_ON(timer_is_armed(timer));
>>>  
>>> -	if (kvm_timer_should_fire(vcpu))
>>> -		kvm_timer_inject_irq(vcpu);
>>> +	/*
>>> +	 * The guest could have modified the timer registers or the timer
>>> +	 * could have expired, update the timer state.
>>> +	 */
>>> +	kvm_timer_update_state(vcpu);
>>>  }
>>>  
>>>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>> @@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>  	 * kvm_vcpu_set_target(). To handle this, we determine
>>>  	 * vcpu timer irq number when the vcpu is reset.
>>>  	 */
>>> -	timer->irq = irq;
>>> +	timer->irq.irq = irq->irq;
>>>  
>>>  	/*
>>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>> @@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>>>  	default:
>>>  		return -1;
>>>  	}
>>> +
>>> +	kvm_timer_update_state(vcpu);
>>>  	return 0;
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 9ed8d53..f4ea950 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>  /*
>>>   * Save the physical active state, and reset it to inactive.
>>>   *
>>> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
>>> + * Return true if there's a pending level triggered interrupt line to queue.
>>>   */
>>> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>>>  {
>>>  	struct irq_phys_map *map;
>>> +	bool phys_active;
>>>  	int ret;
>>>  
>>>  	if (!(vlr.state & LR_HW))
>>>  		return 0;
>>>  
>>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>>> -	BUG_ON(!map || !map->active);
>>> +	BUG_ON(!map);
>>>  
>>>  	ret = irq_get_irqchip_state(map->irq,
>>>  				    IRQCHIP_STATE_ACTIVE,
>>> -				    &map->active);
>>> +				    &phys_active);
>>>  
>>>  	WARN_ON(ret);
>>>  
>>> -	if (map->active) {
>>> +	if (phys_active) {
>>> +		/*
>>> +		 * Interrupt still marked as active on the physical
>>> +		 * distributor, so guest did not EOI it yet.  Reset to
>>> +		 * non-active so that other VMs can see interrupts from this
>>> +		 * device.
>>> +		 */
>>>  		ret = irq_set_irqchip_state(map->irq,
>>>  					    IRQCHIP_STATE_ACTIVE,
>>>  					    false);
>>>  		WARN_ON(ret);
>>> -		return 0;
>>> +		return false;
>>>  	}
>>>  
>>> -	return 1;
>>> +	/* Mapped edge-triggered interrupts not yet supported. */
>>> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
>>
>> Hmmm. What are we missing?
>>
> 
> I don't know really, my brain ran out of memory, but it's not like we
> claimed to support this earlier and clearly we didn't work this well
> enough through.

We can definitely revisit this later, but I have the feeling that the
flow is quite similar...

> 
>>> +	return process_level_irq(vcpu, lr, vlr);
>>>  }
>>>  
>>>  /* Sync back the VGIC state after a guest run */
>>> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>>>  			continue;
>>>  
>>>  		vlr = vgic_get_lr(vcpu, lr);
>>> -		if (vgic_sync_hwirq(vcpu, vlr)) {
>>> -			/*
>>> -			 * So this is a HW interrupt that the guest
>>> -			 * EOI-ed. Clean the LR state and allow the
>>> -			 * interrupt to be sampled again.
>>> -			 */
>>> -			vlr.state = 0;
>>> -			vlr.hwirq = 0;
>>> -			vgic_set_lr(vcpu, lr, vlr);
>>> -			vgic_irq_clear_queued(vcpu, vlr.irq);
>>> -			set_bit(lr, elrsr_ptr);
>>> -		}
>>> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
>>> +			level_pending = true;
>>>  
>>>  		if (!test_bit(lr, elrsr_ptr))
>>>  			continue;
>>> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
>>>  }
>>>  
>>>  /**
>>> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
>>> - *
>>> - * Return the logical active state of a mapped interrupt. This doesn't
>>> - * necessarily reflects the current HW state.
>>> - */
>>> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
>>> -{
>>> -	BUG_ON(!map);
>>> -	return map->active;
>>> -}
>>> -
>>> -/**
>>> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
>>> - *
>>> - * Set the logical active state of a mapped interrupt. This doesn't
>>> - * immediately affects the HW state.
>>> - */
>>> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
>>> -{
>>> -	BUG_ON(!map);
>>> -	map->active = active;
>>> -}
>>> -
>>> -/**
>>>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>>>   * @vcpu: The VCPU pointer
>>>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
>>> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>>>  			if (i < VGIC_NR_SGIS)
>>>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>>>  							vcpu->vcpu_id, i, 1);
>>> -			if (i < VGIC_NR_PRIVATE_IRQS)
>>> +			if (i < VGIC_NR_SGIS)
>>>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>>>  							vcpu->vcpu_id, i,
>>>  							VGIC_CFG_EDGE);
>>> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
>>> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>>> +							vcpu->vcpu_id, i,
>>> +							VGIC_CFG_LEVEL);
>>>  		}
>>>  
>>>  		vgic_enable(vcpu);
>>>
>>
>> My only real objection to this patch is that it puts my brain upside down.
>> Hopefully that won't last.
>>
> Yeah, I tried helping in the commit message, but I couldn't do much
> beyond that. Splitting up the patch further didn't really work out for
> me.

It is indeed quite intricated, and hard to really take apart. Guess
we'll have to live with it.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-03 17:29         ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-03 17:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/09/15 18:23, Christoffer Dall wrote:
> On Thu, Sep 03, 2015 at 06:06:39PM +0100, Marc Zyngier wrote:
>> On 30/08/15 14:54, Christoffer Dall wrote:
>>> The arch timer currently uses edge-triggered semantics in the sense that
>>> the line is never sampled by the vgic and lowering the line from the
>>> timer to the vgic doesn't have any affect on the pending state of
>>> virtual interrupts in the vgic.  This means that we do not support a
>>> guest with the otherwise valid behavior of (1) disable interrupts (2)
>>> enable the timer (3) disable the timer (4) enable interrupts.  Such a
>>> guest would validly not expect to see any interrupts on real hardware,
>>> but will see interrupts on KVM.
>>>
>>> This patches fixes this shortcoming through the following series of
>>> changes.
>>>
>>> First, we change the flow of the timer/vgic sync/flush operations.  Now
>>> the timer is always flushed/synced before the vgic, because the vgic
>>> samples the state of the timer output.  This has the implication that we
>>> move the timer operations in to non-preempible sections, but that is
>>> fine after the previous commit getting rid of hrtimer schedules on every
>>> entry/exit.
>>>
>>> Second, we change the internal behavior of the timer, letting the timer
>>> keep track of its previous output state, and only lower/raise the line
>>> to the vgic when the state changes.  Note that in theory this could have
>>> been accomplished more simply by signalling the vgic every time the
>>> state *potentially* changed, but we don't want to be hitting the vgic
>>> more often than necessary.
>>>
>>> Third, we get rid of the use of the map->active field in the vgic and
>>> instead simply set the interrupt as active on the physical distributor
>>> whenever we signal a mapped interrupt to the guest, and we reset the
>>> active state when we sync back the HW state from the vgic.
>>>
>>> Fourth, and finally, we now initialize the timer PPIs (and all the other
>>> unused PPIs for now), to be level-triggered, and modify the sync code to
>>> sample the line state on HW sync and re-inject a new interrupt if it is
>>> still pending at that time.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  arch/arm/kvm/arm.c           | 11 +++++--
>>>  include/kvm/arm_arch_timer.h |  2 +-
>>>  include/kvm/arm_vgic.h       |  3 --
>>>  virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
>>>  virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
>>>  5 files changed, 81 insertions(+), 70 deletions(-)
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index bdf8871..102a4aa 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  
>>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>>  			local_irq_enable();
>>> +			kvm_timer_sync_hwstate(vcpu);
>>>  			kvm_vgic_sync_hwstate(vcpu);
>>>  			preempt_enable();
>>> -			kvm_timer_sync_hwstate(vcpu);
>>>  			continue;
>>>  		}
>>>  
>>> @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		kvm_guest_exit();
>>>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>  
>>> +		/*
>>> +		 * We must sync the timer state before the vgic state so that
>>> +		 * the vgic can properly sample the updated state of the
>>> +		 * interrupt line.
>>> +		 */
>>> +		kvm_timer_sync_hwstate(vcpu);
>>> +
>>>  		kvm_vgic_sync_hwstate(vcpu);
>>>  
>>>  		preempt_enable();
>>>  
>>> -		kvm_timer_sync_hwstate(vcpu);
>>> -
>>>  		ret = handle_exit(vcpu, run, ret);
>>>  	}
>>>  
>>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
>>> index ef14cc1..1800227 100644
>>> --- a/include/kvm/arm_arch_timer.h
>>> +++ b/include/kvm/arm_arch_timer.h
>>> @@ -51,7 +51,7 @@ struct arch_timer_cpu {
>>>  	bool				armed;
>>>  
>>>  	/* Timer IRQ */
>>> -	const struct kvm_irq_level	*irq;
>>> +	struct kvm_irq_level		irq;
>>>  
>>>  	/* VGIC mapping */
>>>  	struct irq_phys_map		*map;
>>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>>> index d901f1a..99011a0 100644
>>> --- a/include/kvm/arm_vgic.h
>>> +++ b/include/kvm/arm_vgic.h
>>> @@ -163,7 +163,6 @@ struct irq_phys_map {
>>>  	u32			virt_irq;
>>>  	u32			phys_irq;
>>>  	u32			irq;
>>> -	bool			active;
>>>  };
>>>  
>>>  struct irq_phys_map_entry {
>>> @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
>>>  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
>>>  					   int virt_irq, int irq);
>>>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
>>> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
>>> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
>>>  
>>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
>>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index 018f3d6..747302f 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
>>>  	}
>>>  }
>>>  
>>> -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
>>> -{
>>> -	int ret;
>>> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> -
>>> -	kvm_vgic_set_phys_irq_active(timer->map, true);
>>> -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
>>> -					 timer->map,
>>> -					 timer->irq->level);
>>> -	WARN_ON(ret);
>>> -}
>>> -
>>>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>>>  {
>>>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
>>> @@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
>>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>>  
>>>  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
>>> -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
>>> -		!kvm_vgic_get_phys_irq_active(timer->map);
>>> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
>>>  }
>>>  
>>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>> @@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>>  	return cval <= now;
>>>  }
>>>  
>>> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
>>> +{
>>> +	int ret;
>>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +
>>> +	BUG_ON(!vgic_initialized(vcpu->kvm));
>>> +
>>> +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
>>> +					 timer->map,
>>> +					 timer->irq.level);
>>> +	WARN_ON(ret);
>>> +}
>>> +
>>> +/*
>>> + * Check if there was a change in the timer state (should we raise or lower
>>> + * the line level to the GIC).
>>> + */
>>> +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>>> +{
>>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +
>>> +	/*
>>> +	 * If userspace modified the timer registers via SET_ONE_REG before
>>> +	 * the vgic was initialized, we mustn't set the timer->irq.level value
>>> +	 * because the guest would never see the interrupt.  Instead wait
>>> +	 * until we call this funciton from kvm_timer_flush_hwstate.
>>> +	 */
>>> +	if (!vgic_initialized(vcpu->kvm))
>>> +	    return;
>>> +
>>> +	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
>>> +		timer->irq.level = 1;
>>> +		kvm_timer_update_irq(vcpu);
>>> +	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
>>> +		timer->irq.level = 0;
>>> +		kvm_timer_update_irq(vcpu);
>>> +	}
>>> +}
>>> +
>>
>> It took me ages to parse this, so I rewrote it to match my understanding:
>>
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index 8a0fdfc..a722f0f 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -121,13 +121,14 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>  	return cval <= now;
>>  }
>>  
>> -static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
>> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_state)
>>  {
>>  	int ret;
>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>  
>>  	BUG_ON(!vgic_initialized(vcpu->kvm));
>>  
>> +	timer->irq.level = new_state;
>>  	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
>>  					 timer->map,
>>  					 timer->irq.level);
>> @@ -151,13 +152,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
>>  	if (!vgic_initialized(vcpu->kvm))
>>  	    return;
>>  
>> -	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
>> -		timer->irq.level = 1;
>> -		kvm_timer_update_irq(vcpu);
>> -	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
>> -		timer->irq.level = 0;
>> -		kvm_timer_update_irq(vcpu);
>> -	}
>> +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
>> +		kvm_timer_update_irq(vcpu, !timer->irq.level);
>>  }
>>  
>>  /*
>>
>> Did I get it right?
> 
> almost, you'd have to assign timer->irq.level after you check for it
> though, right?

That's why I've added this line in kvm_timer_update_irq()! :-)

>>
>>>  /*
>>>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>>>   * thread is removed from its waitqueue and made runnable when there's a timer
>>> @@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>>>  	 * If the timer expired while we were not scheduled, now is the time
>>>  	 * to inject it.
>>>  	 */
>>> -	if (kvm_timer_should_fire(vcpu))
>>> -		kvm_timer_inject_irq(vcpu);
>>> +	kvm_timer_update_state(vcpu);
>>>  }
>>>  
>>>  /**
>>> @@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>>>  
>>>  	BUG_ON(timer_is_armed(timer));
>>>  
>>> -	if (kvm_timer_should_fire(vcpu))
>>> -		kvm_timer_inject_irq(vcpu);
>>> +	/*
>>> +	 * The guest could have modified the timer registers or the timer
>>> +	 * could have expired, update the timer state.
>>> +	 */
>>> +	kvm_timer_update_state(vcpu);
>>>  }
>>>  
>>>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>> @@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>>>  	 * kvm_vcpu_set_target(). To handle this, we determine
>>>  	 * vcpu timer irq number when the vcpu is reset.
>>>  	 */
>>> -	timer->irq = irq;
>>> +	timer->irq.irq = irq->irq;
>>>  
>>>  	/*
>>>  	 * Tell the VGIC that the virtual interrupt is tied to a
>>> @@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>>>  	default:
>>>  		return -1;
>>>  	}
>>> +
>>> +	kvm_timer_update_state(vcpu);
>>>  	return 0;
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 9ed8d53..f4ea950 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
>>>  /*
>>>   * Save the physical active state, and reset it to inactive.
>>>   *
>>> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
>>> + * Return true if there's a pending level triggered interrupt line to queue.
>>>   */
>>> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>>> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
>>>  {
>>>  	struct irq_phys_map *map;
>>> +	bool phys_active;
>>>  	int ret;
>>>  
>>>  	if (!(vlr.state & LR_HW))
>>>  		return 0;
>>>  
>>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
>>> -	BUG_ON(!map || !map->active);
>>> +	BUG_ON(!map);
>>>  
>>>  	ret = irq_get_irqchip_state(map->irq,
>>>  				    IRQCHIP_STATE_ACTIVE,
>>> -				    &map->active);
>>> +				    &phys_active);
>>>  
>>>  	WARN_ON(ret);
>>>  
>>> -	if (map->active) {
>>> +	if (phys_active) {
>>> +		/*
>>> +		 * Interrupt still marked as active on the physical
>>> +		 * distributor, so guest did not EOI it yet.  Reset to
>>> +		 * non-active so that other VMs can see interrupts from this
>>> +		 * device.
>>> +		 */
>>>  		ret = irq_set_irqchip_state(map->irq,
>>>  					    IRQCHIP_STATE_ACTIVE,
>>>  					    false);
>>>  		WARN_ON(ret);
>>> -		return 0;
>>> +		return false;
>>>  	}
>>>  
>>> -	return 1;
>>> +	/* Mapped edge-triggered interrupts not yet supported. */
>>> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
>>
>> Hmmm. What are we missing?
>>
> 
> I don't know really, my brain ran out of memory, but it's not like we
> claimed to support this earlier and clearly we didn't work this well
> enough through.

We can definitely revisit this later, but I have the feeling that the
flow is quite similar...

> 
>>> +	return process_level_irq(vcpu, lr, vlr);
>>>  }
>>>  
>>>  /* Sync back the VGIC state after a guest run */
>>> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
>>>  			continue;
>>>  
>>>  		vlr = vgic_get_lr(vcpu, lr);
>>> -		if (vgic_sync_hwirq(vcpu, vlr)) {
>>> -			/*
>>> -			 * So this is a HW interrupt that the guest
>>> -			 * EOI-ed. Clean the LR state and allow the
>>> -			 * interrupt to be sampled again.
>>> -			 */
>>> -			vlr.state = 0;
>>> -			vlr.hwirq = 0;
>>> -			vgic_set_lr(vcpu, lr, vlr);
>>> -			vgic_irq_clear_queued(vcpu, vlr.irq);
>>> -			set_bit(lr, elrsr_ptr);
>>> -		}
>>> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
>>> +			level_pending = true;
>>>  
>>>  		if (!test_bit(lr, elrsr_ptr))
>>>  			continue;
>>> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
>>>  }
>>>  
>>>  /**
>>> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
>>> - *
>>> - * Return the logical active state of a mapped interrupt. This doesn't
>>> - * necessarily reflects the current HW state.
>>> - */
>>> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
>>> -{
>>> -	BUG_ON(!map);
>>> -	return map->active;
>>> -}
>>> -
>>> -/**
>>> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
>>> - *
>>> - * Set the logical active state of a mapped interrupt. This doesn't
>>> - * immediately affects the HW state.
>>> - */
>>> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
>>> -{
>>> -	BUG_ON(!map);
>>> -	map->active = active;
>>> -}
>>> -
>>> -/**
>>>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
>>>   * @vcpu: The VCPU pointer
>>>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
>>> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
>>>  			if (i < VGIC_NR_SGIS)
>>>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>>>  							vcpu->vcpu_id, i, 1);
>>> -			if (i < VGIC_NR_PRIVATE_IRQS)
>>> +			if (i < VGIC_NR_SGIS)
>>>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>>>  							vcpu->vcpu_id, i,
>>>  							VGIC_CFG_EDGE);
>>> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
>>> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
>>> +							vcpu->vcpu_id, i,
>>> +							VGIC_CFG_LEVEL);
>>>  		}
>>>  
>>>  		vgic_enable(vcpu);
>>>
>>
>> My only real objection to this patch is that it puts my brain upside down.
>> Hopefully that won't last.
>>
> Yeah, I tried helping in the commit message, but I couldn't do much
> beyond that. Splitting up the patch further didn't really work out for
> me.

It is indeed quite intricated, and hard to really take apart. Guess
we'll have to live with it.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  2015-09-03 17:29         ` Marc Zyngier
@ 2015-09-03 22:00           ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-03 22:00 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm

On Thu, Sep 03, 2015 at 06:29:14PM +0100, Marc Zyngier wrote:
> On 03/09/15 18:23, Christoffer Dall wrote:
> > On Thu, Sep 03, 2015 at 06:06:39PM +0100, Marc Zyngier wrote:
> >> On 30/08/15 14:54, Christoffer Dall wrote:
> >>> The arch timer currently uses edge-triggered semantics in the sense that
> >>> the line is never sampled by the vgic and lowering the line from the
> >>> timer to the vgic doesn't have any affect on the pending state of
> >>> virtual interrupts in the vgic.  This means that we do not support a
> >>> guest with the otherwise valid behavior of (1) disable interrupts (2)
> >>> enable the timer (3) disable the timer (4) enable interrupts.  Such a
> >>> guest would validly not expect to see any interrupts on real hardware,
> >>> but will see interrupts on KVM.
> >>>
> >>> This patches fixes this shortcoming through the following series of
> >>> changes.
> >>>
> >>> First, we change the flow of the timer/vgic sync/flush operations.  Now
> >>> the timer is always flushed/synced before the vgic, because the vgic
> >>> samples the state of the timer output.  This has the implication that we
> >>> move the timer operations in to non-preempible sections, but that is
> >>> fine after the previous commit getting rid of hrtimer schedules on every
> >>> entry/exit.
> >>>
> >>> Second, we change the internal behavior of the timer, letting the timer
> >>> keep track of its previous output state, and only lower/raise the line
> >>> to the vgic when the state changes.  Note that in theory this could have
> >>> been accomplished more simply by signalling the vgic every time the
> >>> state *potentially* changed, but we don't want to be hitting the vgic
> >>> more often than necessary.
> >>>
> >>> Third, we get rid of the use of the map->active field in the vgic and
> >>> instead simply set the interrupt as active on the physical distributor
> >>> whenever we signal a mapped interrupt to the guest, and we reset the
> >>> active state when we sync back the HW state from the vgic.
> >>>
> >>> Fourth, and finally, we now initialize the timer PPIs (and all the other
> >>> unused PPIs for now), to be level-triggered, and modify the sync code to
> >>> sample the line state on HW sync and re-inject a new interrupt if it is
> >>> still pending at that time.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  arch/arm/kvm/arm.c           | 11 +++++--
> >>>  include/kvm/arm_arch_timer.h |  2 +-
> >>>  include/kvm/arm_vgic.h       |  3 --
> >>>  virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
> >>>  virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
> >>>  5 files changed, 81 insertions(+), 70 deletions(-)
> >>>
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index bdf8871..102a4aa 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  
> >>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>>  			local_irq_enable();
> >>> +			kvm_timer_sync_hwstate(vcpu);
> >>>  			kvm_vgic_sync_hwstate(vcpu);
> >>>  			preempt_enable();
> >>> -			kvm_timer_sync_hwstate(vcpu);
> >>>  			continue;
> >>>  		}
> >>>  
> >>> @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  		kvm_guest_exit();
> >>>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>>  
> >>> +		/*
> >>> +		 * We must sync the timer state before the vgic state so that
> >>> +		 * the vgic can properly sample the updated state of the
> >>> +		 * interrupt line.
> >>> +		 */
> >>> +		kvm_timer_sync_hwstate(vcpu);
> >>> +
> >>>  		kvm_vgic_sync_hwstate(vcpu);
> >>>  
> >>>  		preempt_enable();
> >>>  
> >>> -		kvm_timer_sync_hwstate(vcpu);
> >>> -
> >>>  		ret = handle_exit(vcpu, run, ret);
> >>>  	}
> >>>  
> >>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> >>> index ef14cc1..1800227 100644
> >>> --- a/include/kvm/arm_arch_timer.h
> >>> +++ b/include/kvm/arm_arch_timer.h
> >>> @@ -51,7 +51,7 @@ struct arch_timer_cpu {
> >>>  	bool				armed;
> >>>  
> >>>  	/* Timer IRQ */
> >>> -	const struct kvm_irq_level	*irq;
> >>> +	struct kvm_irq_level		irq;
> >>>  
> >>>  	/* VGIC mapping */
> >>>  	struct irq_phys_map		*map;
> >>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> >>> index d901f1a..99011a0 100644
> >>> --- a/include/kvm/arm_vgic.h
> >>> +++ b/include/kvm/arm_vgic.h
> >>> @@ -163,7 +163,6 @@ struct irq_phys_map {
> >>>  	u32			virt_irq;
> >>>  	u32			phys_irq;
> >>>  	u32			irq;
> >>> -	bool			active;
> >>>  };
> >>>  
> >>>  struct irq_phys_map_entry {
> >>> @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> >>>  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> >>>  					   int virt_irq, int irq);
> >>>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> >>> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> >>> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> >>>  
> >>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
> >>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> >>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >>> index 018f3d6..747302f 100644
> >>> --- a/virt/kvm/arm/arch_timer.c
> >>> +++ b/virt/kvm/arm/arch_timer.c
> >>> @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
> >>>  	}
> >>>  }
> >>>  
> >>> -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> >>> -{
> >>> -	int ret;
> >>> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> -
> >>> -	kvm_vgic_set_phys_irq_active(timer->map, true);
> >>> -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> >>> -					 timer->map,
> >>> -					 timer->irq->level);
> >>> -	WARN_ON(ret);
> >>> -}
> >>> -
> >>>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> >>>  {
> >>>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> >>> @@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> >>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>>  
> >>>  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> >>> -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> >>> -		!kvm_vgic_get_phys_irq_active(timer->map);
> >>> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
> >>>  }
> >>>  
> >>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>> @@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>>  	return cval <= now;
> >>>  }
> >>>  
> >>> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	int ret;
> >>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +
> >>> +	BUG_ON(!vgic_initialized(vcpu->kvm));
> >>> +
> >>> +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> >>> +					 timer->map,
> >>> +					 timer->irq.level);
> >>> +	WARN_ON(ret);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Check if there was a change in the timer state (should we raise or lower
> >>> + * the line level to the GIC).
> >>> + */
> >>> +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +
> >>> +	/*
> >>> +	 * If userspace modified the timer registers via SET_ONE_REG before
> >>> +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> >>> +	 * because the guest would never see the interrupt.  Instead wait
> >>> +	 * until we call this funciton from kvm_timer_flush_hwstate.
> >>> +	 */
> >>> +	if (!vgic_initialized(vcpu->kvm))
> >>> +	    return;
> >>> +
> >>> +	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> >>> +		timer->irq.level = 1;
> >>> +		kvm_timer_update_irq(vcpu);
> >>> +	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> >>> +		timer->irq.level = 0;
> >>> +		kvm_timer_update_irq(vcpu);
> >>> +	}
> >>> +}
> >>> +
> >>
> >> It took me ages to parse this, so I rewrote it to match my understanding:
> >>
> >> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >> index 8a0fdfc..a722f0f 100644
> >> --- a/virt/kvm/arm/arch_timer.c
> >> +++ b/virt/kvm/arm/arch_timer.c
> >> @@ -121,13 +121,14 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>  	return cval <= now;
> >>  }
> >>  
> >> -static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> >> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_state)
> >>  {
> >>  	int ret;
> >>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>  
> >>  	BUG_ON(!vgic_initialized(vcpu->kvm));
> >>  
> >> +	timer->irq.level = new_state;
> >>  	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> >>  					 timer->map,
> >>  					 timer->irq.level);
> >> @@ -151,13 +152,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >>  	if (!vgic_initialized(vcpu->kvm))
> >>  	    return;
> >>  
> >> -	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> >> -		timer->irq.level = 1;
> >> -		kvm_timer_update_irq(vcpu);
> >> -	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> >> -		timer->irq.level = 0;
> >> -		kvm_timer_update_irq(vcpu);
> >> -	}
> >> +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
> >> +		kvm_timer_update_irq(vcpu, !timer->irq.level);
> >>  }
> >>  
> >>  /*
> >>
> >> Did I get it right?
> > 
> > almost, you'd have to assign timer->irq.level after you check for it
> > though, right?
> 
> That's why I've added this line in kvm_timer_update_irq()! :-)
> 

duh, /me learns to read diffs all over again.

Yeah, your version is probably easier to read. thanks.

> >>
> >>>  /*
> >>>   * Schedule the background timer before calling kvm_vcpu_block, so that this
> >>>   * thread is removed from its waitqueue and made runnable when there's a timer
> >>> @@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >>>  	 * If the timer expired while we were not scheduled, now is the time
> >>>  	 * to inject it.
> >>>  	 */
> >>> -	if (kvm_timer_should_fire(vcpu))
> >>> -		kvm_timer_inject_irq(vcpu);
> >>> +	kvm_timer_update_state(vcpu);
> >>>  }
> >>>  
> >>>  /**
> >>> @@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >>>  
> >>>  	BUG_ON(timer_is_armed(timer));
> >>>  
> >>> -	if (kvm_timer_should_fire(vcpu))
> >>> -		kvm_timer_inject_irq(vcpu);
> >>> +	/*
> >>> +	 * The guest could have modified the timer registers or the timer
> >>> +	 * could have expired, update the timer state.
> >>> +	 */
> >>> +	kvm_timer_update_state(vcpu);
> >>>  }
> >>>  
> >>>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >>> @@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >>>  	 * kvm_vcpu_set_target(). To handle this, we determine
> >>>  	 * vcpu timer irq number when the vcpu is reset.
> >>>  	 */
> >>> -	timer->irq = irq;
> >>> +	timer->irq.irq = irq->irq;
> >>>  
> >>>  	/*
> >>>  	 * Tell the VGIC that the virtual interrupt is tied to a
> >>> @@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >>>  	default:
> >>>  		return -1;
> >>>  	}
> >>> +
> >>> +	kvm_timer_update_state(vcpu);
> >>>  	return 0;
> >>>  }
> >>>  
> >>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >>> index 9ed8d53..f4ea950 100644
> >>> --- a/virt/kvm/arm/vgic.c
> >>> +++ b/virt/kvm/arm/vgic.c
> >>> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >>>  /*
> >>>   * Save the physical active state, and reset it to inactive.
> >>>   *
> >>> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> >>> + * Return true if there's a pending level triggered interrupt line to queue.
> >>>   */
> >>> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> >>> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> >>>  {
> >>>  	struct irq_phys_map *map;
> >>> +	bool phys_active;
> >>>  	int ret;
> >>>  
> >>>  	if (!(vlr.state & LR_HW))
> >>>  		return 0;
> >>>  
> >>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> >>> -	BUG_ON(!map || !map->active);
> >>> +	BUG_ON(!map);
> >>>  
> >>>  	ret = irq_get_irqchip_state(map->irq,
> >>>  				    IRQCHIP_STATE_ACTIVE,
> >>> -				    &map->active);
> >>> +				    &phys_active);
> >>>  
> >>>  	WARN_ON(ret);
> >>>  
> >>> -	if (map->active) {
> >>> +	if (phys_active) {
> >>> +		/*
> >>> +		 * Interrupt still marked as active on the physical
> >>> +		 * distributor, so guest did not EOI it yet.  Reset to
> >>> +		 * non-active so that other VMs can see interrupts from this
> >>> +		 * device.
> >>> +		 */
> >>>  		ret = irq_set_irqchip_state(map->irq,
> >>>  					    IRQCHIP_STATE_ACTIVE,
> >>>  					    false);
> >>>  		WARN_ON(ret);
> >>> -		return 0;
> >>> +		return false;
> >>>  	}
> >>>  
> >>> -	return 1;
> >>> +	/* Mapped edge-triggered interrupts not yet supported. */
> >>> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> >>
> >> Hmmm. What are we missing?
> >>
> > 
> > I don't know really, my brain ran out of memory, but it's not like we
> > claimed to support this earlier and clearly we didn't work this well
> > enough through.
> 
> We can definitely revisit this later, but I have the feeling that the
> flow is quite similar...
> 
> > 
> >>> +	return process_level_irq(vcpu, lr, vlr);
> >>>  }
> >>>  
> >>>  /* Sync back the VGIC state after a guest run */
> >>> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
> >>>  			continue;
> >>>  
> >>>  		vlr = vgic_get_lr(vcpu, lr);
> >>> -		if (vgic_sync_hwirq(vcpu, vlr)) {
> >>> -			/*
> >>> -			 * So this is a HW interrupt that the guest
> >>> -			 * EOI-ed. Clean the LR state and allow the
> >>> -			 * interrupt to be sampled again.
> >>> -			 */
> >>> -			vlr.state = 0;
> >>> -			vlr.hwirq = 0;
> >>> -			vgic_set_lr(vcpu, lr, vlr);
> >>> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> >>> -			set_bit(lr, elrsr_ptr);
> >>> -		}
> >>> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> >>> +			level_pending = true;
> >>>  
> >>>  		if (!test_bit(lr, elrsr_ptr))
> >>>  			continue;
> >>> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
> >>>  }
> >>>  
> >>>  /**
> >>> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> >>> - *
> >>> - * Return the logical active state of a mapped interrupt. This doesn't
> >>> - * necessarily reflects the current HW state.
> >>> - */
> >>> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> >>> -{
> >>> -	BUG_ON(!map);
> >>> -	return map->active;
> >>> -}
> >>> -
> >>> -/**
> >>> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> >>> - *
> >>> - * Set the logical active state of a mapped interrupt. This doesn't
> >>> - * immediately affects the HW state.
> >>> - */
> >>> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> >>> -{
> >>> -	BUG_ON(!map);
> >>> -	map->active = active;
> >>> -}
> >>> -
> >>> -/**
> >>>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
> >>>   * @vcpu: The VCPU pointer
> >>>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> >>> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
> >>>  			if (i < VGIC_NR_SGIS)
> >>>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
> >>>  							vcpu->vcpu_id, i, 1);
> >>> -			if (i < VGIC_NR_PRIVATE_IRQS)
> >>> +			if (i < VGIC_NR_SGIS)
> >>>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> >>>  							vcpu->vcpu_id, i,
> >>>  							VGIC_CFG_EDGE);
> >>> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> >>> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> >>> +							vcpu->vcpu_id, i,
> >>> +							VGIC_CFG_LEVEL);
> >>>  		}
> >>>  
> >>>  		vgic_enable(vcpu);
> >>>
> >>
> >> My only real objection to this patch is that it puts my brain upside down.
> >> Hopefully that won't last.
> >>
> > Yeah, I tried helping in the commit message, but I couldn't do much
> > beyond that. Splitting up the patch further didn't really work out for
> > me.
> 
> It is indeed quite intricated, and hard to really take apart. Guess
> we'll have to live with it.
> 
> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
@ 2015-09-03 22:00           ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-03 22:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 03, 2015 at 06:29:14PM +0100, Marc Zyngier wrote:
> On 03/09/15 18:23, Christoffer Dall wrote:
> > On Thu, Sep 03, 2015 at 06:06:39PM +0100, Marc Zyngier wrote:
> >> On 30/08/15 14:54, Christoffer Dall wrote:
> >>> The arch timer currently uses edge-triggered semantics in the sense that
> >>> the line is never sampled by the vgic and lowering the line from the
> >>> timer to the vgic doesn't have any affect on the pending state of
> >>> virtual interrupts in the vgic.  This means that we do not support a
> >>> guest with the otherwise valid behavior of (1) disable interrupts (2)
> >>> enable the timer (3) disable the timer (4) enable interrupts.  Such a
> >>> guest would validly not expect to see any interrupts on real hardware,
> >>> but will see interrupts on KVM.
> >>>
> >>> This patches fixes this shortcoming through the following series of
> >>> changes.
> >>>
> >>> First, we change the flow of the timer/vgic sync/flush operations.  Now
> >>> the timer is always flushed/synced before the vgic, because the vgic
> >>> samples the state of the timer output.  This has the implication that we
> >>> move the timer operations in to non-preempible sections, but that is
> >>> fine after the previous commit getting rid of hrtimer schedules on every
> >>> entry/exit.
> >>>
> >>> Second, we change the internal behavior of the timer, letting the timer
> >>> keep track of its previous output state, and only lower/raise the line
> >>> to the vgic when the state changes.  Note that in theory this could have
> >>> been accomplished more simply by signalling the vgic every time the
> >>> state *potentially* changed, but we don't want to be hitting the vgic
> >>> more often than necessary.
> >>>
> >>> Third, we get rid of the use of the map->active field in the vgic and
> >>> instead simply set the interrupt as active on the physical distributor
> >>> whenever we signal a mapped interrupt to the guest, and we reset the
> >>> active state when we sync back the HW state from the vgic.
> >>>
> >>> Fourth, and finally, we now initialize the timer PPIs (and all the other
> >>> unused PPIs for now), to be level-triggered, and modify the sync code to
> >>> sample the line state on HW sync and re-inject a new interrupt if it is
> >>> still pending at that time.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  arch/arm/kvm/arm.c           | 11 +++++--
> >>>  include/kvm/arm_arch_timer.h |  2 +-
> >>>  include/kvm/arm_vgic.h       |  3 --
> >>>  virt/kvm/arm/arch_timer.c    | 68 +++++++++++++++++++++++++++++++-------------
> >>>  virt/kvm/arm/vgic.c          | 67 +++++++++++++++----------------------------
> >>>  5 files changed, 81 insertions(+), 70 deletions(-)
> >>>
> >>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >>> index bdf8871..102a4aa 100644
> >>> --- a/arch/arm/kvm/arm.c
> >>> +++ b/arch/arm/kvm/arm.c
> >>> @@ -561,9 +561,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  
> >>>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
> >>>  			local_irq_enable();
> >>> +			kvm_timer_sync_hwstate(vcpu);
> >>>  			kvm_vgic_sync_hwstate(vcpu);
> >>>  			preempt_enable();
> >>> -			kvm_timer_sync_hwstate(vcpu);
> >>>  			continue;
> >>>  		}
> >>>  
> >>> @@ -608,12 +608,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >>>  		kvm_guest_exit();
> >>>  		trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
> >>>  
> >>> +		/*
> >>> +		 * We must sync the timer state before the vgic state so that
> >>> +		 * the vgic can properly sample the updated state of the
> >>> +		 * interrupt line.
> >>> +		 */
> >>> +		kvm_timer_sync_hwstate(vcpu);
> >>> +
> >>>  		kvm_vgic_sync_hwstate(vcpu);
> >>>  
> >>>  		preempt_enable();
> >>>  
> >>> -		kvm_timer_sync_hwstate(vcpu);
> >>> -
> >>>  		ret = handle_exit(vcpu, run, ret);
> >>>  	}
> >>>  
> >>> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> >>> index ef14cc1..1800227 100644
> >>> --- a/include/kvm/arm_arch_timer.h
> >>> +++ b/include/kvm/arm_arch_timer.h
> >>> @@ -51,7 +51,7 @@ struct arch_timer_cpu {
> >>>  	bool				armed;
> >>>  
> >>>  	/* Timer IRQ */
> >>> -	const struct kvm_irq_level	*irq;
> >>> +	struct kvm_irq_level		irq;
> >>>  
> >>>  	/* VGIC mapping */
> >>>  	struct irq_phys_map		*map;
> >>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> >>> index d901f1a..99011a0 100644
> >>> --- a/include/kvm/arm_vgic.h
> >>> +++ b/include/kvm/arm_vgic.h
> >>> @@ -163,7 +163,6 @@ struct irq_phys_map {
> >>>  	u32			virt_irq;
> >>>  	u32			phys_irq;
> >>>  	u32			irq;
> >>> -	bool			active;
> >>>  };
> >>>  
> >>>  struct irq_phys_map_entry {
> >>> @@ -358,8 +357,6 @@ int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
> >>>  struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
> >>>  					   int virt_irq, int irq);
> >>>  int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
> >>> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
> >>> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
> >>>  
> >>>  #define irqchip_in_kernel(k)	(!!((k)->arch.vgic.in_kernel))
> >>>  #define vgic_initialized(k)	(!!((k)->arch.vgic.nr_cpus))
> >>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >>> index 018f3d6..747302f 100644
> >>> --- a/virt/kvm/arm/arch_timer.c
> >>> +++ b/virt/kvm/arm/arch_timer.c
> >>> @@ -59,18 +59,6 @@ static void timer_disarm(struct arch_timer_cpu *timer)
> >>>  	}
> >>>  }
> >>>  
> >>> -static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
> >>> -{
> >>> -	int ret;
> >>> -	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> -
> >>> -	kvm_vgic_set_phys_irq_active(timer->map, true);
> >>> -	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> >>> -					 timer->map,
> >>> -					 timer->irq->level);
> >>> -	WARN_ON(ret);
> >>> -}
> >>> -
> >>>  static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> >>>  {
> >>>  	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> >>> @@ -116,8 +104,7 @@ static bool kvm_timer_irq_enabled(struct kvm_vcpu *vcpu)
> >>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>>  
> >>>  	return !(timer->cntv_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> >>> -		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE) &&
> >>> -		!kvm_vgic_get_phys_irq_active(timer->map);
> >>> +		(timer->cntv_ctl & ARCH_TIMER_CTRL_ENABLE);
> >>>  }
> >>>  
> >>>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>> @@ -134,6 +121,45 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>>  	return cval <= now;
> >>>  }
> >>>  
> >>> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	int ret;
> >>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +
> >>> +	BUG_ON(!vgic_initialized(vcpu->kvm));
> >>> +
> >>> +	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> >>> +					 timer->map,
> >>> +					 timer->irq.level);
> >>> +	WARN_ON(ret);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Check if there was a change in the timer state (should we raise or lower
> >>> + * the line level to the GIC).
> >>> + */
> >>> +static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >>> +{
> >>> +	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +
> >>> +	/*
> >>> +	 * If userspace modified the timer registers via SET_ONE_REG before
> >>> +	 * the vgic was initialized, we mustn't set the timer->irq.level value
> >>> +	 * because the guest would never see the interrupt.  Instead wait
> >>> +	 * until we call this funciton from kvm_timer_flush_hwstate.
> >>> +	 */
> >>> +	if (!vgic_initialized(vcpu->kvm))
> >>> +	    return;
> >>> +
> >>> +	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> >>> +		timer->irq.level = 1;
> >>> +		kvm_timer_update_irq(vcpu);
> >>> +	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> >>> +		timer->irq.level = 0;
> >>> +		kvm_timer_update_irq(vcpu);
> >>> +	}
> >>> +}
> >>> +
> >>
> >> It took me ages to parse this, so I rewrote it to match my understanding:
> >>
> >> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >> index 8a0fdfc..a722f0f 100644
> >> --- a/virt/kvm/arm/arch_timer.c
> >> +++ b/virt/kvm/arm/arch_timer.c
> >> @@ -121,13 +121,14 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>  	return cval <= now;
> >>  }
> >>  
> >> -static void kvm_timer_update_irq(struct kvm_vcpu *vcpu)
> >> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_state)
> >>  {
> >>  	int ret;
> >>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>  
> >>  	BUG_ON(!vgic_initialized(vcpu->kvm));
> >>  
> >> +	timer->irq.level = new_state;
> >>  	ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
> >>  					 timer->map,
> >>  					 timer->irq.level);
> >> @@ -151,13 +152,8 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
> >>  	if (!vgic_initialized(vcpu->kvm))
> >>  	    return;
> >>  
> >> -	if (kvm_timer_should_fire(vcpu) && !timer->irq.level) {
> >> -		timer->irq.level = 1;
> >> -		kvm_timer_update_irq(vcpu);
> >> -	} else if (!kvm_timer_should_fire(vcpu) && timer->irq.level) {
> >> -		timer->irq.level = 0;
> >> -		kvm_timer_update_irq(vcpu);
> >> -	}
> >> +	if (kvm_timer_should_fire(vcpu) != timer->irq.level)
> >> +		kvm_timer_update_irq(vcpu, !timer->irq.level);
> >>  }
> >>  
> >>  /*
> >>
> >> Did I get it right?
> > 
> > almost, you'd have to assign timer->irq.level after you check for it
> > though, right?
> 
> That's why I've added this line in kvm_timer_update_irq()! :-)
> 

duh, /me learns to read diffs all over again.

Yeah, your version is probably easier to read. thanks.

> >>
> >>>  /*
> >>>   * Schedule the background timer before calling kvm_vcpu_block, so that this
> >>>   * thread is removed from its waitqueue and made runnable when there's a timer
> >>> @@ -191,8 +217,7 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >>>  	 * If the timer expired while we were not scheduled, now is the time
> >>>  	 * to inject it.
> >>>  	 */
> >>> -	if (kvm_timer_should_fire(vcpu))
> >>> -		kvm_timer_inject_irq(vcpu);
> >>> +	kvm_timer_update_state(vcpu);
> >>>  }
> >>>  
> >>>  /**
> >>> @@ -208,8 +233,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
> >>>  
> >>>  	BUG_ON(timer_is_armed(timer));
> >>>  
> >>> -	if (kvm_timer_should_fire(vcpu))
> >>> -		kvm_timer_inject_irq(vcpu);
> >>> +	/*
> >>> +	 * The guest could have modified the timer registers or the timer
> >>> +	 * could have expired, update the timer state.
> >>> +	 */
> >>> +	kvm_timer_update_state(vcpu);
> >>>  }
> >>>  
> >>>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >>> @@ -224,7 +252,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> >>>  	 * kvm_vcpu_set_target(). To handle this, we determine
> >>>  	 * vcpu timer irq number when the vcpu is reset.
> >>>  	 */
> >>> -	timer->irq = irq;
> >>> +	timer->irq.irq = irq->irq;
> >>>  
> >>>  	/*
> >>>  	 * Tell the VGIC that the virtual interrupt is tied to a
> >>> @@ -269,6 +297,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >>>  	default:
> >>>  		return -1;
> >>>  	}
> >>> +
> >>> +	kvm_timer_update_state(vcpu);
> >>>  	return 0;
> >>>  }
> >>>  
> >>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >>> index 9ed8d53..f4ea950 100644
> >>> --- a/virt/kvm/arm/vgic.c
> >>> +++ b/virt/kvm/arm/vgic.c
> >>> @@ -1422,34 +1422,43 @@ static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> >>>  /*
> >>>   * Save the physical active state, and reset it to inactive.
> >>>   *
> >>> - * Return 1 if HW interrupt went from active to inactive, and 0 otherwise.
> >>> + * Return true if there's a pending level triggered interrupt line to queue.
> >>>   */
> >>> -static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
> >>> +static bool vgic_sync_hwirq(struct kvm_vcpu *vcpu, int lr, struct vgic_lr vlr)
> >>>  {
> >>>  	struct irq_phys_map *map;
> >>> +	bool phys_active;
> >>>  	int ret;
> >>>  
> >>>  	if (!(vlr.state & LR_HW))
> >>>  		return 0;
> >>>  
> >>>  	map = vgic_irq_map_search(vcpu, vlr.irq);
> >>> -	BUG_ON(!map || !map->active);
> >>> +	BUG_ON(!map);
> >>>  
> >>>  	ret = irq_get_irqchip_state(map->irq,
> >>>  				    IRQCHIP_STATE_ACTIVE,
> >>> -				    &map->active);
> >>> +				    &phys_active);
> >>>  
> >>>  	WARN_ON(ret);
> >>>  
> >>> -	if (map->active) {
> >>> +	if (phys_active) {
> >>> +		/*
> >>> +		 * Interrupt still marked as active on the physical
> >>> +		 * distributor, so guest did not EOI it yet.  Reset to
> >>> +		 * non-active so that other VMs can see interrupts from this
> >>> +		 * device.
> >>> +		 */
> >>>  		ret = irq_set_irqchip_state(map->irq,
> >>>  					    IRQCHIP_STATE_ACTIVE,
> >>>  					    false);
> >>>  		WARN_ON(ret);
> >>> -		return 0;
> >>> +		return false;
> >>>  	}
> >>>  
> >>> -	return 1;
> >>> +	/* Mapped edge-triggered interrupts not yet supported. */
> >>> +	WARN_ON(vgic_irq_is_edge(vcpu, vlr.irq));
> >>
> >> Hmmm. What are we missing?
> >>
> > 
> > I don't know really, my brain ran out of memory, but it's not like we
> > claimed to support this earlier and clearly we didn't work this well
> > enough through.
> 
> We can definitely revisit this later, but I have the feeling that the
> flow is quite similar...
> 
> > 
> >>> +	return process_level_irq(vcpu, lr, vlr);
> >>>  }
> >>>  
> >>>  /* Sync back the VGIC state after a guest run */
> >>> @@ -1474,18 +1483,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu)
> >>>  			continue;
> >>>  
> >>>  		vlr = vgic_get_lr(vcpu, lr);
> >>> -		if (vgic_sync_hwirq(vcpu, vlr)) {
> >>> -			/*
> >>> -			 * So this is a HW interrupt that the guest
> >>> -			 * EOI-ed. Clean the LR state and allow the
> >>> -			 * interrupt to be sampled again.
> >>> -			 */
> >>> -			vlr.state = 0;
> >>> -			vlr.hwirq = 0;
> >>> -			vgic_set_lr(vcpu, lr, vlr);
> >>> -			vgic_irq_clear_queued(vcpu, vlr.irq);
> >>> -			set_bit(lr, elrsr_ptr);
> >>> -		}
> >>> +		if (vgic_sync_hwirq(vcpu, lr, vlr))
> >>> +			level_pending = true;
> >>>  
> >>>  		if (!test_bit(lr, elrsr_ptr))
> >>>  			continue;
> >>> @@ -1861,30 +1860,6 @@ static void vgic_free_phys_irq_map_rcu(struct rcu_head *rcu)
> >>>  }
> >>>  
> >>>  /**
> >>> - * kvm_vgic_get_phys_irq_active - Return the active state of a mapped IRQ
> >>> - *
> >>> - * Return the logical active state of a mapped interrupt. This doesn't
> >>> - * necessarily reflects the current HW state.
> >>> - */
> >>> -bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map)
> >>> -{
> >>> -	BUG_ON(!map);
> >>> -	return map->active;
> >>> -}
> >>> -
> >>> -/**
> >>> - * kvm_vgic_set_phys_irq_active - Set the active state of a mapped IRQ
> >>> - *
> >>> - * Set the logical active state of a mapped interrupt. This doesn't
> >>> - * immediately affects the HW state.
> >>> - */
> >>> -void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active)
> >>> -{
> >>> -	BUG_ON(!map);
> >>> -	map->active = active;
> >>> -}
> >>> -
> >>> -/**
> >>>   * kvm_vgic_unmap_phys_irq - Remove a virtual to physical IRQ mapping
> >>>   * @vcpu: The VCPU pointer
> >>>   * @map: The pointer to a mapping obtained through kvm_vgic_map_phys_irq
> >>> @@ -2112,10 +2087,14 @@ int vgic_init(struct kvm *kvm)
> >>>  			if (i < VGIC_NR_SGIS)
> >>>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
> >>>  							vcpu->vcpu_id, i, 1);
> >>> -			if (i < VGIC_NR_PRIVATE_IRQS)
> >>> +			if (i < VGIC_NR_SGIS)
> >>>  				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> >>>  							vcpu->vcpu_id, i,
> >>>  							VGIC_CFG_EDGE);
> >>> +			else if (i < VGIC_NR_PRIVATE_IRQS) /* PPIs */
> >>> +				vgic_bitmap_set_irq_val(&dist->irq_cfg,
> >>> +							vcpu->vcpu_id, i,
> >>> +							VGIC_CFG_LEVEL);
> >>>  		}
> >>>  
> >>>  		vgic_enable(vcpu);
> >>>
> >>
> >> My only real objection to this patch is that it puts my brain upside down.
> >> Hopefully that won't last.
> >>
> > Yeah, I tried helping in the commit message, but I couldn't do much
> > beyond that. Splitting up the patch further didn't really work out for
> > me.
> 
> It is indeed quite intricated, and hard to really take apart. Guess
> we'll have to live with it.
> 
> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-04 13:50     ` Eric Auger
  -1 siblings, 0 replies; 74+ messages in thread
From: Eric Auger @ 2015-09-04 13:50 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

Hi Christoffer,
On 08/30/2015 03:54 PM, Christoffer Dall wrote:
> Some times it is useful for architecture implementations of KVM to know
> when the VCPU thread is about to block or when it comes back from
> blocking (arm/arm64 needs to know this to properly implement timers, for
> example).
what about vcpu_sleep()? Is that callback specific to kvm_vcpu_block
function entry/exit points or is it more generic? The question also
applies to future halt/resume functions

Thanks

Eric
> 
> Therefore provide a generic architecture callback function in line with
> what we do elsewhere for KVM generic-arch interactions.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_host.h     | 3 +++
>  arch/arm64/include/asm/kvm_host.h   | 3 +++
>  arch/mips/include/asm/kvm_host.h    | 2 ++
>  arch/powerpc/include/asm/kvm_host.h | 2 ++
>  arch/s390/include/asm/kvm_host.h    | 2 ++
>  arch/x86/include/asm/kvm_host.h     | 3 +++
>  include/linux/kvm_host.h            | 2 ++
>  virt/kvm/kvm_main.c                 | 3 +++
>  8 files changed, 20 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index dcba0fa..86fcf6e 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -236,4 +236,7 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
>  
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 415938d..dd143f5 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -257,4 +257,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>  
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> index e8c8d9d..58f0f4d 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -845,5 +845,7 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  		struct kvm_memory_slot *slot) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  
>  #endif /* __MIPS_KVM_HOST_H__ */
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index d91f65b..179f9a7 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -702,5 +702,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
>  static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>  static inline void kvm_arch_exit(void) {}
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  
>  #endif /* __POWERPC_KVM_HOST_H__ */
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 3024acb..04a97df 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -640,5 +640,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
>  static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
>  static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  		struct kvm_memory_slot *slot) {}
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  
>  #endif
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2a7f5d7..26c4086 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1202,4 +1202,7 @@ int __x86_set_memory_region(struct kvm *kvm,
>  int x86_set_memory_region(struct kvm *kvm,
>  			  const struct kvm_userspace_memory_region *mem);
>  
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9564fd7..87d7be6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -619,6 +619,8 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
>  void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
>  
>  void kvm_vcpu_block(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>  int kvm_vcpu_yield_to(struct kvm_vcpu *target);
>  void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b8a444..04b59dd 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1946,6 +1946,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  		} while (single_task_running() && ktime_before(cur, stop));
>  	}
>  
> +	kvm_arch_vcpu_blocking(vcpu);
> +
>  	for (;;) {
>  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>  
> @@ -1959,6 +1961,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  	finish_wait(&vcpu->wq, &wait);
>  	cur = ktime_get();
>  
> +	kvm_arch_vcpu_unblocking(vcpu);
>  out:
>  	trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
>  }
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
@ 2015-09-04 13:50     ` Eric Auger
  0 siblings, 0 replies; 74+ messages in thread
From: Eric Auger @ 2015-09-04 13:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,
On 08/30/2015 03:54 PM, Christoffer Dall wrote:
> Some times it is useful for architecture implementations of KVM to know
> when the VCPU thread is about to block or when it comes back from
> blocking (arm/arm64 needs to know this to properly implement timers, for
> example).
what about vcpu_sleep()? Is that callback specific to kvm_vcpu_block
function entry/exit points or is it more generic? The question also
applies to future halt/resume functions

Thanks

Eric
> 
> Therefore provide a generic architecture callback function in line with
> what we do elsewhere for KVM generic-arch interactions.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  arch/arm/include/asm/kvm_host.h     | 3 +++
>  arch/arm64/include/asm/kvm_host.h   | 3 +++
>  arch/mips/include/asm/kvm_host.h    | 2 ++
>  arch/powerpc/include/asm/kvm_host.h | 2 ++
>  arch/s390/include/asm/kvm_host.h    | 2 ++
>  arch/x86/include/asm/kvm_host.h     | 3 +++
>  include/linux/kvm_host.h            | 2 ++
>  virt/kvm/kvm_main.c                 | 3 +++
>  8 files changed, 20 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index dcba0fa..86fcf6e 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -236,4 +236,7 @@ static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {}
>  
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 415938d..dd143f5 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -257,4 +257,7 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>  
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
> index e8c8d9d..58f0f4d 100644
> --- a/arch/mips/include/asm/kvm_host.h
> +++ b/arch/mips/include/asm/kvm_host.h
> @@ -845,5 +845,7 @@ static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  		struct kvm_memory_slot *slot) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  
>  #endif /* __MIPS_KVM_HOST_H__ */
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index d91f65b..179f9a7 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -702,5 +702,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
>  static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>  static inline void kvm_arch_exit(void) {}
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  
>  #endif /* __POWERPC_KVM_HOST_H__ */
> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
> index 3024acb..04a97df 100644
> --- a/arch/s390/include/asm/kvm_host.h
> +++ b/arch/s390/include/asm/kvm_host.h
> @@ -640,5 +640,7 @@ static inline void kvm_arch_memslots_updated(struct kvm *kvm, struct kvm_memslot
>  static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
>  static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
>  		struct kvm_memory_slot *slot) {}
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
>  
>  #endif
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2a7f5d7..26c4086 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1202,4 +1202,7 @@ int __x86_set_memory_region(struct kvm *kvm,
>  int x86_set_memory_region(struct kvm *kvm,
>  			  const struct kvm_userspace_memory_region *mem);
>  
> +static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
> +static inline void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) {}
> +
>  #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9564fd7..87d7be6 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -619,6 +619,8 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
>  void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
>  
>  void kvm_vcpu_block(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
>  int kvm_vcpu_yield_to(struct kvm_vcpu *target);
>  void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8b8a444..04b59dd 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1946,6 +1946,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  		} while (single_task_running() && ktime_before(cur, stop));
>  	}
>  
> +	kvm_arch_vcpu_blocking(vcpu);
> +
>  	for (;;) {
>  		prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
>  
> @@ -1959,6 +1961,7 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>  	finish_wait(&vcpu->wq, &wait);
>  	cur = ktime_get();
>  
> +	kvm_arch_vcpu_unblocking(vcpu);
>  out:
>  	trace_kvm_vcpu_wakeup(ktime_to_ns(cur) - ktime_to_ns(start), waited);
>  }
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  2015-09-04 13:50     ` Eric Auger
@ 2015-09-04 14:50       ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-04 14:50 UTC (permalink / raw)
  To: Eric Auger; +Cc: kvmarm, linux-arm-kernel, kvm

On Fri, Sep 04, 2015 at 03:50:08PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 08/30/2015 03:54 PM, Christoffer Dall wrote:
> > Some times it is useful for architecture implementations of KVM to know
> > when the VCPU thread is about to block or when it comes back from
> > blocking (arm/arm64 needs to know this to properly implement timers, for
> > example).
> what about vcpu_sleep()? Is that callback specific to kvm_vcpu_block
> function entry/exit points or is it more generic? The question also
> applies to future halt/resume functions
> 
For ARM, This should be called when we're about to block in a situation
where timer interrupts could affect our sleep state, which would not be
the case for vcpu_sleep, which unconditionally puts the vcpu to sleep
based on other conditions.

I believe that any case where you care about incoming interrupts are
covered by the semantics of kvm_vcpu_block, and therefore these hooks
should only be called by kvm_vcpu_block.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks
@ 2015-09-04 14:50       ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-04 14:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 04, 2015 at 03:50:08PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 08/30/2015 03:54 PM, Christoffer Dall wrote:
> > Some times it is useful for architecture implementations of KVM to know
> > when the VCPU thread is about to block or when it comes back from
> > blocking (arm/arm64 needs to know this to properly implement timers, for
> > example).
> what about vcpu_sleep()? Is that callback specific to kvm_vcpu_block
> function entry/exit points or is it more generic? The question also
> applies to future halt/resume functions
> 
For ARM, This should be called when we're about to block in a situation
where timer interrupts could affect our sleep state, which would not be
the case for vcpu_sleep, which unconditionally puts the vcpu to sleep
based on other conditions.

I believe that any case where you care about incoming interrupts are
covered by the semantics of kvm_vcpu_block, and therefore these hooks
should only be called by kvm_vcpu_block.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
  2015-09-03 15:56       ` Eric Auger
@ 2015-09-04 15:54         ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-04 15:54 UTC (permalink / raw)
  To: Eric Auger; +Cc: Marc Zyngier, kvmarm, linux-arm-kernel, kvm

On Thu, Sep 03, 2015 at 05:56:26PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 09/03/2015 05:23 PM, Marc Zyngier wrote:
> > On 30/08/15 14:54, Christoffer Dall wrote:
> >> Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
> >> with them is not apparently easy to understand by reading various specs.
> >>
> >> Therefore, add a proper documentation file explaining the flow and
> >> rationale of the behavior of the vgic.
> >>
> >> Some of this text was contributed by Marc Zyngier.
> >>
> >> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >> ---
> >>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
> >>  1 file changed, 59 insertions(+)
> >>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >>
> >> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >> new file mode 100644
> >> index 0000000..49e1357
> >> --- /dev/null
> >> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >> @@ -0,0 +1,59 @@
> >> +KVM/ARM VGIC Mapped Interrupts
> >> +==============================
> >> +
> >> +Setting the Physical Active State for Edge vs. Level Triggered IRQs
> >> +-------------------------------------------------------------------
> >> +
> >> +Mapped non-shared interrupts injected to a guest should always mark the
> >> +interrupt as active on the physical distributor.
> When injecting the virtual IRQ associated to the mapped=forwarded IRQ
> (see next comment), the host must not deactivate the physical IRQ so
> that its active state remains?

almost, but this is not the case for the timer, where we set the active
state manually, I'll attempt a reword.

> >> +
> >> +The reasoning for level-triggered interrupts:
> >> +For level-triggered interrupts, we have to mark the interrupt as active
> >> +on the physical distributor,
> to leave the interrupt as active? I have the impression you talk about
> shared IRQ here where the HW would not have any impact on the physical
> distributor state? The physical IRQ can be pending+active too?
>  because otherwise, as the line remains

I'm talking about any level-triggered interrupt where you handle the
interrupt in the guest.  In that case, if you keep deactivating the
interrupt on the host the guest will never make progress.

This is not specific to the timer I think?

Yes, the physical IRQ can be pending+active, but the key is that the
host doesn't deactivate the IRQ.

> >> +asserted, the guest will never execute because the host will keep taking
> >> +interrupts.  As soon as the guest deactivates the interrupt, the
> >> +physical line is sampled by the hardware again and the host takes a new
> >> +interrupt if the physical line is still asserted.
> >> +
> >> +The reasoning for edge-triggered interrupts:
> >> +For edge-triggered interrupts, if we set the HW bit in the LR we also
> >> +have to mark the interrupt as active on the physical distributor.  If we
> >> +don't set the physical active bit and the interrupt hits again before
> >> +the guest has deactivated the interrupt, the interrupt goes to the host,
> >> +which cannot set the state to ACTIVE+PENDING in the LR, because that is
> >> +not supported when setting the HW bit in the LR.
> >> +
> >> +An alternative could be to not use HW bit at all, and inject
> >> +edge-triggered interrupts from a physical assigned device as pure
> >> +virtual interrupts, but that would potentially slow down handling of the
> >> +interrupt in the guest, because a physical interrupt occurring in the
> >> +middle of the guest ISR would preempt the guest for the host to handle
> >> +the interrupt.
> > 
> > It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
> > do not have an Active state (they are either Pending or not), so we'll
> > have to deal with edge interrupts as you just described at some point.
> > Other architectures do something similar, I'd expect.
> > 
> >> +
> >> +
> >> +Life Cycle for Forwarded Physical Interrupts
> >> +--------------------------------------------
> >> +
> >> +By forwarded physical interrupts we mean interrupts presented to a guest
> >> +representing a real HW event originally signaled to the host as a
> > 
> > s/signaled/signalled/
> > 
> >> +physical interrupt
> is it always true for the timer? sometimes isn't it a SW counter that
> expires and upon that event you inject the virtual IRQ with HW bit set?
>  and injecting this as a virtual interrupt with the HW

well you restore the timer state at the same time, so you setup the
hardware exactly as if the timer device raised a physical interrupt
while running the VM.  (in fact that's why we set the active state as
well, because we re-program the arch-timer to assert the line when we
enter the guest).

> >> +bit set in the LR.
> another definition of a forwarded/mapped physical IRQ is a physical IRQ
> that is deactivated by the guest and not by the host.

I was deliberately going for a broader definition; generally trying to
describe how we deal with interrupts from a physical device handled by a
guest, which we do by letting the guest deactive the interrupt.  I think
this is the correct causality analysis ... ?

> 
> Shouldn't we start this file by the definition of a Forwarded Physical
> Interrupts. Here you were supposed to describe their Life Cycle. Also
> note that we previously talked about mapped IRQ and now we talk about
> forwarded IRQ which can be confusing for the reader. 

I agree that the flow feels a bit ad-hoc here.  I'll try to rework the
doc.


> Also we may
> re-introduce the fact that we distinguish between shared and non shared
> beasts to give the full picture?

I'll try to clarify this as well.


> >> +
> >> +The state of such an interrupt is managed in the following way:
> >> +
> >> +  - LR.Pending must be set when the interrupt is first injected, because this
> >> +    is the only way the GICV interface is going to present it to the guest.
> >> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> >> +  - LR.Pending transitions to LR.Active on read of IAR, as expected.
> >> +  - On EOI, the *physical distributor* active bit gets cleared, but the
> >> +    LR.Active is left untouched - it looks like the GIC can only clear a
> >> +    single bit (either the virtual active, or the physical one).
> >> +  - This means we cannot trust LR.Active to find out about the state of the
> >> +    interrupt, and we definitely need to look at the distributor version.
> physical distributor version?

indeed, I'll reword.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
@ 2015-09-04 15:54         ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-04 15:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 03, 2015 at 05:56:26PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 09/03/2015 05:23 PM, Marc Zyngier wrote:
> > On 30/08/15 14:54, Christoffer Dall wrote:
> >> Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
> >> with them is not apparently easy to understand by reading various specs.
> >>
> >> Therefore, add a proper documentation file explaining the flow and
> >> rationale of the behavior of the vgic.
> >>
> >> Some of this text was contributed by Marc Zyngier.
> >>
> >> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >> ---
> >>  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
> >>  1 file changed, 59 insertions(+)
> >>  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >>
> >> diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >> new file mode 100644
> >> index 0000000..49e1357
> >> --- /dev/null
> >> +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> >> @@ -0,0 +1,59 @@
> >> +KVM/ARM VGIC Mapped Interrupts
> >> +==============================
> >> +
> >> +Setting the Physical Active State for Edge vs. Level Triggered IRQs
> >> +-------------------------------------------------------------------
> >> +
> >> +Mapped non-shared interrupts injected to a guest should always mark the
> >> +interrupt as active on the physical distributor.
> When injecting the virtual IRQ associated to the mapped=forwarded IRQ
> (see next comment), the host must not deactivate the physical IRQ so
> that its active state remains?

almost, but this is not the case for the timer, where we set the active
state manually, I'll attempt a reword.

> >> +
> >> +The reasoning for level-triggered interrupts:
> >> +For level-triggered interrupts, we have to mark the interrupt as active
> >> +on the physical distributor,
> to leave the interrupt as active? I have the impression you talk about
> shared IRQ here where the HW would not have any impact on the physical
> distributor state? The physical IRQ can be pending+active too?
>  because otherwise, as the line remains

I'm talking about any level-triggered interrupt where you handle the
interrupt in the guest.  In that case, if you keep deactivating the
interrupt on the host the guest will never make progress.

This is not specific to the timer I think?

Yes, the physical IRQ can be pending+active, but the key is that the
host doesn't deactivate the IRQ.

> >> +asserted, the guest will never execute because the host will keep taking
> >> +interrupts.  As soon as the guest deactivates the interrupt, the
> >> +physical line is sampled by the hardware again and the host takes a new
> >> +interrupt if the physical line is still asserted.
> >> +
> >> +The reasoning for edge-triggered interrupts:
> >> +For edge-triggered interrupts, if we set the HW bit in the LR we also
> >> +have to mark the interrupt as active on the physical distributor.  If we
> >> +don't set the physical active bit and the interrupt hits again before
> >> +the guest has deactivated the interrupt, the interrupt goes to the host,
> >> +which cannot set the state to ACTIVE+PENDING in the LR, because that is
> >> +not supported when setting the HW bit in the LR.
> >> +
> >> +An alternative could be to not use HW bit at all, and inject
> >> +edge-triggered interrupts from a physical assigned device as pure
> >> +virtual interrupts, but that would potentially slow down handling of the
> >> +interrupt in the guest, because a physical interrupt occurring in the
> >> +middle of the guest ISR would preempt the guest for the host to handle
> >> +the interrupt.
> > 
> > It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
> > do not have an Active state (they are either Pending or not), so we'll
> > have to deal with edge interrupts as you just described at some point.
> > Other architectures do something similar, I'd expect.
> > 
> >> +
> >> +
> >> +Life Cycle for Forwarded Physical Interrupts
> >> +--------------------------------------------
> >> +
> >> +By forwarded physical interrupts we mean interrupts presented to a guest
> >> +representing a real HW event originally signaled to the host as a
> > 
> > s/signaled/signalled/
> > 
> >> +physical interrupt
> is it always true for the timer? sometimes isn't it a SW counter that
> expires and upon that event you inject the virtual IRQ with HW bit set?
>  and injecting this as a virtual interrupt with the HW

well you restore the timer state at the same time, so you setup the
hardware exactly as if the timer device raised a physical interrupt
while running the VM.  (in fact that's why we set the active state as
well, because we re-program the arch-timer to assert the line when we
enter the guest).

> >> +bit set in the LR.
> another definition of a forwarded/mapped physical IRQ is a physical IRQ
> that is deactivated by the guest and not by the host.

I was deliberately going for a broader definition; generally trying to
describe how we deal with interrupts from a physical device handled by a
guest, which we do by letting the guest deactive the interrupt.  I think
this is the correct causality analysis ... ?

> 
> Shouldn't we start this file by the definition of a Forwarded Physical
> Interrupts. Here you were supposed to describe their Life Cycle. Also
> note that we previously talked about mapped IRQ and now we talk about
> forwarded IRQ which can be confusing for the reader. 

I agree that the flow feels a bit ad-hoc here.  I'll try to rework the
doc.


> Also we may
> re-introduce the fact that we distinguish between shared and non shared
> beasts to give the full picture?

I'll try to clarify this as well.


> >> +
> >> +The state of such an interrupt is managed in the following way:
> >> +
> >> +  - LR.Pending must be set when the interrupt is first injected, because this
> >> +    is the only way the GICV interface is going to present it to the guest.
> >> +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> >> +  - LR.Pending transitions to LR.Active on read of IAR, as expected.
> >> +  - On EOI, the *physical distributor* active bit gets cleared, but the
> >> +    LR.Active is left untouched - it looks like the GIC can only clear a
> >> +    single bit (either the virtual active, or the physical one).
> >> +  - This means we cannot trust LR.Active to find out about the state of the
> >> +    interrupt, and we definitely need to look at the distributor version.
> physical distributor version?

indeed, I'll reword.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
  2015-09-03 15:23     ` Marc Zyngier
@ 2015-09-04 15:55       ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-04 15:55 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm

On Thu, Sep 03, 2015 at 04:23:04PM +0100, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
> > Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
> > with them is not apparently easy to understand by reading various specs.
> > 
> > Therefore, add a proper documentation file explaining the flow and
> > rationale of the behavior of the vgic.
> > 
> > Some of this text was contributed by Marc Zyngier.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
> >  1 file changed, 59 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > 
> > diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > new file mode 100644
> > index 0000000..49e1357
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > @@ -0,0 +1,59 @@
> > +KVM/ARM VGIC Mapped Interrupts
> > +==============================
> > +
> > +Setting the Physical Active State for Edge vs. Level Triggered IRQs
> > +-------------------------------------------------------------------
> > +
> > +Mapped non-shared interrupts injected to a guest should always mark the
> > +interrupt as active on the physical distributor.
> > +
> > +The reasoning for level-triggered interrupts:
> > +For level-triggered interrupts, we have to mark the interrupt as active
> > +on the physical distributor, because otherwise, as the line remains
> > +asserted, the guest will never execute because the host will keep taking
> > +interrupts.  As soon as the guest deactivates the interrupt, the
> > +physical line is sampled by the hardware again and the host takes a new
> > +interrupt if the physical line is still asserted.
> > +
> > +The reasoning for edge-triggered interrupts:
> > +For edge-triggered interrupts, if we set the HW bit in the LR we also
> > +have to mark the interrupt as active on the physical distributor.  If we
> > +don't set the physical active bit and the interrupt hits again before
> > +the guest has deactivated the interrupt, the interrupt goes to the host,
> > +which cannot set the state to ACTIVE+PENDING in the LR, because that is
> > +not supported when setting the HW bit in the LR.
> > +
> > +An alternative could be to not use HW bit at all, and inject
> > +edge-triggered interrupts from a physical assigned device as pure
> > +virtual interrupts, but that would potentially slow down handling of the
> > +interrupt in the guest, because a physical interrupt occurring in the
> > +middle of the guest ISR would preempt the guest for the host to handle
> > +the interrupt.
> 
> It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
> do not have an Active state (they are either Pending or not), so we'll
> have to deal with edge interrupts as you just described at some point.
> Other architectures do something similar, I'd expect.
> 
> > +
> > +
> > +Life Cycle for Forwarded Physical Interrupts
> > +--------------------------------------------
> > +
> > +By forwarded physical interrupts we mean interrupts presented to a guest
> > +representing a real HW event originally signaled to the host as a
> 
> s/signaled/signalled/
> 
> > +physical interrupt and injecting this as a virtual interrupt with the HW
> > +bit set in the LR.
> > +
> > +The state of such an interrupt is managed in the following way:
> > +
> > +  - LR.Pending must be set when the interrupt is first injected, because this
> > +    is the only way the GICV interface is going to present it to the guest.
> > +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> > +  - LR.Pending transitions to LR.Active on read of IAR, as expected.
> > +  - On EOI, the *physical distributor* active bit gets cleared, but the
> > +    LR.Active is left untouched - it looks like the GIC can only clear a
> > +    single bit (either the virtual active, or the physical one).
> > +  - This means we cannot trust LR.Active to find out about the state of the
> > +    interrupt, and we definitely need to look at the distributor version.
> > +
> > +Consequently, when we context switch the state of a VCPU with forwarded
> > +physical interrupts, we must context switch set pending *or* active bits in the
> > +LR for that VCPU until the guest has deactivated the physical interrupt, and
> > +then clear the corresponding bits in the LR.  If we ever set an LR to pending or
> > +mapped when switching in a VCPU for a forwarded physical interrupt, we must also
> > +set the active state on the *physical distributor*.
> > 
> 
> I wonder if it may be worth adding a small example with the timer,
> because it is not immediately obvious why the interrupt would fire on
> and on without putting the generating device in the picture...
> 
Yes, probably.

I'll try to work both yours and Eric's comments into a new version.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
@ 2015-09-04 15:55       ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-04 15:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 03, 2015 at 04:23:04PM +0100, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
> > Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
> > with them is not apparently easy to understand by reading various specs.
> > 
> > Therefore, add a proper documentation file explaining the flow and
> > rationale of the behavior of the vgic.
> > 
> > Some of this text was contributed by Marc Zyngier.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
> >  1 file changed, 59 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > 
> > diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > new file mode 100644
> > index 0000000..49e1357
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > @@ -0,0 +1,59 @@
> > +KVM/ARM VGIC Mapped Interrupts
> > +==============================
> > +
> > +Setting the Physical Active State for Edge vs. Level Triggered IRQs
> > +-------------------------------------------------------------------
> > +
> > +Mapped non-shared interrupts injected to a guest should always mark the
> > +interrupt as active on the physical distributor.
> > +
> > +The reasoning for level-triggered interrupts:
> > +For level-triggered interrupts, we have to mark the interrupt as active
> > +on the physical distributor, because otherwise, as the line remains
> > +asserted, the guest will never execute because the host will keep taking
> > +interrupts.  As soon as the guest deactivates the interrupt, the
> > +physical line is sampled by the hardware again and the host takes a new
> > +interrupt if the physical line is still asserted.
> > +
> > +The reasoning for edge-triggered interrupts:
> > +For edge-triggered interrupts, if we set the HW bit in the LR we also
> > +have to mark the interrupt as active on the physical distributor.  If we
> > +don't set the physical active bit and the interrupt hits again before
> > +the guest has deactivated the interrupt, the interrupt goes to the host,
> > +which cannot set the state to ACTIVE+PENDING in the LR, because that is
> > +not supported when setting the HW bit in the LR.
> > +
> > +An alternative could be to not use HW bit at all, and inject
> > +edge-triggered interrupts from a physical assigned device as pure
> > +virtual interrupts, but that would potentially slow down handling of the
> > +interrupt in the guest, because a physical interrupt occurring in the
> > +middle of the guest ISR would preempt the guest for the host to handle
> > +the interrupt.
> 
> It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
> do not have an Active state (they are either Pending or not), so we'll
> have to deal with edge interrupts as you just described at some point.
> Other architectures do something similar, I'd expect.
> 
> > +
> > +
> > +Life Cycle for Forwarded Physical Interrupts
> > +--------------------------------------------
> > +
> > +By forwarded physical interrupts we mean interrupts presented to a guest
> > +representing a real HW event originally signaled to the host as a
> 
> s/signaled/signalled/
> 
> > +physical interrupt and injecting this as a virtual interrupt with the HW
> > +bit set in the LR.
> > +
> > +The state of such an interrupt is managed in the following way:
> > +
> > +  - LR.Pending must be set when the interrupt is first injected, because this
> > +    is the only way the GICV interface is going to present it to the guest.
> > +  - LR.Pending will stay set as long as the guest has not acked the interrupt.
> > +  - LR.Pending transitions to LR.Active on read of IAR, as expected.
> > +  - On EOI, the *physical distributor* active bit gets cleared, but the
> > +    LR.Active is left untouched - it looks like the GIC can only clear a
> > +    single bit (either the virtual active, or the physical one).
> > +  - This means we cannot trust LR.Active to find out about the state of the
> > +    interrupt, and we definitely need to look at the distributor version.
> > +
> > +Consequently, when we context switch the state of a VCPU with forwarded
> > +physical interrupts, we must context switch set pending *or* active bits in the
> > +LR for that VCPU until the guest has deactivated the physical interrupt, and
> > +then clear the corresponding bits in the LR.  If we ever set an LR to pending or
> > +mapped when switching in a VCPU for a forwarded physical interrupt, we must also
> > +set the active state on the *physical distributor*.
> > 
> 
> I wonder if it may be worth adding a small example with the timer,
> because it is not immediately obvious why the interrupt would fire on
> and on without putting the generating device in the picture...
> 
Yes, probably.

I'll try to work both yours and Eric's comments into a new version.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
  2015-09-03 15:23     ` Marc Zyngier
@ 2015-09-04 15:57       ` Christoffer Dall
  -1 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-04 15:57 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, kvm

On Thu, Sep 03, 2015 at 04:23:04PM +0100, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
> > Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
> > with them is not apparently easy to understand by reading various specs.
> > 
> > Therefore, add a proper documentation file explaining the flow and
> > rationale of the behavior of the vgic.
> > 
> > Some of this text was contributed by Marc Zyngier.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
> >  1 file changed, 59 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > 
> > diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > new file mode 100644
> > index 0000000..49e1357
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > @@ -0,0 +1,59 @@
> > +KVM/ARM VGIC Mapped Interrupts
> > +==============================
> > +
> > +Setting the Physical Active State for Edge vs. Level Triggered IRQs
> > +-------------------------------------------------------------------
> > +
> > +Mapped non-shared interrupts injected to a guest should always mark the
> > +interrupt as active on the physical distributor.
> > +
> > +The reasoning for level-triggered interrupts:
> > +For level-triggered interrupts, we have to mark the interrupt as active
> > +on the physical distributor, because otherwise, as the line remains
> > +asserted, the guest will never execute because the host will keep taking
> > +interrupts.  As soon as the guest deactivates the interrupt, the
> > +physical line is sampled by the hardware again and the host takes a new
> > +interrupt if the physical line is still asserted.
> > +
> > +The reasoning for edge-triggered interrupts:
> > +For edge-triggered interrupts, if we set the HW bit in the LR we also
> > +have to mark the interrupt as active on the physical distributor.  If we
> > +don't set the physical active bit and the interrupt hits again before
> > +the guest has deactivated the interrupt, the interrupt goes to the host,
> > +which cannot set the state to ACTIVE+PENDING in the LR, because that is
> > +not supported when setting the HW bit in the LR.
> > +
> > +An alternative could be to not use HW bit at all, and inject
> > +edge-triggered interrupts from a physical assigned device as pure
> > +virtual interrupts, but that would potentially slow down handling of the
> > +interrupt in the guest, because a physical interrupt occurring in the
> > +middle of the guest ISR would preempt the guest for the host to handle
> > +the interrupt.
> 
> It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
> do not have an Active state (they are either Pending or not), so we'll
> have to deal with edge interrupts as you just described at some point.
> Other architectures do something similar, I'd expect.
> 
> > +
> > +
> > +Life Cycle for Forwarded Physical Interrupts
> > +--------------------------------------------
> > +
> > +By forwarded physical interrupts we mean interrupts presented to a guest
> > +representing a real HW event originally signaled to the host as a
> 
> s/signaled/signalled/
> 
Actaully this was my first version as well, but aspell told me it was
spelled signaled.

Turns out it's mostly acceptable to use both spellings:

http://www.merriam-webster.com/dictionary/signaled

-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
@ 2015-09-04 15:57       ` Christoffer Dall
  0 siblings, 0 replies; 74+ messages in thread
From: Christoffer Dall @ 2015-09-04 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 03, 2015 at 04:23:04PM +0100, Marc Zyngier wrote:
> On 30/08/15 14:54, Christoffer Dall wrote:
> > Mapped interrupts on arm/arm64 is a tricky concept and the way we deal
> > with them is not apparently easy to understand by reading various specs.
> > 
> > Therefore, add a proper documentation file explaining the flow and
> > rationale of the behavior of the vgic.
> > 
> > Some of this text was contributed by Marc Zyngier.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt | 59 ++++++++++++++++++++++
> >  1 file changed, 59 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > 
> > diff --git a/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > new file mode 100644
> > index 0000000..49e1357
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt
> > @@ -0,0 +1,59 @@
> > +KVM/ARM VGIC Mapped Interrupts
> > +==============================
> > +
> > +Setting the Physical Active State for Edge vs. Level Triggered IRQs
> > +-------------------------------------------------------------------
> > +
> > +Mapped non-shared interrupts injected to a guest should always mark the
> > +interrupt as active on the physical distributor.
> > +
> > +The reasoning for level-triggered interrupts:
> > +For level-triggered interrupts, we have to mark the interrupt as active
> > +on the physical distributor, because otherwise, as the line remains
> > +asserted, the guest will never execute because the host will keep taking
> > +interrupts.  As soon as the guest deactivates the interrupt, the
> > +physical line is sampled by the hardware again and the host takes a new
> > +interrupt if the physical line is still asserted.
> > +
> > +The reasoning for edge-triggered interrupts:
> > +For edge-triggered interrupts, if we set the HW bit in the LR we also
> > +have to mark the interrupt as active on the physical distributor.  If we
> > +don't set the physical active bit and the interrupt hits again before
> > +the guest has deactivated the interrupt, the interrupt goes to the host,
> > +which cannot set the state to ACTIVE+PENDING in the LR, because that is
> > +not supported when setting the HW bit in the LR.
> > +
> > +An alternative could be to not use HW bit at all, and inject
> > +edge-triggered interrupts from a physical assigned device as pure
> > +virtual interrupts, but that would potentially slow down handling of the
> > +interrupt in the guest, because a physical interrupt occurring in the
> > +middle of the guest ISR would preempt the guest for the host to handle
> > +the interrupt.
> 
> It would be worth mentioning that this is valid for PPIs and SPIs. LPIs
> do not have an Active state (they are either Pending or not), so we'll
> have to deal with edge interrupts as you just described at some point.
> Other architectures do something similar, I'd expect.
> 
> > +
> > +
> > +Life Cycle for Forwarded Physical Interrupts
> > +--------------------------------------------
> > +
> > +By forwarded physical interrupts we mean interrupts presented to a guest
> > +representing a real HW event originally signaled to the host as a
> 
> s/signaled/signalled/
> 
Actaully this was my first version as well, but aspell told me it was
spelled signaled.

Turns out it's mostly acceptable to use both spellings:

http://www.merriam-webster.com/dictionary/signaled

-Christoffer

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
  2015-09-04 15:57       ` Christoffer Dall
@ 2015-09-04 15:59         ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-04 15:59 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, linux-arm-kernel, kvm

On 04/09/15 16:57, Christoffer Dall wrote:
> On Thu, Sep 03, 2015 at 04:23:04PM +0100, Marc Zyngier wrote:
>> On 30/08/15 14:54, Christoffer Dall wrote:
>>> +
>>> +
>>> +Life Cycle for Forwarded Physical Interrupts
>>> +--------------------------------------------
>>> +
>>> +By forwarded physical interrupts we mean interrupts presented to a guest
>>> +representing a real HW event originally signaled to the host as a
>>
>> s/signaled/signalled/
>>
> Actaully this was my first version as well, but aspell told me it was
> spelled signaled.
> 
> Turns out it's mostly acceptable to use both spellings:
> 
> http://www.merriam-webster.com/dictionary/signaled

I stand corrected! :-)

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation
@ 2015-09-04 15:59         ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2015-09-04 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/09/15 16:57, Christoffer Dall wrote:
> On Thu, Sep 03, 2015 at 04:23:04PM +0100, Marc Zyngier wrote:
>> On 30/08/15 14:54, Christoffer Dall wrote:
>>> +
>>> +
>>> +Life Cycle for Forwarded Physical Interrupts
>>> +--------------------------------------------
>>> +
>>> +By forwarded physical interrupts we mean interrupts presented to a guest
>>> +representing a real HW event originally signaled to the host as a
>>
>> s/signaled/signalled/
>>
> Actaully this was my first version as well, but aspell told me it was
> spelled signaled.
> 
> Turns out it's mostly acceptable to use both spellings:
> 
> http://www.merriam-webster.com/dictionary/signaled

I stand corrected! :-)

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 5/9] arm/arm64: KVM: Use appropriate define in VGIC reset code
  2015-08-30 13:54   ` Christoffer Dall
@ 2015-09-04 16:08     ` Eric Auger
  -1 siblings, 0 replies; 74+ messages in thread
From: Eric Auger @ 2015-09-04 16:08 UTC (permalink / raw)
  To: Christoffer Dall, kvmarm, linux-arm-kernel, kvm

On 08/30/2015 03:54 PM, Christoffer Dall wrote:
> We currently initialize the SGIs to be enabled in the VGIC code, but we
> use the VGIC_NR_PPIS define for this purpose, instead of the the more
s/the//

Eric
> natural VGIC_NR_SGIS.  Change this slightly confusing use of the
> defines.
> 
> Note: This should have no functional change, as both names are defined
> to the number 16.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 0ba92d3..8299c24 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -2099,7 +2099,7 @@ int vgic_init(struct kvm *kvm)
>  		}
>  
>  		for (i = 0; i < dist->nr_irqs; i++) {
> -			if (i < VGIC_NR_PPIS)
> +			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>  							vcpu->vcpu_id, i, 1);
>  			if (i < VGIC_NR_PRIVATE_IRQS)
> 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 5/9] arm/arm64: KVM: Use appropriate define in VGIC reset code
@ 2015-09-04 16:08     ` Eric Auger
  0 siblings, 0 replies; 74+ messages in thread
From: Eric Auger @ 2015-09-04 16:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/30/2015 03:54 PM, Christoffer Dall wrote:
> We currently initialize the SGIs to be enabled in the VGIC code, but we
> use the VGIC_NR_PPIS define for this purpose, instead of the the more
s/the//

Eric
> natural VGIC_NR_SGIS.  Change this slightly confusing use of the
> defines.
> 
> Note: This should have no functional change, as both names are defined
> to the number 16.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 0ba92d3..8299c24 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -2099,7 +2099,7 @@ int vgic_init(struct kvm *kvm)
>  		}
>  
>  		for (i = 0; i < dist->nr_irqs; i++) {
> -			if (i < VGIC_NR_PPIS)
> +			if (i < VGIC_NR_SGIS)
>  				vgic_bitmap_set_irq_val(&dist->irq_enabled,
>  							vcpu->vcpu_id, i, 1);
>  			if (i < VGIC_NR_PRIVATE_IRQS)
> 

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2015-09-04 16:09 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-30 13:54 [PATCH 0/9] Rework architected timer and fix UEFI reset Christoffer Dall
2015-08-30 13:54 ` Christoffer Dall
2015-08-30 13:54 ` [PATCH 1/9] KVM: Add kvm_arch_vcpu_{un}blocking callbacks Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-09-03 14:21   ` Marc Zyngier
2015-09-03 14:21     ` Marc Zyngier
2015-09-04 13:50   ` Eric Auger
2015-09-04 13:50     ` Eric Auger
2015-09-04 14:50     ` Christoffer Dall
2015-09-04 14:50       ` Christoffer Dall
2015-08-30 13:54 ` [PATCH 2/9] arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-09-03 14:43   ` Marc Zyngier
2015-09-03 14:43     ` Marc Zyngier
2015-09-03 14:58     ` Christoffer Dall
2015-09-03 14:58       ` Christoffer Dall
2015-09-03 15:53       ` Marc Zyngier
2015-09-03 15:53         ` Marc Zyngier
2015-09-03 16:09         ` Christoffer Dall
2015-09-03 16:09           ` Christoffer Dall
2015-08-30 13:54 ` [PATCH 3/9] arm/arm64: KVM: vgic: Factor out level irq processing on guest exit Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-09-03 15:01   ` Marc Zyngier
2015-09-03 15:01     ` Marc Zyngier
2015-08-30 13:54 ` [PATCH 4/9] arm/arm64: Implement GICD_ICFGR as RO for PPIs Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-09-03 15:03   ` Marc Zyngier
2015-09-03 15:03     ` Marc Zyngier
2015-08-30 13:54 ` [PATCH 5/9] arm/arm64: KVM: Use appropriate define in VGIC reset code Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-09-03 15:04   ` Marc Zyngier
2015-09-03 15:04     ` Marc Zyngier
2015-09-04 16:08   ` Eric Auger
2015-09-04 16:08     ` Eric Auger
2015-08-30 13:54 ` [PATCH 6/9] arm/arm64: KVM: Add mapped interrupts documentation Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-09-03 15:23   ` Marc Zyngier
2015-09-03 15:23     ` Marc Zyngier
2015-09-03 15:56     ` Eric Auger
2015-09-03 15:56       ` Eric Auger
2015-09-04 15:54       ` Christoffer Dall
2015-09-04 15:54         ` Christoffer Dall
2015-09-04 15:55     ` Christoffer Dall
2015-09-04 15:55       ` Christoffer Dall
2015-09-04 15:57     ` Christoffer Dall
2015-09-04 15:57       ` Christoffer Dall
2015-09-04 15:59       ` Marc Zyngier
2015-09-04 15:59         ` Marc Zyngier
2015-08-30 13:54 ` [PATCH 7/9] arm/arm64: KVM: vgic: Move active state handling to flush_hwstate Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-09-03 15:33   ` Marc Zyngier
2015-09-03 15:33     ` Marc Zyngier
2015-08-30 13:54 ` [PATCH 8/9] arm/arm64: KVM: Rework the arch timer to use level-triggered semantics Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-09-03 17:06   ` Marc Zyngier
2015-09-03 17:06     ` Marc Zyngier
2015-09-03 17:23     ` Christoffer Dall
2015-09-03 17:23       ` Christoffer Dall
2015-09-03 17:29       ` Marc Zyngier
2015-09-03 17:29         ` Marc Zyngier
2015-09-03 22:00         ` Christoffer Dall
2015-09-03 22:00           ` Christoffer Dall
2015-08-30 13:54 ` [PATCH 9/9] arm/arm64: KVM: arch timer: Reset CNTV_CTL to 0 Christoffer Dall
2015-08-30 13:54   ` Christoffer Dall
2015-08-31  8:46   ` Ard Biesheuvel
2015-08-31  8:46     ` Ard Biesheuvel
2015-08-31  8:57     ` Christoffer Dall
2015-08-31  8:57       ` Christoffer Dall
2015-08-31  9:02       ` Ard Biesheuvel
2015-08-31  9:02         ` Ard Biesheuvel
2015-09-03 17:07   ` Marc Zyngier
2015-09-03 17:07     ` Marc Zyngier
2015-09-03 17:10 ` [PATCH 0/9] Rework architected timer and fix UEFI reset Marc Zyngier
2015-09-03 17:10   ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.