xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT
@ 2021-04-20 14:07 Roger Pau Monne
  2021-04-20 14:07 ` [PATCH v4 01/12] x86/rtc: drop code related to strict mode Roger Pau Monne
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu,
	Paul Durrant, Jun Nakajima, Kevin Tian

Hello,

The following series introduces EOI callbacks and switches a number of
subsystems to use them. The first one is vmsi and dpci, which are quite
straight forward to switch since they used to open code hooks in the EOI
handlers.

Finally HVM virtual timers are also switched to a different model where
EOI callbacks are used instead of checking at every return to guest
whether a timer interrupt is being injected. Note that such change also
fixes a bug in virtual periodic timers that prevented injecting to a vCPU
different than the one where the timer is assigned (and that would
currently match the i8259 target).

Those changes are paired together so that practical applications of
having EOI callbacks can be seen in action.

Note that further cleanup can be done, but I think the series is already
big enough and provides a clear benefit.

A couple of notes from the testing performed:
 - PVH dom0.
 - Windows guest, limited to 2% capacity to test delay for missed ticks
   mode - time is consistent in the guest.
 - Windows guest migration.
 - PCI passthrough to a guest.

Thanks, Roger.

Roger Pau Monne (12):
  x86/rtc: drop code related to strict mode
  x86/vlapic: introduce an EOI callback mechanism
  x86/vmsi: use the newly introduced EOI callbacks
  x86/vioapic: switch to use the EOI callback mechanism
  x86/hvm: allowing registering EOI callbacks for GSIs
  x86/dpci: move code
  x86/dpci: switch to use a GSI EOI callback
  x86/vpt: switch interrupt injection model
  x86/irq: remove unused parameter from hvm_isa_irq_assert
  x86/irq: drop return value from hvm_ioapic_assert
  x86/vpt: remove vPT timers per-vCPU lists
  x86/vpt: introduce a per-vPT lock

 xen/arch/x86/domain.c             |   4 +-
 xen/arch/x86/emul-i8254.c         |   1 +
 xen/arch/x86/hvm/dm.c             |   2 +-
 xen/arch/x86/hvm/hpet.c           |   8 +-
 xen/arch/x86/hvm/hvm.c            |  23 +-
 xen/arch/x86/hvm/irq.c            |  93 +++++-
 xen/arch/x86/hvm/pmtimer.c        |   2 +-
 xen/arch/x86/hvm/rtc.c            |  30 +-
 xen/arch/x86/hvm/svm/intr.c       |   3 -
 xen/arch/x86/hvm/vioapic.c        | 144 ++++++---
 xen/arch/x86/hvm/vlapic.c         |  72 ++++-
 xen/arch/x86/hvm/vmsi.c           |  35 +-
 xen/arch/x86/hvm/vmx/intr.c       |  59 ----
 xen/arch/x86/hvm/vpic.c           |   8 +-
 xen/arch/x86/hvm/vpt.c            | 514 +++++++++---------------------
 xen/drivers/passthrough/x86/hvm.c | 223 ++++++++-----
 xen/include/asm-x86/hvm/io.h      |   3 +-
 xen/include/asm-x86/hvm/irq.h     |  28 +-
 xen/include/asm-x86/hvm/vcpu.h    |   4 -
 xen/include/asm-x86/hvm/vlapic.h  |  19 +-
 xen/include/asm-x86/hvm/vpt.h     |  32 +-
 21 files changed, 630 insertions(+), 677 deletions(-)

-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 01/12] x86/rtc: drop code related to strict mode
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-04-29 14:53   ` Jan Beulich
  2021-04-20 14:07 ` [PATCH v4 02/12] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Xen has been for a long time setting the WAET ACPI table "RTC good"
flag, which implies there's no need to perform a read of the RTC REG_C
register in order to get further interrupts after having received one.
This is hardcoded in the static ACPI tables, and in the RTC emulation
in Xen.

Drop the support for the alternative (strict) mode, it's been unused
for a long (since Xen 4.3) time without any complains.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Further changes in the series will require that no registering or
unregistering of callback is done inside of the handlers themselves,
like it was done in rtc_pf_callback when in strict_mode.
---
Changes since v3:
 - New in this version.
---
 xen/arch/x86/hvm/rtc.c | 27 +--------------------------
 xen/arch/x86/hvm/vpt.c |  4 +---
 2 files changed, 2 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 3150f5f1479..9992595c45a 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -46,15 +46,6 @@
 #define epoch_year     1900
 #define get_year(x)    (x + epoch_year)
 
-enum rtc_mode {
-   rtc_mode_no_ack,
-   rtc_mode_strict
-};
-
-/* This must be in sync with how hvmloader sets the ACPI WAET flags. */
-#define mode_is(d, m) ((void)(d), rtc_mode_##m == rtc_mode_no_ack)
-#define rtc_mode_is(s, m) mode_is(vrtc_domain(s), m)
-
 static void rtc_copy_date(RTCState *s);
 static void rtc_set_time(RTCState *s);
 static inline int from_bcd(RTCState *s, int a);
@@ -64,9 +55,6 @@ static void rtc_update_irq(RTCState *s)
 {
     ASSERT(spin_is_locked(&s->lock));
 
-    if ( rtc_mode_is(s, strict) && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF) )
-        return;
-
     /* IRQ is raised if any source is both raised & enabled */
     if ( !(s->hw.cmos_data[RTC_REG_B] &
            s->hw.cmos_data[RTC_REG_C] &
@@ -74,8 +62,7 @@ static void rtc_update_irq(RTCState *s)
         return;
 
     s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
-    if ( rtc_mode_is(s, no_ack) )
-        hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ);
+    hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ);
     hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ, NULL);
 }
 
@@ -86,19 +73,7 @@ static void rtc_pf_callback(struct vcpu *v, void *opaque)
     RTCState *s = opaque;
 
     spin_lock(&s->lock);
-
-    if ( !rtc_mode_is(s, no_ack)
-         && (s->hw.cmos_data[RTC_REG_C] & RTC_IRQF)
-         && ++(s->pt_dead_ticks) >= 10 )
-    {
-        /* VM is ignoring its RTC; no point in running the timer */
-        TRACE_0D(TRC_HVM_EMUL_RTC_STOP_TIMER);
-        destroy_periodic_time(&s->pt);
-        s->period = 0;
-    }
-
     s->hw.cmos_data[RTC_REG_C] |= RTC_PF|RTC_IRQF;
-
     spin_unlock(&s->lock);
 }
 
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 4cc0a0848bd..24d90c0a186 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -21,7 +21,6 @@
 #include <asm/hvm/vpt.h>
 #include <asm/event.h>
 #include <asm/apic.h>
-#include <asm/mc146818rtc.h>
 #include <public/hvm/params.h>
 
 #define mode_is(d, name) \
@@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
     {
         if ( pt->pending_intr_nr )
         {
-            /* RTC code takes care of disabling the timer itself. */
-            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
+            if ( pt_irq_masked(pt) &&
                  /* Level interrupts should be asserted even if masked. */
                  !pt->level )
             {
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 02/12] x86/vlapic: introduce an EOI callback mechanism
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
  2021-04-20 14:07 ` [PATCH v4 01/12] x86/rtc: drop code related to strict mode Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-04-29 15:48   ` Jan Beulich
  2021-04-20 14:07 ` [PATCH v4 03/12] x86/vmsi: use the newly introduced EOI callbacks Roger Pau Monne
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Add a new vlapic_set_irq_callback helper in order to inject a vector
and set a callback to be executed when the guest performs the end of
interrupt acknowledgment.

Such functionality will be used to migrate the current ad hoc handling
done in vlapic_handle_EOI for the vectors that require some logic to
be executed when the end of interrupt is performed.

The setter of the callback will be in charge for setting the callback
again on guest restore, as callbacks are not saved as part of the
vlapic state. That is the reason why vlapic_set_callback is not a
static function.

No current users are migrated to use this new functionality yet, so no
functional change expected as a result.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
 - Use xzalloc.
 - Drop printk on ENOMEM.
 - Add vcpu parameter to vlapic EOI callback.
 - Check that the vector is pending in ISR or IRR when printing a
   warning message because of an overriding callback.
 - Fix commit message regarding resume mention.

Changes since v2:
 - Fix commit message typo.
 - Expand commit message.
 - Also print a warning if the callback data is overridden.
 - Properly free memory in case of error in vlapic_init.

Changes since v1:
 - Make vlapic_set_irq an inline function on the header.
 - Clear the callback hook in vlapic_handle_EOI.
 - Introduce a helper to set the callback without injecting a vector.
 - Remove unneeded parentheses.
 - Reduce callback table by 16.
 - Use %pv to print domain/vcpu ID.
---
 xen/arch/x86/hvm/vlapic.c        | 62 ++++++++++++++++++++++++++++++--
 xen/include/asm-x86/hvm/vlapic.h | 19 +++++++++-
 2 files changed, 78 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 5e21fb4937d..4465beaeec1 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -144,7 +144,37 @@ bool vlapic_test_irq(const struct vlapic *vlapic, uint8_t vec)
     return vlapic_test_vector(vec, &vlapic->regs->data[APIC_IRR]);
 }
 
-void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig)
+void vlapic_set_callback(struct vlapic *vlapic, unsigned int vec,
+                         vlapic_eoi_callback_t *callback, void *data)
+{
+    unsigned long flags;
+    unsigned int index = vec - 16;
+
+    if ( !callback || vec < 16 || vec >= X86_NR_VECTORS )
+    {
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    spin_lock_irqsave(&vlapic->callback_lock, flags);
+    if ( vlapic->callbacks[index].callback &&
+         (vlapic->callbacks[index].callback != callback ||
+          vlapic->callbacks[index].data != data) &&
+         (vlapic_test_vector(vec, &vlapic->regs->data[APIC_IRR]) ||
+          vlapic_test_vector(vec, &vlapic->regs->data[APIC_ISR])) )
+        printk(XENLOG_G_WARNING
+               "%pv overriding vector %#x callback %ps (%p) data %p "
+               "with %ps (%p) data %p\n",
+               vlapic_vcpu(vlapic), vec, vlapic->callbacks[index].callback,
+               vlapic->callbacks[index].callback, vlapic->callbacks[index].data,
+               callback, callback, data);
+    vlapic->callbacks[index].callback = callback;
+    vlapic->callbacks[index].data = data;
+    spin_unlock_irqrestore(&vlapic->callback_lock, flags);
+}
+
+void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
+                             vlapic_eoi_callback_t *callback, void *data)
 {
     struct vcpu *target = vlapic_vcpu(vlapic);
 
@@ -159,8 +189,12 @@ void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig)
     else
         vlapic_clear_vector(vec, &vlapic->regs->data[APIC_TMR]);
 
+    if ( callback )
+        vlapic_set_callback(vlapic, vec, callback, data);
+
     if ( hvm_funcs.update_eoi_exit_bitmap )
-        alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec, trig);
+        alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
+                          trig || callback);
 
     if ( hvm_funcs.deliver_posted_intr )
         alternative_vcall(hvm_funcs.deliver_posted_intr, target, vec);
@@ -461,11 +495,24 @@ void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
     struct vcpu *v = vlapic_vcpu(vlapic);
     struct domain *d = v->domain;
+    vlapic_eoi_callback_t *callback;
+    void *data;
+    unsigned long flags;
+    unsigned int index = vector - 16;
 
     if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
         vioapic_update_EOI(d, vector);
 
     hvm_dpci_msi_eoi(d, vector);
+
+    spin_lock_irqsave(&vlapic->callback_lock, flags);
+    callback = vlapic->callbacks[index].callback;
+    vlapic->callbacks[index].callback = NULL;
+    data = vlapic->callbacks[index].data;
+    spin_unlock_irqrestore(&vlapic->callback_lock, flags);
+
+    if ( callback )
+        callback(v, vector, data);
 }
 
 static bool_t is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand,
@@ -1623,9 +1670,19 @@ int vlapic_init(struct vcpu *v)
 
     clear_page(vlapic->regs);
 
+    vlapic->callbacks = xzalloc_array(typeof(*vlapic->callbacks),
+                                      X86_NR_VECTORS - 16);
+    if ( !vlapic->callbacks )
+    {
+        unmap_domain_page_global(vlapic->regs);
+        free_domheap_page(vlapic->regs_page);
+        return -ENOMEM;
+    }
+
     vlapic_reset(vlapic);
 
     spin_lock_init(&vlapic->esr_lock);
+    spin_lock_init(&vlapic->callback_lock);
 
     tasklet_init(&vlapic->init_sipi.tasklet, vlapic_init_sipi_action, v);
 
@@ -1647,6 +1704,7 @@ void vlapic_destroy(struct vcpu *v)
     destroy_periodic_time(&vlapic->pt);
     unmap_domain_page_global(vlapic->regs);
     free_domheap_page(vlapic->regs_page);
+    XFREE(vlapic->callbacks);
 }
 
 /*
diff --git a/xen/include/asm-x86/hvm/vlapic.h b/xen/include/asm-x86/hvm/vlapic.h
index 8f908928c35..db71fa38b0b 100644
--- a/xen/include/asm-x86/hvm/vlapic.h
+++ b/xen/include/asm-x86/hvm/vlapic.h
@@ -73,6 +73,9 @@
 #define vlapic_clear_vector(vec, bitmap)                                \
     clear_bit(VEC_POS(vec), (uint32_t *)((bitmap) + REG_POS(vec)))
 
+typedef void vlapic_eoi_callback_t(struct vcpu *v, unsigned int vector,
+                                   void *data);
+
 struct vlapic {
     struct hvm_hw_lapic      hw;
     struct hvm_hw_lapic_regs *regs;
@@ -89,6 +92,11 @@ struct vlapic {
         uint32_t             icr, dest;
         struct tasklet       tasklet;
     } init_sipi;
+    struct {
+        vlapic_eoi_callback_t *callback;
+        void                 *data;
+    } *callbacks;
+    spinlock_t               callback_lock;
 };
 
 /* vlapic's frequence is 100 MHz */
@@ -111,7 +119,16 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val);
 bool_t is_vlapic_lvtpc_enabled(struct vlapic *vlapic);
 
 bool vlapic_test_irq(const struct vlapic *vlapic, uint8_t vec);
-void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig);
+void vlapic_set_callback(struct vlapic *vlapic, unsigned int vec,
+                         vlapic_eoi_callback_t *callback, void *data);
+void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
+                             vlapic_eoi_callback_t *callback, void *data);
+
+static inline void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec,
+                                  uint8_t trig)
+{
+    vlapic_set_irq_callback(vlapic, vec, trig, NULL, NULL);
+}
 
 int vlapic_has_pending_irq(struct vcpu *v);
 int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool_t force_ack);
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 03/12] x86/vmsi: use the newly introduced EOI callbacks
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
  2021-04-20 14:07 ` [PATCH v4 01/12] x86/rtc: drop code related to strict mode Roger Pau Monne
  2021-04-20 14:07 ` [PATCH v4 02/12] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-04-20 14:07 ` [PATCH v4 04/12] x86/vioapic: switch to use the EOI callback mechanism Roger Pau Monne
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu, Paul Durrant

Remove the unconditional call to hvm_dpci_msi_eoi in vlapic_handle_EOI
and instead use the newly introduced EOI callback mechanism in order
to register a callback for MSI vectors injected from passed through
devices.

This avoids having multiple callback functions open-coded in
vlapic_handle_EOI, as there is now a generic framework for registering
such callbacks. It also avoids doing an unconditional call to
hvm_dpci_msi_eoi for each EOI processed by the local APIC.

Note that now the callback is only registered (and thus executed) when
there's an MSI interrupt originating from a PCI passthrough device
being injected into the guest, so the check in hvm_dpci_msi_eoi can be
removed as it's already done by hvm_dirq_assist which is the only
caller of vmsi_deliver_pirq.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v3:
 - Fix the callback to take a vcpu parameter.

Changes since v2:
 - Expand commit message.
 - Pass the domain as the callback data.
 - Remove the check in hvm_dpci_msi_eoi
---
 xen/arch/x86/hvm/vlapic.c         |  2 --
 xen/arch/x86/hvm/vmsi.c           | 35 ++++++++++++++++++-------------
 xen/drivers/passthrough/x86/hvm.c |  6 ++----
 xen/include/asm-x86/hvm/io.h      |  2 +-
 4 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 4465beaeec1..cfcbd732b16 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -503,8 +503,6 @@ void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
     if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
         vioapic_update_EOI(d, vector);
 
-    hvm_dpci_msi_eoi(d, vector);
-
     spin_lock_irqsave(&vlapic->callback_lock, flags);
     callback = vlapic->callbacks[index].callback;
     vlapic->callbacks[index].callback = NULL;
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b43..03ae0dfb3c5 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -44,11 +44,9 @@
 #include <asm/event.h>
 #include <asm/io_apic.h>
 
-static void vmsi_inj_irq(
-    struct vlapic *target,
-    uint8_t vector,
-    uint8_t trig_mode,
-    uint8_t delivery_mode)
+static void vmsi_inj_irq(struct vlapic *target, uint8_t vector,
+                         uint8_t trig_mode, uint8_t delivery_mode,
+                         vlapic_eoi_callback_t *callback, void *data)
 {
     HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "vmsi_inj_irq: vec %02x trig %d dm %d\n",
                 vector, trig_mode, delivery_mode);
@@ -57,17 +55,17 @@ static void vmsi_inj_irq(
     {
     case dest_Fixed:
     case dest_LowestPrio:
-        vlapic_set_irq(target, vector, trig_mode);
+        vlapic_set_irq_callback(target, vector, trig_mode, callback, data);
         break;
     default:
         BUG();
     }
 }
 
-int vmsi_deliver(
-    struct domain *d, int vector,
-    uint8_t dest, uint8_t dest_mode,
-    uint8_t delivery_mode, uint8_t trig_mode)
+static int vmsi_deliver_callback(struct domain *d, int vector, uint8_t dest,
+                                 uint8_t dest_mode, uint8_t delivery_mode,
+                                 uint8_t trig_mode,
+                                 vlapic_eoi_callback_t *callback, void *data)
 {
     struct vlapic *target;
     struct vcpu *v;
@@ -78,7 +76,8 @@ int vmsi_deliver(
         target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
         if ( target != NULL )
         {
-            vmsi_inj_irq(target, vector, trig_mode, delivery_mode);
+            vmsi_inj_irq(target, vector, trig_mode, delivery_mode, callback,
+                         data);
             break;
         }
         HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "null MSI round robin: vector=%02x\n",
@@ -89,8 +88,8 @@ int vmsi_deliver(
         for_each_vcpu ( d, v )
             if ( vlapic_match_dest(vcpu_vlapic(v), NULL,
                                    0, dest, dest_mode) )
-                vmsi_inj_irq(vcpu_vlapic(v), vector,
-                             trig_mode, delivery_mode);
+                vmsi_inj_irq(vcpu_vlapic(v), vector, trig_mode, delivery_mode,
+                             callback, data);
         break;
 
     default:
@@ -103,6 +102,13 @@ int vmsi_deliver(
     return 0;
 }
 
+int vmsi_deliver(struct domain *d, int vector, uint8_t dest, uint8_t dest_mode,
+                 uint8_t delivery_mode, uint8_t trig_mode)
+{
+    return vmsi_deliver_callback(d, vector, dest, dest_mode, delivery_mode,
+                                 trig_mode, NULL, NULL);
+}
+
 void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 {
     uint32_t flags = pirq_dpci->gmsi.gflags;
@@ -119,7 +125,8 @@ void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 
     ASSERT(pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI);
 
-    vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
+    vmsi_deliver_callback(d, vector, dest, dest_mode, delivery_mode, trig_mode,
+                          hvm_dpci_msi_eoi, NULL);
 }
 
 /* Return value, -1 : multi-dests, non-negative value: dest_vcpu_id */
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index 351daafdc9b..8f78c0935b9 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -796,11 +796,9 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
     return 0;
 }
 
-void hvm_dpci_msi_eoi(struct domain *d, int vector)
+void hvm_dpci_msi_eoi(struct vcpu *v, unsigned int vector, void *unused)
 {
-    if ( !is_iommu_enabled(d) ||
-         (!hvm_domain_irq(d)->dpci && !is_hardware_domain(d)) )
-       return;
+    struct domain *d = v->domain;
 
     spin_lock(&d->event_lock);
     pt_pirq_iterate(d, _hvm_dpci_msi_eoi, (void *)(long)vector);
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 54e0161b492..7f30dfa7fea 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -142,7 +142,7 @@ struct hvm_hw_stdvga {
 void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
-extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+void hvm_dpci_msi_eoi(struct vcpu *v, unsigned int vector, void *unused);
 
 /* Decode a PCI port IO access into a bus/slot/func/reg. */
 unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 04/12] x86/vioapic: switch to use the EOI callback mechanism
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (2 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 03/12] x86/vmsi: use the newly introduced EOI callbacks Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-04-29 15:51   ` Jan Beulich
  2021-04-20 14:07 ` [PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs Roger Pau Monne
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Switch the emulated IO-APIC code to use the local APIC EOI callback
mechanism. This allows to remove the last hardcoded callback from
vlapic_handle_EOI. Removing the hardcoded vIO-APIC callback also
allows to getting rid of setting the EOI exit bitmap based on the
triggering mode, as now all users that require an EOI action use the
newly introduced callback mechanism.

Move and rename the vioapic_update_EOI now that it can be made static.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
 - Remove assert in eoi_callback.
 - Cast callback to bool.
 - Simplify check in ioapic_load: GSIs < 16 and edge interrupts can
   also have callbacks.
 - Reword comment about casting to boolean.

Changes since v2:
 - Explicitly convert the last alternative_vcall parameter to a
   boolean in vlapic_set_irq_callback.

Changes since v1:
 - Remove the triggering check in the update_eoi_exit_bitmap call.
 - Register the vlapic callbacks when loading the vIO-APIC state.
 - Reduce scope of ent.
---
 xen/arch/x86/hvm/vioapic.c | 127 ++++++++++++++++++++++++-------------
 xen/arch/x86/hvm/vlapic.c  |  12 ++--
 2 files changed, 89 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 87370dd4172..1c2763fc2bf 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -394,6 +394,48 @@ static const struct hvm_mmio_ops vioapic_mmio_ops = {
     .write = vioapic_write
 };
 
+static void eoi_callback(struct vcpu *v, unsigned int vector, void *data)
+{
+    struct domain *d = v->domain;
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+    unsigned int i;
+
+    spin_lock(&d->arch.hvm.irq_lock);
+
+    for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
+    {
+        struct hvm_vioapic *vioapic = domain_vioapic(d, i);
+        unsigned int pin;
+
+        for ( pin = 0; pin < vioapic->nr_pins; pin++ )
+        {
+            union vioapic_redir_entry *ent = &vioapic->redirtbl[pin];
+
+            if ( ent->fields.vector != vector )
+                continue;
+
+            ent->fields.remote_irr = 0;
+
+            if ( is_iommu_enabled(d) )
+            {
+                spin_unlock(&d->arch.hvm.irq_lock);
+                hvm_dpci_eoi(d, vioapic->base_gsi + pin);
+                spin_lock(&d->arch.hvm.irq_lock);
+            }
+
+            if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
+                 !ent->fields.mask && !ent->fields.remote_irr &&
+                 hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
+            {
+                ent->fields.remote_irr = 1;
+                vioapic_deliver(vioapic, pin);
+            }
+        }
+    }
+
+    spin_unlock(&d->arch.hvm.irq_lock);
+}
+
 static void ioapic_inj_irq(
     struct hvm_vioapic *vioapic,
     struct vlapic *target,
@@ -407,7 +449,8 @@ static void ioapic_inj_irq(
     ASSERT((delivery_mode == dest_Fixed) ||
            (delivery_mode == dest_LowestPrio));
 
-    vlapic_set_irq(target, vector, trig_mode);
+    vlapic_set_irq_callback(target, vector, trig_mode,
+                            trig_mode ? eoi_callback : NULL, NULL);
 }
 
 static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
@@ -514,49 +557,6 @@ void vioapic_irq_positive_edge(struct domain *d, unsigned int irq)
     }
 }
 
-void vioapic_update_EOI(struct domain *d, u8 vector)
-{
-    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
-    union vioapic_redir_entry *ent;
-    unsigned int i;
-
-    ASSERT(has_vioapic(d));
-
-    spin_lock(&d->arch.hvm.irq_lock);
-
-    for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
-    {
-        struct hvm_vioapic *vioapic = domain_vioapic(d, i);
-        unsigned int pin;
-
-        for ( pin = 0; pin < vioapic->nr_pins; pin++ )
-        {
-            ent = &vioapic->redirtbl[pin];
-            if ( ent->fields.vector != vector )
-                continue;
-
-            ent->fields.remote_irr = 0;
-
-            if ( is_iommu_enabled(d) )
-            {
-                spin_unlock(&d->arch.hvm.irq_lock);
-                hvm_dpci_eoi(d, vioapic->base_gsi + pin);
-                spin_lock(&d->arch.hvm.irq_lock);
-            }
-
-            if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
-                 !ent->fields.mask && !ent->fields.remote_irr &&
-                 hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
-            {
-                ent->fields.remote_irr = 1;
-                vioapic_deliver(vioapic, pin);
-            }
-        }
-    }
-
-    spin_unlock(&d->arch.hvm.irq_lock);
-}
-
 int vioapic_get_mask(const struct domain *d, unsigned int gsi)
 {
     unsigned int pin = 0; /* See gsi_vioapic */
@@ -610,6 +610,8 @@ static int ioapic_save(struct vcpu *v, hvm_domain_context_t *h)
 static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
 {
     struct hvm_vioapic *s;
+    unsigned int i;
+    int rc;
 
     if ( !has_vioapic(d) )
         return -ENODEV;
@@ -620,7 +622,42 @@ static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
          d->arch.hvm.nr_vioapics != 1 )
         return -EOPNOTSUPP;
 
-    return hvm_load_entry(IOAPIC, h, &s->domU);
+    rc = hvm_load_entry(IOAPIC, h, &s->domU);
+    if ( rc )
+        return rc;
+
+    for ( i = 0; i < ARRAY_SIZE(s->domU.redirtbl); i++ )
+    {
+        const union vioapic_redir_entry *ent = &s->domU.redirtbl[i];
+        unsigned int vector = ent->fields.vector;
+        unsigned int delivery_mode = ent->fields.delivery_mode;
+        struct vcpu *v;
+
+        /*
+         * Add a callback for each possible vector injected by a redirection
+         * entry.
+         */
+        if ( delivery_mode != dest_LowestPrio && delivery_mode != dest_Fixed )
+            continue;
+
+        for_each_vcpu ( d, v )
+        {
+            struct vlapic *vlapic = vcpu_vlapic(v);
+
+            /*
+             * NB: if the vlapic registers were restored before the vio-apic
+             * ones we could test whether the vector is set in the vlapic IRR
+             * or ISR registers before unconditionally setting the callback.
+             * This is harmless as eoi_callback is capable of dealing with
+             * spurious callbacks.
+             */
+            if ( vlapic_match_dest(vlapic, NULL, 0, ent->fields.dest_id,
+                                   ent->fields.dest_mode) )
+                vlapic_set_callback(vlapic, vector, eoi_callback, NULL);
+        }
+    }
+
+    return 0;
 }
 
 HVM_REGISTER_SAVE_RESTORE(IOAPIC, ioapic_save, ioapic_load, 1, HVMSR_PER_DOM);
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index cfcbd732b16..8f3a0a2e8f7 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -194,7 +194,13 @@ void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
 
     if ( hvm_funcs.update_eoi_exit_bitmap )
         alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
-                          trig || callback);
+                          /*
+                           * NB: need to explicitly convert to boolean to avoid
+                           * truncation wrongly resulting in false getting
+                           * passed, for example when the pointer sits on a
+                           * page boundary.
+                           */
+                          (bool)callback);
 
     if ( hvm_funcs.deliver_posted_intr )
         alternative_vcall(hvm_funcs.deliver_posted_intr, target, vec);
@@ -494,15 +500,11 @@ void vlapic_EOI_set(struct vlapic *vlapic)
 void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
     struct vcpu *v = vlapic_vcpu(vlapic);
-    struct domain *d = v->domain;
     vlapic_eoi_callback_t *callback;
     void *data;
     unsigned long flags;
     unsigned int index = vector - 16;
 
-    if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
-        vioapic_update_EOI(d, vector);
-
     spin_lock_irqsave(&vlapic->callback_lock, flags);
     callback = vlapic->callbacks[index].callback;
     vlapic->callbacks[index].callback = NULL;
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (3 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 04/12] x86/vioapic: switch to use the EOI callback mechanism Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-05-03 15:50   ` Jan Beulich
  2021-04-20 14:07 ` [PATCH v4 06/12] x86/dpci: move code Roger Pau Monne
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Such callbacks will be executed once a EOI is performed by the guest,
regardless of whether the interrupts are injected from the vIO-APIC or
the vPIC, as ISA IRQs are translated to GSIs and then the
corresponding callback is executed at EOI.

The vIO-APIC infrastructure for handling EOIs is build on top of the
existing vlapic EOI callback functionality, while the vPIC one is
handled when writing to the vPIC EOI register.

Note that such callbacks need to be registered and de-registered, and
that a single GSI can have multiple callbacks associated. That's
because GSIs can be level triggered and shared, as that's the case
with legacy PCI interrupts shared between several devices.

Strictly speaking this is a non-functional change, since there are no
users of this new interface introduced by this change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
 - Make callback take a domain parameter.
 - Return whether the unregistered callback was found.
 - Add a comment regarding the result of hvm_gsi_has_callbacks being
   stable.

Changes since v2:
 - Latch hvm_domain_irq in some functions.
 - Make domain parameter of hvm_gsi_has_callbacks const.
 - Add comment about dropping the lock around the
   hvm_gsi_execute_callbacks call.
 - Drop change to ioapic_load.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/hvm.c        | 15 ++++++-
 xen/arch/x86/hvm/irq.c        | 75 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vioapic.c    | 29 +++++++++++---
 xen/arch/x86/hvm/vpic.c       |  4 ++
 xen/include/asm-x86/hvm/irq.h | 21 ++++++++++
 5 files changed, 137 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ae37bc434ae..2c4dd1b86f2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -608,7 +608,7 @@ static int hvm_print_line(
 
 int hvm_domain_initialise(struct domain *d)
 {
-    unsigned int nr_gsis;
+    unsigned int nr_gsis, i;
     int rc;
 
     if ( !hvm_enabled )
@@ -656,6 +656,14 @@ int hvm_domain_initialise(struct domain *d)
     BUILD_BUG_ON(NR_HVM_DOMU_IRQS < NR_ISAIRQS);
     ASSERT(hvm_domain_irq(d)->nr_gsis >= NR_ISAIRQS);
 
+    /* Initialize the EOI callback list. */
+    hvm_domain_irq(d)->gsi_callbacks = xmalloc_array(struct list_head, nr_gsis);
+    if ( !hvm_domain_irq(d)->gsi_callbacks )
+        goto fail1;
+    rwlock_init(&hvm_domain_irq(d)->gsi_callbacks_lock);
+    for ( i = 0; i < nr_gsis; i++ )
+        INIT_LIST_HEAD(&hvm_domain_irq(d)->gsi_callbacks[i]);
+
     /* need link to containing domain */
     d->arch.hvm.pl_time->domain = d;
 
@@ -715,6 +723,8 @@ int hvm_domain_initialise(struct domain *d)
  fail1:
     if ( is_hardware_domain(d) )
         xfree(d->arch.hvm.io_bitmap);
+    if ( hvm_domain_irq(d) )
+        XFREE(hvm_domain_irq(d)->gsi_callbacks);
     XFREE(d->arch.hvm.params);
     XFREE(d->arch.hvm.irq);
  fail0:
@@ -777,6 +787,9 @@ void hvm_domain_destroy(struct domain *d)
     vioapic_deinit(d);
 
     XFREE(d->arch.hvm.pl_time);
+
+    if ( hvm_domain_irq(d) )
+        XFREE(hvm_domain_irq(d)->gsi_callbacks);
     XFREE(d->arch.hvm.irq);
 
     list_for_each_safe ( ioport_list, tmp, &d->arch.hvm.g2m_ioport_list )
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 38ac5fb6c7c..4825a387bdc 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -595,6 +595,81 @@ int hvm_local_events_need_delivery(struct vcpu *v)
     return !hvm_interrupt_blocked(v, intack);
 }
 
+int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
+                              struct hvm_gsi_eoi_callback *cb)
+{
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+
+    if ( gsi >= hvm_irq->nr_gsis )
+    {
+        ASSERT_UNREACHABLE();
+        return -EINVAL;
+    }
+
+    write_lock(&hvm_irq->gsi_callbacks_lock);
+    list_add(&cb->list, &hvm_irq->gsi_callbacks[gsi]);
+    write_unlock(&hvm_irq->gsi_callbacks_lock);
+
+    return 0;
+}
+
+int hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
+                                struct hvm_gsi_eoi_callback *cb)
+{
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+    const struct list_head *tmp;
+    bool found = false;
+
+    if ( gsi >= hvm_irq->nr_gsis )
+    {
+        ASSERT_UNREACHABLE();
+        return -EINVAL;
+    }
+
+    write_lock(&hvm_irq->gsi_callbacks_lock);
+    list_for_each ( tmp, &hvm_irq->gsi_callbacks[gsi] )
+        if ( tmp == &cb->list )
+        {
+            list_del(&cb->list);
+            found = true;
+            break;
+        }
+    write_unlock(&hvm_irq->gsi_callbacks_lock);
+
+    return found ? 0 : -ENOENT;
+}
+
+void hvm_gsi_execute_callbacks(struct domain *d, unsigned int gsi)
+{
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+    struct hvm_gsi_eoi_callback *cb;
+
+    read_lock(&hvm_irq->gsi_callbacks_lock);
+    list_for_each_entry ( cb, &hvm_irq->gsi_callbacks[gsi], list )
+        cb->callback(d, gsi, cb->data);
+    read_unlock(&hvm_irq->gsi_callbacks_lock);
+}
+
+bool hvm_gsi_has_callbacks(const struct domain *d, unsigned int gsi)
+{
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+    bool has_callbacks;
+
+    /*
+     * Note that by the point the function returns the result could be stale,
+     * however the result is only used to decide whether a callback should be
+     * added when injecting a vio-apic vector to the vlapic and all users of
+     * the callbacks interface should always register the callback before
+     * triggering an interrupt.
+     */
+
+    read_lock(&hvm_irq->gsi_callbacks_lock);
+    has_callbacks = !list_empty(&hvm_irq->gsi_callbacks[gsi]);
+    read_unlock(&hvm_irq->gsi_callbacks_lock);
+
+    return has_callbacks;
+}
+
 static void irq_dump(struct domain *d)
 {
     struct hvm_irq *hvm_irq = hvm_domain_irq(d);
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 1c2763fc2bf..0824ede91ab 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -285,6 +285,7 @@ static void vioapic_write_redirent(
             ASSERT(prev_level);
             ASSERT(!top_word);
             hvm_dpci_eoi(d, gsi);
+            hvm_gsi_execute_callbacks(d, gsi);
     }
 
     if ( is_hardware_domain(d) && unmasked )
@@ -410,6 +411,7 @@ static void eoi_callback(struct vcpu *v, unsigned int vector, void *data)
         for ( pin = 0; pin < vioapic->nr_pins; pin++ )
         {
             union vioapic_redir_entry *ent = &vioapic->redirtbl[pin];
+            unsigned int gsi = vioapic->base_gsi + pin;
 
             if ( ent->fields.vector != vector )
                 continue;
@@ -419,13 +421,25 @@ static void eoi_callback(struct vcpu *v, unsigned int vector, void *data)
             if ( is_iommu_enabled(d) )
             {
                 spin_unlock(&d->arch.hvm.irq_lock);
-                hvm_dpci_eoi(d, vioapic->base_gsi + pin);
+                hvm_dpci_eoi(d, gsi);
                 spin_lock(&d->arch.hvm.irq_lock);
             }
 
+            /*
+             * Callbacks don't expect to be executed with any lock held, so
+             * drop the lock that protects the vIO-APIC fields from changing.
+             *
+             * Note that the redirection entry itself cannot go away, so upon
+             * retaking the lock we only need to avoid making assumptions on
+             * redirection entry field values (ie: recheck the IRR field).
+             */
+            spin_unlock(&d->arch.hvm.irq_lock);
+            hvm_gsi_execute_callbacks(d, gsi);
+            spin_lock(&d->arch.hvm.irq_lock);
+
             if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
                  !ent->fields.mask && !ent->fields.remote_irr &&
-                 hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
+                 hvm_irq->gsi_assert_count[gsi] )
             {
                 ent->fields.remote_irr = 1;
                 vioapic_deliver(vioapic, pin);
@@ -441,7 +455,8 @@ static void ioapic_inj_irq(
     struct vlapic *target,
     uint8_t vector,
     uint8_t trig_mode,
-    uint8_t delivery_mode)
+    uint8_t delivery_mode,
+    bool callback)
 {
     HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "irq %d trig %d deliv %d",
                 vector, trig_mode, delivery_mode);
@@ -450,7 +465,7 @@ static void ioapic_inj_irq(
            (delivery_mode == dest_LowestPrio));
 
     vlapic_set_irq_callback(target, vector, trig_mode,
-                            trig_mode ? eoi_callback : NULL, NULL);
+                            callback ? eoi_callback : NULL, NULL);
 }
 
 static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
@@ -464,6 +479,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
     struct vlapic *target;
     struct vcpu *v;
     unsigned int irq = vioapic->base_gsi + pin;
+    bool callback = trig_mode || hvm_gsi_has_callbacks(d, irq);
 
     ASSERT(spin_is_locked(&d->arch.hvm.irq_lock));
 
@@ -490,7 +506,8 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
             target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
         if ( target != NULL )
         {
-            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode);
+            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode,
+                           callback);
         }
         else
         {
@@ -505,7 +522,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
         for_each_vcpu ( d, v )
             if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
                 ioapic_inj_irq(vioapic, vcpu_vlapic(v), vector, trig_mode,
-                               delivery_mode);
+                               delivery_mode, callback);
         break;
 
     case dest_NMI:
diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index f465b7f9979..ef668f3668a 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -235,6 +235,8 @@ static void vpic_ioport_write(
                 unsigned int pin = __scanbit(pending, 8);
 
                 ASSERT(pin < 8);
+                hvm_gsi_execute_callbacks(current->domain,
+                        hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 hvm_dpci_eoi(current->domain,
                              hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 __clear_bit(pin, &pending);
@@ -285,6 +287,8 @@ static void vpic_ioport_write(
                 /* Release lock and EOI the physical interrupt (if any). */
                 vpic_update_int_output(vpic);
                 vpic_unlock(vpic);
+                hvm_gsi_execute_callbacks(current->domain,
+                        hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 hvm_dpci_eoi(current->domain,
                              hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 return; /* bail immediately */
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index c4369ceb7aa..37b8d2ba8fb 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -21,6 +21,7 @@
 #ifndef __ASM_X86_HVM_IRQ_H__
 #define __ASM_X86_HVM_IRQ_H__
 
+#include <xen/rwlock.h>
 #include <xen/timer.h>
 
 #include <asm/hvm/hvm.h>
@@ -79,6 +80,10 @@ struct hvm_irq {
 
     struct hvm_irq_dpci *dpci;
 
+    /* List of callbacks for GSI EOI events. Protected by irq_lock. */
+    struct list_head  *gsi_callbacks;
+    rwlock_t gsi_callbacks_lock;
+
     /*
      * Number of wires asserting each GSI.
      *
@@ -137,6 +142,14 @@ struct hvm_gmsi_info {
     bool posted; /* directly deliver to guest via VT-d PI? */
 };
 
+typedef void hvm_gsi_eoi_callback_t(struct domain *d, unsigned int gsi,
+                                    void *data);
+struct hvm_gsi_eoi_callback {
+    hvm_gsi_eoi_callback_t *callback;
+    void *data;
+    struct list_head list;
+};
+
 struct hvm_girq_dpci_mapping {
     struct list_head list;
     uint8_t bus;
@@ -224,4 +237,12 @@ void hvm_set_callback_via(struct domain *d, uint64_t via);
 struct pirq;
 bool hvm_domain_use_pirq(const struct domain *, const struct pirq *);
 
+int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
+                              struct hvm_gsi_eoi_callback *cb);
+int hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
+                                 struct hvm_gsi_eoi_callback *cb);
+/* NB: Callbacks are not allowed to modify the registered callback list. */
+void hvm_gsi_execute_callbacks(struct domain *d, unsigned int gsi);
+bool hvm_gsi_has_callbacks(const struct domain *d, unsigned int gsi);
+
 #endif /* __ASM_X86_HVM_IRQ_H__ */
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 06/12] x86/dpci: move code
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (4 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-04-20 14:07 ` [PATCH v4 07/12] x86/dpci: switch to use a GSI EOI callback Roger Pau Monne
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Paul Durrant

This is code movement in order to simply further changes.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v2:
 - Drop one of the leading underscores from __hvm_dpci_eoi.

Changes since v1:
 - New in this version.
---
 xen/drivers/passthrough/x86/hvm.c | 162 +++++++++++++++---------------
 1 file changed, 81 insertions(+), 81 deletions(-)

diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index 8f78c0935b9..0db26e5dbb2 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -205,6 +205,87 @@ static struct vcpu *vector_hashing_dest(const struct domain *d,
     return dest;
 }
 
+static void hvm_pirq_eoi(struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci;
+
+    if ( !pirq )
+    {
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    pirq_dpci = pirq_dpci(pirq);
+
+    /*
+     * No need to get vector lock for timer
+     * since interrupt is still not EOIed
+     */
+    if ( --pirq_dpci->pending ||
+         /* When the interrupt source is MSI no Ack should be performed. */
+         (pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE) )
+        return;
+
+    pirq_guest_eoi(pirq);
+}
+
+static void __hvm_dpci_eoi(struct domain *d,
+                           const struct hvm_girq_dpci_mapping *girq)
+{
+    struct pirq *pirq = pirq_info(d, girq->machine_gsi);
+
+    if ( !hvm_domain_use_pirq(d, pirq) )
+        hvm_pci_intx_deassert(d, girq->device, girq->intx);
+
+    hvm_pirq_eoi(pirq);
+}
+
+static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
+{
+    struct pirq *pirq = pirq_info(d, gsi);
+
+    /* Check if GSI is actually mapped. */
+    if ( !pirq_dpci(pirq) )
+        return;
+
+    hvm_gsi_deassert(d, gsi);
+    hvm_pirq_eoi(pirq);
+}
+
+void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi)
+{
+    const struct hvm_irq_dpci *hvm_irq_dpci;
+    const struct hvm_girq_dpci_mapping *girq;
+
+    if ( !is_iommu_enabled(d) )
+        return;
+
+    if ( is_hardware_domain(d) )
+    {
+        spin_lock(&d->event_lock);
+        hvm_gsi_eoi(d, guest_gsi);
+        goto unlock;
+    }
+
+    if ( guest_gsi < NR_ISAIRQS )
+    {
+        hvm_dpci_isairq_eoi(d, guest_gsi);
+        return;
+    }
+
+    spin_lock(&d->event_lock);
+    hvm_irq_dpci = domain_get_irq_dpci(d);
+
+    if ( !hvm_irq_dpci )
+        goto unlock;
+
+    list_for_each_entry ( girq, &hvm_irq_dpci->girq[guest_gsi], list )
+        __hvm_dpci_eoi(d, girq);
+
+unlock:
+    spin_unlock(&d->event_lock);
+}
+
 int pt_irq_create_bind(
     struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -860,87 +941,6 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
     spin_unlock(&d->event_lock);
 }
 
-static void hvm_pirq_eoi(struct pirq *pirq)
-{
-    struct hvm_pirq_dpci *pirq_dpci;
-
-    if ( !pirq )
-    {
-        ASSERT_UNREACHABLE();
-        return;
-    }
-
-    pirq_dpci = pirq_dpci(pirq);
-
-    /*
-     * No need to get vector lock for timer
-     * since interrupt is still not EOIed
-     */
-    if ( --pirq_dpci->pending ||
-         /* When the interrupt source is MSI no Ack should be performed. */
-         (pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE) )
-        return;
-
-    pirq_guest_eoi(pirq);
-}
-
-static void __hvm_dpci_eoi(struct domain *d,
-                           const struct hvm_girq_dpci_mapping *girq)
-{
-    struct pirq *pirq = pirq_info(d, girq->machine_gsi);
-
-    if ( !hvm_domain_use_pirq(d, pirq) )
-        hvm_pci_intx_deassert(d, girq->device, girq->intx);
-
-    hvm_pirq_eoi(pirq);
-}
-
-static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
-{
-    struct pirq *pirq = pirq_info(d, gsi);
-
-    /* Check if GSI is actually mapped. */
-    if ( !pirq_dpci(pirq) )
-        return;
-
-    hvm_gsi_deassert(d, gsi);
-    hvm_pirq_eoi(pirq);
-}
-
-void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi)
-{
-    const struct hvm_irq_dpci *hvm_irq_dpci;
-    const struct hvm_girq_dpci_mapping *girq;
-
-    if ( !is_iommu_enabled(d) )
-        return;
-
-    if ( is_hardware_domain(d) )
-    {
-        spin_lock(&d->event_lock);
-        hvm_gsi_eoi(d, guest_gsi);
-        goto unlock;
-    }
-
-    if ( guest_gsi < NR_ISAIRQS )
-    {
-        hvm_dpci_isairq_eoi(d, guest_gsi);
-        return;
-    }
-
-    spin_lock(&d->event_lock);
-    hvm_irq_dpci = domain_get_irq_dpci(d);
-
-    if ( !hvm_irq_dpci )
-        goto unlock;
-
-    list_for_each_entry ( girq, &hvm_irq_dpci->girq[guest_gsi], list )
-        __hvm_dpci_eoi(d, girq);
-
-unlock:
-    spin_unlock(&d->event_lock);
-}
-
 static int pci_clean_dpci_irq(struct domain *d,
                               struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 07/12] x86/dpci: switch to use a GSI EOI callback
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (5 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 06/12] x86/dpci: move code Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-05-04  9:28   ` Jan Beulich
  2021-04-20 14:07 ` [PATCH v4 08/12] x86/vpt: switch interrupt injection model Roger Pau Monne
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu, Paul Durrant

Switch the dpci GSI EOI callback hooks to use the newly introduced
generic callback functionality, and remove the custom dpci calls found
on the vPIC and vIO-APIC implementations.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
 - Print a warning message if the EOI callback cannot be unregistered.

Changes since v2:
 - Avoid leaking the allocated callback on error paths of
   pt_irq_create_bind.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/vioapic.c        |  8 -----
 xen/arch/x86/hvm/vpic.c           |  4 ---
 xen/drivers/passthrough/x86/hvm.c | 57 ++++++++++++++++++++++++++++---
 xen/include/asm-x86/hvm/io.h      |  1 -
 xen/include/asm-x86/hvm/irq.h     |  1 +
 5 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 0824ede91ab..4f844965423 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -284,7 +284,6 @@ static void vioapic_write_redirent(
              */
             ASSERT(prev_level);
             ASSERT(!top_word);
-            hvm_dpci_eoi(d, gsi);
             hvm_gsi_execute_callbacks(d, gsi);
     }
 
@@ -418,13 +417,6 @@ static void eoi_callback(struct vcpu *v, unsigned int vector, void *data)
 
             ent->fields.remote_irr = 0;
 
-            if ( is_iommu_enabled(d) )
-            {
-                spin_unlock(&d->arch.hvm.irq_lock);
-                hvm_dpci_eoi(d, gsi);
-                spin_lock(&d->arch.hvm.irq_lock);
-            }
-
             /*
              * Callbacks don't expect to be executed with any lock held, so
              * drop the lock that protects the vIO-APIC fields from changing.
diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index ef668f3668a..60d6740f69a 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -237,8 +237,6 @@ static void vpic_ioport_write(
                 ASSERT(pin < 8);
                 hvm_gsi_execute_callbacks(current->domain,
                         hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
-                hvm_dpci_eoi(current->domain,
-                             hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 __clear_bit(pin, &pending);
             }
             return;
@@ -289,8 +287,6 @@ static void vpic_ioport_write(
                 vpic_unlock(vpic);
                 hvm_gsi_execute_callbacks(current->domain,
                         hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
-                hvm_dpci_eoi(current->domain,
-                             hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 return; /* bail immediately */
             case 6: /* Set Priority                */
                 vpic->priority_add = (val + 1) & 7;
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index 0db26e5dbb2..02e027cff8c 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -252,7 +252,7 @@ static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
     hvm_pirq_eoi(pirq);
 }
 
-void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi)
+static void dpci_eoi(struct domain *d, unsigned int guest_gsi, void *data)
 {
     const struct hvm_irq_dpci *hvm_irq_dpci;
     const struct hvm_girq_dpci_mapping *girq;
@@ -476,6 +476,7 @@ int pt_irq_create_bind(
     {
         struct dev_intx_gsi_link *digl = NULL;
         struct hvm_girq_dpci_mapping *girq = NULL;
+        struct hvm_gsi_eoi_callback *cb = NULL;
         unsigned int guest_gsi;
 
         /*
@@ -489,7 +490,7 @@ int pt_irq_create_bind(
             unsigned int link;
 
             digl = xmalloc(struct dev_intx_gsi_link);
-            girq = xmalloc(struct hvm_girq_dpci_mapping);
+            girq = xzalloc(struct hvm_girq_dpci_mapping);
 
             if ( !digl || !girq )
             {
@@ -502,11 +503,22 @@ int pt_irq_create_bind(
             girq->bus = digl->bus = pt_irq_bind->u.pci.bus;
             girq->device = digl->device = pt_irq_bind->u.pci.device;
             girq->intx = digl->intx = pt_irq_bind->u.pci.intx;
-            list_add_tail(&digl->list, &pirq_dpci->digl_list);
+            girq->cb.callback = dpci_eoi;
 
             guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
             link = hvm_pci_intx_link(digl->device, digl->intx);
 
+            rc = hvm_gsi_register_callback(d, guest_gsi, &girq->cb);
+            if ( rc )
+            {
+                spin_unlock(&d->event_lock);
+                xfree(girq);
+                xfree(digl);
+                return rc;
+            }
+
+            list_add_tail(&digl->list, &pirq_dpci->digl_list);
+
             hvm_irq_dpci->link_cnt[link]++;
 
             girq->machine_gsi = pirq;
@@ -514,17 +526,43 @@ int pt_irq_create_bind(
         }
         else
         {
+            /*
+             * NB: the callback structure allocated below will never be freed
+             * once setup because it's used by the hardware domain and will
+             * never be unregistered.
+             */
+            cb = xzalloc(struct hvm_gsi_eoi_callback);
+
             ASSERT(is_hardware_domain(d));
 
+            if ( !cb )
+            {
+                spin_unlock(&d->event_lock);
+                return -ENOMEM;
+            }
+
             /* MSI_TRANSLATE is not supported for the hardware domain. */
             if ( pt_irq_bind->irq_type != PT_IRQ_TYPE_PCI ||
                  pirq >= hvm_domain_irq(d)->nr_gsis )
             {
                 spin_unlock(&d->event_lock);
-
+                xfree(cb);
                 return -EINVAL;
             }
             guest_gsi = pirq;
+
+            cb->callback = dpci_eoi;
+            /*
+             * IRQ binds created for the hardware domain are never destroyed,
+             * so it's fine to not keep a reference to cb here.
+             */
+            rc = hvm_gsi_register_callback(d, guest_gsi, cb);
+            if ( rc )
+            {
+                spin_unlock(&d->event_lock);
+                xfree(cb);
+                return rc;
+            }
         }
 
         /* Bind the same mirq once in the same domain */
@@ -596,12 +634,17 @@ int pt_irq_create_bind(
                     list_del(&digl->list);
                     link = hvm_pci_intx_link(digl->device, digl->intx);
                     hvm_irq_dpci->link_cnt[link]--;
+                    hvm_gsi_unregister_callback(d, guest_gsi, &girq->cb);
                 }
+                else
+                    hvm_gsi_unregister_callback(d, guest_gsi, cb);
+
                 pirq_dpci->flags = 0;
                 pirq_cleanup_check(info, d);
                 spin_unlock(&d->event_lock);
                 xfree(girq);
                 xfree(digl);
+                xfree(cb);
                 return rc;
             }
         }
@@ -699,6 +742,7 @@ int pt_irq_destroy_bind(
         unsigned int link = hvm_pci_intx_link(device, intx);
         struct hvm_girq_dpci_mapping *girq;
         struct dev_intx_gsi_link *digl, *tmp;
+        int rc;
 
         list_for_each_entry ( girq, &hvm_irq_dpci->girq[guest_gsi], list )
         {
@@ -708,6 +752,11 @@ int pt_irq_destroy_bind(
                  girq->machine_gsi == machine_gsi )
             {
                 list_del(&girq->list);
+                rc = hvm_gsi_unregister_callback(d, guest_gsi, &girq->cb);
+                if ( rc )
+                    printk(XENLOG_G_WARNING
+                           "%pd: unable to remove callback for GSI %u: %d\n",
+                           d, guest_gsi, rc);
                 xfree(girq);
                 girq = NULL;
                 break;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 7f30dfa7fea..977a857f729 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -101,7 +101,6 @@ bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn,
                                   struct npfec);
 bool handle_pio(uint16_t port, unsigned int size, int dir);
 void hvm_interrupt_post(struct vcpu *v, int vector, int type);
-void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq);
 void msix_write_completion(struct vcpu *);
 
 #ifdef CONFIG_HVM
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 37b8d2ba8fb..57d51c15863 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -156,6 +156,7 @@ struct hvm_girq_dpci_mapping {
     uint8_t device;
     uint8_t intx;
     uint8_t machine_gsi;
+    struct hvm_gsi_eoi_callback cb;
 };
 
 #define NR_ISAIRQS  16
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 08/12] x86/vpt: switch interrupt injection model
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (6 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 07/12] x86/dpci: switch to use a GSI EOI callback Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-05-04 11:00   ` Jan Beulich
  2021-04-20 14:07 ` [PATCH v4 09/12] x86/irq: remove unused parameter from hvm_isa_irq_assert Roger Pau Monne
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu,
	Jun Nakajima, Kevin Tian

Currently vPT relies on timers being assigned to a vCPU and performing
checks on every return to HVM guest in order to check if an interrupt
from a vPT timer assigned to the vCPU is currently being injected.

This model doesn't work properly since the interrupt destination vCPU
of a vPT timer can be different from the vCPU where the timer is
currently assigned, in which case the timer would get stuck because it
never sees the interrupt as being injected.

Knowing when a vPT interrupt is injected is relevant for the guest
timer modes where missed vPT interrupts are not discarded and instead
are accumulated and injected when possible.

This change aims to modify the logic described above, so that vPT
doesn't need to check on every return to HVM guest if a vPT interrupt
is being injected. In order to achieve this the vPT code is modified
to make use of the new EOI callbacks, so that virtual timers can
detect when a interrupt has been serviced by the guest by waiting for
the EOI callback to execute.

This model also simplifies some of the logic, as when executing the
timer EOI callback Xen can try to inject another interrupt if the
timer has interrupts pending for delivery.

Note that timers are still bound to a vCPU for the time being, this
relation however doesn't limit the interrupt destination anymore, and
will be removed by further patches.

This model has been tested with Windows 7 guests without showing any
timer delay, even when the guest was limited to have very little CPU
capacity and pending virtual timer interrupts accumulate.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
 - Rename pt_irq_fired to irq_eoi and adjust the logic.
 - Initialize v and cb_priv in eoi_callback.

Changes since v2:
 - Avoid and explicit != NULL check.
 - Use a switch in inject_interrupt to evaluate the timer mode.
 - Print the pt->source field on error in create_periodic_time.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/svm/intr.c   |   3 -
 xen/arch/x86/hvm/vmx/intr.c   |  59 ------
 xen/arch/x86/hvm/vpt.c        | 352 +++++++++++++++-------------------
 xen/include/asm-x86/hvm/vpt.h |   5 +-
 4 files changed, 157 insertions(+), 262 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/intr.c b/xen/arch/x86/hvm/svm/intr.c
index 7f815d23078..2ee2332253b 100644
--- a/xen/arch/x86/hvm/svm/intr.c
+++ b/xen/arch/x86/hvm/svm/intr.c
@@ -146,8 +146,6 @@ void svm_intr_assist(void)
         return;
 
     /* Crank the handle on interrupt state. */
-    pt_update_irq(v);
-
     do {
         intack = hvm_vcpu_has_pending_irq(v);
         if ( likely(intack.source == hvm_intsrc_none) )
@@ -219,7 +217,6 @@ void svm_intr_assist(void)
     {
         HVMTRACE_2D(INJ_VIRQ, intack.vector, /*fake=*/ 0);
         svm_inject_extint(v, intack.vector);
-        pt_intr_post(v, intack);
     }
 
     /* Is there another IRQ to queue up behind this one? */
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 80bfbb47878..3fcc7073db2 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -203,7 +203,6 @@ static int nvmx_intr_intercept(struct vcpu *v, struct hvm_intack intack)
             {
                 /* for now, duplicate the ack path in vmx_intr_assist */
                 hvm_vcpu_ack_pending_irq(v, intack);
-                pt_intr_post(v, intack);
 
                 intack = hvm_vcpu_has_pending_irq(v);
                 if ( unlikely(intack.source != hvm_intsrc_none) )
@@ -242,7 +241,6 @@ void vmx_intr_assist(void)
     struct vcpu *v = current;
     unsigned int tpr_threshold = 0;
     enum hvm_intblk intblk;
-    int pt_vector;
 
     /* Block event injection when single step with MTF. */
     if ( unlikely(v->arch.hvm.single_step) )
@@ -263,8 +261,6 @@ void vmx_intr_assist(void)
 #endif
 
     /* Crank the handle on interrupt state. */
-    pt_vector = pt_update_irq(v);
-
     do {
         unsigned long intr_info;
 
@@ -337,58 +333,6 @@ void vmx_intr_assist(void)
     {
         unsigned long status;
 
-       /*
-        * intack.vector is the highest priority vector. So we set eoi_exit_bitmap
-        * for intack.vector - give a chance to post periodic time interrupts when
-        * periodic time interrupts become the highest one
-        */
-        if ( pt_vector != -1 )
-        {
-#ifndef NDEBUG
-            /*
-             * We assert that intack.vector is the highest priority vector for
-             * only an interrupt from vlapic can reach this point and the
-             * highest vector is chosen in hvm_vcpu_has_pending_irq().
-             * But, in fact, the assertion failed sometimes. It is suspected
-             * that PIR is not synced to vIRR which makes pt_vector is left in
-             * PIR. In order to verify this suspicion, dump some information
-             * when the assertion fails.
-             */
-            if ( unlikely(intack.vector < pt_vector) )
-            {
-                const struct vlapic *vlapic;
-                const struct pi_desc *pi_desc;
-                const uint32_t *word;
-                unsigned int i;
-
-                printk(XENLOG_ERR "%pv: intack: %u:%02x pt: %02x\n",
-                       current, intack.source, intack.vector, pt_vector);
-
-                vlapic = vcpu_vlapic(v);
-                if ( vlapic && vlapic->regs )
-                {
-                    word = (const void *)&vlapic->regs->data[APIC_IRR];
-                    printk(XENLOG_ERR "vIRR:");
-                    for ( i = X86_NR_VECTORS / 32; i-- ; )
-                        printk(" %08x", word[i*4]);
-                    printk("\n");
-                }
-
-                pi_desc = &v->arch.hvm.vmx.pi_desc;
-                if ( pi_desc )
-                {
-                    word = (const void *)&pi_desc->pir;
-                    printk(XENLOG_ERR " PIR:");
-                    for ( i = X86_NR_VECTORS / 32; i-- ; )
-                        printk(" %08x", word[i]);
-                    printk("\n");
-                }
-            }
-#endif
-            ASSERT(intack.vector >= pt_vector);
-            vmx_set_eoi_exit_bitmap(v, intack.vector);
-        }
-
         /* we need update the RVI field */
         __vmread(GUEST_INTR_STATUS, &status);
         status &= ~VMX_GUEST_INTR_STATUS_SUBFIELD_BITMASK;
@@ -397,14 +341,11 @@ void vmx_intr_assist(void)
         __vmwrite(GUEST_INTR_STATUS, status);
 
         vmx_sync_exit_bitmap(v);
-
-        pt_intr_post(v, intack);
     }
     else
     {
         HVMTRACE_2D(INJ_VIRQ, intack.vector, /*fake=*/ 0);
         vmx_inject_extint(intack.vector, intack.source);
-        pt_intr_post(v, intack);
     }
 
     /* Is there another IRQ to queue up behind this one? */
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 24d90c0a186..6744b88d20c 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -26,6 +26,8 @@
 #define mode_is(d, name) \
     ((d)->arch.hvm.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name)
 
+static bool inject_interrupt(struct periodic_time *pt);
+
 void hvm_init_guest_time(struct domain *d)
 {
     struct pl_time *pl = d->arch.hvm.pl_time;
@@ -75,35 +77,6 @@ void hvm_set_guest_time(struct vcpu *v, u64 guest_time)
     }
 }
 
-static int pt_irq_vector(struct periodic_time *pt, enum hvm_intsrc src)
-{
-    struct vcpu *v = pt->vcpu;
-    unsigned int gsi, isa_irq;
-    int vector;
-
-    if ( pt->source == PTSRC_lapic )
-        return pt->irq;
-
-    isa_irq = pt->irq;
-
-    if ( src == hvm_intsrc_pic )
-        return (v->domain->arch.hvm.vpic[isa_irq >> 3].irq_base
-                + (isa_irq & 7));
-
-    ASSERT(src == hvm_intsrc_lapic);
-    gsi = pt->source == PTSRC_isa ? hvm_isa_irq_to_gsi(isa_irq) : pt->irq;
-    vector = vioapic_get_vector(v->domain, gsi);
-    if ( vector < 0 )
-    {
-        dprintk(XENLOG_WARNING, "d%u: invalid GSI (%u) for platform timer\n",
-                v->domain->domain_id, gsi);
-        domain_crash(v->domain);
-        return -1;
-    }
-
-    return vector;
-}
-
 static int pt_irq_masked(struct periodic_time *pt)
 {
     struct vcpu *v = pt->vcpu;
@@ -257,35 +230,15 @@ void pt_restore_timer(struct vcpu *v)
     pt_vcpu_lock(v);
 
     list_for_each_entry ( pt, head, list )
-    {
         if ( pt->pending_intr_nr == 0 )
-        {
-            pt_process_missed_ticks(pt);
             set_timer(&pt->timer, pt->scheduled);
-        }
-    }
 
     pt_thaw_time(v);
 
     pt_vcpu_unlock(v);
 }
 
-static void pt_timer_fn(void *data)
-{
-    struct periodic_time *pt = data;
-
-    pt_lock(pt);
-
-    pt->pending_intr_nr++;
-    pt->scheduled += pt->period;
-    pt->do_not_freeze = 0;
-
-    vcpu_kick(pt->vcpu);
-
-    pt_unlock(pt);
-}
-
-static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
+static void irq_eoi(struct periodic_time *pt)
 {
     pt->irq_issued = false;
 
@@ -295,188 +248,153 @@ static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
             list_del(&pt->list);
         pt->on_list = false;
         pt->pending_intr_nr = 0;
+
+        return;
     }
-    else if ( mode_is(v->domain, one_missed_tick_pending) ||
-              mode_is(v->domain, no_missed_ticks_pending) )
-    {
-        pt->last_plt_gtime = hvm_get_guest_time(v);
-        pt_process_missed_ticks(pt);
-        pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */
-        set_timer(&pt->timer, pt->scheduled);
-    }
-    else
+
+    pt_process_missed_ticks(pt);
+    /* 'collapse' missed ticks according to the selected mode. */
+    switch ( pt->vcpu->domain->arch.hvm.params[HVM_PARAM_TIMER_MODE] )
     {
-        pt->last_plt_gtime += pt->period;
-        if ( --pt->pending_intr_nr == 0 )
-        {
-            pt_process_missed_ticks(pt);
-            if ( pt->pending_intr_nr == 0 )
-                set_timer(&pt->timer, pt->scheduled);
-        }
+    case HVMPTM_one_missed_tick_pending:
+        pt->pending_intr_nr = min(pt->pending_intr_nr, 1u);
+        break;
+
+    case HVMPTM_no_missed_ticks_pending:
+        pt->pending_intr_nr = 0;
+        break;
     }
 
-    if ( mode_is(v->domain, delay_for_missed_ticks) &&
-         (hvm_get_guest_time(v) < pt->last_plt_gtime) )
-        hvm_set_guest_time(v, pt->last_plt_gtime);
+    if ( !pt->pending_intr_nr )
+        set_timer(&pt->timer, pt->scheduled);
 }
 
-int pt_update_irq(struct vcpu *v)
+static void pt_timer_fn(void *data)
 {
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt, *temp, *earliest_pt;
-    uint64_t max_lag;
-    int irq, pt_vector = -1;
-    bool level;
+    struct periodic_time *pt = data;
+    struct vcpu *v;
+    time_cb *cb = NULL;
+    void *cb_priv;
+    unsigned int irq;
 
-    pt_vcpu_lock(v);
+    pt_lock(pt);
 
-    earliest_pt = NULL;
-    max_lag = -1ULL;
-    list_for_each_entry_safe ( pt, temp, head, list )
+    v = pt->vcpu;
+    irq = pt->irq;
+
+    if ( inject_interrupt(pt) )
     {
-        if ( pt->pending_intr_nr )
-        {
-            if ( pt_irq_masked(pt) &&
-                 /* Level interrupts should be asserted even if masked. */
-                 !pt->level )
-            {
-                /* suspend timer emulation */
-                list_del(&pt->list);
-                pt->on_list = 0;
-            }
-            else
-            {
-                if ( (pt->last_plt_gtime + pt->period) < max_lag )
-                {
-                    max_lag = pt->last_plt_gtime + pt->period;
-                    earliest_pt = pt;
-                }
-            }
-        }
+        pt->scheduled += pt->period;
+        pt->do_not_freeze = 0;
+        cb = pt->cb;
+        cb_priv = pt->priv;
     }
-
-    if ( earliest_pt == NULL )
+    else
     {
-        pt_vcpu_unlock(v);
-        return -1;
+        /* Masked. */
+        if ( pt->on_list )
+            list_del(&pt->list);
+        pt->on_list = false;
+        pt->pending_intr_nr++;
     }
 
-    earliest_pt->irq_issued = 1;
-    irq = earliest_pt->irq;
-    level = earliest_pt->level;
+    pt_unlock(pt);
 
-    pt_vcpu_unlock(v);
+    if ( cb )
+        cb(v, cb_priv);
+}
 
-    switch ( earliest_pt->source )
-    {
-    case PTSRC_lapic:
-        /*
-         * If periodic timer interrupt is handled by lapic, its vector in
-         * IRR is returned and used to set eoi_exit_bitmap for virtual
-         * interrupt delivery case. Otherwise return -1 to do nothing.
-         */
-        vlapic_set_irq(vcpu_vlapic(v), irq, 0);
-        pt_vector = irq;
-        break;
+static void eoi_callback(struct periodic_time *pt)
+{
+    struct vcpu *v = NULL;
+    time_cb *cb = NULL;
+    void *cb_priv = NULL;
 
-    case PTSRC_isa:
-        hvm_isa_irq_deassert(v->domain, irq);
-        if ( platform_legacy_irq(irq) && vlapic_accept_pic_intr(v) &&
-             v->domain->arch.hvm.vpic[irq >> 3].int_output )
-            hvm_isa_irq_assert(v->domain, irq, NULL);
-        else
+    pt_lock(pt);
+
+    irq_eoi(pt);
+    if ( pt->pending_intr_nr )
+    {
+        if ( inject_interrupt(pt) )
         {
-            pt_vector = hvm_isa_irq_assert(v->domain, irq, vioapic_get_vector);
-            /*
-             * hvm_isa_irq_assert may not set the corresponding bit in vIRR
-             * when mask field of IOAPIC RTE is set. Check it again.
-             */
-            if ( pt_vector < 0 || !vlapic_test_irq(vcpu_vlapic(v), pt_vector) )
-                pt_vector = -1;
+            pt->pending_intr_nr--;
+            cb = pt->cb;
+            cb_priv = pt->priv;
+            v = pt->vcpu;
         }
-        break;
-
-    case PTSRC_ioapic:
-        pt_vector = hvm_ioapic_assert(v->domain, irq, level);
-        if ( pt_vector < 0 || !vlapic_test_irq(vcpu_vlapic(v), pt_vector) )
+        else
         {
-            pt_vector = -1;
-            if ( level )
-            {
-                /*
-                 * Level interrupts are always asserted because the pin assert
-                 * count is incremented regardless of whether the pin is masked
-                 * or the vector latched in IRR, so also execute the callback
-                 * associated with the timer.
-                 */
-                time_cb *cb = NULL;
-                void *cb_priv = NULL;
-
-                pt_vcpu_lock(v);
-                /* Make sure the timer is still on the list. */
-                list_for_each_entry ( pt, &v->arch.hvm.tm_list, list )
-                    if ( pt == earliest_pt )
-                    {
-                        pt_irq_fired(v, pt);
-                        cb = pt->cb;
-                        cb_priv = pt->priv;
-                        break;
-                    }
-                pt_vcpu_unlock(v);
-
-                if ( cb != NULL )
-                    cb(v, cb_priv);
-            }
+            /* Masked. */
+            if ( pt->on_list )
+                list_del(&pt->list);
+            pt->on_list = false;
         }
-        break;
     }
 
-    return pt_vector;
+    pt_unlock(pt);
+
+    if ( cb )
+        cb(v, cb_priv);
 }
 
-static struct periodic_time *is_pt_irq(
-    struct vcpu *v, struct hvm_intack intack)
+static void vlapic_eoi_callback(struct vcpu *unused, unsigned int unused2,
+                                void *data)
 {
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt;
-
-    list_for_each_entry ( pt, head, list )
-    {
-        if ( pt->pending_intr_nr && pt->irq_issued &&
-             (intack.vector == pt_irq_vector(pt, intack.source)) )
-            return pt;
-    }
+    eoi_callback(data);
+}
 
-    return NULL;
+static void vioapic_eoi_callback(struct domain *unused, unsigned int unused2,
+                                 void *data)
+{
+    eoi_callback(data);
 }
 
-void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
+static bool inject_interrupt(struct periodic_time *pt)
 {
-    struct periodic_time *pt;
-    time_cb *cb;
-    void *cb_priv;
+    struct vcpu *v = pt->vcpu;
+    struct domain *d = v->domain;
+    unsigned int irq = pt->irq;
 
-    if ( intack.source == hvm_intsrc_vector )
-        return;
+    /* Level interrupts should be asserted even if masked. */
+    if ( pt_irq_masked(pt) && !pt->level )
+        return false;
 
-    pt_vcpu_lock(v);
-
-    pt = is_pt_irq(v, intack);
-    if ( pt == NULL )
+    switch ( pt->source )
     {
-        pt_vcpu_unlock(v);
-        return;
+    case PTSRC_lapic:
+        vlapic_set_irq_callback(vcpu_vlapic(v), pt->irq, 0, vlapic_eoi_callback,
+                                pt);
+        break;
+
+    case PTSRC_isa:
+        hvm_isa_irq_deassert(d, irq);
+        hvm_isa_irq_assert(d, irq, NULL);
+        break;
+
+    case PTSRC_ioapic:
+        hvm_ioapic_assert(d, irq, pt->level);
+        break;
     }
 
-    pt_irq_fired(v, pt);
+    switch ( d->arch.hvm.params[HVM_PARAM_TIMER_MODE] )
+    {
+    case HVMPTM_one_missed_tick_pending:
+    case HVMPTM_no_missed_ticks_pending:
+        pt->last_plt_gtime = hvm_get_guest_time(v);
+        break;
 
-    cb = pt->cb;
-    cb_priv = pt->priv;
+    case HVMPTM_delay_for_missed_ticks:
+        pt->last_plt_gtime += pt->period;
+        if ( hvm_get_guest_time(v) < pt->last_plt_gtime )
+            hvm_set_guest_time(v, pt->last_plt_gtime);
+        break;
 
-    pt_vcpu_unlock(v);
+    default:
+        pt->last_plt_gtime += pt->period;
+        break;
+    }
 
-    if ( cb != NULL )
-        cb(v, cb_priv);
+    return true;
 }
 
 void pt_migrate(struct vcpu *v)
@@ -552,6 +470,24 @@ void create_periodic_time(
     pt->cb = cb;
     pt->priv = data;
 
+    switch ( pt->source )
+    {
+        int rc;
+
+    case PTSRC_isa:
+        irq = hvm_isa_irq_to_gsi(irq);
+        /* fallthrough */
+    case PTSRC_ioapic:
+        pt->eoi_cb.callback = vioapic_eoi_callback;
+        pt->eoi_cb.data = pt;
+        rc = hvm_gsi_register_callback(v->domain, irq, &pt->eoi_cb);
+        if ( rc )
+            gdprintk(XENLOG_WARNING,
+                     "unable to register callback for timer GSI %u source %u: %d\n",
+                     irq, pt->source, rc);
+        break;
+    }
+
     pt_vcpu_lock(v);
     pt->on_list = 1;
     list_add(&pt->list, &v->arch.hvm.tm_list);
@@ -565,6 +501,8 @@ void create_periodic_time(
 
 void destroy_periodic_time(struct periodic_time *pt)
 {
+    unsigned int gsi;
+
     /* Was this structure previously initialised by create_periodic_time()? */
     if ( pt->vcpu == NULL )
         return;
@@ -574,6 +512,17 @@ void destroy_periodic_time(struct periodic_time *pt)
         list_del(&pt->list);
     pt->on_list = 0;
     pt->pending_intr_nr = 0;
+
+    gsi = pt->irq;
+    switch ( pt->source )
+    {
+    case PTSRC_isa:
+        gsi = hvm_isa_irq_to_gsi(pt->irq);
+        /* fallthrough */
+    case PTSRC_ioapic:
+        hvm_gsi_unregister_callback(pt->vcpu->domain, gsi, &pt->eoi_cb);
+        break;
+    }
     pt_unlock(pt);
 
     /*
@@ -641,20 +590,29 @@ void pt_adjust_global_vcpu_target(struct vcpu *v)
     write_unlock(&pl_time->vhpet.lock);
 }
 
-
 static void pt_resume(struct periodic_time *pt)
 {
+    struct vcpu *v;
+    time_cb *cb = NULL;
+    void *cb_priv;
+
     if ( pt->vcpu == NULL )
         return;
 
     pt_lock(pt);
-    if ( pt->pending_intr_nr && !pt->on_list )
+    if ( pt->pending_intr_nr && !pt->on_list && inject_interrupt(pt) )
     {
+        pt->pending_intr_nr--;
+        cb = pt->cb;
+        cb_priv = pt->priv;
+        v = pt->vcpu;
         pt->on_list = 1;
         list_add(&pt->list, &pt->vcpu->arch.hvm.tm_list);
-        vcpu_kick(pt->vcpu);
     }
     pt_unlock(pt);
+
+    if ( cb )
+        cb(v, cb_priv);
 }
 
 void pt_may_unmask_irq(struct domain *d, struct periodic_time *vlapic_pt)
diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
index 74c0cedd11c..384d2e02039 100644
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -23,6 +23,7 @@
 #include <xen/list.h>
 #include <xen/rwlock.h>
 #include <asm/hvm/hvm.h>
+#include <asm/hvm/irq.h>
 
 /*
  * Abstract layer of periodic time, one short time.
@@ -50,6 +51,7 @@ struct periodic_time {
     struct timer timer;         /* ac_timer */
     time_cb *cb;
     void *priv;                 /* point back to platform time source */
+    struct hvm_gsi_eoi_callback eoi_cb; /* EOI callback registration data */
 };
 
 
@@ -151,9 +153,6 @@ struct pl_time {    /* platform time */
 
 void pt_save_timer(struct vcpu *v);
 void pt_restore_timer(struct vcpu *v);
-int pt_update_irq(struct vcpu *v);
-struct hvm_intack;
-void pt_intr_post(struct vcpu *v, struct hvm_intack intack);
 void pt_migrate(struct vcpu *v);
 
 void pt_adjust_global_vcpu_target(struct vcpu *v);
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 09/12] x86/irq: remove unused parameter from hvm_isa_irq_assert
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (7 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 08/12] x86/vpt: switch interrupt injection model Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-05-04 11:42   ` Jan Beulich
  2021-04-20 14:07 ` [PATCH v4 10/12] x86/irq: drop return value from hvm_ioapic_assert Roger Pau Monne
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

There are no callers anymore passing a get_vector function pointer to
hvm_isa_irq_assert, so drop the parameter.

No functional change expected.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
 - New in this version.
---
 xen/arch/x86/hvm/dm.c         |  2 +-
 xen/arch/x86/hvm/irq.c        | 10 +---------
 xen/arch/x86/hvm/pmtimer.c    |  2 +-
 xen/arch/x86/hvm/rtc.c        |  2 +-
 xen/arch/x86/hvm/vpt.c        |  2 +-
 xen/include/asm-x86/hvm/irq.h |  4 +---
 6 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index b60b9f3364a..c62a259b7fc 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -110,7 +110,7 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq,
         hvm_isa_irq_deassert(d, isa_irq);
         break;
     case 1:
-        hvm_isa_irq_assert(d, isa_irq, NULL);
+        hvm_isa_irq_assert(d, isa_irq);
         break;
     default:
         return -EINVAL;
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 4825a387bdc..c3d8f2a786a 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -212,13 +212,10 @@ void hvm_gsi_deassert(struct domain *d, unsigned int gsi)
     spin_unlock(&d->arch.hvm.irq_lock);
 }
 
-int hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq,
-                       int (*get_vector)(const struct domain *d,
-                                         unsigned int gsi))
+void hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq)
 {
     struct hvm_irq *hvm_irq = hvm_domain_irq(d);
     unsigned int gsi = hvm_isa_irq_to_gsi(isa_irq);
-    int vector = -1;
 
     ASSERT(isa_irq <= 15);
 
@@ -228,12 +225,7 @@ int hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq,
          (hvm_irq->gsi_assert_count[gsi]++ == 0) )
         assert_irq(d, gsi, isa_irq);
 
-    if ( get_vector )
-        vector = get_vector(d, gsi);
-
     spin_unlock(&d->arch.hvm.irq_lock);
-
-    return vector;
 }
 
 void hvm_isa_irq_deassert(
diff --git a/xen/arch/x86/hvm/pmtimer.c b/xen/arch/x86/hvm/pmtimer.c
index 97b9e41712f..9d30b145f60 100644
--- a/xen/arch/x86/hvm/pmtimer.c
+++ b/xen/arch/x86/hvm/pmtimer.c
@@ -61,7 +61,7 @@ static void pmt_update_sci(PMTState *s)
     ASSERT(spin_is_locked(&s->lock));
 
     if ( acpi->pm1a_en & acpi->pm1a_sts & SCI_MASK )
-        hvm_isa_irq_assert(s->vcpu->domain, SCI_IRQ, NULL);
+        hvm_isa_irq_assert(s->vcpu->domain, SCI_IRQ);
     else
         hvm_isa_irq_deassert(s->vcpu->domain, SCI_IRQ);
 }
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 9992595c45a..b66ca6f64f1 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -63,7 +63,7 @@ static void rtc_update_irq(RTCState *s)
 
     s->hw.cmos_data[RTC_REG_C] |= RTC_IRQF;
     hvm_isa_irq_deassert(vrtc_domain(s), RTC_IRQ);
-    hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ, NULL);
+    hvm_isa_irq_assert(vrtc_domain(s), RTC_IRQ);
 }
 
 /* Called by the VPT code after it's injected a PF interrupt for us.
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 6744b88d20c..639e45c520e 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -368,7 +368,7 @@ static bool inject_interrupt(struct periodic_time *pt)
 
     case PTSRC_isa:
         hvm_isa_irq_deassert(d, irq);
-        hvm_isa_irq_assert(d, irq, NULL);
+        hvm_isa_irq_assert(d, irq);
         break;
 
     case PTSRC_ioapic:
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 57d51c15863..4e3534d4eb4 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -214,9 +214,7 @@ void hvm_pci_intx_deassert(struct domain *d, unsigned int device,
  * allows us to get the interrupt vector in the protection of irq_lock.
  * For most cases, just set get_vector to NULL.
  */
-int hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq,
-                       int (*get_vector)(const struct domain *d,
-                                         unsigned int gsi));
+void hvm_isa_irq_assert(struct domain *d, unsigned int isa_irq);
 void hvm_isa_irq_deassert(struct domain *d, unsigned int isa_irq);
 
 /* Modify state of GSIs. */
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 10/12] x86/irq: drop return value from hvm_ioapic_assert
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (8 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 09/12] x86/irq: remove unused parameter from hvm_isa_irq_assert Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-05-04 11:42   ` Jan Beulich
  2021-04-20 14:07 ` [PATCH v4 11/12] x86/vpt: remove vPT timers per-vCPU lists Roger Pau Monne
  2021-04-20 14:07 ` [PATCH v4 12/12] x86/vpt: introduce a per-vPT lock Roger Pau Monne
  11 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

There's no caller anymore that cares about the injected vector, so
drop the returned vector from the function.

No functional change indented.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
 - New in this version.
---
 xen/arch/x86/hvm/irq.c        | 8 ++------
 xen/include/asm-x86/hvm/irq.h | 2 +-
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index c3d8f2a786a..1c588e212f9 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -47,24 +47,20 @@ static void assert_gsi(struct domain *d, unsigned ioapic_gsi)
     vioapic_irq_positive_edge(d, ioapic_gsi);
 }
 
-int hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level)
+void hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level)
 {
     struct hvm_irq *hvm_irq = hvm_domain_irq(d);
-    int vector;
 
     if ( gsi >= hvm_irq->nr_gsis )
     {
         ASSERT_UNREACHABLE();
-        return -1;
+        return;
     }
 
     spin_lock(&d->arch.hvm.irq_lock);
     if ( !level || hvm_irq->gsi_assert_count[gsi]++ == 0 )
         assert_gsi(d, gsi);
-    vector = vioapic_get_vector(d, gsi);
     spin_unlock(&d->arch.hvm.irq_lock);
-
-    return vector;
 }
 
 void hvm_ioapic_deassert(struct domain *d, unsigned int gsi)
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 4e3534d4eb4..fda2f8e8ebf 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -226,7 +226,7 @@ int hvm_set_pci_link_route(struct domain *d, u8 link, u8 isa_irq);
 int hvm_inject_msi(struct domain *d, uint64_t addr, uint32_t data);
 
 /* Assert/deassert an IO APIC pin. */
-int hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level);
+void hvm_ioapic_assert(struct domain *d, unsigned int gsi, bool level);
 void hvm_ioapic_deassert(struct domain *d, unsigned int gsi);
 
 void hvm_maybe_deassert_evtchn_irq(void);
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 11/12] x86/vpt: remove vPT timers per-vCPU lists
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (9 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 10/12] x86/irq: drop return value from hvm_ioapic_assert Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  2021-04-20 14:07 ` [PATCH v4 12/12] x86/vpt: introduce a per-vPT lock Roger Pau Monne
  11 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

No longer add vPT timers to lists on specific vCPUs, since there's no
need anymore to check if timer interrupts have been injected on return
to HVM guest.

Such change allows to get rid of virtual timers vCPU migration, and
also cleanup some of the virtual timers fields that are no longer
required.

The model is also slightly different now in that timers are not
stopped when a vCPU is de-scheduled. Such timers will continue
running, and when triggered the function will try to inject the
corresponding interrupt to the guest (which might be different than
the currently running one). Note that the timer triggering when the
guest is no longer running can only happen once, as the timer callback
will not reset the interrupt to fire again. Such resetting if required
will be done by the EOI callback.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v3:
 - Remove stale commit log paragrpah.

Changes since v2:
 - Remove pt_{save/restore}_timer and instead use
   pt_{freeze/thaw}_time.
 - Remove the introduction of the 'masked' field, it's not needed.
 - Rework pt_active to use timer_is_active.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/domain.c          |   4 +-
 xen/arch/x86/hvm/hvm.c         |   4 +-
 xen/arch/x86/hvm/vlapic.c      |   1 -
 xen/arch/x86/hvm/vpt.c         | 192 ++++-----------------------------
 xen/include/asm-x86/hvm/vcpu.h |   3 +-
 xen/include/asm-x86/hvm/vpt.h  |  12 +--
 6 files changed, 25 insertions(+), 191 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 4dc27f798e7..b6cd715dce9 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2039,8 +2039,8 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     vpmu_switch_from(prev);
     np2m_schedule(NP2M_SCHEDLE_OUT);
 
-    if ( is_hvm_domain(prevd) && !list_empty(&prev->arch.hvm.tm_list) )
-        pt_save_timer(prev);
+    if ( is_hvm_domain(prevd) )
+        pt_freeze_time(prev);
 
     local_irq_disable();
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2c4dd1b86f2..ec4ab1f5199 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -489,7 +489,6 @@ void hvm_set_info_guest(struct vcpu *v)
 void hvm_migrate_timers(struct vcpu *v)
 {
     rtc_migrate_timers(v);
-    pt_migrate(v);
 }
 
 void hvm_migrate_pirq(struct hvm_pirq_dpci *pirq_dpci, const struct vcpu *v)
@@ -544,7 +543,7 @@ void hvm_do_resume(struct vcpu *v)
 {
     check_wakeup_from_wait();
 
-    pt_restore_timer(v);
+    pt_thaw_time(v);
 
     if ( !vcpu_ioreq_handle_completion(v) )
         return;
@@ -1559,7 +1558,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
     hvm_asid_flush_vcpu(v);
 
     spin_lock_init(&v->arch.hvm.tm_lock);
-    INIT_LIST_HEAD(&v->arch.hvm.tm_list);
 
     rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */
     if ( rc != 0 )
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 8f3a0a2e8f7..2af24989dd5 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1342,7 +1342,6 @@ void vlapic_adjust_i8259_target(struct domain *d)
     if ( d->arch.hvm.i8259_target == v )
         return;
     d->arch.hvm.i8259_target = v;
-    pt_adjust_global_vcpu_target(v);
 }
 
 int vlapic_has_pending_irq(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 639e45c520e..6a8d216c7b5 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -125,24 +125,6 @@ static int pt_irq_masked(struct periodic_time *pt)
     return 1;
 }
 
-/*
- * Functions which read (maybe write) all periodic_time instances
- * attached to a particular vCPU use pt_vcpu_{un}lock locking helpers.
- *
- * Such users are explicitly forbidden from changing the value of the
- * pt->vcpu field, because another thread holding the pt_migrate lock
- * may already be spinning waiting for your vcpu lock.
- */
-static void pt_vcpu_lock(struct vcpu *v)
-{
-    spin_lock(&v->arch.hvm.tm_lock);
-}
-
-static void pt_vcpu_unlock(struct vcpu *v)
-{
-    spin_unlock(&v->arch.hvm.tm_lock);
-}
-
 /*
  * Functions which want to modify a particular periodic_time object
  * use pt_{un}lock locking helpers.
@@ -176,14 +158,12 @@ static void pt_process_missed_ticks(struct periodic_time *pt)
         return;
 
     missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
-    if ( mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
-        pt->do_not_freeze = !pt->pending_intr_nr;
-    else
+    if ( !mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
         pt->pending_intr_nr += missed_ticks;
     pt->scheduled += missed_ticks * pt->period;
 }
 
-static void pt_freeze_time(struct vcpu *v)
+void pt_freeze_time(struct vcpu *v)
 {
     if ( !mode_is(v->domain, delay_for_missed_ticks) )
         return;
@@ -191,7 +171,7 @@ static void pt_freeze_time(struct vcpu *v)
     v->arch.hvm.guest_time = hvm_get_guest_time(v);
 }
 
-static void pt_thaw_time(struct vcpu *v)
+void pt_thaw_time(struct vcpu *v)
 {
     if ( !mode_is(v->domain, delay_for_missed_ticks) )
         return;
@@ -203,52 +183,11 @@ static void pt_thaw_time(struct vcpu *v)
     v->arch.hvm.guest_time = 0;
 }
 
-void pt_save_timer(struct vcpu *v)
-{
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt;
-
-    if ( v->pause_flags & VPF_blocked )
-        return;
-
-    pt_vcpu_lock(v);
-
-    list_for_each_entry ( pt, head, list )
-        if ( !pt->do_not_freeze )
-            stop_timer(&pt->timer);
-
-    pt_freeze_time(v);
-
-    pt_vcpu_unlock(v);
-}
-
-void pt_restore_timer(struct vcpu *v)
-{
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt;
-
-    pt_vcpu_lock(v);
-
-    list_for_each_entry ( pt, head, list )
-        if ( pt->pending_intr_nr == 0 )
-            set_timer(&pt->timer, pt->scheduled);
-
-    pt_thaw_time(v);
-
-    pt_vcpu_unlock(v);
-}
-
 static void irq_eoi(struct periodic_time *pt)
 {
-    pt->irq_issued = false;
-
     if ( pt->one_shot )
     {
-        if ( pt->on_list )
-            list_del(&pt->list);
-        pt->on_list = false;
         pt->pending_intr_nr = 0;
-
         return;
     }
 
@@ -266,7 +205,11 @@ static void irq_eoi(struct periodic_time *pt)
     }
 
     if ( !pt->pending_intr_nr )
+    {
+        /* Make sure timer follows vCPU. */
+        migrate_timer(&pt->timer, current->processor);
         set_timer(&pt->timer, pt->scheduled);
+    }
 }
 
 static void pt_timer_fn(void *data)
@@ -282,21 +225,15 @@ static void pt_timer_fn(void *data)
     v = pt->vcpu;
     irq = pt->irq;
 
-    if ( inject_interrupt(pt) )
+    pt->scheduled += pt->period;
+
+    if ( !inject_interrupt(pt) )
+        pt->pending_intr_nr++;
+    else
     {
-        pt->scheduled += pt->period;
-        pt->do_not_freeze = 0;
         cb = pt->cb;
         cb_priv = pt->priv;
     }
-    else
-    {
-        /* Masked. */
-        if ( pt->on_list )
-            list_del(&pt->list);
-        pt->on_list = false;
-        pt->pending_intr_nr++;
-    }
 
     pt_unlock(pt);
 
@@ -313,22 +250,12 @@ static void eoi_callback(struct periodic_time *pt)
     pt_lock(pt);
 
     irq_eoi(pt);
-    if ( pt->pending_intr_nr )
+    if ( pt->pending_intr_nr && inject_interrupt(pt) )
     {
-        if ( inject_interrupt(pt) )
-        {
-            pt->pending_intr_nr--;
-            cb = pt->cb;
-            cb_priv = pt->priv;
-            v = pt->vcpu;
-        }
-        else
-        {
-            /* Masked. */
-            if ( pt->on_list )
-                list_del(&pt->list);
-            pt->on_list = false;
-        }
+        pt->pending_intr_nr--;
+        cb = pt->cb;
+        cb_priv = pt->priv;
+        v = pt->vcpu;
     }
 
     pt_unlock(pt);
@@ -397,19 +324,6 @@ static bool inject_interrupt(struct periodic_time *pt)
     return true;
 }
 
-void pt_migrate(struct vcpu *v)
-{
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt;
-
-    pt_vcpu_lock(v);
-
-    list_for_each_entry ( pt, head, list )
-        migrate_timer(&pt->timer, v->processor);
-
-    pt_vcpu_unlock(v);
-}
-
 void create_periodic_time(
     struct vcpu *v, struct periodic_time *pt, uint64_t delta,
     uint64_t period, uint8_t irq, time_cb *cb, void *data, bool level)
@@ -429,8 +343,6 @@ void create_periodic_time(
     write_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
 
     pt->pending_intr_nr = 0;
-    pt->do_not_freeze = 0;
-    pt->irq_issued = 0;
 
     /* Periodic timer must be at least 0.1ms. */
     if ( (period < 100000) && period )
@@ -488,11 +400,6 @@ void create_periodic_time(
         break;
     }
 
-    pt_vcpu_lock(v);
-    pt->on_list = 1;
-    list_add(&pt->list, &v->arch.hvm.tm_list);
-    pt_vcpu_unlock(v);
-
     init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
     set_timer(&pt->timer, pt->scheduled);
 
@@ -508,9 +415,6 @@ void destroy_periodic_time(struct periodic_time *pt)
         return;
 
     pt_lock(pt);
-    if ( pt->on_list )
-        list_del(&pt->list);
-    pt->on_list = 0;
     pt->pending_intr_nr = 0;
 
     gsi = pt->irq;
@@ -532,64 +436,6 @@ void destroy_periodic_time(struct periodic_time *pt)
     kill_timer(&pt->timer);
 }
 
-static void pt_adjust_vcpu(struct periodic_time *pt, struct vcpu *v)
-{
-    ASSERT(pt->source == PTSRC_isa || pt->source == PTSRC_ioapic);
-
-    if ( pt->vcpu == NULL )
-        return;
-
-    write_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
-
-    if ( pt->vcpu == v )
-        goto out;
-
-    pt_vcpu_lock(pt->vcpu);
-    if ( pt->on_list )
-        list_del(&pt->list);
-    pt_vcpu_unlock(pt->vcpu);
-
-    pt->vcpu = v;
-
-    pt_vcpu_lock(v);
-    if ( pt->on_list )
-    {
-        list_add(&pt->list, &v->arch.hvm.tm_list);
-        migrate_timer(&pt->timer, v->processor);
-    }
-    pt_vcpu_unlock(v);
-
- out:
-    write_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
-}
-
-void pt_adjust_global_vcpu_target(struct vcpu *v)
-{
-    struct PITState *vpit;
-    struct pl_time *pl_time;
-    int i;
-
-    if ( !v || !has_vpit(v->domain) )
-        return;
-
-    vpit = &v->domain->arch.vpit;
-
-    spin_lock(&vpit->lock);
-    pt_adjust_vcpu(&vpit->pt0, v);
-    spin_unlock(&vpit->lock);
-
-    pl_time = v->domain->arch.hvm.pl_time;
-
-    spin_lock(&pl_time->vrtc.lock);
-    pt_adjust_vcpu(&pl_time->vrtc.pt, v);
-    spin_unlock(&pl_time->vrtc.lock);
-
-    write_lock(&pl_time->vhpet.lock);
-    for ( i = 0; i < HPET_TIMER_NUM; i++ )
-        pt_adjust_vcpu(&pl_time->vhpet.pt[i], v);
-    write_unlock(&pl_time->vhpet.lock);
-}
-
 static void pt_resume(struct periodic_time *pt)
 {
     struct vcpu *v;
@@ -600,14 +446,12 @@ static void pt_resume(struct periodic_time *pt)
         return;
 
     pt_lock(pt);
-    if ( pt->pending_intr_nr && !pt->on_list && inject_interrupt(pt) )
+    if ( pt->pending_intr_nr && inject_interrupt(pt) )
     {
         pt->pending_intr_nr--;
         cb = pt->cb;
         cb_priv = pt->priv;
         v = pt->vcpu;
-        pt->on_list = 1;
-        list_add(&pt->list, &pt->vcpu->arch.hvm.tm_list);
     }
     pt_unlock(pt);
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 8adf4555c2a..9a756964fb0 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -148,9 +148,8 @@ struct hvm_vcpu {
     s64                 cache_tsc_offset;
     u64                 guest_time;
 
-    /* Lock and list for virtual platform timers. */
+    /* Lock for virtual platform timers. */
     spinlock_t          tm_lock;
-    struct list_head    tm_list;
 
     bool                flag_dr_dirty;
     bool                debug_state_latch;
diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
index 384d2e02039..323afa605e7 100644
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -31,11 +31,7 @@
 typedef void time_cb(struct vcpu *v, void *opaque);
 
 struct periodic_time {
-    struct list_head list;
-    bool on_list;
     bool one_shot;
-    bool do_not_freeze;
-    bool irq_issued;
     bool warned_timeout_too_short;
     bool level;
 #define PTSRC_isa    1 /* ISA time source */
@@ -151,11 +147,9 @@ struct pl_time {    /* platform time */
     struct domain *domain;
 };
 
-void pt_save_timer(struct vcpu *v);
-void pt_restore_timer(struct vcpu *v);
-void pt_migrate(struct vcpu *v);
+void pt_freeze_time(struct vcpu *v);
+void pt_thaw_time(struct vcpu *v);
 
-void pt_adjust_global_vcpu_target(struct vcpu *v);
 #define pt_global_vcpu_target(d) \
     (is_hvm_domain(d) && (d)->arch.hvm.i8259_target ? \
      (d)->arch.hvm.i8259_target : \
@@ -164,7 +158,7 @@ void pt_adjust_global_vcpu_target(struct vcpu *v);
 void pt_may_unmask_irq(struct domain *d, struct periodic_time *vlapic_pt);
 
 /* Is given periodic timer active? */
-#define pt_active(pt) ((pt)->on_list || (pt)->pending_intr_nr)
+#define pt_active(pt) ((pt)->pending_intr_nr || timer_is_active(&(pt)->timer))
 
 /*
  * Create/destroy a periodic (or one-shot!) timer.
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 12/12] x86/vpt: introduce a per-vPT lock
  2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (10 preceding siblings ...)
  2021-04-20 14:07 ` [PATCH v4 11/12] x86/vpt: remove vPT timers per-vCPU lists Roger Pau Monne
@ 2021-04-20 14:07 ` Roger Pau Monne
  11 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monne @ 2021-04-20 14:07 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Introduce a per virtual timer lock that replaces the existing per-vCPU
and per-domain vPT locks. Since virtual timers are no longer assigned
or migrated between vCPUs the locking can be simplified to a
in-structure spinlock that protects all the fields.

This requires introducing a helper to initialize the spinlock, and
that could be used to initialize other virtual timer fields in the
future.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v1:
 - New in his version.
---
 xen/arch/x86/emul-i8254.c      |  1 +
 xen/arch/x86/hvm/hpet.c        |  8 +++++-
 xen/arch/x86/hvm/hvm.c         |  4 ---
 xen/arch/x86/hvm/rtc.c         |  1 +
 xen/arch/x86/hvm/vlapic.c      |  1 +
 xen/arch/x86/hvm/vpt.c         | 52 ++++++++++++++--------------------
 xen/include/asm-x86/hvm/vcpu.h |  3 --
 xen/include/asm-x86/hvm/vpt.h  | 15 ++--------
 8 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/xen/arch/x86/emul-i8254.c b/xen/arch/x86/emul-i8254.c
index 73be4188ad4..a47138cbab7 100644
--- a/xen/arch/x86/emul-i8254.c
+++ b/xen/arch/x86/emul-i8254.c
@@ -484,6 +484,7 @@ void pit_init(struct domain *d, unsigned long cpu_khz)
     {
         register_portio_handler(d, PIT_BASE, 4, handle_pit_io);
         register_portio_handler(d, 0x61, 1, handle_speaker_io);
+        init_periodic_timer(&pit->pt0);
     }
 
     pit_reset(d);
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index ca94e8b4538..20593c3862d 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -739,12 +739,18 @@ static void hpet_set(HPETState *h)
 
 void hpet_init(struct domain *d)
 {
+    HPETState *h = domain_vhpet(d);
+    unsigned int i;
+
     if ( !has_vhpet(d) )
         return;
 
-    hpet_set(domain_vhpet(d));
+    hpet_set(h);
     register_mmio_handler(d, &hpet_mmio_ops);
     d->arch.hvm.params[HVM_PARAM_HPET_ENABLED] = 1;
+
+    for ( i = 0; i < HPET_TIMER_NUM; i++ )
+        init_periodic_timer(&h->pt[i]);
 }
 
 void hpet_deinit(struct domain *d)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index ec4ab1f5199..0498c3b02e2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -666,8 +666,6 @@ int hvm_domain_initialise(struct domain *d)
     /* need link to containing domain */
     d->arch.hvm.pl_time->domain = d;
 
-    rwlock_init(&d->arch.hvm.pl_time->pt_migrate);
-
     /* Set the default IO Bitmap. */
     if ( is_hardware_domain(d) )
     {
@@ -1557,8 +1555,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
     hvm_asid_flush_vcpu(v);
 
-    spin_lock_init(&v->arch.hvm.tm_lock);
-
     rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */
     if ( rc != 0 )
         goto fail1;
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index b66ca6f64f1..8c8b4ed4018 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -821,6 +821,7 @@ void rtc_init(struct domain *d)
     init_timer(&s->update_timer, rtc_update_timer, s, smp_processor_id());
     init_timer(&s->update_timer2, rtc_update_timer2, s, smp_processor_id());
     init_timer(&s->alarm_timer, rtc_alarm_cb, s, smp_processor_id());
+    init_periodic_timer(&s->pt);
 
     register_portio_handler(d, RTC_PORT(0), 2, handle_rtc_io);
 
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 2af24989dd5..5c60bf66584 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1654,6 +1654,7 @@ int vlapic_init(struct vcpu *v)
         return 0;
     }
 
+    init_periodic_timer(&vlapic->pt);
     vlapic->pt.source = PTSRC_lapic;
 
     vlapic->regs_page = alloc_domheap_page(v->domain, MEMF_no_owner);
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 6a8d216c7b5..e4ecb98d3a4 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -125,27 +125,6 @@ static int pt_irq_masked(struct periodic_time *pt)
     return 1;
 }
 
-/*
- * Functions which want to modify a particular periodic_time object
- * use pt_{un}lock locking helpers.
- *
- * These users lock whichever vCPU the periodic_time is attached to,
- * but since the vCPU could be modified without holding any lock, they
- * need to take an additional lock that protects against pt->vcpu
- * changing.
- */
-static void pt_lock(struct periodic_time *pt)
-{
-    read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
-    spin_lock(&pt->vcpu->arch.hvm.tm_lock);
-}
-
-static void pt_unlock(struct periodic_time *pt)
-{
-    spin_unlock(&pt->vcpu->arch.hvm.tm_lock);
-    read_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
-}
-
 static void pt_process_missed_ticks(struct periodic_time *pt)
 {
     s_time_t missed_ticks, now = NOW();
@@ -220,7 +199,7 @@ static void pt_timer_fn(void *data)
     void *cb_priv;
     unsigned int irq;
 
-    pt_lock(pt);
+    spin_lock(&pt->lock);
 
     v = pt->vcpu;
     irq = pt->irq;
@@ -235,7 +214,7 @@ static void pt_timer_fn(void *data)
         cb_priv = pt->priv;
     }
 
-    pt_unlock(pt);
+    spin_unlock(&pt->lock);
 
     if ( cb )
         cb(v, cb_priv);
@@ -247,7 +226,7 @@ static void eoi_callback(struct periodic_time *pt)
     time_cb *cb = NULL;
     void *cb_priv = NULL;
 
-    pt_lock(pt);
+    spin_lock(&pt->lock);
 
     irq_eoi(pt);
     if ( pt->pending_intr_nr && inject_interrupt(pt) )
@@ -258,7 +237,7 @@ static void eoi_callback(struct periodic_time *pt)
         v = pt->vcpu;
     }
 
-    pt_unlock(pt);
+    spin_unlock(&pt->lock);
 
     if ( cb )
         cb(v, cb_priv);
@@ -324,6 +303,11 @@ static bool inject_interrupt(struct periodic_time *pt)
     return true;
 }
 
+void init_periodic_timer(struct periodic_time *pt)
+{
+    spin_lock_init(&pt->lock);
+}
+
 void create_periodic_time(
     struct vcpu *v, struct periodic_time *pt, uint64_t delta,
     uint64_t period, uint8_t irq, time_cb *cb, void *data, bool level)
@@ -340,7 +324,7 @@ void create_periodic_time(
 
     destroy_periodic_time(pt);
 
-    write_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
+    spin_lock(&pt->lock);
 
     pt->pending_intr_nr = 0;
 
@@ -403,18 +387,21 @@ void create_periodic_time(
     init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
     set_timer(&pt->timer, pt->scheduled);
 
-    write_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
+    spin_unlock(&pt->lock);
 }
 
 void destroy_periodic_time(struct periodic_time *pt)
 {
     unsigned int gsi;
 
+    spin_lock(&pt->lock);
     /* Was this structure previously initialised by create_periodic_time()? */
     if ( pt->vcpu == NULL )
+    {
+        spin_unlock(&pt->lock);
         return;
+    }
 
-    pt_lock(pt);
     pt->pending_intr_nr = 0;
 
     gsi = pt->irq;
@@ -427,7 +414,7 @@ void destroy_periodic_time(struct periodic_time *pt)
         hvm_gsi_unregister_callback(pt->vcpu->domain, gsi, &pt->eoi_cb);
         break;
     }
-    pt_unlock(pt);
+    spin_unlock(&pt->lock);
 
     /*
      * pt_timer_fn() can run until this kill_timer() returns. We must do this
@@ -442,10 +429,13 @@ static void pt_resume(struct periodic_time *pt)
     time_cb *cb = NULL;
     void *cb_priv;
 
+    spin_lock(&pt->lock);
     if ( pt->vcpu == NULL )
+    {
+        spin_unlock(&pt->lock);
         return;
+    }
 
-    pt_lock(pt);
     if ( pt->pending_intr_nr && inject_interrupt(pt) )
     {
         pt->pending_intr_nr--;
@@ -453,7 +443,7 @@ static void pt_resume(struct periodic_time *pt)
         cb_priv = pt->priv;
         v = pt->vcpu;
     }
-    pt_unlock(pt);
+    spin_unlock(&pt->lock);
 
     if ( cb )
         cb(v, cb_priv);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 9a756964fb0..fe3d0e10426 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -148,9 +148,6 @@ struct hvm_vcpu {
     s64                 cache_tsc_offset;
     u64                 guest_time;
 
-    /* Lock for virtual platform timers. */
-    spinlock_t          tm_lock;
-
     bool                flag_dr_dirty;
     bool                debug_state_latch;
     bool                single_step;
diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
index 323afa605e7..5628cff8f7a 100644
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -48,6 +48,7 @@ struct periodic_time {
     time_cb *cb;
     void *priv;                 /* point back to platform time source */
     struct hvm_gsi_eoi_callback eoi_cb; /* EOI callback registration data */
+    spinlock_t lock;
 };
 
 
@@ -126,19 +127,6 @@ struct pl_time {    /* platform time */
     struct RTCState  vrtc;
     struct HPETState vhpet;
     struct PMTState  vpmt;
-     /*
-      * Functions which want to modify the vcpu field of the vpt need
-      * to hold the global lock (pt_migrate) in write mode together
-      * with the per-vcpu locks of the lists being modified. Functions
-      * that want to lock a periodic_timer that's possibly on a
-      * different vCPU list need to take the lock in read mode first in
-      * order to prevent the vcpu field of periodic_timer from
-      * changing.
-      *
-      * Note that two vcpu locks cannot be held at the same time to
-      * avoid a deadlock.
-      */
-    rwlock_t pt_migrate;
     /* guest_time = Xen sys time + stime_offset */
     int64_t stime_offset;
     /* Ensures monotonicity in appropriate timer modes. */
@@ -173,6 +161,7 @@ void create_periodic_time(
     struct vcpu *v, struct periodic_time *pt, uint64_t delta,
     uint64_t period, uint8_t irq, time_cb *cb, void *data, bool level);
 void destroy_periodic_time(struct periodic_time *pt);
+void init_periodic_timer(struct periodic_time *pt);
 
 int pv_pit_handler(int port, int data, int write);
 void pit_reset(struct domain *d);
-- 
2.30.1



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 01/12] x86/rtc: drop code related to strict mode
  2021-04-20 14:07 ` [PATCH v4 01/12] x86/rtc: drop code related to strict mode Roger Pau Monne
@ 2021-04-29 14:53   ` Jan Beulich
  2021-05-03  9:28     ` Roger Pau Monné
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2021-04-29 14:53 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 20.04.2021 16:07, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/rtc.c
> +++ b/xen/arch/x86/hvm/rtc.c
> @@ -46,15 +46,6 @@
>  #define epoch_year     1900
>  #define get_year(x)    (x + epoch_year)
>  
> -enum rtc_mode {
> -   rtc_mode_no_ack,
> -   rtc_mode_strict
> -};
> -
> -/* This must be in sync with how hvmloader sets the ACPI WAET flags. */
> -#define mode_is(d, m) ((void)(d), rtc_mode_##m == rtc_mode_no_ack)
> -#define rtc_mode_is(s, m) mode_is(vrtc_domain(s), m)

Leaving aside my concerns about this removal, I think some form of
reference to hvmloader and its respective behavior should remain
here, presumably in form of a (replacement) comment.

> @@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
>      {
>          if ( pt->pending_intr_nr )
>          {
> -            /* RTC code takes care of disabling the timer itself. */
> -            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
> +            if ( pt_irq_masked(pt) &&
>                   /* Level interrupts should be asserted even if masked. */
>                   !pt->level )
>              {

I'm struggling to relate this to any other part of the patch. In
particular I can't find the case where a periodic timer would be
registered with RTC_IRQ and a NULL private pointer. The only use
I can find is with a non-NULL pointer, which would mean the "else"
path is always taken at present for the RTC case (which you now
change).

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 02/12] x86/vlapic: introduce an EOI callback mechanism
  2021-04-20 14:07 ` [PATCH v4 02/12] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
@ 2021-04-29 15:48   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2021-04-29 15:48 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 20.04.2021 16:07, Roger Pau Monne wrote:
> Add a new vlapic_set_irq_callback helper in order to inject a vector
> and set a callback to be executed when the guest performs the end of
> interrupt acknowledgment.
> 
> Such functionality will be used to migrate the current ad hoc handling
> done in vlapic_handle_EOI for the vectors that require some logic to
> be executed when the end of interrupt is performed.
> 
> The setter of the callback will be in charge for setting the callback
> again on guest restore, as callbacks are not saved as part of the
> vlapic state. That is the reason why vlapic_set_callback is not a
> static function.
> 
> No current users are migrated to use this new functionality yet, so no
> functional change expected as a result.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 04/12] x86/vioapic: switch to use the EOI callback mechanism
  2021-04-20 14:07 ` [PATCH v4 04/12] x86/vioapic: switch to use the EOI callback mechanism Roger Pau Monne
@ 2021-04-29 15:51   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2021-04-29 15:51 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 20.04.2021 16:07, Roger Pau Monne wrote:
> Switch the emulated IO-APIC code to use the local APIC EOI callback
> mechanism. This allows to remove the last hardcoded callback from
> vlapic_handle_EOI. Removing the hardcoded vIO-APIC callback also
> allows to getting rid of setting the EOI exit bitmap based on the
> triggering mode, as now all users that require an EOI action use the
> newly introduced callback mechanism.
> 
> Move and rename the vioapic_update_EOI now that it can be made static.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 01/12] x86/rtc: drop code related to strict mode
  2021-04-29 14:53   ` Jan Beulich
@ 2021-05-03  9:28     ` Roger Pau Monné
  2021-05-03 12:26       ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monné @ 2021-05-03  9:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Thu, Apr 29, 2021 at 04:53:07PM +0200, Jan Beulich wrote:
> On 20.04.2021 16:07, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/rtc.c
> > +++ b/xen/arch/x86/hvm/rtc.c
> > @@ -46,15 +46,6 @@
> >  #define epoch_year     1900
> >  #define get_year(x)    (x + epoch_year)
> >  
> > -enum rtc_mode {
> > -   rtc_mode_no_ack,
> > -   rtc_mode_strict
> > -};
> > -
> > -/* This must be in sync with how hvmloader sets the ACPI WAET flags. */
> > -#define mode_is(d, m) ((void)(d), rtc_mode_##m == rtc_mode_no_ack)
> > -#define rtc_mode_is(s, m) mode_is(vrtc_domain(s), m)
> 
> Leaving aside my concerns about this removal, I think some form of
> reference to hvmloader and its respective behavior should remain
> here, presumably in form of a (replacement) comment.

What about adding a comment in rtc_pf_callback:

/*
 * The current RTC implementation will inject an interrupt regardless
 * of whether REG_C has been read since the last interrupt was
 * injected. This is why the ACPI WAET 'RTC good' flag must be
 * unconditionally set by hvmloader.
 */

> > @@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
> >      {
> >          if ( pt->pending_intr_nr )
> >          {
> > -            /* RTC code takes care of disabling the timer itself. */
> > -            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
> > +            if ( pt_irq_masked(pt) &&
> >                   /* Level interrupts should be asserted even if masked. */
> >                   !pt->level )
> >              {
> 
> I'm struggling to relate this to any other part of the patch. In
> particular I can't find the case where a periodic timer would be
> registered with RTC_IRQ and a NULL private pointer. The only use
> I can find is with a non-NULL pointer, which would mean the "else"
> path is always taken at present for the RTC case (which you now
> change).

Right, the else case was always taken because as the comment noted RTC
would take care of disabling itself (by calling destroy_periodic_time
from the callback when using strict_mode). When no_ack mode was
implemented this wasn't taken into account AFAICT, and thus the RTC
was never removed from the list even when masked.

I think with no_ack mode the RTC shouldn't have this specific handling
in pt_update_irq, as it should behave like any other virtual timer.
I could try to split this as a separate bugfix, but then I would have
to teach pt_update_irq to differentiate between strict_mode and no_ack
mode.

Would you be fine if the following is added to the commit message
instead:

"Note that the special handling of the RTC timer done in pt_update_irq
is wrong for the no_ack mode, as the RTC timer callback won't disable
the timer anymore when it detects the guest is not reading REG_C. As
such remove the code as part of the removal of strict_mode, and don't
special case the RTC timer anymore in pt_update_irq."

Thanks, Roger.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 01/12] x86/rtc: drop code related to strict mode
  2021-05-03  9:28     ` Roger Pau Monné
@ 2021-05-03 12:26       ` Jan Beulich
  2021-05-03 14:47         ` Roger Pau Monné
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2021-05-03 12:26 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 03.05.2021 11:28, Roger Pau Monné wrote:
> On Thu, Apr 29, 2021 at 04:53:07PM +0200, Jan Beulich wrote:
>> On 20.04.2021 16:07, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/hvm/rtc.c
>>> +++ b/xen/arch/x86/hvm/rtc.c
>>> @@ -46,15 +46,6 @@
>>>  #define epoch_year     1900
>>>  #define get_year(x)    (x + epoch_year)
>>>  
>>> -enum rtc_mode {
>>> -   rtc_mode_no_ack,
>>> -   rtc_mode_strict
>>> -};
>>> -
>>> -/* This must be in sync with how hvmloader sets the ACPI WAET flags. */
>>> -#define mode_is(d, m) ((void)(d), rtc_mode_##m == rtc_mode_no_ack)
>>> -#define rtc_mode_is(s, m) mode_is(vrtc_domain(s), m)
>>
>> Leaving aside my concerns about this removal, I think some form of
>> reference to hvmloader and its respective behavior should remain
>> here, presumably in form of a (replacement) comment.
> 
> What about adding a comment in rtc_pf_callback:
> 
> /*
>  * The current RTC implementation will inject an interrupt regardless
>  * of whether REG_C has been read since the last interrupt was
>  * injected. This is why the ACPI WAET 'RTC good' flag must be
>  * unconditionally set by hvmloader.
>  */

For one I'm unconvinced this is "must"; I think it is "may". We're
producing excess interrupts for an unaware guest, aiui. Presumably most
guests can tolerate this, but - second - it may be unnecessary overhead.
Which in turn may be why nobody has complained so far, as this sort of
overhead my be hard to notice. I also suspect the RTC may not be used
very often for generating a periodic interrupt. (I've also not seen the
flag named "RTC good" - the ACPI constant is ACPI_WAET_RTC_NO_ACK, for
example.)

>>> @@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
>>>      {
>>>          if ( pt->pending_intr_nr )
>>>          {
>>> -            /* RTC code takes care of disabling the timer itself. */
>>> -            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
>>> +            if ( pt_irq_masked(pt) &&
>>>                   /* Level interrupts should be asserted even if masked. */
>>>                   !pt->level )
>>>              {
>>
>> I'm struggling to relate this to any other part of the patch. In
>> particular I can't find the case where a periodic timer would be
>> registered with RTC_IRQ and a NULL private pointer. The only use
>> I can find is with a non-NULL pointer, which would mean the "else"
>> path is always taken at present for the RTC case (which you now
>> change).
> 
> Right, the else case was always taken because as the comment noted RTC
> would take care of disabling itself (by calling destroy_periodic_time
> from the callback when using strict_mode). When no_ack mode was
> implemented this wasn't taken into account AFAICT, and thus the RTC
> was never removed from the list even when masked.
> 
> I think with no_ack mode the RTC shouldn't have this specific handling
> in pt_update_irq, as it should behave like any other virtual timer.
> I could try to split this as a separate bugfix, but then I would have
> to teach pt_update_irq to differentiate between strict_mode and no_ack
> mode.

A fair part of my confusion was about "&& !pt->priv". I've looked back
at 9607327abbd3 ("x86/HVM: properly handle RTC periodic timer even when
!RTC_PIE"), where this was added. It was, afaict, to cover for
hpet_set_timer() passing NULL with RTC_IRQ. Which makes me suspect that
be07023be115 ("x86/vhpet: add support for level triggered interrupts")
may have subtly broken things.

> Would you be fine if the following is added to the commit message
> instead:
> 
> "Note that the special handling of the RTC timer done in pt_update_irq
> is wrong for the no_ack mode, as the RTC timer callback won't disable
> the timer anymore when it detects the guest is not reading REG_C. As
> such remove the code as part of the removal of strict_mode, and don't
> special case the RTC timer anymore in pt_update_irq."

Not sure yet - as per above I'm still not convinced this part of the
change is correct.

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 01/12] x86/rtc: drop code related to strict mode
  2021-05-03 12:26       ` Jan Beulich
@ 2021-05-03 14:47         ` Roger Pau Monné
  2021-05-03 14:58           ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monné @ 2021-05-03 14:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Mon, May 03, 2021 at 02:26:51PM +0200, Jan Beulich wrote:
> On 03.05.2021 11:28, Roger Pau Monné wrote:
> > On Thu, Apr 29, 2021 at 04:53:07PM +0200, Jan Beulich wrote:
> >> On 20.04.2021 16:07, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/hvm/rtc.c
> >>> +++ b/xen/arch/x86/hvm/rtc.c
> >>> @@ -46,15 +46,6 @@
> >>>  #define epoch_year     1900
> >>>  #define get_year(x)    (x + epoch_year)
> >>>  
> >>> -enum rtc_mode {
> >>> -   rtc_mode_no_ack,
> >>> -   rtc_mode_strict
> >>> -};
> >>> -
> >>> -/* This must be in sync with how hvmloader sets the ACPI WAET flags. */
> >>> -#define mode_is(d, m) ((void)(d), rtc_mode_##m == rtc_mode_no_ack)
> >>> -#define rtc_mode_is(s, m) mode_is(vrtc_domain(s), m)
> >>
> >> Leaving aside my concerns about this removal, I think some form of
> >> reference to hvmloader and its respective behavior should remain
> >> here, presumably in form of a (replacement) comment.
> > 
> > What about adding a comment in rtc_pf_callback:
> > 
> > /*
> >  * The current RTC implementation will inject an interrupt regardless
> >  * of whether REG_C has been read since the last interrupt was
> >  * injected. This is why the ACPI WAET 'RTC good' flag must be
> >  * unconditionally set by hvmloader.
> >  */
> 
> For one I'm unconvinced this is "must"; I think it is "may". We're
> producing excess interrupts for an unaware guest, aiui. Presumably most
> guests can tolerate this, but - second - it may be unnecessary overhead.
> Which in turn may be why nobody has complained so far, as this sort of
> overhead my be hard to notice. I also suspect the RTC may not be used
> very often for generating a periodic interrupt.

I agree that there might be some overhead here, but asking for the
guest to read REG_C in order to receive further interrupts also seems
like quite a lot of overhead because all the interception involved.
IMO it's best to unconditionally offer the no_ack mode (like Xen has
been doing).

Also strict_mode wasn't really behaving according to the spec either,
as it would injected up to 10 interrupts without the user have read
REG_C.

> (I've also not seen the
> flag named "RTC good" - the ACPI constant is ACPI_WAET_RTC_NO_ACK, for
> example.)

I'm reading the WAET spec as published my Microsoft:

http://msdn.microsoft.com/en-us/windows/hardware/gg487524.aspx

Where the flag is listed as 'RTC good'. Maybe that's outdated now?
Seems to be the official source for the specification from
https://uefi.org/acpi.

> >>> @@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
> >>>      {
> >>>          if ( pt->pending_intr_nr )
> >>>          {
> >>> -            /* RTC code takes care of disabling the timer itself. */
> >>> -            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
> >>> +            if ( pt_irq_masked(pt) &&
> >>>                   /* Level interrupts should be asserted even if masked. */
> >>>                   !pt->level )
> >>>              {
> >>
> >> I'm struggling to relate this to any other part of the patch. In
> >> particular I can't find the case where a periodic timer would be
> >> registered with RTC_IRQ and a NULL private pointer. The only use
> >> I can find is with a non-NULL pointer, which would mean the "else"
> >> path is always taken at present for the RTC case (which you now
> >> change).
> > 
> > Right, the else case was always taken because as the comment noted RTC
> > would take care of disabling itself (by calling destroy_periodic_time
> > from the callback when using strict_mode). When no_ack mode was
> > implemented this wasn't taken into account AFAICT, and thus the RTC
> > was never removed from the list even when masked.
> > 
> > I think with no_ack mode the RTC shouldn't have this specific handling
> > in pt_update_irq, as it should behave like any other virtual timer.
> > I could try to split this as a separate bugfix, but then I would have
> > to teach pt_update_irq to differentiate between strict_mode and no_ack
> > mode.
> 
> A fair part of my confusion was about "&& !pt->priv".

I think you meant "|| !pt->priv"?

> I've looked back
> at 9607327abbd3 ("x86/HVM: properly handle RTC periodic timer even when
> !RTC_PIE"), where this was added. It was, afaict, to cover for
> hpet_set_timer() passing NULL with RTC_IRQ.

That's tricky, as hpet_set_timer hardcodes 8 instead of using RTC_IRQ
which makes it really easy to miss.

> Which makes me suspect that
> be07023be115 ("x86/vhpet: add support for level triggered interrupts")
> may have subtly broken things.

Right - as that would have made the RTC irq when generated from the
HPET no longer be suspended when masked (as pt->priv would no longer
be NULL). Could be fixed with:

diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index ca94e8b4538..f2cbd12f400 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -318,7 +318,8 @@ static void hpet_set_timer(HPETState *h, unsigned int tn,
                          hpet_tick_to_ns(h, diff),
                          oneshot ? 0 : hpet_tick_to_ns(h, h->hpet.period[tn]),
                          irq, timer_level(h, tn) ? hpet_timer_fired : NULL,
-                         (void *)(unsigned long)tn, timer_level(h, tn));
+                         timer_level(h, tn) ? (void *)(unsigned long)tn : NULL,
+                         timer_level(h, tn));
 }
 
 static inline uint64_t hpet_fixup_reg(

Passing again NULL as the callback private data for edge triggered
interrupts.

> > Would you be fine if the following is added to the commit message
> > instead:
> > 
> > "Note that the special handling of the RTC timer done in pt_update_irq
> > is wrong for the no_ack mode, as the RTC timer callback won't disable
> > the timer anymore when it detects the guest is not reading REG_C. As
> > such remove the code as part of the removal of strict_mode, and don't
> > special case the RTC timer anymore in pt_update_irq."
> 
> Not sure yet - as per above I'm still not convinced this part of the
> change is correct.

I believe part of this handling is kind of bogus - for example I'm
unsure Xen should account masked interrupt injections as missed ticks.
A guest might decide to mask it's interrupt source for whatever
reason, and then it shouldn't receive a flurry of interrupts when
unmasked. Ie: missed ticks should only be accounted for interrupts
that should have been delivered but the guest wasn't scheduled. I
think such model would also simplify some of the logic that we
currently have.

In fact I have a patch on top of this current series which I haven't
posted yet that does implement this new mode of not accounting masked
interrupts as missed ticks to the delivered later.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 01/12] x86/rtc: drop code related to strict mode
  2021-05-03 14:47         ` Roger Pau Monné
@ 2021-05-03 14:58           ` Jan Beulich
  2021-05-03 15:28             ` Roger Pau Monné
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2021-05-03 14:58 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 03.05.2021 16:47, Roger Pau Monné wrote:
> On Mon, May 03, 2021 at 02:26:51PM +0200, Jan Beulich wrote:
>> On 03.05.2021 11:28, Roger Pau Monné wrote:
>>> On Thu, Apr 29, 2021 at 04:53:07PM +0200, Jan Beulich wrote:
>>>> On 20.04.2021 16:07, Roger Pau Monne wrote:
>> (I've also not seen the
>> flag named "RTC good" - the ACPI constant is ACPI_WAET_RTC_NO_ACK, for
>> example.)
> 
> I'm reading the WAET spec as published my Microsoft:
> 
> http://msdn.microsoft.com/en-us/windows/hardware/gg487524.aspx
> 
> Where the flag is listed as 'RTC good'. Maybe that's outdated now?
> Seems to be the official source for the specification from
> https://uefi.org/acpi.

Well, I guess the wording wasn't used for the constant's name because
the RTC isn't "bad" otherwise?

>>>>> @@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
>>>>>      {
>>>>>          if ( pt->pending_intr_nr )
>>>>>          {
>>>>> -            /* RTC code takes care of disabling the timer itself. */
>>>>> -            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
>>>>> +            if ( pt_irq_masked(pt) &&
>>>>>                   /* Level interrupts should be asserted even if masked. */
>>>>>                   !pt->level )
>>>>>              {
>>>>
>>>> I'm struggling to relate this to any other part of the patch. In
>>>> particular I can't find the case where a periodic timer would be
>>>> registered with RTC_IRQ and a NULL private pointer. The only use
>>>> I can find is with a non-NULL pointer, which would mean the "else"
>>>> path is always taken at present for the RTC case (which you now
>>>> change).
>>>
>>> Right, the else case was always taken because as the comment noted RTC
>>> would take care of disabling itself (by calling destroy_periodic_time
>>> from the callback when using strict_mode). When no_ack mode was
>>> implemented this wasn't taken into account AFAICT, and thus the RTC
>>> was never removed from the list even when masked.
>>>
>>> I think with no_ack mode the RTC shouldn't have this specific handling
>>> in pt_update_irq, as it should behave like any other virtual timer.
>>> I could try to split this as a separate bugfix, but then I would have
>>> to teach pt_update_irq to differentiate between strict_mode and no_ack
>>> mode.
>>
>> A fair part of my confusion was about "&& !pt->priv".
> 
> I think you meant "|| !pt->priv"?

Oops, indeed.

>> I've looked back
>> at 9607327abbd3 ("x86/HVM: properly handle RTC periodic timer even when
>> !RTC_PIE"), where this was added. It was, afaict, to cover for
>> hpet_set_timer() passing NULL with RTC_IRQ.
> 
> That's tricky, as hpet_set_timer hardcodes 8 instead of using RTC_IRQ
> which makes it really easy to miss.
> 
>> Which makes me suspect that
>> be07023be115 ("x86/vhpet: add support for level triggered interrupts")
>> may have subtly broken things.
> 
> Right - as that would have made the RTC irq when generated from the
> HPET no longer be suspended when masked (as pt->priv would no longer
> be NULL). Could be fixed with:
> 
> diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
> index ca94e8b4538..f2cbd12f400 100644
> --- a/xen/arch/x86/hvm/hpet.c
> +++ b/xen/arch/x86/hvm/hpet.c
> @@ -318,7 +318,8 @@ static void hpet_set_timer(HPETState *h, unsigned int tn,
>                           hpet_tick_to_ns(h, diff),
>                           oneshot ? 0 : hpet_tick_to_ns(h, h->hpet.period[tn]),
>                           irq, timer_level(h, tn) ? hpet_timer_fired : NULL,
> -                         (void *)(unsigned long)tn, timer_level(h, tn));
> +                         timer_level(h, tn) ? (void *)(unsigned long)tn : NULL,
> +                         timer_level(h, tn));
>  }
>  
>  static inline uint64_t hpet_fixup_reg(
> 
> Passing again NULL as the callback private data for edge triggered
> interrupts.

Right, plus perhaps at the same time replacing the hardcoded 8.

>>> Would you be fine if the following is added to the commit message
>>> instead:
>>>
>>> "Note that the special handling of the RTC timer done in pt_update_irq
>>> is wrong for the no_ack mode, as the RTC timer callback won't disable
>>> the timer anymore when it detects the guest is not reading REG_C. As
>>> such remove the code as part of the removal of strict_mode, and don't
>>> special case the RTC timer anymore in pt_update_irq."
>>
>> Not sure yet - as per above I'm still not convinced this part of the
>> change is correct.
> 
> I believe part of this handling is kind of bogus - for example I'm
> unsure Xen should account masked interrupt injections as missed ticks.
> A guest might decide to mask it's interrupt source for whatever
> reason, and then it shouldn't receive a flurry of interrupts when
> unmasked. Ie: missed ticks should only be accounted for interrupts
> that should have been delivered but the guest wasn't scheduled. I
> think such model would also simplify some of the logic that we
> currently have.
> 
> In fact I have a patch on top of this current series which I haven't
> posted yet that does implement this new mode of not accounting masked
> interrupts as missed ticks to the delivered later.

This may be problematic: Iirc one of the goals of this mode is to cover
for the case where a guest simply doesn't get around to unmasking the
IRQ until the next one occurs. Yes, it feels bogus, but I'm not sure it
can be done away with. I also can't seem to be able to think of a
heuristic by which the two scenarios could be told apart halfway
reliably.

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 01/12] x86/rtc: drop code related to strict mode
  2021-05-03 14:58           ` Jan Beulich
@ 2021-05-03 15:28             ` Roger Pau Monné
  2021-05-03 15:59               ` Jan Beulich
  0 siblings, 1 reply; 28+ messages in thread
From: Roger Pau Monné @ 2021-05-03 15:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Mon, May 03, 2021 at 04:58:07PM +0200, Jan Beulich wrote:
> On 03.05.2021 16:47, Roger Pau Monné wrote:
> > On Mon, May 03, 2021 at 02:26:51PM +0200, Jan Beulich wrote:
> >> On 03.05.2021 11:28, Roger Pau Monné wrote:
> >>> On Thu, Apr 29, 2021 at 04:53:07PM +0200, Jan Beulich wrote:
> >>>> On 20.04.2021 16:07, Roger Pau Monne wrote:
> >> (I've also not seen the
> >> flag named "RTC good" - the ACPI constant is ACPI_WAET_RTC_NO_ACK, for
> >> example.)
> > 
> > I'm reading the WAET spec as published my Microsoft:
> > 
> > http://msdn.microsoft.com/en-us/windows/hardware/gg487524.aspx
> > 
> > Where the flag is listed as 'RTC good'. Maybe that's outdated now?
> > Seems to be the official source for the specification from
> > https://uefi.org/acpi.
> 
> Well, I guess the wording wasn't used for the constant's name because
> the RTC isn't "bad" otherwise?

I guess so, that's the name given my Microsoft anyway. The
descriptions speaks of an 'enhanced RTC', so you could differentiate
between plain RTC and enhanced one :).

> >>>>> @@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
> >>>>>      {
> >>>>>          if ( pt->pending_intr_nr )
> >>>>>          {
> >>>>> -            /* RTC code takes care of disabling the timer itself. */
> >>>>> -            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
> >>>>> +            if ( pt_irq_masked(pt) &&
> >>>>>                   /* Level interrupts should be asserted even if masked. */
> >>>>>                   !pt->level )
> >>>>>              {
> >>>>
> >>>> I'm struggling to relate this to any other part of the patch. In
> >>>> particular I can't find the case where a periodic timer would be
> >>>> registered with RTC_IRQ and a NULL private pointer. The only use
> >>>> I can find is with a non-NULL pointer, which would mean the "else"
> >>>> path is always taken at present for the RTC case (which you now
> >>>> change).
> >>>
> >>> Right, the else case was always taken because as the comment noted RTC
> >>> would take care of disabling itself (by calling destroy_periodic_time
> >>> from the callback when using strict_mode). When no_ack mode was
> >>> implemented this wasn't taken into account AFAICT, and thus the RTC
> >>> was never removed from the list even when masked.
> >>>
> >>> I think with no_ack mode the RTC shouldn't have this specific handling
> >>> in pt_update_irq, as it should behave like any other virtual timer.
> >>> I could try to split this as a separate bugfix, but then I would have
> >>> to teach pt_update_irq to differentiate between strict_mode and no_ack
> >>> mode.
> >>
> >> A fair part of my confusion was about "&& !pt->priv".
> > 
> > I think you meant "|| !pt->priv"?
> 
> Oops, indeed.
> 
> >> I've looked back
> >> at 9607327abbd3 ("x86/HVM: properly handle RTC periodic timer even when
> >> !RTC_PIE"), where this was added. It was, afaict, to cover for
> >> hpet_set_timer() passing NULL with RTC_IRQ.
> > 
> > That's tricky, as hpet_set_timer hardcodes 8 instead of using RTC_IRQ
> > which makes it really easy to miss.
> > 
> >> Which makes me suspect that
> >> be07023be115 ("x86/vhpet: add support for level triggered interrupts")
> >> may have subtly broken things.
> > 
> > Right - as that would have made the RTC irq when generated from the
> > HPET no longer be suspended when masked (as pt->priv would no longer
> > be NULL). Could be fixed with:
> > 
> > diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
> > index ca94e8b4538..f2cbd12f400 100644
> > --- a/xen/arch/x86/hvm/hpet.c
> > +++ b/xen/arch/x86/hvm/hpet.c
> > @@ -318,7 +318,8 @@ static void hpet_set_timer(HPETState *h, unsigned int tn,
> >                           hpet_tick_to_ns(h, diff),
> >                           oneshot ? 0 : hpet_tick_to_ns(h, h->hpet.period[tn]),
> >                           irq, timer_level(h, tn) ? hpet_timer_fired : NULL,
> > -                         (void *)(unsigned long)tn, timer_level(h, tn));
> > +                         timer_level(h, tn) ? (void *)(unsigned long)tn : NULL,
> > +                         timer_level(h, tn));
> >  }
> >  
> >  static inline uint64_t hpet_fixup_reg(
> > 
> > Passing again NULL as the callback private data for edge triggered
> > interrupts.
> 
> Right, plus perhaps at the same time replacing the hardcoded 8.

Right, but if you agree to take this patch and remove strict_mode then
the emulated RTC won't disable itself anymore, and hence needs to be
handled as any other virtual timer?

I will submit the HPET patch so it can be backported to stable
releases anyway, just wanted to check whether you would agree to
remove the strict_mode, and whether you then agree that the special
handling of the RTC done in pt_update_irq is no longer needed.

> >>> Would you be fine if the following is added to the commit message
> >>> instead:
> >>>
> >>> "Note that the special handling of the RTC timer done in pt_update_irq
> >>> is wrong for the no_ack mode, as the RTC timer callback won't disable
> >>> the timer anymore when it detects the guest is not reading REG_C. As
> >>> such remove the code as part of the removal of strict_mode, and don't
> >>> special case the RTC timer anymore in pt_update_irq."
> >>
> >> Not sure yet - as per above I'm still not convinced this part of the
> >> change is correct.
> > 
> > I believe part of this handling is kind of bogus - for example I'm
> > unsure Xen should account masked interrupt injections as missed ticks.
> > A guest might decide to mask it's interrupt source for whatever
> > reason, and then it shouldn't receive a flurry of interrupts when
> > unmasked. Ie: missed ticks should only be accounted for interrupts
> > that should have been delivered but the guest wasn't scheduled. I
> > think such model would also simplify some of the logic that we
> > currently have.
> > 
> > In fact I have a patch on top of this current series which I haven't
> > posted yet that does implement this new mode of not accounting masked
> > interrupts as missed ticks to the delivered later.
> 
> This may be problematic: Iirc one of the goals of this mode is to cover
> for the case where a guest simply doesn't get around to unmasking the
> IRQ until the next one occurs. Yes, it feels bogus, but I'm not sure it
> can be done away with.

Well, an OS shouldn't really mask the interrupt source without being
capable of handling missed interrupts. As even when running native a
SMM could steal time from the OS and thus miss timer ticks?

> I also can't seem to be able to think of a
> heuristic by which the two scenarios could be told apart halfway
> reliably.

I've tested with Windows 7 limited to 2% capacity and seems to be able
to keep track of time correctly when masked timer interrupts are not
accounted as missed ticks. Note than doing the same with
no_missed_ticks_pending leads to time skews (so limiting the CPU
capacity to 2% does indeed force timer ticks to accumulate).

Anyway, we can discuss later, once this initial batch of patches is
in.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-20 14:07 ` [PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs Roger Pau Monne
@ 2021-05-03 15:50   ` Jan Beulich
  2021-05-04 10:27     ` Roger Pau Monné
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Beulich @ 2021-05-03 15:50 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 20.04.2021 16:07, Roger Pau Monne wrote:
> Such callbacks will be executed once a EOI is performed by the guest,
> regardless of whether the interrupts are injected from the vIO-APIC or
> the vPIC, as ISA IRQs are translated to GSIs and then the
> corresponding callback is executed at EOI.
> 
> The vIO-APIC infrastructure for handling EOIs is build on top of the
> existing vlapic EOI callback functionality, while the vPIC one is
> handled when writing to the vPIC EOI register.
> 
> Note that such callbacks need to be registered and de-registered, and
> that a single GSI can have multiple callbacks associated. That's
> because GSIs can be level triggered and shared, as that's the case
> with legacy PCI interrupts shared between several devices.
> 
> Strictly speaking this is a non-functional change, since there are no
> users of this new interface introduced by this change.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

In principle, as everything looks functionally correct to me,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Nevertheless, besides a few remarks further down, I have to admit I'm
concerned of the direct-to-indirect calls conversion (not just here,
but also covering earlier patches), which (considering we're talking
of EOI) I expect may occur quite frequently for at least some guests.
There aren't that many different callback functions which get
registered, are there? Hence I wonder whether enumerating them and
picking the right one via, say, an enum wouldn't be more efficient
and still allow elimination of (in the case here) unconditional calls
to hvm_dpci_eoi() for every EOI.

> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -595,6 +595,81 @@ int hvm_local_events_need_delivery(struct vcpu *v)
>      return !hvm_interrupt_blocked(v, intack);
>  }
>  
> +int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
> +                              struct hvm_gsi_eoi_callback *cb)
> +{
> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> +
> +    if ( gsi >= hvm_irq->nr_gsis )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return -EINVAL;
> +    }
> +
> +    write_lock(&hvm_irq->gsi_callbacks_lock);
> +    list_add(&cb->list, &hvm_irq->gsi_callbacks[gsi]);
> +    write_unlock(&hvm_irq->gsi_callbacks_lock);
> +
> +    return 0;
> +}
> +
> +int hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
> +                                struct hvm_gsi_eoi_callback *cb)
> +{
> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> +    const struct list_head *tmp;
> +    bool found = false;
> +
> +    if ( gsi >= hvm_irq->nr_gsis )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return -EINVAL;
> +    }
> +
> +    write_lock(&hvm_irq->gsi_callbacks_lock);
> +    list_for_each ( tmp, &hvm_irq->gsi_callbacks[gsi] )
> +        if ( tmp == &cb->list )
> +        {
> +            list_del(&cb->list);

Minor remark: Would passing "tmp" here lead to better generated
code?

> @@ -419,13 +421,25 @@ static void eoi_callback(struct vcpu *v, unsigned int vector, void *data)
>              if ( is_iommu_enabled(d) )
>              {
>                  spin_unlock(&d->arch.hvm.irq_lock);
> -                hvm_dpci_eoi(d, vioapic->base_gsi + pin);
> +                hvm_dpci_eoi(d, gsi);
>                  spin_lock(&d->arch.hvm.irq_lock);
>              }
>  
> +            /*
> +             * Callbacks don't expect to be executed with any lock held, so
> +             * drop the lock that protects the vIO-APIC fields from changing.
> +             *
> +             * Note that the redirection entry itself cannot go away, so upon
> +             * retaking the lock we only need to avoid making assumptions on
> +             * redirection entry field values (ie: recheck the IRR field).
> +             */
> +            spin_unlock(&d->arch.hvm.irq_lock);
> +            hvm_gsi_execute_callbacks(d, gsi);
> +            spin_lock(&d->arch.hvm.irq_lock);

While this may be transient in the series, as said before I'm not
happy about this double unlock/relock sequence. I didn't really
understand what would be wrong with

            spin_unlock(&d->arch.hvm.irq_lock);
            if ( is_iommu_enabled(d) )
                hvm_dpci_eoi(d, gsi);
            hvm_gsi_execute_callbacks(d, gsi);
            spin_lock(&d->arch.hvm.irq_lock);

This in particular wouldn't grow but even shrink the later patch
dropping the call to hvm_dpci_eoi().

> --- a/xen/arch/x86/hvm/vpic.c
> +++ b/xen/arch/x86/hvm/vpic.c
> @@ -235,6 +235,8 @@ static void vpic_ioport_write(
>                  unsigned int pin = __scanbit(pending, 8);
>  
>                  ASSERT(pin < 8);
> +                hvm_gsi_execute_callbacks(current->domain,
> +                        hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
>                  hvm_dpci_eoi(current->domain,
>                               hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
>                  __clear_bit(pin, &pending);
> @@ -285,6 +287,8 @@ static void vpic_ioport_write(
>                  /* Release lock and EOI the physical interrupt (if any). */
>                  vpic_update_int_output(vpic);
>                  vpic_unlock(vpic);
> +                hvm_gsi_execute_callbacks(current->domain,
> +                        hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
>                  hvm_dpci_eoi(current->domain,
>                               hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
>                  return; /* bail immediately */

Another presumably minor remark: In the IO-APIC case you insert after
the call to hvm_dpci_eoi(). I wonder if consistency wouldn't help
avoid questions of archeologists in a couple of years time.

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 01/12] x86/rtc: drop code related to strict mode
  2021-05-03 15:28             ` Roger Pau Monné
@ 2021-05-03 15:59               ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2021-05-03 15:59 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 03.05.2021 17:28, Roger Pau Monné wrote:
> On Mon, May 03, 2021 at 04:58:07PM +0200, Jan Beulich wrote:
>> On 03.05.2021 16:47, Roger Pau Monné wrote:
>>> On Mon, May 03, 2021 at 02:26:51PM +0200, Jan Beulich wrote:
>>>> On 03.05.2021 11:28, Roger Pau Monné wrote:
>>>>> On Thu, Apr 29, 2021 at 04:53:07PM +0200, Jan Beulich wrote:
>>>>>> On 20.04.2021 16:07, Roger Pau Monne wrote:
>>>>>>> @@ -337,8 +336,7 @@ int pt_update_irq(struct vcpu *v)
>>>>>>>      {
>>>>>>>          if ( pt->pending_intr_nr )
>>>>>>>          {
>>>>>>> -            /* RTC code takes care of disabling the timer itself. */
>>>>>>> -            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
>>>>>>> +            if ( pt_irq_masked(pt) &&
>>>>>>>                   /* Level interrupts should be asserted even if masked. */
>>>>>>>                   !pt->level )
>>>>>>>              {
>>>>>>
>>>>>> I'm struggling to relate this to any other part of the patch. In
>>>>>> particular I can't find the case where a periodic timer would be
>>>>>> registered with RTC_IRQ and a NULL private pointer. The only use
>>>>>> I can find is with a non-NULL pointer, which would mean the "else"
>>>>>> path is always taken at present for the RTC case (which you now
>>>>>> change).
>>>>>
>>>>> Right, the else case was always taken because as the comment noted RTC
>>>>> would take care of disabling itself (by calling destroy_periodic_time
>>>>> from the callback when using strict_mode). When no_ack mode was
>>>>> implemented this wasn't taken into account AFAICT, and thus the RTC
>>>>> was never removed from the list even when masked.
>>>>>
>>>>> I think with no_ack mode the RTC shouldn't have this specific handling
>>>>> in pt_update_irq, as it should behave like any other virtual timer.
>>>>> I could try to split this as a separate bugfix, but then I would have
>>>>> to teach pt_update_irq to differentiate between strict_mode and no_ack
>>>>> mode.
>>>>
>>>> A fair part of my confusion was about "&& !pt->priv".
>>>
>>> I think you meant "|| !pt->priv"?
>>
>> Oops, indeed.
>>
>>>> I've looked back
>>>> at 9607327abbd3 ("x86/HVM: properly handle RTC periodic timer even when
>>>> !RTC_PIE"), where this was added. It was, afaict, to cover for
>>>> hpet_set_timer() passing NULL with RTC_IRQ.
>>>
>>> That's tricky, as hpet_set_timer hardcodes 8 instead of using RTC_IRQ
>>> which makes it really easy to miss.
>>>
>>>> Which makes me suspect that
>>>> be07023be115 ("x86/vhpet: add support for level triggered interrupts")
>>>> may have subtly broken things.
>>>
>>> Right - as that would have made the RTC irq when generated from the
>>> HPET no longer be suspended when masked (as pt->priv would no longer
>>> be NULL). Could be fixed with:
>>>
>>> diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
>>> index ca94e8b4538..f2cbd12f400 100644
>>> --- a/xen/arch/x86/hvm/hpet.c
>>> +++ b/xen/arch/x86/hvm/hpet.c
>>> @@ -318,7 +318,8 @@ static void hpet_set_timer(HPETState *h, unsigned int tn,
>>>                           hpet_tick_to_ns(h, diff),
>>>                           oneshot ? 0 : hpet_tick_to_ns(h, h->hpet.period[tn]),
>>>                           irq, timer_level(h, tn) ? hpet_timer_fired : NULL,
>>> -                         (void *)(unsigned long)tn, timer_level(h, tn));
>>> +                         timer_level(h, tn) ? (void *)(unsigned long)tn : NULL,
>>> +                         timer_level(h, tn));
>>>  }
>>>  
>>>  static inline uint64_t hpet_fixup_reg(
>>>
>>> Passing again NULL as the callback private data for edge triggered
>>> interrupts.
>>
>> Right, plus perhaps at the same time replacing the hardcoded 8.
> 
> Right, but if you agree to take this patch and remove strict_mode then
> the emulated RTC won't disable itself anymore, and hence needs to be
> handled as any other virtual timer?

I'm still trying to become convinced, both of the removal of the mode
in general and the particular part of the change I've been struggling
with.

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 07/12] x86/dpci: switch to use a GSI EOI callback
  2021-04-20 14:07 ` [PATCH v4 07/12] x86/dpci: switch to use a GSI EOI callback Roger Pau Monne
@ 2021-05-04  9:28   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2021-05-04  9:28 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, Paul Durrant, xen-devel

On 20.04.2021 16:07, Roger Pau Monne wrote:
> @@ -476,6 +476,7 @@ int pt_irq_create_bind(
>      {
>          struct dev_intx_gsi_link *digl = NULL;
>          struct hvm_girq_dpci_mapping *girq = NULL;
> +        struct hvm_gsi_eoi_callback *cb = NULL;

I wonder if this wouldn't benefit from a brief "hwdom only" comment.

> @@ -502,11 +503,22 @@ int pt_irq_create_bind(
>              girq->bus = digl->bus = pt_irq_bind->u.pci.bus;
>              girq->device = digl->device = pt_irq_bind->u.pci.device;
>              girq->intx = digl->intx = pt_irq_bind->u.pci.intx;
> -            list_add_tail(&digl->list, &pirq_dpci->digl_list);
> +            girq->cb.callback = dpci_eoi;
>  
>              guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
>              link = hvm_pci_intx_link(digl->device, digl->intx);
>  
> +            rc = hvm_gsi_register_callback(d, guest_gsi, &girq->cb);
> +            if ( rc )
> +            {
> +                spin_unlock(&d->event_lock);
> +                xfree(girq);
> +                xfree(digl);
> +                return rc;
> +            }
> +
> +            list_add_tail(&digl->list, &pirq_dpci->digl_list);
> +
>              hvm_irq_dpci->link_cnt[link]++;

Could we keep calculation and use of "link" together, please, so the
compiler can avoid spilling the value to the stack or allocating a
callee-saved register for it?

> @@ -514,17 +526,43 @@ int pt_irq_create_bind(
>          }
>          else
>          {
> +            /*
> +             * NB: the callback structure allocated below will never be freed
> +             * once setup because it's used by the hardware domain and will
> +             * never be unregistered.
> +             */
> +            cb = xzalloc(struct hvm_gsi_eoi_callback);
> +
>              ASSERT(is_hardware_domain(d));
>  
> +            if ( !cb )
> +            {
> +                spin_unlock(&d->event_lock);
> +                return -ENOMEM;
> +            }

I'm inclined to ask that the ASSERT() remain first in this "else" block.
In fact, you could ...

>              /* MSI_TRANSLATE is not supported for the hardware domain. */
>              if ( pt_irq_bind->irq_type != PT_IRQ_TYPE_PCI ||
>                   pirq >= hvm_domain_irq(d)->nr_gsis )
>              {
>                  spin_unlock(&d->event_lock);
> -
> +                xfree(cb);

... avoid this extra cleanup by ...

>                  return -EINVAL;
>              }

... putting the allocation here.

>              guest_gsi = pirq;
> +
> +            cb->callback = dpci_eoi;
> +            /*
> +             * IRQ binds created for the hardware domain are never destroyed,
> +             * so it's fine to not keep a reference to cb here.
> +             */
> +            rc = hvm_gsi_register_callback(d, guest_gsi, cb);

In reply to a v3 comment of mine you said "I should replace IRQ with
GSI in the comment above to make it clearer." And while the question
of the comment being (and going to remain) true in the first place
was discussed, I would have hoped for the commit message to say a
word on this. If this ever changed, chances are the place here would
go unnoticed and unchanged, leading to a memory leak.

> @@ -596,12 +634,17 @@ int pt_irq_create_bind(
>                      list_del(&digl->list);
>                      link = hvm_pci_intx_link(digl->device, digl->intx);
>                      hvm_irq_dpci->link_cnt[link]--;
> +                    hvm_gsi_unregister_callback(d, guest_gsi, &girq->cb);
>                  }
> +                else
> +                    hvm_gsi_unregister_callback(d, guest_gsi, cb);
> +
>                  pirq_dpci->flags = 0;
>                  pirq_cleanup_check(info, d);
>                  spin_unlock(&d->event_lock);
>                  xfree(girq);
>                  xfree(digl);
> +                xfree(cb);

May I suggest that you move the xfree() into the "else" you add, and
perhaps even make it conditional upon the un-register being successful?

> @@ -708,6 +752,11 @@ int pt_irq_destroy_bind(
>                   girq->machine_gsi == machine_gsi )
>              {
>                  list_del(&girq->list);
> +                rc = hvm_gsi_unregister_callback(d, guest_gsi, &girq->cb);
> +                if ( rc )
> +                    printk(XENLOG_G_WARNING
> +                           "%pd: unable to remove callback for GSI %u: %d\n",
> +                           d, guest_gsi, rc);
>                  xfree(girq);

If the un-registration really failed (here as well as further up),
is it safe to free girq?

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-05-03 15:50   ` Jan Beulich
@ 2021-05-04 10:27     ` Roger Pau Monné
  0 siblings, 0 replies; 28+ messages in thread
From: Roger Pau Monné @ 2021-05-04 10:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Mon, May 03, 2021 at 05:50:39PM +0200, Jan Beulich wrote:
> On 20.04.2021 16:07, Roger Pau Monne wrote:
> > Such callbacks will be executed once a EOI is performed by the guest,
> > regardless of whether the interrupts are injected from the vIO-APIC or
> > the vPIC, as ISA IRQs are translated to GSIs and then the
> > corresponding callback is executed at EOI.
> > 
> > The vIO-APIC infrastructure for handling EOIs is build on top of the
> > existing vlapic EOI callback functionality, while the vPIC one is
> > handled when writing to the vPIC EOI register.
> > 
> > Note that such callbacks need to be registered and de-registered, and
> > that a single GSI can have multiple callbacks associated. That's
> > because GSIs can be level triggered and shared, as that's the case
> > with legacy PCI interrupts shared between several devices.
> > 
> > Strictly speaking this is a non-functional change, since there are no
> > users of this new interface introduced by this change.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> In principle, as everything looks functionally correct to me,
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Nevertheless, besides a few remarks further down, I have to admit I'm
> concerned of the direct-to-indirect calls conversion (not just here,
> but also covering earlier patches), which (considering we're talking
> of EOI) I expect may occur quite frequently for at least some guests.

I would expect the vmexit cost for each EOI would shadow any gain
between using direct vs indirect calls.

> There aren't that many different callback functions which get
> registered, are there? Hence I wonder whether enumerating them and
> picking the right one via, say, an enum wouldn't be more efficient
> and still allow elimination of (in the case here) unconditional calls
> to hvm_dpci_eoi() for every EOI.

So for the vlapic (vector) callbacks we have the current consumers:
 - MSI passthrough.
 - vPT.
 - IO-APIC.

For GSI callbacks we have:
 - GSI passthrough.
 - vPT.

I could see about implementing this.

This is also kind of blocked on the RTC stuff, since vPT cannot be
migrated to this new model unless we remove strict_mode or changfe the
logic here to allow GSI callbacks to de-register themselves.

> 
> > --- a/xen/arch/x86/hvm/irq.c
> > +++ b/xen/arch/x86/hvm/irq.c
> > @@ -595,6 +595,81 @@ int hvm_local_events_need_delivery(struct vcpu *v)
> >      return !hvm_interrupt_blocked(v, intack);
> >  }
> >  
> > +int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
> > +                              struct hvm_gsi_eoi_callback *cb)
> > +{
> > +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> > +
> > +    if ( gsi >= hvm_irq->nr_gsis )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return -EINVAL;
> > +    }
> > +
> > +    write_lock(&hvm_irq->gsi_callbacks_lock);
> > +    list_add(&cb->list, &hvm_irq->gsi_callbacks[gsi]);
> > +    write_unlock(&hvm_irq->gsi_callbacks_lock);
> > +
> > +    return 0;
> > +}
> > +
> > +int hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
> > +                                struct hvm_gsi_eoi_callback *cb)
> > +{
> > +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> > +    const struct list_head *tmp;
> > +    bool found = false;
> > +
> > +    if ( gsi >= hvm_irq->nr_gsis )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return -EINVAL;
> > +    }
> > +
> > +    write_lock(&hvm_irq->gsi_callbacks_lock);
> > +    list_for_each ( tmp, &hvm_irq->gsi_callbacks[gsi] )
> > +        if ( tmp == &cb->list )
> > +        {
> > +            list_del(&cb->list);
> 
> Minor remark: Would passing "tmp" here lead to better generated
> code?

Maybe? I don't mind doing so.

> > @@ -419,13 +421,25 @@ static void eoi_callback(struct vcpu *v, unsigned int vector, void *data)
> >              if ( is_iommu_enabled(d) )
> >              {
> >                  spin_unlock(&d->arch.hvm.irq_lock);
> > -                hvm_dpci_eoi(d, vioapic->base_gsi + pin);
> > +                hvm_dpci_eoi(d, gsi);
> >                  spin_lock(&d->arch.hvm.irq_lock);
> >              }
> >  
> > +            /*
> > +             * Callbacks don't expect to be executed with any lock held, so
> > +             * drop the lock that protects the vIO-APIC fields from changing.
> > +             *
> > +             * Note that the redirection entry itself cannot go away, so upon
> > +             * retaking the lock we only need to avoid making assumptions on
> > +             * redirection entry field values (ie: recheck the IRR field).
> > +             */
> > +            spin_unlock(&d->arch.hvm.irq_lock);
> > +            hvm_gsi_execute_callbacks(d, gsi);
> > +            spin_lock(&d->arch.hvm.irq_lock);
> 
> While this may be transient in the series, as said before I'm not
> happy about this double unlock/relock sequence. I didn't really
> understand what would be wrong with
> 
>             spin_unlock(&d->arch.hvm.irq_lock);
>             if ( is_iommu_enabled(d) )
>                 hvm_dpci_eoi(d, gsi);
>             hvm_gsi_execute_callbacks(d, gsi);
>             spin_lock(&d->arch.hvm.irq_lock);
> 
> This in particular wouldn't grow but even shrink the later patch
> dropping the call to hvm_dpci_eoi().

Sure.

> > --- a/xen/arch/x86/hvm/vpic.c
> > +++ b/xen/arch/x86/hvm/vpic.c
> > @@ -235,6 +235,8 @@ static void vpic_ioport_write(
> >                  unsigned int pin = __scanbit(pending, 8);
> >  
> >                  ASSERT(pin < 8);
> > +                hvm_gsi_execute_callbacks(current->domain,
> > +                        hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
> >                  hvm_dpci_eoi(current->domain,
> >                               hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
> >                  __clear_bit(pin, &pending);
> > @@ -285,6 +287,8 @@ static void vpic_ioport_write(
> >                  /* Release lock and EOI the physical interrupt (if any). */
> >                  vpic_update_int_output(vpic);
> >                  vpic_unlock(vpic);
> > +                hvm_gsi_execute_callbacks(current->domain,
> > +                        hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
> >                  hvm_dpci_eoi(current->domain,
> >                               hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
> >                  return; /* bail immediately */
> 
> Another presumably minor remark: In the IO-APIC case you insert after
> the call to hvm_dpci_eoi(). I wonder if consistency wouldn't help
> avoid questions of archeologists in a couple of years time.

Hm, sorry, I remember trying to place them in the same order, but
likely messed up the order during some rebase.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 08/12] x86/vpt: switch interrupt injection model
  2021-04-20 14:07 ` [PATCH v4 08/12] x86/vpt: switch interrupt injection model Roger Pau Monne
@ 2021-05-04 11:00   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2021-05-04 11:00 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Andrew Cooper, Wei Liu, Jun Nakajima, Kevin Tian, xen-devel

On 20.04.2021 16:07, Roger Pau Monne wrote:
> @@ -295,188 +248,153 @@ static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
>              list_del(&pt->list);
>          pt->on_list = false;
>          pt->pending_intr_nr = 0;
> +
> +        return;
>      }
> -    else if ( mode_is(v->domain, one_missed_tick_pending) ||
> -              mode_is(v->domain, no_missed_ticks_pending) )
> -    {
> -        pt->last_plt_gtime = hvm_get_guest_time(v);
> -        pt_process_missed_ticks(pt);
> -        pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */
> -        set_timer(&pt->timer, pt->scheduled);
> -    }
> -    else
> +
> +    pt_process_missed_ticks(pt);
> +    /* 'collapse' missed ticks according to the selected mode. */
> +    switch ( pt->vcpu->domain->arch.hvm.params[HVM_PARAM_TIMER_MODE] )
>      {
> -        pt->last_plt_gtime += pt->period;
> -        if ( --pt->pending_intr_nr == 0 )
> -        {
> -            pt_process_missed_ticks(pt);
> -            if ( pt->pending_intr_nr == 0 )
> -                set_timer(&pt->timer, pt->scheduled);
> -        }
> +    case HVMPTM_one_missed_tick_pending:
> +        pt->pending_intr_nr = min(pt->pending_intr_nr, 1u);
> +        break;
> +
> +    case HVMPTM_no_missed_ticks_pending:
> +        pt->pending_intr_nr = 0;
> +        break;
>      }
>  
> -    if ( mode_is(v->domain, delay_for_missed_ticks) &&
> -         (hvm_get_guest_time(v) < pt->last_plt_gtime) )
> -        hvm_set_guest_time(v, pt->last_plt_gtime);
> +    if ( !pt->pending_intr_nr )
> +        set_timer(&pt->timer, pt->scheduled);
>  }
>  
> -int pt_update_irq(struct vcpu *v)
> +static void pt_timer_fn(void *data)
>  {
> -    struct list_head *head = &v->arch.hvm.tm_list;
> -    struct periodic_time *pt, *temp, *earliest_pt;
> -    uint64_t max_lag;
> -    int irq, pt_vector = -1;
> -    bool level;
> +    struct periodic_time *pt = data;
> +    struct vcpu *v;
> +    time_cb *cb = NULL;
> +    void *cb_priv;
> +    unsigned int irq;
>  
> -    pt_vcpu_lock(v);
> +    pt_lock(pt);
>  
> -    earliest_pt = NULL;
> -    max_lag = -1ULL;
> -    list_for_each_entry_safe ( pt, temp, head, list )
> +    v = pt->vcpu;
> +    irq = pt->irq;
> +
> +    if ( inject_interrupt(pt) )
>      {
> -        if ( pt->pending_intr_nr )
> -        {
> -            if ( pt_irq_masked(pt) &&
> -                 /* Level interrupts should be asserted even if masked. */
> -                 !pt->level )
> -            {
> -                /* suspend timer emulation */
> -                list_del(&pt->list);
> -                pt->on_list = 0;
> -            }
> -            else
> -            {
> -                if ( (pt->last_plt_gtime + pt->period) < max_lag )
> -                {
> -                    max_lag = pt->last_plt_gtime + pt->period;
> -                    earliest_pt = pt;
> -                }
> -            }
> -        }
> +        pt->scheduled += pt->period;
> +        pt->do_not_freeze = 0;

Nit: "false" please.

> +        cb = pt->cb;
> +        cb_priv = pt->priv;
>      }
> -
> -    if ( earliest_pt == NULL )
> +    else
>      {
> -        pt_vcpu_unlock(v);
> -        return -1;
> +        /* Masked. */
> +        if ( pt->on_list )
> +            list_del(&pt->list);
> +        pt->on_list = false;
> +        pt->pending_intr_nr++;
>      }

inject_interrupt() returns whether it was able to deliver the interrupt.
This in particular fails if the interrupt was masked and is edge
triggered. This, unexpectedly to me, reports success if a level triggered
interrupt was already pending. But in either event, the missed ticks
accounting is, as per my understanding of the comment in hvm/params.h,
supposed to be dealing with missing delivery due to preemption only. An
interrupt being masked / already pending may not be in this state due to
the guest having got preempted, though. A guest keeping a timer interrupt
masked for an extended period of time should not get a flood of
interrupts later on, no matter what HVM_PARAM_TIMER_MODE is set to.

However, I'm not going to exclude that little bit of doc is wrong, or
implementation and doc aren't in agreement already before your change.

> -    earliest_pt->irq_issued = 1;

This looks to be the only place where the field gets set to non-zero.
If the field is unused after this change, it wants deleting. I notice
patch 11 does so, but it may be worthwhile pointing out
- in the description here, that field removal will happen later,
- in the later patch, that this field was already unused (and doesn't
  become dead by the other removal done there).

> -    irq = earliest_pt->irq;
> -    level = earliest_pt->level;
> +    pt_unlock(pt);
>  
> -    pt_vcpu_unlock(v);
> +    if ( cb )
> +        cb(v, cb_priv);
> +}
>  
> -    switch ( earliest_pt->source )
> -    {
> -    case PTSRC_lapic:
> -        /*
> -         * If periodic timer interrupt is handled by lapic, its vector in
> -         * IRR is returned and used to set eoi_exit_bitmap for virtual
> -         * interrupt delivery case. Otherwise return -1 to do nothing.
> -         */
> -        vlapic_set_irq(vcpu_vlapic(v), irq, 0);
> -        pt_vector = irq;
> -        break;
> +static void eoi_callback(struct periodic_time *pt)
> +{
> +    struct vcpu *v = NULL;
> +    time_cb *cb = NULL;
> +    void *cb_priv = NULL;
>  
> -    case PTSRC_isa:
> -        hvm_isa_irq_deassert(v->domain, irq);
> -        if ( platform_legacy_irq(irq) && vlapic_accept_pic_intr(v) &&
> -             v->domain->arch.hvm.vpic[irq >> 3].int_output )
> -            hvm_isa_irq_assert(v->domain, irq, NULL);
> -        else
> +    pt_lock(pt);
> +
> +    irq_eoi(pt);
> +    if ( pt->pending_intr_nr )
> +    {
> +        if ( inject_interrupt(pt) )
>          {
> -            pt_vector = hvm_isa_irq_assert(v->domain, irq, vioapic_get_vector);
> -            /*
> -             * hvm_isa_irq_assert may not set the corresponding bit in vIRR
> -             * when mask field of IOAPIC RTE is set. Check it again.
> -             */
> -            if ( pt_vector < 0 || !vlapic_test_irq(vcpu_vlapic(v), pt_vector) )
> -                pt_vector = -1;
> +            pt->pending_intr_nr--;
> +            cb = pt->cb;
> +            cb_priv = pt->priv;
> +            v = pt->vcpu;
>          }
> -        break;
> -
> -    case PTSRC_ioapic:
> -        pt_vector = hvm_ioapic_assert(v->domain, irq, level);
> -        if ( pt_vector < 0 || !vlapic_test_irq(vcpu_vlapic(v), pt_vector) )
> +        else
>          {
> -            pt_vector = -1;
> -            if ( level )
> -            {
> -                /*
> -                 * Level interrupts are always asserted because the pin assert
> -                 * count is incremented regardless of whether the pin is masked
> -                 * or the vector latched in IRR, so also execute the callback
> -                 * associated with the timer.
> -                 */
> -                time_cb *cb = NULL;
> -                void *cb_priv = NULL;
> -
> -                pt_vcpu_lock(v);
> -                /* Make sure the timer is still on the list. */
> -                list_for_each_entry ( pt, &v->arch.hvm.tm_list, list )
> -                    if ( pt == earliest_pt )
> -                    {
> -                        pt_irq_fired(v, pt);
> -                        cb = pt->cb;
> -                        cb_priv = pt->priv;
> -                        break;
> -                    }
> -                pt_vcpu_unlock(v);
> -
> -                if ( cb != NULL )
> -                    cb(v, cb_priv);
> -            }
> +            /* Masked. */
> +            if ( pt->on_list )
> +                list_del(&pt->list);
> +            pt->on_list = false;
>          }
> -        break;
>      }
>  
> -    return pt_vector;
> +    pt_unlock(pt);
> +
> +    if ( cb )
> +        cb(v, cb_priv);
>  }
>  
> -static struct periodic_time *is_pt_irq(
> -    struct vcpu *v, struct hvm_intack intack)
> +static void vlapic_eoi_callback(struct vcpu *unused, unsigned int unused2,
> +                                void *data)
>  {
> -    struct list_head *head = &v->arch.hvm.tm_list;
> -    struct periodic_time *pt;
> -
> -    list_for_each_entry ( pt, head, list )
> -    {
> -        if ( pt->pending_intr_nr && pt->irq_issued &&
> -             (intack.vector == pt_irq_vector(pt, intack.source)) )
> -            return pt;
> -    }
> +    eoi_callback(data);
> +}
>  
> -    return NULL;
> +static void vioapic_eoi_callback(struct domain *unused, unsigned int unused2,
> +                                 void *data)
> +{
> +    eoi_callback(data);
>  }
>  
> -void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
> +static bool inject_interrupt(struct periodic_time *pt)
>  {
> -    struct periodic_time *pt;
> -    time_cb *cb;
> -    void *cb_priv;
> +    struct vcpu *v = pt->vcpu;
> +    struct domain *d = v->domain;
> +    unsigned int irq = pt->irq;
>  
> -    if ( intack.source == hvm_intsrc_vector )
> -        return;
> +    /* Level interrupts should be asserted even if masked. */
> +    if ( pt_irq_masked(pt) && !pt->level )
> +        return false;
>  
> -    pt_vcpu_lock(v);
> -
> -    pt = is_pt_irq(v, intack);
> -    if ( pt == NULL )
> +    switch ( pt->source )
>      {
> -        pt_vcpu_unlock(v);
> -        return;
> +    case PTSRC_lapic:
> +        vlapic_set_irq_callback(vcpu_vlapic(v), pt->irq, 0, vlapic_eoi_callback,
> +                                pt);
> +        break;
> +
> +    case PTSRC_isa:
> +        hvm_isa_irq_deassert(d, irq);
> +        hvm_isa_irq_assert(d, irq, NULL);
> +        break;
> +
> +    case PTSRC_ioapic:
> +        hvm_ioapic_assert(d, irq, pt->level);
> +        break;
>      }

Why do ISA IRQs get de-asserted first, but IO-APIC ones don't? I
notice e.g. hvm_set_callback_irq_level() and hvm_set_pci_link_route()
have similar apparent asymmetries, so I guess I'm missing something.
In particular I can't spot - even prior to this change - where
hvm_irq->gsi_assert_count[gsi] would get decremented for a level
triggered IRQ, when hvm_ioapic_deassert() gets called only from
hvm/hpet.c:hpet_write(). I guess the main point is that that's the
only case of a level triggered timer interrupt for us?

> @@ -641,20 +590,29 @@ void pt_adjust_global_vcpu_target(struct vcpu *v)
>      write_unlock(&pl_time->vhpet.lock);
>  }
>  
> -
>  static void pt_resume(struct periodic_time *pt)
>  {
> +    struct vcpu *v;
> +    time_cb *cb = NULL;
> +    void *cb_priv;
> +
>      if ( pt->vcpu == NULL )
>          return;
>  
>      pt_lock(pt);
> -    if ( pt->pending_intr_nr && !pt->on_list )
> +    if ( pt->pending_intr_nr && !pt->on_list && inject_interrupt(pt) )
>      {
> +        pt->pending_intr_nr--;
> +        cb = pt->cb;
> +        cb_priv = pt->priv;
> +        v = pt->vcpu;
>          pt->on_list = 1;
>          list_add(&pt->list, &pt->vcpu->arch.hvm.tm_list);
> -        vcpu_kick(pt->vcpu);

Just for my own understanding: The replacement of this is what happens
down the call tree from inject_interrupt()?

Jan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 09/12] x86/irq: remove unused parameter from hvm_isa_irq_assert
  2021-04-20 14:07 ` [PATCH v4 09/12] x86/irq: remove unused parameter from hvm_isa_irq_assert Roger Pau Monne
@ 2021-05-04 11:42   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2021-05-04 11:42 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 20.04.2021 16:07, Roger Pau Monne wrote:
> There are no callers anymore passing a get_vector function pointer to
> hvm_isa_irq_assert, so drop the parameter.
> 
> No functional change expected.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 10/12] x86/irq: drop return value from hvm_ioapic_assert
  2021-04-20 14:07 ` [PATCH v4 10/12] x86/irq: drop return value from hvm_ioapic_assert Roger Pau Monne
@ 2021-05-04 11:42   ` Jan Beulich
  0 siblings, 0 replies; 28+ messages in thread
From: Jan Beulich @ 2021-05-04 11:42 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 20.04.2021 16:07, Roger Pau Monne wrote:
> There's no caller anymore that cares about the injected vector, so
> drop the returned vector from the function.
> 
> No functional change indented.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-05-04 11:42 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-20 14:07 [PATCH v4 00/12] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
2021-04-20 14:07 ` [PATCH v4 01/12] x86/rtc: drop code related to strict mode Roger Pau Monne
2021-04-29 14:53   ` Jan Beulich
2021-05-03  9:28     ` Roger Pau Monné
2021-05-03 12:26       ` Jan Beulich
2021-05-03 14:47         ` Roger Pau Monné
2021-05-03 14:58           ` Jan Beulich
2021-05-03 15:28             ` Roger Pau Monné
2021-05-03 15:59               ` Jan Beulich
2021-04-20 14:07 ` [PATCH v4 02/12] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
2021-04-29 15:48   ` Jan Beulich
2021-04-20 14:07 ` [PATCH v4 03/12] x86/vmsi: use the newly introduced EOI callbacks Roger Pau Monne
2021-04-20 14:07 ` [PATCH v4 04/12] x86/vioapic: switch to use the EOI callback mechanism Roger Pau Monne
2021-04-29 15:51   ` Jan Beulich
2021-04-20 14:07 ` [PATCH v4 05/12] x86/hvm: allowing registering EOI callbacks for GSIs Roger Pau Monne
2021-05-03 15:50   ` Jan Beulich
2021-05-04 10:27     ` Roger Pau Monné
2021-04-20 14:07 ` [PATCH v4 06/12] x86/dpci: move code Roger Pau Monne
2021-04-20 14:07 ` [PATCH v4 07/12] x86/dpci: switch to use a GSI EOI callback Roger Pau Monne
2021-05-04  9:28   ` Jan Beulich
2021-04-20 14:07 ` [PATCH v4 08/12] x86/vpt: switch interrupt injection model Roger Pau Monne
2021-05-04 11:00   ` Jan Beulich
2021-04-20 14:07 ` [PATCH v4 09/12] x86/irq: remove unused parameter from hvm_isa_irq_assert Roger Pau Monne
2021-05-04 11:42   ` Jan Beulich
2021-04-20 14:07 ` [PATCH v4 10/12] x86/irq: drop return value from hvm_ioapic_assert Roger Pau Monne
2021-05-04 11:42   ` Jan Beulich
2021-04-20 14:07 ` [PATCH v4 11/12] x86/vpt: remove vPT timers per-vCPU lists Roger Pau Monne
2021-04-20 14:07 ` [PATCH v4 12/12] x86/vpt: introduce a per-vPT lock Roger Pau Monne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).