xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT
@ 2021-03-31 10:32 Roger Pau Monne
  2021-03-31 10:32 ` [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks Roger Pau Monne
                   ` (10 more replies)
  0 siblings, 11 replies; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu,
	Paul Durrant, Jun Nakajima, Kevin Tian

Hello,

The following series introduces EOI callbacks and switches a number of
subsystems to use them. The first one is vmsi and dpci, which are quite
straight forward to switch since they used to open code hooks in the EOI
handlers.

Finally HVM virtual timers are also switched to a different model where
EOI callbacks are used instead of checking at every return to guest
whether a timer interrupt is being injected. Note that such change also
fixes a bug in virtual periodic timers that prevented injecting to a vCPU
different than the one where the timer is assigned (and that would
currently match the i8259 target).

Those changes are paired together so that practical applications of
having EOI callbacks can be seen in action.

Note that further cleanup can be done, but I think the series is already
big enough and provides a clear benefit.

A couple of notes from the testing performed:
 - PVH dom0.
 - Windows guest, limited to 2% capacity to test delay for missed ticks
   mode - time is consistent in the guest.
 - Windows guest migration.
 - PCI passthrough to a guest.

Thanks, Roger.

Roger Pau Monne (11):
  x86/hvm: drop vcpu parameter from vlapic EOI callbacks
  x86/hvm: drop domain parameter from vioapic/vpic EOI callbacks
  x86/vlapic: introduce an EOI callback mechanism
  x86/vmsi: use the newly introduced EOI callbacks
  x86/vioapic: switch to use the EOI callback mechanism
  x86/hvm: allowing registering EOI callbacks for GSIs
  x86/dpci: move code
  x86/dpci: switch to use a GSI EOI callback
  x86/vpt: switch interrupt injection model
  x86/vpt: remove vPT timers per-vCPU lists
  x86/vpt: introduce a per-vPT lock

 xen/arch/x86/domain.c             |   4 +-
 xen/arch/x86/emul-i8254.c         |   1 +
 xen/arch/x86/hvm/hpet.c           |   8 +-
 xen/arch/x86/hvm/hvm.c            |  23 +-
 xen/arch/x86/hvm/irq.c            |  63 ++++
 xen/arch/x86/hvm/rtc.c            |   1 +
 xen/arch/x86/hvm/svm/intr.c       |   3 -
 xen/arch/x86/hvm/vioapic.c        | 149 ++++++----
 xen/arch/x86/hvm/vlapic.c         |  74 ++++-
 xen/arch/x86/hvm/vmsi.c           |  35 ++-
 xen/arch/x86/hvm/vmx/intr.c       |  59 ----
 xen/arch/x86/hvm/vpic.c           |   9 +-
 xen/arch/x86/hvm/vpt.c            | 476 +++++++++---------------------
 xen/drivers/passthrough/x86/hvm.c | 219 ++++++++------
 xen/include/asm-x86/hvm/io.h      |   3 +-
 xen/include/asm-x86/hvm/irq.h     |  21 ++
 xen/include/asm-x86/hvm/vcpu.h    |   4 -
 xen/include/asm-x86/hvm/vioapic.h |   2 +-
 xen/include/asm-x86/hvm/vlapic.h  |  18 +-
 xen/include/asm-x86/hvm/vpt.h     |  26 +-
 20 files changed, 601 insertions(+), 597 deletions(-)

-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
@ 2021-03-31 10:32 ` Roger Pau Monne
  2021-03-31 16:02   ` Jan Beulich
  2021-04-01 11:06   ` Jan Beulich
  2021-03-31 10:32 ` [PATCH v3 02/11] x86/hvm: drop domain parameter from vioapic/vpic " Roger Pau Monne
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu,
	Paul Durrant, Paul Durrant

EOIs are always executed in guest vCPU context, so there's no reason to
pass a vCPU parameter around as can be fetched from current.

While there make the vector parameter of both callbacks unsigned int.

No functional change intended.

Suggested-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/vioapic.c        | 5 +++--
 xen/arch/x86/hvm/vlapic.c         | 7 ++-----
 xen/drivers/passthrough/x86/hvm.c | 4 +++-
 xen/include/asm-x86/hvm/io.h      | 2 +-
 xen/include/asm-x86/hvm/vioapic.h | 2 +-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 87370dd4172..91e5f892787 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -372,7 +372,7 @@ static int vioapic_write(
 
 #if VIOAPIC_VERSION_ID >= 0x20
     case VIOAPIC_REG_EOI:
-        vioapic_update_EOI(v->domain, val);
+        vioapic_update_EOI(val);
         break;
 #endif
 
@@ -514,8 +514,9 @@ void vioapic_irq_positive_edge(struct domain *d, unsigned int irq)
     }
 }
 
-void vioapic_update_EOI(struct domain *d, u8 vector)
+void vioapic_update_EOI(unsigned int vector)
 {
+    struct domain *d = current->domain;
     struct hvm_irq *hvm_irq = hvm_domain_irq(d);
     union vioapic_redir_entry *ent;
     unsigned int i;
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 5e21fb4937d..98e4ba67d79 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -459,13 +459,10 @@ void vlapic_EOI_set(struct vlapic *vlapic)
 
 void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
-    struct vcpu *v = vlapic_vcpu(vlapic);
-    struct domain *d = v->domain;
-
     if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
-        vioapic_update_EOI(d, vector);
+        vioapic_update_EOI(vector);
 
-    hvm_dpci_msi_eoi(d, vector);
+    hvm_dpci_msi_eoi(vector);
 }
 
 static bool_t is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand,
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index 351daafdc9b..2f6c81b1e2c 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -796,8 +796,10 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
     return 0;
 }
 
-void hvm_dpci_msi_eoi(struct domain *d, int vector)
+void hvm_dpci_msi_eoi(unsigned int vector)
 {
+    struct domain *d = current->domain;
+
     if ( !is_iommu_enabled(d) ||
          (!hvm_domain_irq(d)->dpci && !is_hardware_domain(d)) )
        return;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 54e0161b492..8b8392ec59e 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -142,7 +142,7 @@ struct hvm_hw_stdvga {
 void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
-extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+extern void hvm_dpci_msi_eoi(unsigned int vector);
 
 /* Decode a PCI port IO access into a bus/slot/func/reg. */
 unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
diff --git a/xen/include/asm-x86/hvm/vioapic.h b/xen/include/asm-x86/hvm/vioapic.h
index 36b64d20d60..882548c13b7 100644
--- a/xen/include/asm-x86/hvm/vioapic.h
+++ b/xen/include/asm-x86/hvm/vioapic.h
@@ -63,7 +63,7 @@ int vioapic_init(struct domain *d);
 void vioapic_deinit(struct domain *d);
 void vioapic_reset(struct domain *d);
 void vioapic_irq_positive_edge(struct domain *d, unsigned int irq);
-void vioapic_update_EOI(struct domain *d, u8 vector);
+void vioapic_update_EOI(unsigned int vector);
 
 int vioapic_get_mask(const struct domain *d, unsigned int gsi);
 int vioapic_get_vector(const struct domain *d, unsigned int gsi);
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 02/11] x86/hvm: drop domain parameter from vioapic/vpic EOI callbacks
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
  2021-03-31 10:32 ` [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks Roger Pau Monne
@ 2021-03-31 10:32 ` Roger Pau Monne
  2021-03-31 16:04   ` Jan Beulich
  2021-03-31 10:32 ` [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu, Paul Durrant

EOIs are always executed in guest vCPU context, so there's no reason to
pass a domain parameter around as can be fetched from current->domain.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/vioapic.c        | 4 ++--
 xen/arch/x86/hvm/vpic.c           | 6 ++----
 xen/drivers/passthrough/x86/hvm.c | 3 ++-
 xen/include/asm-x86/hvm/io.h      | 2 +-
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 91e5f892787..dcc2de76489 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -284,7 +284,7 @@ static void vioapic_write_redirent(
              */
             ASSERT(prev_level);
             ASSERT(!top_word);
-            hvm_dpci_eoi(d, gsi);
+            hvm_dpci_eoi(gsi);
     }
 
     if ( is_hardware_domain(d) && unmasked )
@@ -541,7 +541,7 @@ void vioapic_update_EOI(unsigned int vector)
             if ( is_iommu_enabled(d) )
             {
                 spin_unlock(&d->arch.hvm.irq_lock);
-                hvm_dpci_eoi(d, vioapic->base_gsi + pin);
+                hvm_dpci_eoi(vioapic->base_gsi + pin);
                 spin_lock(&d->arch.hvm.irq_lock);
             }
 
diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index f465b7f9979..a69aecad912 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -235,8 +235,7 @@ static void vpic_ioport_write(
                 unsigned int pin = __scanbit(pending, 8);
 
                 ASSERT(pin < 8);
-                hvm_dpci_eoi(current->domain,
-                             hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
+                hvm_dpci_eoi(hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 __clear_bit(pin, &pending);
             }
             return;
@@ -285,8 +284,7 @@ static void vpic_ioport_write(
                 /* Release lock and EOI the physical interrupt (if any). */
                 vpic_update_int_output(vpic);
                 vpic_unlock(vpic);
-                hvm_dpci_eoi(current->domain,
-                             hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
+                hvm_dpci_eoi(hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 return; /* bail immediately */
             case 6: /* Set Priority                */
                 vpic->priority_add = (val + 1) & 7;
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index 2f6c81b1e2c..a9256e7ef51 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -911,8 +911,9 @@ static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
     hvm_pirq_eoi(pirq);
 }
 
-void hvm_dpci_eoi(struct domain *d, unsigned int guest_gsi)
+void hvm_dpci_eoi(unsigned int guest_gsi)
 {
+    struct domain *d = current->domain;
     const struct hvm_irq_dpci *hvm_irq_dpci;
     const struct hvm_girq_dpci_mapping *girq;
 
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 8b8392ec59e..4f294232fb7 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -101,7 +101,7 @@ bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn,
                                   struct npfec);
 bool handle_pio(uint16_t port, unsigned int size, int dir);
 void hvm_interrupt_post(struct vcpu *v, int vector, int type);
-void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq);
+void hvm_dpci_eoi(unsigned int guest_irq);
 void msix_write_completion(struct vcpu *);
 
 #ifdef CONFIG_HVM
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
  2021-03-31 10:32 ` [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks Roger Pau Monne
  2021-03-31 10:32 ` [PATCH v3 02/11] x86/hvm: drop domain parameter from vioapic/vpic " Roger Pau Monne
@ 2021-03-31 10:32 ` Roger Pau Monne
  2021-03-31 11:47   ` Andrew Cooper
  2021-04-07 14:55   ` Jan Beulich
  2021-03-31 10:32 ` [PATCH v3 04/11] x86/vmsi: use the newly introduced EOI callbacks Roger Pau Monne
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Add a new vlapic_set_irq_callback helper in order to inject a vector
and set a callback to be executed when the guest performs the end of
interrupt acknowledgment.

Such functionality will be used to migrate the current ad hoc handling
done in vlapic_handle_EOI for the vectors that require some logic to
be executed when the end of interrupt is performed.

The setter of the callback will be in charge for setting the callback
again on resume. That is the reason why vlapic_set_callback is not a
static function.

No current users are migrated to use this new functionality yet, so no
functional change expected as a result.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Fix commit message typo.
 - Expand commit message.
 - Also print a warning if the callback data is overridden.
 - Properly free memory in case of error in vlapic_init.

Changes since v1:
 - Make vlapic_set_irq an inline function on the header.
 - Clear the callback hook in vlapic_handle_EOI.
 - Introduce a helper to set the callback without injecting a vector.
 - Remove unneeded parentheses.
 - Reduce callback table by 16.
 - Use %pv to print domain/vcpu ID.
---
 xen/arch/x86/hvm/vlapic.c        | 64 +++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/vlapic.h | 18 ++++++++-
 2 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 98e4ba67d79..851a1f5bd6c 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -144,7 +144,35 @@ bool vlapic_test_irq(const struct vlapic *vlapic, uint8_t vec)
     return vlapic_test_vector(vec, &vlapic->regs->data[APIC_IRR]);
 }
 
-void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig)
+void vlapic_set_callback(struct vlapic *vlapic, unsigned int vec,
+                         vlapic_eoi_callback_t *callback, void *data)
+{
+    unsigned long flags;
+    unsigned int index = vec - 16;
+
+    if ( !callback || vec < 16 || vec >= X86_NR_VECTORS )
+    {
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    spin_lock_irqsave(&vlapic->callback_lock, flags);
+    if ( vlapic->callbacks[index].callback &&
+         (vlapic->callbacks[index].callback != callback ||
+          vlapic->callbacks[index].data != data) )
+        printk(XENLOG_G_WARNING
+               "%pv overriding vector %#x callback %ps (%p) data %p "
+               "with %ps (%p) data %p\n",
+               vlapic_vcpu(vlapic), vec, vlapic->callbacks[index].callback,
+               vlapic->callbacks[index].callback, vlapic->callbacks[index].data,
+               callback, callback, data);
+    vlapic->callbacks[index].callback = callback;
+    vlapic->callbacks[index].data = data;
+    spin_unlock_irqrestore(&vlapic->callback_lock, flags);
+}
+
+void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
+                             vlapic_eoi_callback_t *callback, void *data)
 {
     struct vcpu *target = vlapic_vcpu(vlapic);
 
@@ -159,8 +187,12 @@ void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig)
     else
         vlapic_clear_vector(vec, &vlapic->regs->data[APIC_TMR]);
 
+    if ( callback )
+        vlapic_set_callback(vlapic, vec, callback, data);
+
     if ( hvm_funcs.update_eoi_exit_bitmap )
-        alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec, trig);
+        alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
+                          trig || callback);
 
     if ( hvm_funcs.deliver_posted_intr )
         alternative_vcall(hvm_funcs.deliver_posted_intr, target, vec);
@@ -459,10 +491,24 @@ void vlapic_EOI_set(struct vlapic *vlapic)
 
 void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
+    vlapic_eoi_callback_t *callback;
+    void *data;
+    unsigned long flags;
+    unsigned int index = vector - 16;
+
     if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
         vioapic_update_EOI(vector);
 
     hvm_dpci_msi_eoi(vector);
+
+    spin_lock_irqsave(&vlapic->callback_lock, flags);
+    callback = vlapic->callbacks[index].callback;
+    vlapic->callbacks[index].callback = NULL;
+    data = vlapic->callbacks[index].data;
+    spin_unlock_irqrestore(&vlapic->callback_lock, flags);
+
+    if ( callback )
+        callback(vector, data);
 }
 
 static bool_t is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand,
@@ -1620,9 +1666,22 @@ int vlapic_init(struct vcpu *v)
 
     clear_page(vlapic->regs);
 
+    vlapic->callbacks = xmalloc_array(typeof(*vlapic->callbacks),
+                                      X86_NR_VECTORS - 16);
+    if ( !vlapic->callbacks )
+    {
+        dprintk(XENLOG_ERR, "%pv: alloc vlapic callbacks error\n", v);
+        unmap_domain_page_global(vlapic->regs);
+        free_domheap_page(vlapic->regs_page);
+        return -ENOMEM;
+    }
+    memset(vlapic->callbacks, 0,
+           sizeof(*vlapic->callbacks) * (X86_NR_VECTORS - 16));
+
     vlapic_reset(vlapic);
 
     spin_lock_init(&vlapic->esr_lock);
+    spin_lock_init(&vlapic->callback_lock);
 
     tasklet_init(&vlapic->init_sipi.tasklet, vlapic_init_sipi_action, v);
 
@@ -1644,6 +1703,7 @@ void vlapic_destroy(struct vcpu *v)
     destroy_periodic_time(&vlapic->pt);
     unmap_domain_page_global(vlapic->regs);
     free_domheap_page(vlapic->regs_page);
+    XFREE(vlapic->callbacks);
 }
 
 /*
diff --git a/xen/include/asm-x86/hvm/vlapic.h b/xen/include/asm-x86/hvm/vlapic.h
index 8f908928c35..c380127a719 100644
--- a/xen/include/asm-x86/hvm/vlapic.h
+++ b/xen/include/asm-x86/hvm/vlapic.h
@@ -73,6 +73,8 @@
 #define vlapic_clear_vector(vec, bitmap)                                \
     clear_bit(VEC_POS(vec), (uint32_t *)((bitmap) + REG_POS(vec)))
 
+typedef void vlapic_eoi_callback_t(unsigned int vector, void *data);
+
 struct vlapic {
     struct hvm_hw_lapic      hw;
     struct hvm_hw_lapic_regs *regs;
@@ -89,6 +91,11 @@ struct vlapic {
         uint32_t             icr, dest;
         struct tasklet       tasklet;
     } init_sipi;
+    struct {
+        vlapic_eoi_callback_t *callback;
+        void                 *data;
+    } *callbacks;
+    spinlock_t               callback_lock;
 };
 
 /* vlapic's frequence is 100 MHz */
@@ -111,7 +118,16 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val);
 bool_t is_vlapic_lvtpc_enabled(struct vlapic *vlapic);
 
 bool vlapic_test_irq(const struct vlapic *vlapic, uint8_t vec);
-void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec, uint8_t trig);
+void vlapic_set_callback(struct vlapic *vlapic, unsigned int vec,
+                         vlapic_eoi_callback_t *callback, void *data);
+void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
+                             vlapic_eoi_callback_t *callback, void *data);
+
+static inline void vlapic_set_irq(struct vlapic *vlapic, uint8_t vec,
+                                  uint8_t trig)
+{
+    vlapic_set_irq_callback(vlapic, vec, trig, NULL, NULL);
+}
 
 int vlapic_has_pending_irq(struct vcpu *v);
 int vlapic_ack_pending_irq(struct vcpu *v, int vector, bool_t force_ack);
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 04/11] x86/vmsi: use the newly introduced EOI callbacks
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (2 preceding siblings ...)
  2021-03-31 10:32 ` [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
@ 2021-03-31 10:32 ` Roger Pau Monne
  2021-04-07 14:59   ` Jan Beulich
  2021-03-31 10:32 ` [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism Roger Pau Monne
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:32 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu, Paul Durrant

Remove the unconditional call to hvm_dpci_msi_eoi in vlapic_handle_EOI
and instead use the newly introduced EOI callback mechanism in order
to register a callback for MSI vectors injected from passed through
devices.

This avoids having multiple callback functions open-coded in
vlapic_handle_EOI, as there is now a generic framework for registering
such callbacks. It also avoids doing an unconditional call to
hvm_dpci_msi_eoi for each EOI processed by the local APIC.

Note that now the callback is only registered (and thus executed) when
there's an MSI interrupt originating from a PCI passthrough device
being injected into the guest, so the check in hvm_dpci_msi_eoi can be
removed as it's already done by hvm_dirq_assist which is the only
caller of vmsi_deliver_pirq.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Expand commit message.
 - Pass the domain as the callback data.
 - Remove the check in hvm_dpci_msi_eoi
---
 xen/arch/x86/hvm/vlapic.c         |  2 --
 xen/arch/x86/hvm/vmsi.c           | 35 ++++++++++++++++++-------------
 xen/drivers/passthrough/x86/hvm.c |  8 ++-----
 xen/include/asm-x86/hvm/io.h      |  2 +-
 4 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 851a1f5bd6c..10b216345a7 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -499,8 +499,6 @@ void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
     if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
         vioapic_update_EOI(vector);
 
-    hvm_dpci_msi_eoi(vector);
-
     spin_lock_irqsave(&vlapic->callback_lock, flags);
     callback = vlapic->callbacks[index].callback;
     vlapic->callbacks[index].callback = NULL;
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 13e2a190b43..3da0a2261fd 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -44,11 +44,9 @@
 #include <asm/event.h>
 #include <asm/io_apic.h>
 
-static void vmsi_inj_irq(
-    struct vlapic *target,
-    uint8_t vector,
-    uint8_t trig_mode,
-    uint8_t delivery_mode)
+static void vmsi_inj_irq(struct vlapic *target, uint8_t vector,
+                         uint8_t trig_mode, uint8_t delivery_mode,
+                         vlapic_eoi_callback_t *callback, void *data)
 {
     HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "vmsi_inj_irq: vec %02x trig %d dm %d\n",
                 vector, trig_mode, delivery_mode);
@@ -57,17 +55,17 @@ static void vmsi_inj_irq(
     {
     case dest_Fixed:
     case dest_LowestPrio:
-        vlapic_set_irq(target, vector, trig_mode);
+        vlapic_set_irq_callback(target, vector, trig_mode, callback, data);
         break;
     default:
         BUG();
     }
 }
 
-int vmsi_deliver(
-    struct domain *d, int vector,
-    uint8_t dest, uint8_t dest_mode,
-    uint8_t delivery_mode, uint8_t trig_mode)
+static int vmsi_deliver_callback(struct domain *d, int vector, uint8_t dest,
+                                 uint8_t dest_mode, uint8_t delivery_mode,
+                                 uint8_t trig_mode,
+                                 vlapic_eoi_callback_t *callback, void *data)
 {
     struct vlapic *target;
     struct vcpu *v;
@@ -78,7 +76,8 @@ int vmsi_deliver(
         target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
         if ( target != NULL )
         {
-            vmsi_inj_irq(target, vector, trig_mode, delivery_mode);
+            vmsi_inj_irq(target, vector, trig_mode, delivery_mode, callback,
+                         data);
             break;
         }
         HVM_DBG_LOG(DBG_LEVEL_VLAPIC, "null MSI round robin: vector=%02x\n",
@@ -89,8 +88,8 @@ int vmsi_deliver(
         for_each_vcpu ( d, v )
             if ( vlapic_match_dest(vcpu_vlapic(v), NULL,
                                    0, dest, dest_mode) )
-                vmsi_inj_irq(vcpu_vlapic(v), vector,
-                             trig_mode, delivery_mode);
+                vmsi_inj_irq(vcpu_vlapic(v), vector, trig_mode, delivery_mode,
+                             callback, data);
         break;
 
     default:
@@ -103,6 +102,13 @@ int vmsi_deliver(
     return 0;
 }
 
+int vmsi_deliver(struct domain *d, int vector, uint8_t dest, uint8_t dest_mode,
+                 uint8_t delivery_mode, uint8_t trig_mode)
+{
+    return vmsi_deliver_callback(d, vector, dest, dest_mode, delivery_mode,
+                                 trig_mode, NULL, NULL);
+}
+
 void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 {
     uint32_t flags = pirq_dpci->gmsi.gflags;
@@ -119,7 +125,8 @@ void vmsi_deliver_pirq(struct domain *d, const struct hvm_pirq_dpci *pirq_dpci)
 
     ASSERT(pirq_dpci->flags & HVM_IRQ_DPCI_GUEST_MSI);
 
-    vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode);
+    vmsi_deliver_callback(d, vector, dest, dest_mode, delivery_mode, trig_mode,
+                          hvm_dpci_msi_eoi, d);
 }
 
 /* Return value, -1 : multi-dests, non-negative value: dest_vcpu_id */
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index a9256e7ef51..2331af896d4 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -796,13 +796,9 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
     return 0;
 }
 
-void hvm_dpci_msi_eoi(unsigned int vector)
+void hvm_dpci_msi_eoi(unsigned int vector, void *data)
 {
-    struct domain *d = current->domain;
-
-    if ( !is_iommu_enabled(d) ||
-         (!hvm_domain_irq(d)->dpci && !is_hardware_domain(d)) )
-       return;
+    struct domain *d = data;
 
     spin_lock(&d->event_lock);
     pt_pirq_iterate(d, _hvm_dpci_msi_eoi, (void *)(long)vector);
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 4f294232fb7..9ac3e4f48f6 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -142,7 +142,7 @@ struct hvm_hw_stdvga {
 void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
-extern void hvm_dpci_msi_eoi(unsigned int vector);
+void hvm_dpci_msi_eoi(unsigned int vector, void *data);
 
 /* Decode a PCI port IO access into a bus/slot/func/reg. */
 unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (3 preceding siblings ...)
  2021-03-31 10:32 ` [PATCH v3 04/11] x86/vmsi: use the newly introduced EOI callbacks Roger Pau Monne
@ 2021-03-31 10:32 ` Roger Pau Monne
  2021-04-07 15:19   ` Jan Beulich
  2021-03-31 10:32 ` [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs Roger Pau Monne
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Switch the emulated IO-APIC code to use the local APIC EOI callback
mechanism. This allows to remove the last hardcoded callback from
vlapic_handle_EOI. Removing the hardcoded vIO-APIC callback also
allows to getting rid of setting the EOI exit bitmap based on the
triggering mode, as now all users that require an EOI action use the
newly introduced callback mechanism.

Move and rename the vioapic_update_EOI now that it can be made static.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Explicitly convert the last alternative_vcall parameter to a
   boolean in vlapic_set_irq_callback.

Changes since v1:
 - Remove the triggering check in the update_eoi_exit_bitmap call.
 - Register the vlapic callbacks when loading the vIO-APIC state.
 - Reduce scope of ent.
---
 xen/arch/x86/hvm/vioapic.c | 131 ++++++++++++++++++++++++-------------
 xen/arch/x86/hvm/vlapic.c  |  11 ++--
 2 files changed, 92 insertions(+), 50 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index dcc2de76489..d29b6bfdb7d 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -394,6 +394,50 @@ static const struct hvm_mmio_ops vioapic_mmio_ops = {
     .write = vioapic_write
 };
 
+static void eoi_callback(unsigned int vector, void *data)
+{
+    struct domain *d = current->domain;
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+    unsigned int i;
+
+    ASSERT(has_vioapic(d));
+
+    spin_lock(&d->arch.hvm.irq_lock);
+
+    for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
+    {
+        struct hvm_vioapic *vioapic = domain_vioapic(d, i);
+        unsigned int pin;
+
+        for ( pin = 0; pin < vioapic->nr_pins; pin++ )
+        {
+            union vioapic_redir_entry *ent = &vioapic->redirtbl[pin];
+
+            if ( ent->fields.vector != vector )
+                continue;
+
+            ent->fields.remote_irr = 0;
+
+            if ( is_iommu_enabled(d) )
+            {
+                spin_unlock(&d->arch.hvm.irq_lock);
+                hvm_dpci_eoi(vioapic->base_gsi + pin);
+                spin_lock(&d->arch.hvm.irq_lock);
+            }
+
+            if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
+                 !ent->fields.mask && !ent->fields.remote_irr &&
+                 hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
+            {
+                ent->fields.remote_irr = 1;
+                vioapic_deliver(vioapic, pin);
+            }
+        }
+    }
+
+    spin_unlock(&d->arch.hvm.irq_lock);
+}
+
 static void ioapic_inj_irq(
     struct hvm_vioapic *vioapic,
     struct vlapic *target,
@@ -407,7 +451,8 @@ static void ioapic_inj_irq(
     ASSERT((delivery_mode == dest_Fixed) ||
            (delivery_mode == dest_LowestPrio));
 
-    vlapic_set_irq(target, vector, trig_mode);
+    vlapic_set_irq_callback(target, vector, trig_mode,
+                            trig_mode ? eoi_callback : NULL, NULL);
 }
 
 static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
@@ -514,50 +559,6 @@ void vioapic_irq_positive_edge(struct domain *d, unsigned int irq)
     }
 }
 
-void vioapic_update_EOI(unsigned int vector)
-{
-    struct domain *d = current->domain;
-    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
-    union vioapic_redir_entry *ent;
-    unsigned int i;
-
-    ASSERT(has_vioapic(d));
-
-    spin_lock(&d->arch.hvm.irq_lock);
-
-    for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ )
-    {
-        struct hvm_vioapic *vioapic = domain_vioapic(d, i);
-        unsigned int pin;
-
-        for ( pin = 0; pin < vioapic->nr_pins; pin++ )
-        {
-            ent = &vioapic->redirtbl[pin];
-            if ( ent->fields.vector != vector )
-                continue;
-
-            ent->fields.remote_irr = 0;
-
-            if ( is_iommu_enabled(d) )
-            {
-                spin_unlock(&d->arch.hvm.irq_lock);
-                hvm_dpci_eoi(vioapic->base_gsi + pin);
-                spin_lock(&d->arch.hvm.irq_lock);
-            }
-
-            if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
-                 !ent->fields.mask && !ent->fields.remote_irr &&
-                 hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
-            {
-                ent->fields.remote_irr = 1;
-                vioapic_deliver(vioapic, pin);
-            }
-        }
-    }
-
-    spin_unlock(&d->arch.hvm.irq_lock);
-}
-
 int vioapic_get_mask(const struct domain *d, unsigned int gsi)
 {
     unsigned int pin = 0; /* See gsi_vioapic */
@@ -611,6 +612,8 @@ static int ioapic_save(struct vcpu *v, hvm_domain_context_t *h)
 static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
 {
     struct hvm_vioapic *s;
+    unsigned int i;
+    int rc;
 
     if ( !has_vioapic(d) )
         return -ENODEV;
@@ -621,7 +624,43 @@ static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
          d->arch.hvm.nr_vioapics != 1 )
         return -EOPNOTSUPP;
 
-    return hvm_load_entry(IOAPIC, h, &s->domU);
+    rc = hvm_load_entry(IOAPIC, h, &s->domU);
+    if ( rc )
+        return rc;
+
+    for ( i = 0; i < ARRAY_SIZE(s->domU.redirtbl); i++ )
+    {
+        const union vioapic_redir_entry *ent = &s->domU.redirtbl[i];
+        unsigned int vector = ent->fields.vector;
+        unsigned int delivery_mode = ent->fields.delivery_mode;
+        struct vcpu *v;
+
+        /*
+         * Add a callback for each possible vector injected by a redirection
+         * entry.
+         */
+        if ( vector < 16 || !ent->fields.remote_irr ||
+             (delivery_mode != dest_LowestPrio && delivery_mode != dest_Fixed) )
+            continue;
+
+        for_each_vcpu ( d, v )
+        {
+            struct vlapic *vlapic = vcpu_vlapic(v);
+
+            /*
+             * NB: if the vlapic registers were restored before the vio-apic
+             * ones we could test whether the vector is set in the vlapic IRR
+             * or ISR registers before unconditionally setting the callback.
+             * This is harmless as eoi_callback is capable of dealing with
+             * spurious callbacks.
+             */
+            if ( vlapic_match_dest(vlapic, NULL, 0, ent->fields.dest_id,
+                                   ent->fields.dest_mode) )
+                vlapic_set_callback(vlapic, vector, eoi_callback, NULL);
+        }
+    }
+
+    return 0;
 }
 
 HVM_REGISTER_SAVE_RESTORE(IOAPIC, ioapic_save, ioapic_load, 1, HVMSR_PER_DOM);
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 10b216345a7..63fa3780767 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -192,7 +192,13 @@ void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
 
     if ( hvm_funcs.update_eoi_exit_bitmap )
         alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
-                          trig || callback);
+                          /*
+                           * NB: need to explicitly convert to boolean to avoid
+                           * truncation wrongly result in false begin reported
+                           * for example when the pointer sits on a page
+                           * boundary.
+                           */
+                          !!callback);
 
     if ( hvm_funcs.deliver_posted_intr )
         alternative_vcall(hvm_funcs.deliver_posted_intr, target, vec);
@@ -496,9 +502,6 @@ void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
     unsigned long flags;
     unsigned int index = vector - 16;
 
-    if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
-        vioapic_update_EOI(vector);
-
     spin_lock_irqsave(&vlapic->callback_lock, flags);
     callback = vlapic->callbacks[index].callback;
     vlapic->callbacks[index].callback = NULL;
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (4 preceding siblings ...)
  2021-03-31 10:32 ` [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism Roger Pau Monne
@ 2021-03-31 10:32 ` Roger Pau Monne
  2021-04-07 15:51   ` Jan Beulich
  2021-03-31 10:32 ` [PATCH v3 07/11] x86/dpci: move code Roger Pau Monne
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Such callbacks will be executed once a EOI is performed by the guest,
regardless of whether the interrupts are injected from the vIO-APIC or
the vPIC, as ISA IRQs are translated to GSIs and then the
corresponding callback is executed at EOI.

The vIO-APIC infrastructure for handling EOIs is build on top of the
existing vlapic EOI callback functionality, while the vPIC one is
handled when writing to the vPIC EOI register.

Note that such callbacks need to be registered and de-registered, and
that a single GSI can have multiple callbacks associated. That's
because GSIs can be level triggered and shared, as that's the case
with legacy PCI interrupts shared between several devices.

Strictly speaking this is a non-functional change, since there are no
users of this new interface introduced by this change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Latch hvm_domain_irq in some functions.
 - Make domain parameter of hvm_gsi_has_callbacks const.
 - Add comment about dropping the lock around the
   hvm_gsi_execute_callbacks call.
 - Drop change to ioapic_load.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/hvm.c        | 15 ++++++++-
 xen/arch/x86/hvm/irq.c        | 63 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vioapic.c    | 29 ++++++++++++----
 xen/arch/x86/hvm/vpic.c       |  5 +++
 xen/include/asm-x86/hvm/irq.h | 20 +++++++++++
 5 files changed, 125 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e7bcffebc49..0279014e66e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -608,7 +608,7 @@ static int hvm_print_line(
 
 int hvm_domain_initialise(struct domain *d)
 {
-    unsigned int nr_gsis;
+    unsigned int nr_gsis, i;
     int rc;
 
     if ( !hvm_enabled )
@@ -655,6 +655,14 @@ int hvm_domain_initialise(struct domain *d)
     BUILD_BUG_ON(NR_HVM_DOMU_IRQS < NR_ISAIRQS);
     ASSERT(hvm_domain_irq(d)->nr_gsis >= NR_ISAIRQS);
 
+    /* Initialize the EOI callback list. */
+    hvm_domain_irq(d)->gsi_callbacks = xmalloc_array(struct list_head, nr_gsis);
+    if ( !hvm_domain_irq(d)->gsi_callbacks )
+        goto fail1;
+    rwlock_init(&hvm_domain_irq(d)->gsi_callbacks_lock);
+    for ( i = 0; i < nr_gsis; i++ )
+        INIT_LIST_HEAD(&hvm_domain_irq(d)->gsi_callbacks[i]);
+
     /* need link to containing domain */
     d->arch.hvm.pl_time->domain = d;
 
@@ -714,6 +722,8 @@ int hvm_domain_initialise(struct domain *d)
  fail1:
     if ( is_hardware_domain(d) )
         xfree(d->arch.hvm.io_bitmap);
+    if ( hvm_domain_irq(d) )
+        XFREE(hvm_domain_irq(d)->gsi_callbacks);
     XFREE(d->arch.hvm.params);
     XFREE(d->arch.hvm.irq);
  fail0:
@@ -776,6 +786,9 @@ void hvm_domain_destroy(struct domain *d)
     vioapic_deinit(d);
 
     XFREE(d->arch.hvm.pl_time);
+
+    if ( hvm_domain_irq(d) )
+        XFREE(hvm_domain_irq(d)->gsi_callbacks);
     XFREE(d->arch.hvm.irq);
 
     list_for_each_safe ( ioport_list, tmp, &d->arch.hvm.g2m_ioport_list )
diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
index 38ac5fb6c7c..b9fa8409b9e 100644
--- a/xen/arch/x86/hvm/irq.c
+++ b/xen/arch/x86/hvm/irq.c
@@ -595,6 +595,69 @@ int hvm_local_events_need_delivery(struct vcpu *v)
     return !hvm_interrupt_blocked(v, intack);
 }
 
+int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
+                              struct hvm_gsi_eoi_callback *cb)
+{
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+
+    if ( gsi >= hvm_irq->nr_gsis )
+    {
+        ASSERT_UNREACHABLE();
+        return -EINVAL;
+    }
+
+    write_lock(&hvm_irq->gsi_callbacks_lock);
+    list_add(&cb->list, &hvm_irq->gsi_callbacks[gsi]);
+    write_unlock(&hvm_irq->gsi_callbacks_lock);
+
+    return 0;
+}
+
+void hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
+                                 struct hvm_gsi_eoi_callback *cb)
+{
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+    const struct list_head *tmp;
+
+    if ( gsi >= hvm_irq->nr_gsis )
+    {
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    write_lock(&hvm_irq->gsi_callbacks_lock);
+    list_for_each ( tmp, &hvm_irq->gsi_callbacks[gsi] )
+        if ( tmp == &cb->list )
+        {
+            list_del(&cb->list);
+            break;
+        }
+    write_unlock(&hvm_irq->gsi_callbacks_lock);
+}
+
+void hvm_gsi_execute_callbacks(unsigned int gsi)
+{
+    struct hvm_irq *hvm_irq = hvm_domain_irq(current->domain);
+    struct hvm_gsi_eoi_callback *cb;
+
+    read_lock(&hvm_irq->gsi_callbacks_lock);
+    list_for_each_entry ( cb, &hvm_irq->gsi_callbacks[gsi], list )
+        cb->callback(gsi, cb->data);
+    read_unlock(&hvm_irq->gsi_callbacks_lock);
+}
+
+bool hvm_gsi_has_callbacks(const struct domain *d, unsigned int gsi)
+{
+    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
+    bool has_callbacks;
+
+    read_lock(&hvm_irq->gsi_callbacks_lock);
+    has_callbacks = !list_empty(&hvm_irq->gsi_callbacks[gsi]);
+    read_unlock(&hvm_irq->gsi_callbacks_lock);
+
+    return has_callbacks;
+}
+
 static void irq_dump(struct domain *d)
 {
     struct hvm_irq *hvm_irq = hvm_domain_irq(d);
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index d29b6bfdb7d..099c29466ba 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -285,6 +285,7 @@ static void vioapic_write_redirent(
             ASSERT(prev_level);
             ASSERT(!top_word);
             hvm_dpci_eoi(gsi);
+            hvm_gsi_execute_callbacks(gsi);
     }
 
     if ( is_hardware_domain(d) && unmasked )
@@ -412,6 +413,7 @@ static void eoi_callback(unsigned int vector, void *data)
         for ( pin = 0; pin < vioapic->nr_pins; pin++ )
         {
             union vioapic_redir_entry *ent = &vioapic->redirtbl[pin];
+            unsigned int gsi = vioapic->base_gsi + pin;
 
             if ( ent->fields.vector != vector )
                 continue;
@@ -421,13 +423,25 @@ static void eoi_callback(unsigned int vector, void *data)
             if ( is_iommu_enabled(d) )
             {
                 spin_unlock(&d->arch.hvm.irq_lock);
-                hvm_dpci_eoi(vioapic->base_gsi + pin);
+                hvm_dpci_eoi(gsi);
                 spin_lock(&d->arch.hvm.irq_lock);
             }
 
+            /*
+             * Callbacks don't expect to be executed with any lock held, so
+             * drop the lock that protects the vIO-APIC fields from changing.
+             *
+             * Note that the redirection entry itself cannot go away, so upon
+             * retaking the lock we only need to avoid making assumptions on
+             * redirection entry field values (ie: recheck the IRR field).
+             */
+            spin_unlock(&d->arch.hvm.irq_lock);
+            hvm_gsi_execute_callbacks(gsi);
+            spin_lock(&d->arch.hvm.irq_lock);
+
             if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
                  !ent->fields.mask && !ent->fields.remote_irr &&
-                 hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
+                 hvm_irq->gsi_assert_count[gsi] )
             {
                 ent->fields.remote_irr = 1;
                 vioapic_deliver(vioapic, pin);
@@ -443,7 +457,8 @@ static void ioapic_inj_irq(
     struct vlapic *target,
     uint8_t vector,
     uint8_t trig_mode,
-    uint8_t delivery_mode)
+    uint8_t delivery_mode,
+    bool callback)
 {
     HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "irq %d trig %d deliv %d",
                 vector, trig_mode, delivery_mode);
@@ -452,7 +467,7 @@ static void ioapic_inj_irq(
            (delivery_mode == dest_LowestPrio));
 
     vlapic_set_irq_callback(target, vector, trig_mode,
-                            trig_mode ? eoi_callback : NULL, NULL);
+                            callback ? eoi_callback : NULL, NULL);
 }
 
 static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
@@ -466,6 +481,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
     struct vlapic *target;
     struct vcpu *v;
     unsigned int irq = vioapic->base_gsi + pin;
+    bool callback = trig_mode || hvm_gsi_has_callbacks(d, irq);
 
     ASSERT(spin_is_locked(&d->arch.hvm.irq_lock));
 
@@ -492,7 +508,8 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
             target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
         if ( target != NULL )
         {
-            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode);
+            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode,
+                           callback);
         }
         else
         {
@@ -507,7 +524,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
         for_each_vcpu ( d, v )
             if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
                 ioapic_inj_irq(vioapic, vcpu_vlapic(v), vector, trig_mode,
-                               delivery_mode);
+                               delivery_mode, callback);
         break;
 
     case dest_NMI:
diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index a69aecad912..ca484c31b6a 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -235,6 +235,8 @@ static void vpic_ioport_write(
                 unsigned int pin = __scanbit(pending, 8);
 
                 ASSERT(pin < 8);
+                hvm_gsi_execute_callbacks(
+                        hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 hvm_dpci_eoi(hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 __clear_bit(pin, &pending);
             }
@@ -284,8 +286,11 @@ static void vpic_ioport_write(
                 /* Release lock and EOI the physical interrupt (if any). */
                 vpic_update_int_output(vpic);
                 vpic_unlock(vpic);
+                hvm_gsi_execute_callbacks(
+                        hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 hvm_dpci_eoi(hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 return; /* bail immediately */
+
             case 6: /* Set Priority                */
                 vpic->priority_add = (val + 1) & 7;
                 break;
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 07b1ab99cd1..0828c01dd18 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -21,6 +21,7 @@
 #ifndef __ASM_X86_HVM_IRQ_H__
 #define __ASM_X86_HVM_IRQ_H__
 
+#include <xen/rwlock.h>
 #include <xen/timer.h>
 
 #include <asm/hvm/hvm.h>
@@ -79,6 +80,10 @@ struct hvm_irq {
 
     struct hvm_irq_dpci *dpci;
 
+    /* List of callbacks for GSI EOI events. Protected by irq_lock. */
+    struct list_head  *gsi_callbacks;
+    rwlock_t gsi_callbacks_lock;
+
     /*
      * Number of wires asserting each GSI.
      *
@@ -138,6 +143,13 @@ struct hvm_gmsi_info {
     bool posted; /* directly deliver to guest via VT-d PI? */
 };
 
+typedef void hvm_gsi_eoi_callback_t(unsigned int gsi, void *data);
+struct hvm_gsi_eoi_callback {
+    hvm_gsi_eoi_callback_t *callback;
+    void *data;
+    struct list_head list;
+};
+
 struct hvm_girq_dpci_mapping {
     struct list_head list;
     uint8_t bus;
@@ -225,4 +237,12 @@ void hvm_set_callback_via(struct domain *d, uint64_t via);
 struct pirq;
 bool hvm_domain_use_pirq(const struct domain *, const struct pirq *);
 
+int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
+                              struct hvm_gsi_eoi_callback *cb);
+void hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
+                                 struct hvm_gsi_eoi_callback *cb);
+/* data is an opaque blob to pass to the callback if it has no private data. */
+void hvm_gsi_execute_callbacks(unsigned int gsi);
+bool hvm_gsi_has_callbacks(const struct domain *d, unsigned int gsi);
+
 #endif /* __ASM_X86_HVM_IRQ_H__ */
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 07/11] x86/dpci: move code
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (5 preceding siblings ...)
  2021-03-31 10:32 ` [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs Roger Pau Monne
@ 2021-03-31 10:32 ` Roger Pau Monne
  2021-03-31 10:33 ` [PATCH v3 08/11] x86/dpci: switch to use a GSI EOI callback Roger Pau Monne
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:32 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Paul Durrant

This is code movement in order to simply further changes.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v2:
 - Drop one of the leading underscores from __hvm_dpci_eoi.

Changes since v1:
 - New in this version.
---
 xen/drivers/passthrough/x86/hvm.c | 164 +++++++++++++++---------------
 1 file changed, 82 insertions(+), 82 deletions(-)

diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index 2331af896d4..ecc7d66e600 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -205,6 +205,88 @@ static struct vcpu *vector_hashing_dest(const struct domain *d,
     return dest;
 }
 
+static void hvm_pirq_eoi(struct pirq *pirq)
+{
+    struct hvm_pirq_dpci *pirq_dpci;
+
+    if ( !pirq )
+    {
+        ASSERT_UNREACHABLE();
+        return;
+    }
+
+    pirq_dpci = pirq_dpci(pirq);
+
+    /*
+     * No need to get vector lock for timer
+     * since interrupt is still not EOIed
+     */
+    if ( --pirq_dpci->pending ||
+         /* When the interrupt source is MSI no Ack should be performed. */
+         (pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE) )
+        return;
+
+    pirq_guest_eoi(pirq);
+}
+
+static void _hvm_dpci_eoi(struct domain *d,
+                          const struct hvm_girq_dpci_mapping *girq)
+{
+    struct pirq *pirq = pirq_info(d, girq->machine_gsi);
+
+    if ( !hvm_domain_use_pirq(d, pirq) )
+        hvm_pci_intx_deassert(d, girq->device, girq->intx);
+
+    hvm_pirq_eoi(pirq);
+}
+
+static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
+{
+    struct pirq *pirq = pirq_info(d, gsi);
+
+    /* Check if GSI is actually mapped. */
+    if ( !pirq_dpci(pirq) )
+        return;
+
+    hvm_gsi_deassert(d, gsi);
+    hvm_pirq_eoi(pirq);
+}
+
+void hvm_dpci_eoi(unsigned int guest_gsi)
+{
+    struct domain *d = current->domain;
+    const struct hvm_irq_dpci *hvm_irq_dpci;
+    const struct hvm_girq_dpci_mapping *girq;
+
+    if ( !is_iommu_enabled(d) )
+        return;
+
+    if ( is_hardware_domain(d) )
+    {
+        spin_lock(&d->event_lock);
+        hvm_gsi_eoi(d, guest_gsi);
+        goto unlock;
+    }
+
+    if ( guest_gsi < NR_ISAIRQS )
+    {
+        hvm_dpci_isairq_eoi(d, guest_gsi);
+        return;
+    }
+
+    spin_lock(&d->event_lock);
+    hvm_irq_dpci = domain_get_irq_dpci(d);
+
+    if ( !hvm_irq_dpci )
+        goto unlock;
+
+    list_for_each_entry ( girq, &hvm_irq_dpci->girq[guest_gsi], list )
+        _hvm_dpci_eoi(d, girq);
+
+unlock:
+    spin_unlock(&d->event_lock);
+}
+
 int pt_irq_create_bind(
     struct domain *d, const struct xen_domctl_bind_pt_irq *pt_irq_bind)
 {
@@ -860,88 +942,6 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci)
     spin_unlock(&d->event_lock);
 }
 
-static void hvm_pirq_eoi(struct pirq *pirq)
-{
-    struct hvm_pirq_dpci *pirq_dpci;
-
-    if ( !pirq )
-    {
-        ASSERT_UNREACHABLE();
-        return;
-    }
-
-    pirq_dpci = pirq_dpci(pirq);
-
-    /*
-     * No need to get vector lock for timer
-     * since interrupt is still not EOIed
-     */
-    if ( --pirq_dpci->pending ||
-         /* When the interrupt source is MSI no Ack should be performed. */
-         (pirq_dpci->flags & HVM_IRQ_DPCI_TRANSLATE) )
-        return;
-
-    pirq_guest_eoi(pirq);
-}
-
-static void __hvm_dpci_eoi(struct domain *d,
-                           const struct hvm_girq_dpci_mapping *girq)
-{
-    struct pirq *pirq = pirq_info(d, girq->machine_gsi);
-
-    if ( !hvm_domain_use_pirq(d, pirq) )
-        hvm_pci_intx_deassert(d, girq->device, girq->intx);
-
-    hvm_pirq_eoi(pirq);
-}
-
-static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
-{
-    struct pirq *pirq = pirq_info(d, gsi);
-
-    /* Check if GSI is actually mapped. */
-    if ( !pirq_dpci(pirq) )
-        return;
-
-    hvm_gsi_deassert(d, gsi);
-    hvm_pirq_eoi(pirq);
-}
-
-void hvm_dpci_eoi(unsigned int guest_gsi)
-{
-    struct domain *d = current->domain;
-    const struct hvm_irq_dpci *hvm_irq_dpci;
-    const struct hvm_girq_dpci_mapping *girq;
-
-    if ( !is_iommu_enabled(d) )
-        return;
-
-    if ( is_hardware_domain(d) )
-    {
-        spin_lock(&d->event_lock);
-        hvm_gsi_eoi(d, guest_gsi);
-        goto unlock;
-    }
-
-    if ( guest_gsi < NR_ISAIRQS )
-    {
-        hvm_dpci_isairq_eoi(d, guest_gsi);
-        return;
-    }
-
-    spin_lock(&d->event_lock);
-    hvm_irq_dpci = domain_get_irq_dpci(d);
-
-    if ( !hvm_irq_dpci )
-        goto unlock;
-
-    list_for_each_entry ( girq, &hvm_irq_dpci->girq[guest_gsi], list )
-        __hvm_dpci_eoi(d, girq);
-
-unlock:
-    spin_unlock(&d->event_lock);
-}
-
 static int pci_clean_dpci_irq(struct domain *d,
                               struct hvm_pirq_dpci *pirq_dpci, void *arg)
 {
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 08/11] x86/dpci: switch to use a GSI EOI callback
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (6 preceding siblings ...)
  2021-03-31 10:32 ` [PATCH v3 07/11] x86/dpci: move code Roger Pau Monne
@ 2021-03-31 10:33 ` Roger Pau Monne
  2021-04-08 14:49   ` Jan Beulich
  2021-03-31 10:33 ` [PATCH v3 09/11] x86/vpt: switch interrupt injection model Roger Pau Monne
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu, Paul Durrant

Switch the dpci GSI EOI callback hooks to use the newly introduced
generic callback functionality, and remove the custom dpci calls found
on the vPIC and vIO-APIC implementations.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Avoid leaking the allocated callback on error paths of
   pt_irq_create_bind.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/vioapic.c        |  8 -----
 xen/arch/x86/hvm/vpic.c           |  2 --
 xen/drivers/passthrough/x86/hvm.c | 54 ++++++++++++++++++++++++++++---
 xen/include/asm-x86/hvm/io.h      |  1 -
 xen/include/asm-x86/hvm/irq.h     |  1 +
 5 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 099c29466ba..4cdb95ce835 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -284,7 +284,6 @@ static void vioapic_write_redirent(
              */
             ASSERT(prev_level);
             ASSERT(!top_word);
-            hvm_dpci_eoi(gsi);
             hvm_gsi_execute_callbacks(gsi);
     }
 
@@ -420,13 +419,6 @@ static void eoi_callback(unsigned int vector, void *data)
 
             ent->fields.remote_irr = 0;
 
-            if ( is_iommu_enabled(d) )
-            {
-                spin_unlock(&d->arch.hvm.irq_lock);
-                hvm_dpci_eoi(gsi);
-                spin_lock(&d->arch.hvm.irq_lock);
-            }
-
             /*
              * Callbacks don't expect to be executed with any lock held, so
              * drop the lock that protects the vIO-APIC fields from changing.
diff --git a/xen/arch/x86/hvm/vpic.c b/xen/arch/x86/hvm/vpic.c
index ca484c31b6a..e0f3f6276dc 100644
--- a/xen/arch/x86/hvm/vpic.c
+++ b/xen/arch/x86/hvm/vpic.c
@@ -237,7 +237,6 @@ static void vpic_ioport_write(
                 ASSERT(pin < 8);
                 hvm_gsi_execute_callbacks(
                         hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
-                hvm_dpci_eoi(hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 __clear_bit(pin, &pending);
             }
             return;
@@ -288,7 +287,6 @@ static void vpic_ioport_write(
                 vpic_unlock(vpic);
                 hvm_gsi_execute_callbacks(
                         hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
-                hvm_dpci_eoi(hvm_isa_irq_to_gsi((addr >> 7) ? (pin | 8) : pin));
                 return; /* bail immediately */
 
             case 6: /* Set Priority                */
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index ecc7d66e600..4ae678d69b4 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -252,9 +252,9 @@ static void hvm_gsi_eoi(struct domain *d, unsigned int gsi)
     hvm_pirq_eoi(pirq);
 }
 
-void hvm_dpci_eoi(unsigned int guest_gsi)
+static void dpci_eoi(unsigned int guest_gsi, void *data)
 {
-    struct domain *d = current->domain;
+    struct domain *d = data;
     const struct hvm_irq_dpci *hvm_irq_dpci;
     const struct hvm_girq_dpci_mapping *girq;
 
@@ -477,6 +477,7 @@ int pt_irq_create_bind(
     {
         struct dev_intx_gsi_link *digl = NULL;
         struct hvm_girq_dpci_mapping *girq = NULL;
+        struct hvm_gsi_eoi_callback *cb = NULL;
         unsigned int guest_gsi;
 
         /*
@@ -503,11 +504,23 @@ int pt_irq_create_bind(
             girq->bus = digl->bus = pt_irq_bind->u.pci.bus;
             girq->device = digl->device = pt_irq_bind->u.pci.device;
             girq->intx = digl->intx = pt_irq_bind->u.pci.intx;
-            list_add_tail(&digl->list, &pirq_dpci->digl_list);
+            girq->cb.callback = dpci_eoi;
+            girq->cb.data = d;
 
             guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
             link = hvm_pci_intx_link(digl->device, digl->intx);
 
+            rc = hvm_gsi_register_callback(d, guest_gsi, &girq->cb);
+            if ( rc )
+            {
+                spin_unlock(&d->event_lock);
+                xfree(girq);
+                xfree(digl);
+                return rc;
+            }
+
+            list_add_tail(&digl->list, &pirq_dpci->digl_list);
+
             hvm_irq_dpci->link_cnt[link]++;
 
             girq->machine_gsi = pirq;
@@ -515,17 +528,44 @@ int pt_irq_create_bind(
         }
         else
         {
+            /*
+             * NB: the callback structure allocated below will never be freed
+             * once setup because it's used by the hardware domain and will
+             * never be unregistered.
+             */
+            cb = xmalloc(struct hvm_gsi_eoi_callback);
+
             ASSERT(is_hardware_domain(d));
 
+            if ( !cb )
+            {
+                spin_unlock(&d->event_lock);
+                return -ENOMEM;
+            }
+
             /* MSI_TRANSLATE is not supported for the hardware domain. */
             if ( pt_irq_bind->irq_type != PT_IRQ_TYPE_PCI ||
                  pirq >= hvm_domain_irq(d)->nr_gsis )
             {
                 spin_unlock(&d->event_lock);
-
+                xfree(cb);
                 return -EINVAL;
             }
             guest_gsi = pirq;
+
+            cb->callback = dpci_eoi;
+            cb->data = d;
+            /*
+             * IRQ binds created for the hardware domain are never destroyed,
+             * so it's fine to not keep a reference to cb here.
+             */
+            rc = hvm_gsi_register_callback(d, guest_gsi, cb);
+            if ( rc )
+            {
+                spin_unlock(&d->event_lock);
+                xfree(cb);
+                return rc;
+            }
         }
 
         /* Bind the same mirq once in the same domain */
@@ -597,12 +637,17 @@ int pt_irq_create_bind(
                     list_del(&digl->list);
                     link = hvm_pci_intx_link(digl->device, digl->intx);
                     hvm_irq_dpci->link_cnt[link]--;
+                    hvm_gsi_unregister_callback(d, guest_gsi, &girq->cb);
                 }
+                else
+                    hvm_gsi_unregister_callback(d, guest_gsi, cb);
+
                 pirq_dpci->flags = 0;
                 pirq_cleanup_check(info, d);
                 spin_unlock(&d->event_lock);
                 xfree(girq);
                 xfree(digl);
+                xfree(cb);
                 return rc;
             }
         }
@@ -709,6 +754,7 @@ int pt_irq_destroy_bind(
                  girq->machine_gsi == machine_gsi )
             {
                 list_del(&girq->list);
+                hvm_gsi_unregister_callback(d, guest_gsi, &girq->cb);
                 xfree(girq);
                 girq = NULL;
                 break;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 9ac3e4f48f6..a05bdbe5555 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -101,7 +101,6 @@ bool handle_mmio_with_translation(unsigned long gla, unsigned long gpfn,
                                   struct npfec);
 bool handle_pio(uint16_t port, unsigned int size, int dir);
 void hvm_interrupt_post(struct vcpu *v, int vector, int type);
-void hvm_dpci_eoi(unsigned int guest_irq);
 void msix_write_completion(struct vcpu *);
 
 #ifdef CONFIG_HVM
diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-x86/hvm/irq.h
index 0828c01dd18..f49c4c3b6e5 100644
--- a/xen/include/asm-x86/hvm/irq.h
+++ b/xen/include/asm-x86/hvm/irq.h
@@ -156,6 +156,7 @@ struct hvm_girq_dpci_mapping {
     uint8_t device;
     uint8_t intx;
     uint8_t machine_gsi;
+    struct hvm_gsi_eoi_callback cb;
 };
 
 #define NR_ISAIRQS  16
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 09/11] x86/vpt: switch interrupt injection model
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (7 preceding siblings ...)
  2021-03-31 10:33 ` [PATCH v3 08/11] x86/dpci: switch to use a GSI EOI callback Roger Pau Monne
@ 2021-03-31 10:33 ` Roger Pau Monne
  2021-04-14 10:28   ` Jan Beulich
  2021-03-31 10:33 ` [PATCH v3 10/11] x86/vpt: remove vPT timers per-vCPU lists Roger Pau Monne
  2021-03-31 10:33 ` [PATCH v3 11/11] x86/vpt: introduce a per-vPT lock Roger Pau Monne
  10 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:33 UTC (permalink / raw)
  To: xen-devel
  Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu,
	Jun Nakajima, Kevin Tian

Currently vPT relies on timers being assigned to a vCPU and performing
checks on every return to HVM guest in order to check if an interrupt
from a vPT timer assigned to the vCPU is currently being injected.

This model doesn't work properly since the interrupt destination vCPU
of a vPT timer can be different from the vCPU where the timer is
currently assigned, in which case the timer would get stuck because it
never sees the interrupt as being injected.

Knowing when a vPT interrupt is injected is relevant for the guest
timer modes where missed vPT interrupts are not discarded and instead
are accumulated and injected when possible.

This change aims to modify the logic described above, so that vPT
doesn't need to check on every return to HVM guest if a vPT interrupt
is being injected. In order to achieve this the vPT code is modified
to make use of the new EOI callbacks, so that virtual timers can
detect when a interrupt has been serviced by the guest by waiting for
the EOI callback to execute.

This model also simplifies some of the logic, as when executing the
timer EOI callback Xen can try to inject another interrupt if the
timer has interrupts pending for delivery.

Note that timers are still bound to a vCPU for the time being, this
relation however doesn't limit the interrupt destination anymore, and
will be removed by further patches.

This model has been tested with Windows 7 guests without showing any
timer delay, even when the guest was limited to have very little CPU
capacity and pending virtual timer interrupts accumulate.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Avoid and explicit != NULL check.
 - Use a switch in inject_interrupt to evaluate the timer mode.
 - Print the pt->source field on error in create_periodic_time.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/svm/intr.c   |   3 -
 xen/arch/x86/hvm/vmx/intr.c   |  59 ------
 xen/arch/x86/hvm/vpt.c        | 334 ++++++++++++++--------------------
 xen/include/asm-x86/hvm/vpt.h |   5 +-
 4 files changed, 143 insertions(+), 258 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/intr.c b/xen/arch/x86/hvm/svm/intr.c
index 7f815d23078..2ee2332253b 100644
--- a/xen/arch/x86/hvm/svm/intr.c
+++ b/xen/arch/x86/hvm/svm/intr.c
@@ -146,8 +146,6 @@ void svm_intr_assist(void)
         return;
 
     /* Crank the handle on interrupt state. */
-    pt_update_irq(v);
-
     do {
         intack = hvm_vcpu_has_pending_irq(v);
         if ( likely(intack.source == hvm_intsrc_none) )
@@ -219,7 +217,6 @@ void svm_intr_assist(void)
     {
         HVMTRACE_2D(INJ_VIRQ, intack.vector, /*fake=*/ 0);
         svm_inject_extint(v, intack.vector);
-        pt_intr_post(v, intack);
     }
 
     /* Is there another IRQ to queue up behind this one? */
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 80bfbb47878..3fcc7073db2 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -203,7 +203,6 @@ static int nvmx_intr_intercept(struct vcpu *v, struct hvm_intack intack)
             {
                 /* for now, duplicate the ack path in vmx_intr_assist */
                 hvm_vcpu_ack_pending_irq(v, intack);
-                pt_intr_post(v, intack);
 
                 intack = hvm_vcpu_has_pending_irq(v);
                 if ( unlikely(intack.source != hvm_intsrc_none) )
@@ -242,7 +241,6 @@ void vmx_intr_assist(void)
     struct vcpu *v = current;
     unsigned int tpr_threshold = 0;
     enum hvm_intblk intblk;
-    int pt_vector;
 
     /* Block event injection when single step with MTF. */
     if ( unlikely(v->arch.hvm.single_step) )
@@ -263,8 +261,6 @@ void vmx_intr_assist(void)
 #endif
 
     /* Crank the handle on interrupt state. */
-    pt_vector = pt_update_irq(v);
-
     do {
         unsigned long intr_info;
 
@@ -337,58 +333,6 @@ void vmx_intr_assist(void)
     {
         unsigned long status;
 
-       /*
-        * intack.vector is the highest priority vector. So we set eoi_exit_bitmap
-        * for intack.vector - give a chance to post periodic time interrupts when
-        * periodic time interrupts become the highest one
-        */
-        if ( pt_vector != -1 )
-        {
-#ifndef NDEBUG
-            /*
-             * We assert that intack.vector is the highest priority vector for
-             * only an interrupt from vlapic can reach this point and the
-             * highest vector is chosen in hvm_vcpu_has_pending_irq().
-             * But, in fact, the assertion failed sometimes. It is suspected
-             * that PIR is not synced to vIRR which makes pt_vector is left in
-             * PIR. In order to verify this suspicion, dump some information
-             * when the assertion fails.
-             */
-            if ( unlikely(intack.vector < pt_vector) )
-            {
-                const struct vlapic *vlapic;
-                const struct pi_desc *pi_desc;
-                const uint32_t *word;
-                unsigned int i;
-
-                printk(XENLOG_ERR "%pv: intack: %u:%02x pt: %02x\n",
-                       current, intack.source, intack.vector, pt_vector);
-
-                vlapic = vcpu_vlapic(v);
-                if ( vlapic && vlapic->regs )
-                {
-                    word = (const void *)&vlapic->regs->data[APIC_IRR];
-                    printk(XENLOG_ERR "vIRR:");
-                    for ( i = X86_NR_VECTORS / 32; i-- ; )
-                        printk(" %08x", word[i*4]);
-                    printk("\n");
-                }
-
-                pi_desc = &v->arch.hvm.vmx.pi_desc;
-                if ( pi_desc )
-                {
-                    word = (const void *)&pi_desc->pir;
-                    printk(XENLOG_ERR " PIR:");
-                    for ( i = X86_NR_VECTORS / 32; i-- ; )
-                        printk(" %08x", word[i]);
-                    printk("\n");
-                }
-            }
-#endif
-            ASSERT(intack.vector >= pt_vector);
-            vmx_set_eoi_exit_bitmap(v, intack.vector);
-        }
-
         /* we need update the RVI field */
         __vmread(GUEST_INTR_STATUS, &status);
         status &= ~VMX_GUEST_INTR_STATUS_SUBFIELD_BITMASK;
@@ -397,14 +341,11 @@ void vmx_intr_assist(void)
         __vmwrite(GUEST_INTR_STATUS, status);
 
         vmx_sync_exit_bitmap(v);
-
-        pt_intr_post(v, intack);
     }
     else
     {
         HVMTRACE_2D(INJ_VIRQ, intack.vector, /*fake=*/ 0);
         vmx_inject_extint(intack.vector, intack.source);
-        pt_intr_post(v, intack);
     }
 
     /* Is there another IRQ to queue up behind this one? */
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 4c2afe2e915..f951cd95bcf 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -27,6 +27,8 @@
 #define mode_is(d, name) \
     ((d)->arch.hvm.params[HVM_PARAM_TIMER_MODE] == HVMPTM_##name)
 
+static bool inject_interrupt(struct periodic_time *pt);
+
 void hvm_init_guest_time(struct domain *d)
 {
     struct pl_time *pl = d->arch.hvm.pl_time;
@@ -76,35 +78,6 @@ void hvm_set_guest_time(struct vcpu *v, u64 guest_time)
     }
 }
 
-static int pt_irq_vector(struct periodic_time *pt, enum hvm_intsrc src)
-{
-    struct vcpu *v = pt->vcpu;
-    unsigned int gsi, isa_irq;
-    int vector;
-
-    if ( pt->source == PTSRC_lapic )
-        return pt->irq;
-
-    isa_irq = pt->irq;
-
-    if ( src == hvm_intsrc_pic )
-        return (v->domain->arch.hvm.vpic[isa_irq >> 3].irq_base
-                + (isa_irq & 7));
-
-    ASSERT(src == hvm_intsrc_lapic);
-    gsi = pt->source == PTSRC_isa ? hvm_isa_irq_to_gsi(isa_irq) : pt->irq;
-    vector = vioapic_get_vector(v->domain, gsi);
-    if ( vector < 0 )
-    {
-        dprintk(XENLOG_WARNING, "d%u: invalid GSI (%u) for platform timer\n",
-                v->domain->domain_id, gsi);
-        domain_crash(v->domain);
-        return -1;
-    }
-
-    return vector;
-}
-
 static int pt_irq_masked(struct periodic_time *pt)
 {
     struct vcpu *v = pt->vcpu;
@@ -247,34 +220,14 @@ void pt_restore_timer(struct vcpu *v)
     pt_vcpu_lock(v);
 
     list_for_each_entry ( pt, head, list )
-    {
         if ( pt->pending_intr_nr == 0 )
-        {
-            pt_process_missed_ticks(pt);
             set_timer(&pt->timer, pt->scheduled);
-        }
-    }
 
     pt_thaw_time(v);
 
     pt_vcpu_unlock(v);
 }
 
-static void pt_timer_fn(void *data)
-{
-    struct periodic_time *pt = data;
-
-    pt_lock(pt);
-
-    pt->pending_intr_nr++;
-    pt->scheduled += pt->period;
-    pt->do_not_freeze = 0;
-
-    vcpu_kick(pt->vcpu);
-
-    pt_unlock(pt);
-}
-
 static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
 {
     pt->irq_issued = false;
@@ -285,189 +238,144 @@ static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
             list_del(&pt->list);
         pt->on_list = false;
         pt->pending_intr_nr = 0;
+
+        return;
     }
-    else if ( mode_is(v->domain, one_missed_tick_pending) ||
-              mode_is(v->domain, no_missed_ticks_pending) )
+
+    if ( mode_is(v->domain, one_missed_tick_pending) ||
+         mode_is(v->domain, no_missed_ticks_pending) )
     {
-        pt->last_plt_gtime = hvm_get_guest_time(v);
         pt_process_missed_ticks(pt);
         pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */
+    }
+    else if ( !pt->pending_intr_nr )
+        pt_process_missed_ticks(pt);
+
+    if ( !pt->pending_intr_nr )
         set_timer(&pt->timer, pt->scheduled);
+}
+
+static void pt_timer_fn(void *data)
+{
+    struct periodic_time *pt = data;
+    struct vcpu *v;
+    time_cb *cb = NULL;
+    void *cb_priv;
+    unsigned int irq;
+
+    pt_lock(pt);
+
+    v = pt->vcpu;
+    irq = pt->irq;
+
+    if ( inject_interrupt(pt) )
+    {
+        pt->scheduled += pt->period;
+        pt->do_not_freeze = 0;
+        cb = pt->cb;
+        cb_priv = pt->priv;
     }
     else
     {
-        pt->last_plt_gtime += pt->period;
-        if ( --pt->pending_intr_nr == 0 )
-        {
-            pt_process_missed_ticks(pt);
-            if ( pt->pending_intr_nr == 0 )
-                set_timer(&pt->timer, pt->scheduled);
-        }
+        /* Masked. */
+        if ( pt->on_list )
+            list_del(&pt->list);
+        pt->on_list = false;
+        pt->pending_intr_nr++;
     }
 
-    if ( mode_is(v->domain, delay_for_missed_ticks) &&
-         (hvm_get_guest_time(v) < pt->last_plt_gtime) )
-        hvm_set_guest_time(v, pt->last_plt_gtime);
+    pt_unlock(pt);
+
+    if ( cb )
+        cb(v, cb_priv);
 }
 
-int pt_update_irq(struct vcpu *v)
+/*
+ * The same callback is shared between LAPIC and PIC/IO-APIC based timers, as
+ * we ignore the first parameter that's different between them.
+ */
+static void eoi_callback(unsigned int unused, void *data)
 {
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt, *temp, *earliest_pt;
-    uint64_t max_lag;
-    int irq, pt_vector = -1;
-    bool level;
+    struct periodic_time *pt = data;
+    struct vcpu *v;
+    time_cb *cb = NULL;
+    void *cb_priv;
 
-    pt_vcpu_lock(v);
+    pt_lock(pt);
 
-    earliest_pt = NULL;
-    max_lag = -1ULL;
-    list_for_each_entry_safe ( pt, temp, head, list )
+    pt_irq_fired(pt->vcpu, pt);
+    if ( pt->pending_intr_nr )
     {
-        if ( pt->pending_intr_nr )
+        if ( inject_interrupt(pt) )
+        {
+            pt->pending_intr_nr--;
+            cb = pt->cb;
+            cb_priv = pt->priv;
+            v = pt->vcpu;
+        }
+        else
         {
-            /* RTC code takes care of disabling the timer itself. */
-            if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
-                 /* Level interrupts should be asserted even if masked. */
-                 !pt->level )
-            {
-                /* suspend timer emulation */
+            /* Masked. */
+            if ( pt->on_list )
                 list_del(&pt->list);
-                pt->on_list = 0;
-            }
-            else
-            {
-                if ( (pt->last_plt_gtime + pt->period) < max_lag )
-                {
-                    max_lag = pt->last_plt_gtime + pt->period;
-                    earliest_pt = pt;
-                }
-            }
+            pt->on_list = false;
         }
     }
 
-    if ( earliest_pt == NULL )
-    {
-        pt_vcpu_unlock(v);
-        return -1;
-    }
+    pt_unlock(pt);
 
-    earliest_pt->irq_issued = 1;
-    irq = earliest_pt->irq;
-    level = earliest_pt->level;
+    if ( cb )
+        cb(v, cb_priv);
+}
 
-    pt_vcpu_unlock(v);
+static bool inject_interrupt(struct periodic_time *pt)
+{
+    struct vcpu *v = pt->vcpu;
+    struct domain *d = v->domain;
+    unsigned int irq = pt->irq;
 
-    switch ( earliest_pt->source )
+    if ( pt_irq_masked(pt) )
+        return false;
+
+    switch ( pt->source )
     {
     case PTSRC_lapic:
-        /*
-         * If periodic timer interrupt is handled by lapic, its vector in
-         * IRR is returned and used to set eoi_exit_bitmap for virtual
-         * interrupt delivery case. Otherwise return -1 to do nothing.
-         */
-        vlapic_set_irq(vcpu_vlapic(v), irq, 0);
-        pt_vector = irq;
+        vlapic_set_irq_callback(vcpu_vlapic(v), pt->irq, 0, eoi_callback, pt);
         break;
 
     case PTSRC_isa:
-        hvm_isa_irq_deassert(v->domain, irq);
+        hvm_isa_irq_deassert(d, irq);
         if ( platform_legacy_irq(irq) && vlapic_accept_pic_intr(v) &&
-             v->domain->arch.hvm.vpic[irq >> 3].int_output )
-            hvm_isa_irq_assert(v->domain, irq, NULL);
+             d->arch.hvm.vpic[irq >> 3].int_output )
+            hvm_isa_irq_assert(d, irq, NULL);
         else
-        {
-            pt_vector = hvm_isa_irq_assert(v->domain, irq, vioapic_get_vector);
-            /*
-             * hvm_isa_irq_assert may not set the corresponding bit in vIRR
-             * when mask field of IOAPIC RTE is set. Check it again.
-             */
-            if ( pt_vector < 0 || !vlapic_test_irq(vcpu_vlapic(v), pt_vector) )
-                pt_vector = -1;
-        }
+            hvm_isa_irq_assert(d, irq, vioapic_get_vector);
         break;
 
     case PTSRC_ioapic:
-        pt_vector = hvm_ioapic_assert(v->domain, irq, level);
-        if ( pt_vector < 0 || !vlapic_test_irq(vcpu_vlapic(v), pt_vector) )
-        {
-            pt_vector = -1;
-            if ( level )
-            {
-                /*
-                 * Level interrupts are always asserted because the pin assert
-                 * count is incremented regardless of whether the pin is masked
-                 * or the vector latched in IRR, so also execute the callback
-                 * associated with the timer.
-                 */
-                time_cb *cb = NULL;
-                void *cb_priv = NULL;
-
-                pt_vcpu_lock(v);
-                /* Make sure the timer is still on the list. */
-                list_for_each_entry ( pt, &v->arch.hvm.tm_list, list )
-                    if ( pt == earliest_pt )
-                    {
-                        pt_irq_fired(v, pt);
-                        cb = pt->cb;
-                        cb_priv = pt->priv;
-                        break;
-                    }
-                pt_vcpu_unlock(v);
-
-                if ( cb != NULL )
-                    cb(v, cb_priv);
-            }
-        }
+        hvm_ioapic_assert(d, irq, pt->level);
         break;
     }
 
-    return pt_vector;
-}
-
-static struct periodic_time *is_pt_irq(
-    struct vcpu *v, struct hvm_intack intack)
-{
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt;
-
-    list_for_each_entry ( pt, head, list )
+    switch ( d->arch.hvm.params[HVM_PARAM_TIMER_MODE] )
     {
-        if ( pt->pending_intr_nr && pt->irq_issued &&
-             (intack.vector == pt_irq_vector(pt, intack.source)) )
-            return pt;
-    }
-
-    return NULL;
-}
-
-void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
-{
-    struct periodic_time *pt;
-    time_cb *cb;
-    void *cb_priv;
-
-    if ( intack.source == hvm_intsrc_vector )
-        return;
+    case HVMPTM_one_missed_tick_pending:
+    case HVMPTM_no_missed_ticks_pending:
+        pt->last_plt_gtime = hvm_get_guest_time(v);
+        break;
 
-    pt_vcpu_lock(v);
+    case HVMPTM_delay_for_missed_ticks:
+        pt->last_plt_gtime += pt->period;
+        if ( hvm_get_guest_time(v) < pt->last_plt_gtime )
+            hvm_set_guest_time(v, pt->last_plt_gtime);
+        break;
 
-    pt = is_pt_irq(v, intack);
-    if ( pt == NULL )
-    {
-        pt_vcpu_unlock(v);
-        return;
+    default:
+        pt->last_plt_gtime += pt->period;
+        break;
     }
 
-    pt_irq_fired(v, pt);
-
-    cb = pt->cb;
-    cb_priv = pt->priv;
-
-    pt_vcpu_unlock(v);
-
-    if ( cb != NULL )
-        cb(v, cb_priv);
+    return true;
 }
 
 void pt_migrate(struct vcpu *v)
@@ -543,6 +451,24 @@ void create_periodic_time(
     pt->cb = cb;
     pt->priv = data;
 
+    switch ( pt->source )
+    {
+        int rc;
+
+    case PTSRC_isa:
+        irq = hvm_isa_irq_to_gsi(irq);
+        /* fallthrough */
+    case PTSRC_ioapic:
+        pt->eoi_cb.callback = eoi_callback;
+        pt->eoi_cb.data = pt;
+        rc = hvm_gsi_register_callback(v->domain, irq, &pt->eoi_cb);
+        if ( rc )
+            gdprintk(XENLOG_WARNING,
+                     "unable to register callback for timer GSI %u source %u: %d\n",
+                     irq, pt->source, rc);
+        break;
+    }
+
     pt->on_list = 1;
     list_add(&pt->list, &v->arch.hvm.tm_list);
 
@@ -554,6 +480,8 @@ void create_periodic_time(
 
 void destroy_periodic_time(struct periodic_time *pt)
 {
+    unsigned int gsi;
+
     /* Was this structure previously initialised by create_periodic_time()? */
     if ( pt->vcpu == NULL )
         return;
@@ -563,6 +491,17 @@ void destroy_periodic_time(struct periodic_time *pt)
         list_del(&pt->list);
     pt->on_list = 0;
     pt->pending_intr_nr = 0;
+
+    gsi = pt->irq;
+    switch ( pt->source )
+    {
+    case PTSRC_isa:
+        gsi = hvm_isa_irq_to_gsi(pt->irq);
+        /* fallthrough */
+    case PTSRC_ioapic:
+        hvm_gsi_unregister_callback(pt->vcpu->domain, gsi, &pt->eoi_cb);
+        break;
+    }
     pt_unlock(pt);
 
     /*
@@ -617,20 +556,29 @@ void pt_adjust_global_vcpu_target(struct vcpu *v)
     write_unlock(&pl_time->vhpet.lock);
 }
 
-
 static void pt_resume(struct periodic_time *pt)
 {
+    struct vcpu *v;
+    time_cb *cb = NULL;
+    void *cb_priv;
+
     if ( pt->vcpu == NULL )
         return;
 
     pt_lock(pt);
-    if ( pt->pending_intr_nr && !pt->on_list )
+    if ( pt->pending_intr_nr && !pt->on_list && inject_interrupt(pt) )
     {
+        pt->pending_intr_nr--;
+        cb = pt->cb;
+        cb_priv = pt->priv;
+        v = pt->vcpu;
         pt->on_list = 1;
         list_add(&pt->list, &pt->vcpu->arch.hvm.tm_list);
-        vcpu_kick(pt->vcpu);
     }
     pt_unlock(pt);
+
+    if ( cb )
+        cb(v, cb_priv);
 }
 
 void pt_may_unmask_irq(struct domain *d, struct periodic_time *vlapic_pt)
diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
index 39d26cbda49..9440fe4ac7d 100644
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -23,6 +23,7 @@
 #include <xen/list.h>
 #include <xen/rwlock.h>
 #include <asm/hvm/hvm.h>
+#include <asm/hvm/irq.h>
 
 /*
  * Abstract layer of periodic time, one short time.
@@ -50,6 +51,7 @@ struct periodic_time {
     struct timer timer;         /* ac_timer */
     time_cb *cb;
     void *priv;                 /* point back to platform time source */
+    struct hvm_gsi_eoi_callback eoi_cb; /* EOI callback registration data */
 };
 
 
@@ -145,9 +147,6 @@ struct pl_time {    /* platform time */
 
 void pt_save_timer(struct vcpu *v);
 void pt_restore_timer(struct vcpu *v);
-int pt_update_irq(struct vcpu *v);
-struct hvm_intack;
-void pt_intr_post(struct vcpu *v, struct hvm_intack intack);
 void pt_migrate(struct vcpu *v);
 
 void pt_adjust_global_vcpu_target(struct vcpu *v);
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 10/11] x86/vpt: remove vPT timers per-vCPU lists
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (8 preceding siblings ...)
  2021-03-31 10:33 ` [PATCH v3 09/11] x86/vpt: switch interrupt injection model Roger Pau Monne
@ 2021-03-31 10:33 ` Roger Pau Monne
  2021-04-14 10:38   ` Jan Beulich
  2021-03-31 10:33 ` [PATCH v3 11/11] x86/vpt: introduce a per-vPT lock Roger Pau Monne
  10 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

No longer add vPT timers to lists on specific vCPUs, since there's no
need anymore to check if timer interrupts have been injected on return
to HVM guest.

Such change allows to get rid of virtual timers vCPU migration, and
also cleanup some of the virtual timers fields that are no longer
required.

The model is also slightly different now in that timers are not
stopped when a vCPU is de-scheduled. Such timers will continue
running, and when triggered the function will try to inject the
corresponding interrupt to the guest (which might be different than
the currently running one). Note that the timer triggering when the
guest is no longer running can only happen once, as the timer callback
will not reset the interrupt to fire again. Such resetting if required
will be done by the EOI callback.

Since virtual timers are no longer added to per-VCPU lists when active
a new 'masked' field is added to the structure, to signal whether a
timer has it's interrupt source currently masked.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Remove pt_{save/restore}_timer and instead use
   pt_{freeze/thaw}_time.
 - Remove the introduction of the 'masked' field, it's not needed.
 - Rework pt_active to use timer_is_active.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/domain.c          |   4 +-
 xen/arch/x86/hvm/hvm.c         |   4 +-
 xen/arch/x86/hvm/vlapic.c      |   1 -
 xen/arch/x86/hvm/vpt.c         | 174 ++++-----------------------------
 xen/include/asm-x86/hvm/vcpu.h |   3 +-
 xen/include/asm-x86/hvm/vpt.h  |  12 +--
 6 files changed, 27 insertions(+), 171 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b2127298800..a711ff2814a 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2014,8 +2014,8 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
     vpmu_switch_from(prev);
     np2m_schedule(NP2M_SCHEDLE_OUT);
 
-    if ( is_hvm_domain(prevd) && !list_empty(&prev->arch.hvm.tm_list) )
-        pt_save_timer(prev);
+    if ( is_hvm_domain(prevd) )
+        pt_freeze_time(prev);
 
     local_irq_disable();
 
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 0279014e66e..3a72da67ef2 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -489,7 +489,6 @@ void hvm_set_info_guest(struct vcpu *v)
 void hvm_migrate_timers(struct vcpu *v)
 {
     rtc_migrate_timers(v);
-    pt_migrate(v);
 }
 
 void hvm_migrate_pirq(struct hvm_pirq_dpci *pirq_dpci, const struct vcpu *v)
@@ -544,7 +543,7 @@ void hvm_do_resume(struct vcpu *v)
 {
     check_wakeup_from_wait();
 
-    pt_restore_timer(v);
+    pt_thaw_time(v);
 
     if ( !vcpu_ioreq_handle_completion(v) )
         return;
@@ -1558,7 +1557,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
     hvm_asid_flush_vcpu(v);
 
     spin_lock_init(&v->arch.hvm.tm_lock);
-    INIT_LIST_HEAD(&v->arch.hvm.tm_list);
 
     rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */
     if ( rc != 0 )
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 63fa3780767..8091b6d8925 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1339,7 +1339,6 @@ void vlapic_adjust_i8259_target(struct domain *d)
     if ( d->arch.hvm.i8259_target == v )
         return;
     d->arch.hvm.i8259_target = v;
-    pt_adjust_global_vcpu_target(v);
 }
 
 int vlapic_has_pending_irq(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index f951cd95bcf..84d49c1b25c 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -126,18 +126,6 @@ static int pt_irq_masked(struct periodic_time *pt)
     return 1;
 }
 
-static void pt_vcpu_lock(struct vcpu *v)
-{
-    read_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
-    spin_lock(&v->arch.hvm.tm_lock);
-}
-
-static void pt_vcpu_unlock(struct vcpu *v)
-{
-    spin_unlock(&v->arch.hvm.tm_lock);
-    read_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
-}
-
 static void pt_lock(struct periodic_time *pt)
 {
     /*
@@ -151,7 +139,8 @@ static void pt_lock(struct periodic_time *pt)
 
 static void pt_unlock(struct periodic_time *pt)
 {
-    pt_vcpu_unlock(pt->vcpu);
+    spin_unlock(&pt->vcpu->arch.hvm.tm_lock);
+    read_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
 }
 
 static void pt_process_missed_ticks(struct periodic_time *pt)
@@ -166,14 +155,12 @@ static void pt_process_missed_ticks(struct periodic_time *pt)
         return;
 
     missed_ticks = missed_ticks / (s_time_t) pt->period + 1;
-    if ( mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
-        pt->do_not_freeze = !pt->pending_intr_nr;
-    else
+    if ( !mode_is(pt->vcpu->domain, no_missed_ticks_pending) )
         pt->pending_intr_nr += missed_ticks;
     pt->scheduled += missed_ticks * pt->period;
 }
 
-static void pt_freeze_time(struct vcpu *v)
+void pt_freeze_time(struct vcpu *v)
 {
     if ( !mode_is(v->domain, delay_for_missed_ticks) )
         return;
@@ -181,7 +168,7 @@ static void pt_freeze_time(struct vcpu *v)
     v->arch.hvm.guest_time = hvm_get_guest_time(v);
 }
 
-static void pt_thaw_time(struct vcpu *v)
+void pt_thaw_time(struct vcpu *v)
 {
     if ( !mode_is(v->domain, delay_for_missed_ticks) )
         return;
@@ -193,52 +180,11 @@ static void pt_thaw_time(struct vcpu *v)
     v->arch.hvm.guest_time = 0;
 }
 
-void pt_save_timer(struct vcpu *v)
-{
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt;
-
-    if ( v->pause_flags & VPF_blocked )
-        return;
-
-    pt_vcpu_lock(v);
-
-    list_for_each_entry ( pt, head, list )
-        if ( !pt->do_not_freeze )
-            stop_timer(&pt->timer);
-
-    pt_freeze_time(v);
-
-    pt_vcpu_unlock(v);
-}
-
-void pt_restore_timer(struct vcpu *v)
-{
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt;
-
-    pt_vcpu_lock(v);
-
-    list_for_each_entry ( pt, head, list )
-        if ( pt->pending_intr_nr == 0 )
-            set_timer(&pt->timer, pt->scheduled);
-
-    pt_thaw_time(v);
-
-    pt_vcpu_unlock(v);
-}
-
 static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
 {
-    pt->irq_issued = false;
-
     if ( pt->one_shot )
     {
-        if ( pt->on_list )
-            list_del(&pt->list);
-        pt->on_list = false;
         pt->pending_intr_nr = 0;
-
         return;
     }
 
@@ -252,7 +198,11 @@ static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
         pt_process_missed_ticks(pt);
 
     if ( !pt->pending_intr_nr )
+    {
+        /* Make sure timer follows vCPU. */
+        migrate_timer(&pt->timer, current->processor);
         set_timer(&pt->timer, pt->scheduled);
+    }
 }
 
 static void pt_timer_fn(void *data)
@@ -268,21 +218,15 @@ static void pt_timer_fn(void *data)
     v = pt->vcpu;
     irq = pt->irq;
 
-    if ( inject_interrupt(pt) )
+    pt->scheduled += pt->period;
+
+    if ( !inject_interrupt(pt) )
+        pt->pending_intr_nr++;
+    else
     {
-        pt->scheduled += pt->period;
-        pt->do_not_freeze = 0;
         cb = pt->cb;
         cb_priv = pt->priv;
     }
-    else
-    {
-        /* Masked. */
-        if ( pt->on_list )
-            list_del(&pt->list);
-        pt->on_list = false;
-        pt->pending_intr_nr++;
-    }
 
     pt_unlock(pt);
 
@@ -304,22 +248,12 @@ static void eoi_callback(unsigned int unused, void *data)
     pt_lock(pt);
 
     pt_irq_fired(pt->vcpu, pt);
-    if ( pt->pending_intr_nr )
+    if ( pt->pending_intr_nr && inject_interrupt(pt) )
     {
-        if ( inject_interrupt(pt) )
-        {
-            pt->pending_intr_nr--;
-            cb = pt->cb;
-            cb_priv = pt->priv;
-            v = pt->vcpu;
-        }
-        else
-        {
-            /* Masked. */
-            if ( pt->on_list )
-                list_del(&pt->list);
-            pt->on_list = false;
-        }
+        pt->pending_intr_nr--;
+        cb = pt->cb;
+        cb_priv = pt->priv;
+        v = pt->vcpu;
     }
 
     pt_unlock(pt);
@@ -378,19 +312,6 @@ static bool inject_interrupt(struct periodic_time *pt)
     return true;
 }
 
-void pt_migrate(struct vcpu *v)
-{
-    struct list_head *head = &v->arch.hvm.tm_list;
-    struct periodic_time *pt;
-
-    pt_vcpu_lock(v);
-
-    list_for_each_entry ( pt, head, list )
-        migrate_timer(&pt->timer, v->processor);
-
-    pt_vcpu_unlock(v);
-}
-
 void create_periodic_time(
     struct vcpu *v, struct periodic_time *pt, uint64_t delta,
     uint64_t period, uint8_t irq, time_cb *cb, void *data, bool level)
@@ -410,8 +331,6 @@ void create_periodic_time(
     write_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
 
     pt->pending_intr_nr = 0;
-    pt->do_not_freeze = 0;
-    pt->irq_issued = 0;
 
     /* Periodic timer must be at least 0.1ms. */
     if ( (period < 100000) && period )
@@ -469,9 +388,6 @@ void create_periodic_time(
         break;
     }
 
-    pt->on_list = 1;
-    list_add(&pt->list, &v->arch.hvm.tm_list);
-
     init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
     set_timer(&pt->timer, pt->scheduled);
 
@@ -487,9 +403,6 @@ void destroy_periodic_time(struct periodic_time *pt)
         return;
 
     pt_lock(pt);
-    if ( pt->on_list )
-        list_del(&pt->list);
-    pt->on_list = 0;
     pt->pending_intr_nr = 0;
 
     gsi = pt->irq;
@@ -511,51 +424,6 @@ void destroy_periodic_time(struct periodic_time *pt)
     kill_timer(&pt->timer);
 }
 
-static void pt_adjust_vcpu(struct periodic_time *pt, struct vcpu *v)
-{
-    ASSERT(pt->source == PTSRC_isa || pt->source == PTSRC_ioapic);
-
-    if ( pt->vcpu == NULL )
-        return;
-
-    write_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
-    pt->vcpu = v;
-    if ( pt->on_list )
-    {
-        list_del(&pt->list);
-        list_add(&pt->list, &v->arch.hvm.tm_list);
-        migrate_timer(&pt->timer, v->processor);
-    }
-    write_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
-}
-
-void pt_adjust_global_vcpu_target(struct vcpu *v)
-{
-    struct PITState *vpit;
-    struct pl_time *pl_time;
-    int i;
-
-    if ( !v || !has_vpit(v->domain) )
-        return;
-
-    vpit = &v->domain->arch.vpit;
-
-    spin_lock(&vpit->lock);
-    pt_adjust_vcpu(&vpit->pt0, v);
-    spin_unlock(&vpit->lock);
-
-    pl_time = v->domain->arch.hvm.pl_time;
-
-    spin_lock(&pl_time->vrtc.lock);
-    pt_adjust_vcpu(&pl_time->vrtc.pt, v);
-    spin_unlock(&pl_time->vrtc.lock);
-
-    write_lock(&pl_time->vhpet.lock);
-    for ( i = 0; i < HPET_TIMER_NUM; i++ )
-        pt_adjust_vcpu(&pl_time->vhpet.pt[i], v);
-    write_unlock(&pl_time->vhpet.lock);
-}
-
 static void pt_resume(struct periodic_time *pt)
 {
     struct vcpu *v;
@@ -566,14 +434,12 @@ static void pt_resume(struct periodic_time *pt)
         return;
 
     pt_lock(pt);
-    if ( pt->pending_intr_nr && !pt->on_list && inject_interrupt(pt) )
+    if ( pt->pending_intr_nr && inject_interrupt(pt) )
     {
         pt->pending_intr_nr--;
         cb = pt->cb;
         cb_priv = pt->priv;
         v = pt->vcpu;
-        pt->on_list = 1;
-        list_add(&pt->list, &pt->vcpu->arch.hvm.tm_list);
     }
     pt_unlock(pt);
 
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 8adf4555c2a..9a756964fb0 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -148,9 +148,8 @@ struct hvm_vcpu {
     s64                 cache_tsc_offset;
     u64                 guest_time;
 
-    /* Lock and list for virtual platform timers. */
+    /* Lock for virtual platform timers. */
     spinlock_t          tm_lock;
-    struct list_head    tm_list;
 
     bool                flag_dr_dirty;
     bool                debug_state_latch;
diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
index 9440fe4ac7d..af04efa5e01 100644
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -31,11 +31,7 @@
 typedef void time_cb(struct vcpu *v, void *opaque);
 
 struct periodic_time {
-    struct list_head list;
-    bool on_list;
     bool one_shot;
-    bool do_not_freeze;
-    bool irq_issued;
     bool warned_timeout_too_short;
     bool level;
 #define PTSRC_isa    1 /* ISA time source */
@@ -145,11 +141,9 @@ struct pl_time {    /* platform time */
     struct domain *domain;
 };
 
-void pt_save_timer(struct vcpu *v);
-void pt_restore_timer(struct vcpu *v);
-void pt_migrate(struct vcpu *v);
+void pt_freeze_time(struct vcpu *v);
+void pt_thaw_time(struct vcpu *v);
 
-void pt_adjust_global_vcpu_target(struct vcpu *v);
 #define pt_global_vcpu_target(d) \
     (is_hvm_domain(d) && (d)->arch.hvm.i8259_target ? \
      (d)->arch.hvm.i8259_target : \
@@ -158,7 +152,7 @@ void pt_adjust_global_vcpu_target(struct vcpu *v);
 void pt_may_unmask_irq(struct domain *d, struct periodic_time *vlapic_pt);
 
 /* Is given periodic timer active? */
-#define pt_active(pt) ((pt)->on_list || (pt)->pending_intr_nr)
+#define pt_active(pt) ((pt)->pending_intr_nr || timer_is_active(&(pt)->timer))
 
 /*
  * Create/destroy a periodic (or one-shot!) timer.
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 11/11] x86/vpt: introduce a per-vPT lock
  2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
                   ` (9 preceding siblings ...)
  2021-03-31 10:33 ` [PATCH v3 10/11] x86/vpt: remove vPT timers per-vCPU lists Roger Pau Monne
@ 2021-03-31 10:33 ` Roger Pau Monne
  10 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monne @ 2021-03-31 10:33 UTC (permalink / raw)
  To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich, Andrew Cooper, Wei Liu

Introduce a per virtual timer lock that replaces the existing per-vCPU
and per-domain vPT locks. Since virtual timers are no longer assigned
or migrated between vCPUs the locking can be simplified to a
in-structure spinlock that protects all the fields.

This requires introducing a helper to initialize the spinlock, and
that could be used to initialize other virtual timer fields in the
future.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v1:
 - New in his version.
---
 xen/arch/x86/emul-i8254.c      |  1 +
 xen/arch/x86/hvm/hpet.c        |  8 +++++-
 xen/arch/x86/hvm/hvm.c         |  4 ---
 xen/arch/x86/hvm/rtc.c         |  1 +
 xen/arch/x86/hvm/vlapic.c      |  1 +
 xen/arch/x86/hvm/vpt.c         | 48 +++++++++++++++-------------------
 xen/include/asm-x86/hvm/vcpu.h |  3 ---
 xen/include/asm-x86/hvm/vpt.h  |  9 ++-----
 8 files changed, 33 insertions(+), 42 deletions(-)

diff --git a/xen/arch/x86/emul-i8254.c b/xen/arch/x86/emul-i8254.c
index 73be4188ad4..a47138cbab7 100644
--- a/xen/arch/x86/emul-i8254.c
+++ b/xen/arch/x86/emul-i8254.c
@@ -484,6 +484,7 @@ void pit_init(struct domain *d, unsigned long cpu_khz)
     {
         register_portio_handler(d, PIT_BASE, 4, handle_pit_io);
         register_portio_handler(d, 0x61, 1, handle_speaker_io);
+        init_periodic_timer(&pit->pt0);
     }
 
     pit_reset(d);
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index ca94e8b4538..20593c3862d 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -739,12 +739,18 @@ static void hpet_set(HPETState *h)
 
 void hpet_init(struct domain *d)
 {
+    HPETState *h = domain_vhpet(d);
+    unsigned int i;
+
     if ( !has_vhpet(d) )
         return;
 
-    hpet_set(domain_vhpet(d));
+    hpet_set(h);
     register_mmio_handler(d, &hpet_mmio_ops);
     d->arch.hvm.params[HVM_PARAM_HPET_ENABLED] = 1;
+
+    for ( i = 0; i < HPET_TIMER_NUM; i++ )
+        init_periodic_timer(&h->pt[i]);
 }
 
 void hpet_deinit(struct domain *d)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 3a72da67ef2..1c014fc26c3 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -665,8 +665,6 @@ int hvm_domain_initialise(struct domain *d)
     /* need link to containing domain */
     d->arch.hvm.pl_time->domain = d;
 
-    rwlock_init(&d->arch.hvm.pl_time->pt_migrate);
-
     /* Set the default IO Bitmap. */
     if ( is_hardware_domain(d) )
     {
@@ -1556,8 +1554,6 @@ int hvm_vcpu_initialise(struct vcpu *v)
 
     hvm_asid_flush_vcpu(v);
 
-    spin_lock_init(&v->arch.hvm.tm_lock);
-
     rc = hvm_vcpu_cacheattr_init(v); /* teardown: vcpu_cacheattr_destroy */
     if ( rc != 0 )
         goto fail1;
diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
index 3150f5f1479..2d540b16acd 100644
--- a/xen/arch/x86/hvm/rtc.c
+++ b/xen/arch/x86/hvm/rtc.c
@@ -846,6 +846,7 @@ void rtc_init(struct domain *d)
     init_timer(&s->update_timer, rtc_update_timer, s, smp_processor_id());
     init_timer(&s->update_timer2, rtc_update_timer2, s, smp_processor_id());
     init_timer(&s->alarm_timer, rtc_alarm_cb, s, smp_processor_id());
+    init_periodic_timer(&s->pt);
 
     register_portio_handler(d, RTC_PORT(0), 2, handle_rtc_io);
 
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 8091b6d8925..688ff85e710 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -1651,6 +1651,7 @@ int vlapic_init(struct vcpu *v)
         return 0;
     }
 
+    init_periodic_timer(&vlapic->pt);
     vlapic->pt.source = PTSRC_lapic;
 
     vlapic->regs_page = alloc_domheap_page(v->domain, MEMF_no_owner);
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 84d49c1b25c..9cb0b8a0a82 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -126,23 +126,6 @@ static int pt_irq_masked(struct periodic_time *pt)
     return 1;
 }
 
-static void pt_lock(struct periodic_time *pt)
-{
-    /*
-     * We cannot use pt_vcpu_lock here, because we need to acquire the
-     * per-domain lock first and then (re-)fetch the value of pt->vcpu, or
-     * else we might be using a stale value of pt->vcpu.
-     */
-    read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
-    spin_lock(&pt->vcpu->arch.hvm.tm_lock);
-}
-
-static void pt_unlock(struct periodic_time *pt)
-{
-    spin_unlock(&pt->vcpu->arch.hvm.tm_lock);
-    read_unlock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
-}
-
 static void pt_process_missed_ticks(struct periodic_time *pt)
 {
     s_time_t missed_ticks, now = NOW();
@@ -213,7 +196,7 @@ static void pt_timer_fn(void *data)
     void *cb_priv;
     unsigned int irq;
 
-    pt_lock(pt);
+    spin_lock(&pt->lock);
 
     v = pt->vcpu;
     irq = pt->irq;
@@ -228,7 +211,7 @@ static void pt_timer_fn(void *data)
         cb_priv = pt->priv;
     }
 
-    pt_unlock(pt);
+    spin_unlock(&pt->lock);
 
     if ( cb )
         cb(v, cb_priv);
@@ -245,7 +228,7 @@ static void eoi_callback(unsigned int unused, void *data)
     time_cb *cb = NULL;
     void *cb_priv;
 
-    pt_lock(pt);
+    spin_lock(&pt->lock);
 
     pt_irq_fired(pt->vcpu, pt);
     if ( pt->pending_intr_nr && inject_interrupt(pt) )
@@ -256,7 +239,7 @@ static void eoi_callback(unsigned int unused, void *data)
         v = pt->vcpu;
     }
 
-    pt_unlock(pt);
+    spin_unlock(&pt->lock);
 
     if ( cb )
         cb(v, cb_priv);
@@ -312,6 +295,11 @@ static bool inject_interrupt(struct periodic_time *pt)
     return true;
 }
 
+void init_periodic_timer(struct periodic_time *pt)
+{
+    spin_lock_init(&pt->lock);
+}
+
 void create_periodic_time(
     struct vcpu *v, struct periodic_time *pt, uint64_t delta,
     uint64_t period, uint8_t irq, time_cb *cb, void *data, bool level)
@@ -328,7 +316,7 @@ void create_periodic_time(
 
     destroy_periodic_time(pt);
 
-    write_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
+    spin_lock(&pt->lock);
 
     pt->pending_intr_nr = 0;
 
@@ -391,18 +379,21 @@ void create_periodic_time(
     init_timer(&pt->timer, pt_timer_fn, pt, v->processor);
     set_timer(&pt->timer, pt->scheduled);
 
-    write_unlock(&v->domain->arch.hvm.pl_time->pt_migrate);
+    spin_unlock(&pt->lock);
 }
 
 void destroy_periodic_time(struct periodic_time *pt)
 {
     unsigned int gsi;
 
+    spin_lock(&pt->lock);
     /* Was this structure previously initialised by create_periodic_time()? */
     if ( pt->vcpu == NULL )
+    {
+        spin_unlock(&pt->lock);
         return;
+    }
 
-    pt_lock(pt);
     pt->pending_intr_nr = 0;
 
     gsi = pt->irq;
@@ -415,7 +406,7 @@ void destroy_periodic_time(struct periodic_time *pt)
         hvm_gsi_unregister_callback(pt->vcpu->domain, gsi, &pt->eoi_cb);
         break;
     }
-    pt_unlock(pt);
+    spin_unlock(&pt->lock);
 
     /*
      * pt_timer_fn() can run until this kill_timer() returns. We must do this
@@ -430,10 +421,13 @@ static void pt_resume(struct periodic_time *pt)
     time_cb *cb = NULL;
     void *cb_priv;
 
+    spin_lock(&pt->lock);
     if ( pt->vcpu == NULL )
+    {
+        spin_unlock(&pt->lock);
         return;
+    }
 
-    pt_lock(pt);
     if ( pt->pending_intr_nr && inject_interrupt(pt) )
     {
         pt->pending_intr_nr--;
@@ -441,7 +435,7 @@ static void pt_resume(struct periodic_time *pt)
         cb_priv = pt->priv;
         v = pt->vcpu;
     }
-    pt_unlock(pt);
+    spin_unlock(&pt->lock);
 
     if ( cb )
         cb(v, cb_priv);
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 9a756964fb0..fe3d0e10426 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -148,9 +148,6 @@ struct hvm_vcpu {
     s64                 cache_tsc_offset;
     u64                 guest_time;
 
-    /* Lock for virtual platform timers. */
-    spinlock_t          tm_lock;
-
     bool                flag_dr_dirty;
     bool                debug_state_latch;
     bool                single_step;
diff --git a/xen/include/asm-x86/hvm/vpt.h b/xen/include/asm-x86/hvm/vpt.h
index af04efa5e01..5628cff8f7a 100644
--- a/xen/include/asm-x86/hvm/vpt.h
+++ b/xen/include/asm-x86/hvm/vpt.h
@@ -48,6 +48,7 @@ struct periodic_time {
     time_cb *cb;
     void *priv;                 /* point back to platform time source */
     struct hvm_gsi_eoi_callback eoi_cb; /* EOI callback registration data */
+    spinlock_t lock;
 };
 
 
@@ -126,13 +127,6 @@ struct pl_time {    /* platform time */
     struct RTCState  vrtc;
     struct HPETState vhpet;
     struct PMTState  vpmt;
-    /*
-     * rwlock to prevent periodic_time vCPU migration. Take the lock in read
-     * mode in order to prevent the vcpu field of periodic_time from changing.
-     * Lock must be taken in write mode when changes to the vcpu field are
-     * performed, as it allows exclusive access to all the timers of a domain.
-     */
-    rwlock_t pt_migrate;
     /* guest_time = Xen sys time + stime_offset */
     int64_t stime_offset;
     /* Ensures monotonicity in appropriate timer modes. */
@@ -167,6 +161,7 @@ void create_periodic_time(
     struct vcpu *v, struct periodic_time *pt, uint64_t delta,
     uint64_t period, uint8_t irq, time_cb *cb, void *data, bool level);
 void destroy_periodic_time(struct periodic_time *pt);
+void init_periodic_timer(struct periodic_time *pt);
 
 int pv_pit_handler(int port, int data, int write);
 void pit_reset(struct domain *d);
-- 
2.30.1



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-03-31 10:32 ` [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
@ 2021-03-31 11:47   ` Andrew Cooper
  2021-03-31 12:50     ` Roger Pau Monné
  2021-04-07 14:55   ` Jan Beulich
  1 sibling, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2021-03-31 11:47 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich, Wei Liu

On 31/03/2021 11:32, Roger Pau Monne wrote:
> @@ -1620,9 +1666,22 @@ int vlapic_init(struct vcpu *v)
>  
>      clear_page(vlapic->regs);
>  
> +    vlapic->callbacks = xmalloc_array(typeof(*vlapic->callbacks),
> +                                      X86_NR_VECTORS - 16);
> +    if ( !vlapic->callbacks )
> +    {
> +        dprintk(XENLOG_ERR, "%pv: alloc vlapic callbacks error\n", v);
> +        unmap_domain_page_global(vlapic->regs);
> +        free_domheap_page(vlapic->regs_page);
> +        return -ENOMEM;
> +    }
> +    memset(vlapic->callbacks, 0,
> +           sizeof(*vlapic->callbacks) * (X86_NR_VECTORS - 16));

xzalloc_array() instead of memset().  Also, we shouldn't be printing for
-ENOMEM cases.

As for the construction/teardown logic, vlapic_init()'s caller already
vlapic_destory().  Therefore, the existing error path you've copied is
buggy because it will cause a double-free if __map_domain_page_global()
fails.

I'll do a cleanup patch to fix the idempotency, which needs backporting too.

~Andrew



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-03-31 11:47   ` Andrew Cooper
@ 2021-03-31 12:50     ` Roger Pau Monné
  0 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2021-03-31 12:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Wei Liu

On Wed, Mar 31, 2021 at 12:47:54PM +0100, Andrew Cooper wrote:
> On 31/03/2021 11:32, Roger Pau Monne wrote:
> > @@ -1620,9 +1666,22 @@ int vlapic_init(struct vcpu *v)
> >  
> >      clear_page(vlapic->regs);
> >  
> > +    vlapic->callbacks = xmalloc_array(typeof(*vlapic->callbacks),
> > +                                      X86_NR_VECTORS - 16);
> > +    if ( !vlapic->callbacks )
> > +    {
> > +        dprintk(XENLOG_ERR, "%pv: alloc vlapic callbacks error\n", v);
> > +        unmap_domain_page_global(vlapic->regs);
> > +        free_domheap_page(vlapic->regs_page);
> > +        return -ENOMEM;
> > +    }
> > +    memset(vlapic->callbacks, 0,
> > +           sizeof(*vlapic->callbacks) * (X86_NR_VECTORS - 16));
> 
> xzalloc_array() instead of memset().  Also, we shouldn't be printing for
> -ENOMEM cases.
> 
> As for the construction/teardown logic, vlapic_init()'s caller already
> vlapic_destory().  Therefore, the existing error path you've copied is
> buggy because it will cause a double-free if __map_domain_page_global()
> fails.

Right, let me adjust that path.

> I'll do a cleanup patch to fix the idempotency, which needs backporting too.

Ack, I don't mind doiing one myself either.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks
  2021-03-31 10:32 ` [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks Roger Pau Monne
@ 2021-03-31 16:02   ` Jan Beulich
  2021-03-31 16:24     ` Andrew Cooper
  2021-04-01 11:06   ` Jan Beulich
  1 sibling, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-03-31 16:02 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Andrew Cooper, Wei Liu, Paul Durrant, Paul Durrant, xen-devel

On 31.03.2021 12:32, Roger Pau Monne wrote:
> EOIs are always executed in guest vCPU context, so there's no reason to
> pass a vCPU parameter around as can be fetched from current.
> 
> While there make the vector parameter of both callbacks unsigned int.
> 
> No functional change intended.
> 
> Suggested-by: Paul Durrant <pdurrant@amazon.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Paul Durrant <paul@xen.org>
> ---
> Changes since v1:
>  - New in this version.

I'm afraid the situation with viridian_synic_wrmsr() hasn't changed
since v2, and hence my previous comment still applies.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 02/11] x86/hvm: drop domain parameter from vioapic/vpic EOI callbacks
  2021-03-31 10:32 ` [PATCH v3 02/11] x86/hvm: drop domain parameter from vioapic/vpic " Roger Pau Monne
@ 2021-03-31 16:04   ` Jan Beulich
  2021-04-01  9:15     ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-03-31 16:04 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, Paul Durrant, xen-devel

On 31.03.2021 12:32, Roger Pau Monne wrote:
> EOIs are always executed in guest vCPU context, so there's no reason to
> pass a domain parameter around as can be fetched from current->domain.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Paul Durrant <paul@xen.org>
> ---
> Changes since v1:
>  - New in this version.

Just to make it explicit - possibly same thing as with patch 1,
depending on how exactly the issue there gets taken care of.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks
  2021-03-31 16:02   ` Jan Beulich
@ 2021-03-31 16:24     ` Andrew Cooper
  2021-04-01  9:12       ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Andrew Cooper @ 2021-03-31 16:24 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monne
  Cc: Wei Liu, Paul Durrant, Paul Durrant, xen-devel

On 31/03/2021 17:02, Jan Beulich wrote:
> On 31.03.2021 12:32, Roger Pau Monne wrote:
>> EOIs are always executed in guest vCPU context, so there's no reason to
>> pass a vCPU parameter around as can be fetched from current.
>>
>> While there make the vector parameter of both callbacks unsigned int.
>>
>> No functional change intended.
>>
>> Suggested-by: Paul Durrant <pdurrant@amazon.com>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Reviewed-by: Paul Durrant <paul@xen.org>
>> ---
>> Changes since v1:
>>  - New in this version.
> I'm afraid the situation with viridian_synic_wrmsr() hasn't changed
> since v2, and hence my previous comment still applies.

Only just spotted that line of reasoning.

Longterm, I do want to remove all the Viridian special cases, and handle
all of the state via architectural mechanisms (cpu policy for static
settings, and the existing MSR records for dynamic content).

A consequence of this cleanup is that guest_{rd,wr}msr() will be
eventually be used to save/restore dynamic state in the migrate stream,
which is why I've been engineering guest_{rd,wr}msr() to work for MSRs
in "current || !scheduled" context.

As such, it is important to retain a vcpu pointer because it will not be
current on the save/restore hypercalls, which execute in control domain
context.

How much is keeping the vcpu pointer going to interfere with this
series?  My expectation is that this patch would need reverting to
continue the cleanup plans.

~Andrew



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks
  2021-03-31 16:24     ` Andrew Cooper
@ 2021-04-01  9:12       ` Roger Pau Monné
  0 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-01  9:12 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Wei Liu, Paul Durrant, Paul Durrant, xen-devel

On Wed, Mar 31, 2021 at 05:24:24PM +0100, Andrew Cooper wrote:
> On 31/03/2021 17:02, Jan Beulich wrote:
> > On 31.03.2021 12:32, Roger Pau Monne wrote:
> >> EOIs are always executed in guest vCPU context, so there's no reason to
> >> pass a vCPU parameter around as can be fetched from current.
> >>
> >> While there make the vector parameter of both callbacks unsigned int.
> >>
> >> No functional change intended.
> >>
> >> Suggested-by: Paul Durrant <pdurrant@amazon.com>
> >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >> Reviewed-by: Paul Durrant <paul@xen.org>
> >> ---
> >> Changes since v1:
> >>  - New in this version.
> > I'm afraid the situation with viridian_synic_wrmsr() hasn't changed
> > since v2, and hence my previous comment still applies.
> 
> Only just spotted that line of reasoning.
> 
> Longterm, I do want to remove all the Viridian special cases, and handle
> all of the state via architectural mechanisms (cpu policy for static
> settings, and the existing MSR records for dynamic content).
> 
> A consequence of this cleanup is that guest_{rd,wr}msr() will be
> eventually be used to save/restore dynamic state in the migrate stream,
> which is why I've been engineering guest_{rd,wr}msr() to work for MSRs
> in "current || !scheduled" context.
> 
> As such, it is important to retain a vcpu pointer because it will not be
> current on the save/restore hypercalls, which execute in control domain
> context.

But you are never going to restore an HV_X64_MSR_EOI MSR, as such
should never be part of the migrate stream in the first place. It
doesn't have a value itself - it's only used as an alternative way to
EOI an interrupt.

I still think the EOIs will always be performed on the affected vCPU
context, but I don't want this discussion the block the current
series, so I'm just going to drop this patch.

Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 02/11] x86/hvm: drop domain parameter from vioapic/vpic EOI callbacks
  2021-03-31 16:04   ` Jan Beulich
@ 2021-04-01  9:15     ` Roger Pau Monné
  2021-04-01  9:28       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-01  9:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Paul Durrant, xen-devel

On Wed, Mar 31, 2021 at 06:04:43PM +0200, Jan Beulich wrote:
> On 31.03.2021 12:32, Roger Pau Monne wrote:
> > EOIs are always executed in guest vCPU context, so there's no reason to
> > pass a domain parameter around as can be fetched from current->domain.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > Reviewed-by: Paul Durrant <paul@xen.org>
> > ---
> > Changes since v1:
> >  - New in this version.
> 
> Just to make it explicit - possibly same thing as with patch 1,
> depending on how exactly the issue there gets taken care of.

I don't think we have any scenario ATM where we allow EOI'ing of
interrupts from a different vCPU context, neither I can see us
allowing such in the future, but I don't want this discussion to block
the series, hence I'm going to drop this patch together with patch 1.

Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 02/11] x86/hvm: drop domain parameter from vioapic/vpic EOI callbacks
  2021-04-01  9:15     ` Roger Pau Monné
@ 2021-04-01  9:28       ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2021-04-01  9:28 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, Paul Durrant, xen-devel

On 01.04.2021 11:15, Roger Pau Monné wrote:
> On Wed, Mar 31, 2021 at 06:04:43PM +0200, Jan Beulich wrote:
>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>> EOIs are always executed in guest vCPU context, so there's no reason to
>>> pass a domain parameter around as can be fetched from current->domain.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Reviewed-by: Paul Durrant <paul@xen.org>
>>> ---
>>> Changes since v1:
>>>  - New in this version.
>>
>> Just to make it explicit - possibly same thing as with patch 1,
>> depending on how exactly the issue there gets taken care of.
> 
> I don't think we have any scenario ATM where we allow EOI'ing of
> interrupts from a different vCPU context, neither I can see us
> allowing such in the future, but I don't want this discussion to block
> the series, hence I'm going to drop this patch together with patch 1.

Well, having seen also your reply to Andrew wrt patch 1, I just wanted
to clarify that dropping the two patches isn't the only option. Making
Viridian code resilient in this regard might be another one.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks
  2021-03-31 10:32 ` [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks Roger Pau Monne
  2021-03-31 16:02   ` Jan Beulich
@ 2021-04-01 11:06   ` Jan Beulich
  2021-04-07  7:41     ` Roger Pau Monné
  1 sibling, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-01 11:06 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Andrew Cooper, Wei Liu, Paul Durrant, Paul Durrant, xen-devel

On 31.03.2021 12:32, Roger Pau Monne wrote:
> EOIs are always executed in guest vCPU context, so there's no reason to
> pass a vCPU parameter around as can be fetched from current.

While not overly problematic, I'd like to point out that there's not a
single vcpu parameter being dropped here - in both cases it's struct
domain *.

> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -459,13 +459,10 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>  
>  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
>  {
> -    struct vcpu *v = vlapic_vcpu(vlapic);
> -    struct domain *d = v->domain;
> -
>      if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
> -        vioapic_update_EOI(d, vector);
> +        vioapic_update_EOI(vector);
>  
> -    hvm_dpci_msi_eoi(d, vector);
> +    hvm_dpci_msi_eoi(vector);
>  }

The Viridian path pointed out before was only an example. I'm afraid
the call from vlapic_has_pending_irq() to vlapic_EOI_set() is also
far from obvious that it always has "v == current". What we end up
with here is a mix of passed in value (vlapic) and assumption of the
call being for the vCPU / domain we're running on. At the very least
I think this would want documenting here in some way (maybe ASSERT(),
definitely mentioning in the description), but even better would
perhaps be if the parameter of the function here as well as further
ones involved would also be dropped then.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks
  2021-04-01 11:06   ` Jan Beulich
@ 2021-04-07  7:41     ` Roger Pau Monné
  2021-04-07  8:19       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-07  7:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Paul Durrant, Paul Durrant, xen-devel

On Thu, Apr 01, 2021 at 01:06:35PM +0200, Jan Beulich wrote:
> On 31.03.2021 12:32, Roger Pau Monne wrote:
> > EOIs are always executed in guest vCPU context, so there's no reason to
> > pass a vCPU parameter around as can be fetched from current.
> 
> While not overly problematic, I'd like to point out that there's not a
> single vcpu parameter being dropped here - in both cases it's struct
> domain *.
> 
> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -459,13 +459,10 @@ void vlapic_EOI_set(struct vlapic *vlapic)
> >  
> >  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
> >  {
> > -    struct vcpu *v = vlapic_vcpu(vlapic);
> > -    struct domain *d = v->domain;
> > -
> >      if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
> > -        vioapic_update_EOI(d, vector);
> > +        vioapic_update_EOI(vector);
> >  
> > -    hvm_dpci_msi_eoi(d, vector);
> > +    hvm_dpci_msi_eoi(vector);
> >  }
> 
> The Viridian path pointed out before was only an example. I'm afraid
> the call from vlapic_has_pending_irq() to vlapic_EOI_set() is also
> far from obvious that it always has "v == current". What we end up
> with here is a mix of passed in value (vlapic) and assumption of the
> call being for the vCPU / domain we're running on. At the very least
> I think this would want documenting here in some way (maybe ASSERT(),
> definitely mentioning in the description), but even better would
> perhaps be if the parameter of the function here as well as further
> ones involved would also be dropped then.

I've kind of attempted to purge the vlapic parameter further, but the
proper way to do it would be to audit all vlapic functions.

For example I've removed the parameter from vlapic_EOI_set and
vlapic_handle_EOI, but I'm afraid that would also raise questions
about purging it vlapic_has_pending_irq for example.

Let me know if the patch below would be acceptable, or if I should
rather not make the EOI callbacks depends on this cleanup, as I could
certainly do the cleanup later.

Thanks, Roger.
---8<---
From 4719f5a827d3154ef763e078956792855ca84e04 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <roger.pau@citrix.com>
Date: Tue, 18 Aug 2020 16:30:06 +0200
Subject: [PATCH] x86/hvm: drop vlapic parameter from vlapic EOI helpers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

EOIs are always executed in guest vCPU context, so there's no reason to
pass a vlapic parameter around as can be fetched from current.

This also allows to drop the domain parameter from the EOI callbacks,
and while there make the vector parameter of both callbacks unsigned
int.

The callers from vmx.c (vmx_handle_eoi_write, vmx_vmexit_handler) are
clearly using v == current, as one is the vmexit handler, and the
other is called directly from such handler.

The only caller from Viridian code halso has a check that v == current
before calling vlapic_EOI_set - so it's safe.

Finally the callers from vlapic.c itself: vlapic_reg_write will only
get called with the APIC_EOI register as a parameter from the MMIO/MSR
intercepts which run in guest context and vlapic_has_pending_irq only
gets called as part of the vm entry path to guest.

No functional change intended.

Suggested-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v3:
 - Purge passing the vcpu in the vlapic eoi call paths.

Changes since v1:
 - New in this version.
---
 xen/arch/x86/hvm/vioapic.c        |  5 +++--
 xen/arch/x86/hvm/viridian/synic.c |  2 +-
 xen/arch/x86/hvm/vlapic.c         | 31 +++++++++++++++++++++----------
 xen/arch/x86/hvm/vmx/vmx.c        |  4 ++--
 xen/drivers/passthrough/x86/hvm.c |  4 +++-
 xen/include/asm-x86/hvm/io.h      |  2 +-
 xen/include/asm-x86/hvm/vioapic.h |  2 +-
 xen/include/asm-x86/hvm/vlapic.h  |  4 ++--
 8 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 87370dd4172..91e5f892787 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -372,7 +372,7 @@ static int vioapic_write(
 
 #if VIOAPIC_VERSION_ID >= 0x20
     case VIOAPIC_REG_EOI:
-        vioapic_update_EOI(v->domain, val);
+        vioapic_update_EOI(val);
         break;
 #endif
 
@@ -514,8 +514,9 @@ void vioapic_irq_positive_edge(struct domain *d, unsigned int irq)
     }
 }
 
-void vioapic_update_EOI(struct domain *d, u8 vector)
+void vioapic_update_EOI(unsigned int vector)
 {
+    struct domain *d = current->domain;
     struct hvm_irq *hvm_irq = hvm_domain_irq(d);
     union vioapic_redir_entry *ent;
     unsigned int i;
diff --git a/xen/arch/x86/hvm/viridian/synic.c b/xen/arch/x86/hvm/viridian/synic.c
index e18538c60a6..33bd9c9bd13 100644
--- a/xen/arch/x86/hvm/viridian/synic.c
+++ b/xen/arch/x86/hvm/viridian/synic.c
@@ -93,7 +93,7 @@ int viridian_synic_wrmsr(struct vcpu *v, uint32_t idx, uint64_t val)
             ASSERT_UNREACHABLE();
             return X86EMUL_EXCEPTION;
         }
-        vlapic_EOI_set(vcpu_vlapic(v));
+        vlapic_EOI_set();
         break;
 
     case HV_X64_MSR_ICR:
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 5e21fb4937d..111b6d902f5 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -413,9 +413,10 @@ struct vlapic *vlapic_lowest_prio(
     return target;
 }
 
-void vlapic_EOI_set(struct vlapic *vlapic)
+void vlapic_EOI_set(void)
 {
-    struct vcpu *v = vlapic_vcpu(vlapic);
+    struct vcpu *v = current;
+    struct vlapic *vlapic = vcpu_vlapic(v);
     /*
      * If APIC assist was set then an EOI may have been avoided and
      * hence this EOI actually relates to a lower priority vector.
@@ -448,7 +449,7 @@ void vlapic_EOI_set(struct vlapic *vlapic)
         alternative_vcall(hvm_funcs.handle_eoi, vector,
                           vlapic_find_highest_isr(vlapic));
 
-    vlapic_handle_EOI(vlapic, vector);
+    vlapic_handle_EOI(vector);
 
     if ( missed_eoi )
     {
@@ -457,15 +458,14 @@ void vlapic_EOI_set(struct vlapic *vlapic)
     }
 }
 
-void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
+void vlapic_handle_EOI(uint8_t vector)
 {
-    struct vcpu *v = vlapic_vcpu(vlapic);
-    struct domain *d = v->domain;
+    struct vlapic *vlapic = vcpu_vlapic(current);
 
     if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
-        vioapic_update_EOI(d, vector);
+        vioapic_update_EOI(vector);
 
-    hvm_dpci_msi_eoi(d, vector);
+    hvm_dpci_msi_eoi(vector);
 }
 
 static bool_t is_multicast_dest(struct vlapic *vlapic, unsigned int short_hand,
@@ -796,7 +796,12 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
         break;
 
     case APIC_EOI:
-        vlapic_EOI_set(vlapic);
+        if ( v != current )
+        {
+            ASSERT_UNREACHABLE();
+            break;
+        }
+        vlapic_EOI_set();
         break;
 
     case APIC_LDR:
@@ -1303,6 +1308,12 @@ int vlapic_has_pending_irq(struct vcpu *v)
     struct vlapic *vlapic = vcpu_vlapic(v);
     int irr, isr;
 
+    if ( v != current )
+    {
+        ASSERT_UNREACHABLE();
+        return -1;
+    }
+
     if ( !vlapic_enabled(vlapic) )
         return -1;
 
@@ -1327,7 +1338,7 @@ int vlapic_has_pending_irq(struct vcpu *v)
      * with IRR.
      */
     if ( viridian_apic_assist_completed(v) )
-        vlapic_EOI_set(vlapic);
+        vlapic_EOI_set();
 
     isr = vlapic_find_highest_isr(vlapic);
 
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index b52824677e9..22e406285cf 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3715,7 +3715,7 @@ static int vmx_handle_eoi_write(void)
          ((exit_qualification & 0xfff) == APIC_EOI) )
     {
         update_guest_eip(); /* Safe: APIC data write */
-        vlapic_EOI_set(vcpu_vlapic(current));
+        vlapic_EOI_set();
         HVMTRACE_0D(VLAPIC);
         return 1;
     }
@@ -4364,7 +4364,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
         ASSERT(cpu_has_vmx_virtual_intr_delivery);
 
-        vlapic_handle_EOI(vcpu_vlapic(v), exit_qualification);
+        vlapic_handle_EOI(exit_qualification);
         break;
 
     case EXIT_REASON_IO_INSTRUCTION:
diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
index 351daafdc9b..2f6c81b1e2c 100644
--- a/xen/drivers/passthrough/x86/hvm.c
+++ b/xen/drivers/passthrough/x86/hvm.c
@@ -796,8 +796,10 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
     return 0;
 }
 
-void hvm_dpci_msi_eoi(struct domain *d, int vector)
+void hvm_dpci_msi_eoi(unsigned int vector)
 {
+    struct domain *d = current->domain;
+
     if ( !is_iommu_enabled(d) ||
          (!hvm_domain_irq(d)->dpci && !is_hardware_domain(d)) )
        return;
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 54e0161b492..8b8392ec59e 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -142,7 +142,7 @@ struct hvm_hw_stdvga {
 void stdvga_init(struct domain *d);
 void stdvga_deinit(struct domain *d);
 
-extern void hvm_dpci_msi_eoi(struct domain *d, int vector);
+extern void hvm_dpci_msi_eoi(unsigned int vector);
 
 /* Decode a PCI port IO access into a bus/slot/func/reg. */
 unsigned int hvm_pci_decode_addr(unsigned int cf8, unsigned int addr,
diff --git a/xen/include/asm-x86/hvm/vioapic.h b/xen/include/asm-x86/hvm/vioapic.h
index 36b64d20d60..882548c13b7 100644
--- a/xen/include/asm-x86/hvm/vioapic.h
+++ b/xen/include/asm-x86/hvm/vioapic.h
@@ -63,7 +63,7 @@ int vioapic_init(struct domain *d);
 void vioapic_deinit(struct domain *d);
 void vioapic_reset(struct domain *d);
 void vioapic_irq_positive_edge(struct domain *d, unsigned int irq);
-void vioapic_update_EOI(struct domain *d, u8 vector);
+void vioapic_update_EOI(unsigned int vector);
 
 int vioapic_get_mask(const struct domain *d, unsigned int gsi);
 int vioapic_get_vector(const struct domain *d, unsigned int gsi);
diff --git a/xen/include/asm-x86/hvm/vlapic.h b/xen/include/asm-x86/hvm/vlapic.h
index 8f908928c35..b693696eccb 100644
--- a/xen/include/asm-x86/hvm/vlapic.h
+++ b/xen/include/asm-x86/hvm/vlapic.h
@@ -133,8 +133,8 @@ uint32_t vlapic_set_ppr(struct vlapic *vlapic);
 
 void vlapic_adjust_i8259_target(struct domain *d);
 
-void vlapic_EOI_set(struct vlapic *vlapic);
-void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector);
+void vlapic_EOI_set(void);
+void vlapic_handle_EOI(uint8_t vector);
 
 void vlapic_ipi(struct vlapic *vlapic, uint32_t icr_low, uint32_t icr_high);
 
-- 
2.30.1




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks
  2021-04-07  7:41     ` Roger Pau Monné
@ 2021-04-07  8:19       ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2021-04-07  8:19 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, Wei Liu, Paul Durrant, Paul Durrant, xen-devel

On 07.04.2021 09:41, Roger Pau Monné wrote:
> On Thu, Apr 01, 2021 at 01:06:35PM +0200, Jan Beulich wrote:
>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>> EOIs are always executed in guest vCPU context, so there's no reason to
>>> pass a vCPU parameter around as can be fetched from current.
>>
>> While not overly problematic, I'd like to point out that there's not a
>> single vcpu parameter being dropped here - in both cases it's struct
>> domain *.
>>
>>> --- a/xen/arch/x86/hvm/vlapic.c
>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>> @@ -459,13 +459,10 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>>>  
>>>  void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
>>>  {
>>> -    struct vcpu *v = vlapic_vcpu(vlapic);
>>> -    struct domain *d = v->domain;
>>> -
>>>      if ( vlapic_test_vector(vector, &vlapic->regs->data[APIC_TMR]) )
>>> -        vioapic_update_EOI(d, vector);
>>> +        vioapic_update_EOI(vector);
>>>  
>>> -    hvm_dpci_msi_eoi(d, vector);
>>> +    hvm_dpci_msi_eoi(vector);
>>>  }
>>
>> The Viridian path pointed out before was only an example. I'm afraid
>> the call from vlapic_has_pending_irq() to vlapic_EOI_set() is also
>> far from obvious that it always has "v == current". What we end up
>> with here is a mix of passed in value (vlapic) and assumption of the
>> call being for the vCPU / domain we're running on. At the very least
>> I think this would want documenting here in some way (maybe ASSERT(),
>> definitely mentioning in the description), but even better would
>> perhaps be if the parameter of the function here as well as further
>> ones involved would also be dropped then.
> 
> I've kind of attempted to purge the vlapic parameter further, but the
> proper way to do it would be to audit all vlapic functions.
> 
> For example I've removed the parameter from vlapic_EOI_set and
> vlapic_handle_EOI, but I'm afraid that would also raise questions
> about purging it vlapic_has_pending_irq for example.
> 
> Let me know if the patch below would be acceptable, or if I should
> rather not make the EOI callbacks depends on this cleanup, as I could
> certainly do the cleanup later.

While I'm not opposed in principle, the patch moves us further away
from what Andrew has asked for (to retain the vcpu pointers), if I
understood him correctly. I'm also not entirely certain if there
couldn't be, down the road, emulators needing to signal an EOI to
Xen on behalf of a guest.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-03-31 10:32 ` [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
  2021-03-31 11:47   ` Andrew Cooper
@ 2021-04-07 14:55   ` Jan Beulich
  2021-04-07 16:27     ` Roger Pau Monné
  1 sibling, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-07 14:55 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 31.03.2021 12:32, Roger Pau Monne wrote:
> Add a new vlapic_set_irq_callback helper in order to inject a vector
> and set a callback to be executed when the guest performs the end of
> interrupt acknowledgment.
> 
> Such functionality will be used to migrate the current ad hoc handling
> done in vlapic_handle_EOI for the vectors that require some logic to
> be executed when the end of interrupt is performed.
> 
> The setter of the callback will be in charge for setting the callback
> again on resume. That is the reason why vlapic_set_callback is not a
> static function.

I'm struggling with your use of "resume" here: Resuming from S3
doesn't require re-doing anything that's kept in memory, does it?
So what meaning does the word have here?

Apart from this, and with the xzalloc_array() change requested
by Andrew, this looks good to me.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 04/11] x86/vmsi: use the newly introduced EOI callbacks
  2021-03-31 10:32 ` [PATCH v3 04/11] x86/vmsi: use the newly introduced EOI callbacks Roger Pau Monne
@ 2021-04-07 14:59   ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2021-04-07 14:59 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, Paul Durrant, xen-devel

On 31.03.2021 12:32, Roger Pau Monne wrote:
> Remove the unconditional call to hvm_dpci_msi_eoi in vlapic_handle_EOI
> and instead use the newly introduced EOI callback mechanism in order
> to register a callback for MSI vectors injected from passed through
> devices.
> 
> This avoids having multiple callback functions open-coded in
> vlapic_handle_EOI, as there is now a generic framework for registering
> such callbacks. It also avoids doing an unconditional call to
> hvm_dpci_msi_eoi for each EOI processed by the local APIC.
> 
> Note that now the callback is only registered (and thus executed) when
> there's an MSI interrupt originating from a PCI passthrough device
> being injected into the guest, so the check in hvm_dpci_msi_eoi can be
> removed as it's already done by hvm_dirq_assist which is the only
> caller of vmsi_deliver_pirq.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism
  2021-03-31 10:32 ` [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism Roger Pau Monne
@ 2021-04-07 15:19   ` Jan Beulich
  2021-04-07 16:46     ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-07 15:19 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 31.03.2021 12:32, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -394,6 +394,50 @@ static const struct hvm_mmio_ops vioapic_mmio_ops = {
>      .write = vioapic_write
>  };
>  
> +static void eoi_callback(unsigned int vector, void *data)
> +{
> +    struct domain *d = current->domain;
> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> +    unsigned int i;
> +
> +    ASSERT(has_vioapic(d));

On the same grounds on which you dropped checks from hvm_dpci_msi_eoi()
in the previous patch you could imo now drop this assertion.

> @@ -621,7 +624,43 @@ static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
>           d->arch.hvm.nr_vioapics != 1 )
>          return -EOPNOTSUPP;
>  
> -    return hvm_load_entry(IOAPIC, h, &s->domU);
> +    rc = hvm_load_entry(IOAPIC, h, &s->domU);
> +    if ( rc )
> +        return rc;
> +
> +    for ( i = 0; i < ARRAY_SIZE(s->domU.redirtbl); i++ )
> +    {
> +        const union vioapic_redir_entry *ent = &s->domU.redirtbl[i];
> +        unsigned int vector = ent->fields.vector;
> +        unsigned int delivery_mode = ent->fields.delivery_mode;
> +        struct vcpu *v;
> +
> +        /*
> +         * Add a callback for each possible vector injected by a redirection
> +         * entry.
> +         */
> +        if ( vector < 16 || !ent->fields.remote_irr ||
> +             (delivery_mode != dest_LowestPrio && delivery_mode != dest_Fixed) )
> +            continue;
> +
> +        for_each_vcpu ( d, v )
> +        {
> +            struct vlapic *vlapic = vcpu_vlapic(v);
> +
> +            /*
> +             * NB: if the vlapic registers were restored before the vio-apic
> +             * ones we could test whether the vector is set in the vlapic IRR
> +             * or ISR registers before unconditionally setting the callback.
> +             * This is harmless as eoi_callback is capable of dealing with
> +             * spurious callbacks.
> +             */
> +            if ( vlapic_match_dest(vlapic, NULL, 0, ent->fields.dest_id,
> +                                   ent->fields.dest_mode) )
> +                vlapic_set_callback(vlapic, vector, eoi_callback, NULL);

eoi_callback()'s behavior is only one of the aspects to consider here.
The other is vlapic_set_callback()'s complaining if it finds a
callback already set. What guarantees that a mistakenly set callback
here won't get in conflict with some future use of the same vector by
the guest?

And btw - like in the earlier patch you could again pass d instead of
NULL here, avoiding the need to establish it from current in the
callback.

> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -192,7 +192,13 @@ void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
>  
>      if ( hvm_funcs.update_eoi_exit_bitmap )
>          alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
> -                          trig || callback);
> +                          /*
> +                           * NB: need to explicitly convert to boolean to avoid
> +                           * truncation wrongly result in false begin reported
> +                           * for example when the pointer sits on a page
> +                           * boundary.
> +                           */
> +                          !!callback);

I've had quite a bit of difficulty with the comment. Once I realized
that you likely mean "being" instead of "begin" it got a bit better.
I'd like to suggest also s/result/resulting/, a comma after "reported",
and maybe then s/being reported/getting passed/.

As to explicitly converting to bool, wouldn't a cast to bool do? That's
more explicitly an "explicit conversion" than using !!.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-03-31 10:32 ` [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs Roger Pau Monne
@ 2021-04-07 15:51   ` Jan Beulich
  2021-04-07 17:08     ` Roger Pau Monné
  2021-04-08 12:52     ` Roger Pau Monné
  0 siblings, 2 replies; 51+ messages in thread
From: Jan Beulich @ 2021-04-07 15:51 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 31.03.2021 12:32, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -595,6 +595,69 @@ int hvm_local_events_need_delivery(struct vcpu *v)
>      return !hvm_interrupt_blocked(v, intack);
>  }
>  
> +int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
> +                              struct hvm_gsi_eoi_callback *cb)
> +{
> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> +
> +    if ( gsi >= hvm_irq->nr_gsis )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return -EINVAL;
> +    }
> +
> +    write_lock(&hvm_irq->gsi_callbacks_lock);
> +    list_add(&cb->list, &hvm_irq->gsi_callbacks[gsi]);
> +    write_unlock(&hvm_irq->gsi_callbacks_lock);
> +
> +    return 0;
> +}
> +
> +void hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
> +                                 struct hvm_gsi_eoi_callback *cb)
> +{
> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> +    const struct list_head *tmp;
> +
> +    if ( gsi >= hvm_irq->nr_gsis )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return;
> +    }
> +
> +    write_lock(&hvm_irq->gsi_callbacks_lock);
> +    list_for_each ( tmp, &hvm_irq->gsi_callbacks[gsi] )
> +        if ( tmp == &cb->list )
> +        {
> +            list_del(&cb->list);
> +            break;
> +        }
> +    write_unlock(&hvm_irq->gsi_callbacks_lock);
> +}

Perhaps somehow flag, at least in debug builds, if the callback
wasn#t found?

> +void hvm_gsi_execute_callbacks(unsigned int gsi)
> +{
> +    struct hvm_irq *hvm_irq = hvm_domain_irq(current->domain);
> +    struct hvm_gsi_eoi_callback *cb;
> +
> +    read_lock(&hvm_irq->gsi_callbacks_lock);
> +    list_for_each_entry ( cb, &hvm_irq->gsi_callbacks[gsi], list )
> +        cb->callback(gsi, cb->data);
> +    read_unlock(&hvm_irq->gsi_callbacks_lock);
> +}

Just as an observation (for now at least) - holding the lock here
means the callbacks cannot re-register themselves.

> +bool hvm_gsi_has_callbacks(const struct domain *d, unsigned int gsi)
> +{
> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> +    bool has_callbacks;
> +
> +    read_lock(&hvm_irq->gsi_callbacks_lock);
> +    has_callbacks = !list_empty(&hvm_irq->gsi_callbacks[gsi]);
> +    read_unlock(&hvm_irq->gsi_callbacks_lock);
> +
> +    return has_callbacks;
> +}

What use is this function? Its result is stale by the time the
caller can look at it, as you've dropped the lock.

> @@ -421,13 +423,25 @@ static void eoi_callback(unsigned int vector, void *data)
>              if ( is_iommu_enabled(d) )
>              {
>                  spin_unlock(&d->arch.hvm.irq_lock);
> -                hvm_dpci_eoi(vioapic->base_gsi + pin);
> +                hvm_dpci_eoi(gsi);
>                  spin_lock(&d->arch.hvm.irq_lock);
>              }
>  
> +            /*
> +             * Callbacks don't expect to be executed with any lock held, so
> +             * drop the lock that protects the vIO-APIC fields from changing.
> +             *
> +             * Note that the redirection entry itself cannot go away, so upon
> +             * retaking the lock we only need to avoid making assumptions on
> +             * redirection entry field values (ie: recheck the IRR field).
> +             */
> +            spin_unlock(&d->arch.hvm.irq_lock);
> +            hvm_gsi_execute_callbacks(gsi);
> +            spin_lock(&d->arch.hvm.irq_lock);

The two pairs of unlock / re-lock want folding, I think - there's
no point causing extra contention on the lock here.

> @@ -443,7 +457,8 @@ static void ioapic_inj_irq(
>      struct vlapic *target,
>      uint8_t vector,
>      uint8_t trig_mode,
> -    uint8_t delivery_mode)
> +    uint8_t delivery_mode,
> +    bool callback)
>  {
>      HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "irq %d trig %d deliv %d",
>                  vector, trig_mode, delivery_mode);
> @@ -452,7 +467,7 @@ static void ioapic_inj_irq(
>             (delivery_mode == dest_LowestPrio));
>  
>      vlapic_set_irq_callback(target, vector, trig_mode,
> -                            trig_mode ? eoi_callback : NULL, NULL);
> +                            callback ? eoi_callback : NULL, NULL);

I think you'd better use trig_mode || callback here and ...

> @@ -466,6 +481,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>      struct vlapic *target;
>      struct vcpu *v;
>      unsigned int irq = vioapic->base_gsi + pin;
> +    bool callback = trig_mode || hvm_gsi_has_callbacks(d, irq);
>  
>      ASSERT(spin_is_locked(&d->arch.hvm.irq_lock));
>  
> @@ -492,7 +508,8 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>              target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
>          if ( target != NULL )
>          {
> -            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode);
> +            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode,
> +                           callback);

... invoke hvm_gsi_has_callbacks() right here and ...

> @@ -507,7 +524,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>          for_each_vcpu ( d, v )
>              if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
>                  ioapic_inj_irq(vioapic, vcpu_vlapic(v), vector, trig_mode,
> -                               delivery_mode);
> +                               delivery_mode, callback);

... here, avoiding to call the function when you don't need the
result.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-04-07 14:55   ` Jan Beulich
@ 2021-04-07 16:27     ` Roger Pau Monné
  2021-04-08  6:20       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-07 16:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Wed, Apr 07, 2021 at 04:55:43PM +0200, Jan Beulich wrote:
> On 31.03.2021 12:32, Roger Pau Monne wrote:
> > Add a new vlapic_set_irq_callback helper in order to inject a vector
> > and set a callback to be executed when the guest performs the end of
> > interrupt acknowledgment.
> > 
> > Such functionality will be used to migrate the current ad hoc handling
> > done in vlapic_handle_EOI for the vectors that require some logic to
> > be executed when the end of interrupt is performed.
> > 
> > The setter of the callback will be in charge for setting the callback
> > again on resume. That is the reason why vlapic_set_callback is not a
> > static function.
> 
> I'm struggling with your use of "resume" here: Resuming from S3
> doesn't require re-doing anything that's kept in memory, does it?
> So what meaning does the word have here?

Right, I can see the confusion. Resume here means a guest being
migrated or restored, not Xen itself being resumed. Callbacks are not
part of the exported guest state, and hence any emulated device that
requires a callback will have to register it as part of loading the
saved state.

> Apart from this, and with the xzalloc_array() change requested
> by Andrew, this looks good to me.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism
  2021-04-07 15:19   ` Jan Beulich
@ 2021-04-07 16:46     ` Roger Pau Monné
  2021-04-08  6:27       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-07 16:46 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Wed, Apr 07, 2021 at 05:19:06PM +0200, Jan Beulich wrote:
> On 31.03.2021 12:32, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/vioapic.c
> > +++ b/xen/arch/x86/hvm/vioapic.c
> > @@ -621,7 +624,43 @@ static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
> >           d->arch.hvm.nr_vioapics != 1 )
> >          return -EOPNOTSUPP;
> >  
> > -    return hvm_load_entry(IOAPIC, h, &s->domU);
> > +    rc = hvm_load_entry(IOAPIC, h, &s->domU);
> > +    if ( rc )
> > +        return rc;
> > +
> > +    for ( i = 0; i < ARRAY_SIZE(s->domU.redirtbl); i++ )
> > +    {
> > +        const union vioapic_redir_entry *ent = &s->domU.redirtbl[i];
> > +        unsigned int vector = ent->fields.vector;
> > +        unsigned int delivery_mode = ent->fields.delivery_mode;
> > +        struct vcpu *v;
> > +
> > +        /*
> > +         * Add a callback for each possible vector injected by a redirection
> > +         * entry.
> > +         */
> > +        if ( vector < 16 || !ent->fields.remote_irr ||
> > +             (delivery_mode != dest_LowestPrio && delivery_mode != dest_Fixed) )
> > +            continue;
> > +
> > +        for_each_vcpu ( d, v )
> > +        {
> > +            struct vlapic *vlapic = vcpu_vlapic(v);
> > +
> > +            /*
> > +             * NB: if the vlapic registers were restored before the vio-apic
> > +             * ones we could test whether the vector is set in the vlapic IRR
> > +             * or ISR registers before unconditionally setting the callback.
> > +             * This is harmless as eoi_callback is capable of dealing with
> > +             * spurious callbacks.
> > +             */
> > +            if ( vlapic_match_dest(vlapic, NULL, 0, ent->fields.dest_id,
> > +                                   ent->fields.dest_mode) )
> > +                vlapic_set_callback(vlapic, vector, eoi_callback, NULL);
> 
> eoi_callback()'s behavior is only one of the aspects to consider here.
> The other is vlapic_set_callback()'s complaining if it finds a
> callback already set. What guarantees that a mistakenly set callback
> here won't get in conflict with some future use of the same vector by
> the guest?

Such conflict would only manifest as a warning message, but won't
cause any malfunction, as the later callback would override the
current one.

This model I'm proposing doesn't support lapic vector sharing with
different devices that require EOI callbacks, I think we already
discussed this on a previous series and agreed it was fine.

> And btw - like in the earlier patch you could again pass d instead of
> NULL here, avoiding the need to establish it from current in the
> callback.

On the new version the vlapic callback gets passed a vcpu parameter,
as I will drop the prepatches to remove passing a domain parameter to
vioapic_update_EOI.

> > --- a/xen/arch/x86/hvm/vlapic.c
> > +++ b/xen/arch/x86/hvm/vlapic.c
> > @@ -192,7 +192,13 @@ void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
> >  
> >      if ( hvm_funcs.update_eoi_exit_bitmap )
> >          alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
> > -                          trig || callback);
> > +                          /*
> > +                           * NB: need to explicitly convert to boolean to avoid
> > +                           * truncation wrongly result in false begin reported
> > +                           * for example when the pointer sits on a page
> > +                           * boundary.
> > +                           */
> > +                          !!callback);
> 
> I've had quite a bit of difficulty with the comment. Once I realized
> that you likely mean "being" instead of "begin" it got a bit better.
> I'd like to suggest also s/result/resulting/, a comma after "reported",
> and maybe then s/being reported/getting passed/.
> 
> As to explicitly converting to bool, wouldn't a cast to bool do? That's
> more explicitly an "explicit conversion" than using !!.

I've always used !! in the past for such cases because it's shorter, I
can explicitly cast if you prefer that instead.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-07 15:51   ` Jan Beulich
@ 2021-04-07 17:08     ` Roger Pau Monné
  2021-04-08  6:34       ` Jan Beulich
  2021-04-15 16:04       ` Roger Pau Monné
  2021-04-08 12:52     ` Roger Pau Monné
  1 sibling, 2 replies; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-07 17:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Wed, Apr 07, 2021 at 05:51:14PM +0200, Jan Beulich wrote:
> On 31.03.2021 12:32, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/irq.c
> > +++ b/xen/arch/x86/hvm/irq.c
> > +void hvm_gsi_execute_callbacks(unsigned int gsi)
> > +{
> > +    struct hvm_irq *hvm_irq = hvm_domain_irq(current->domain);
> > +    struct hvm_gsi_eoi_callback *cb;
> > +
> > +    read_lock(&hvm_irq->gsi_callbacks_lock);
> > +    list_for_each_entry ( cb, &hvm_irq->gsi_callbacks[gsi], list )
> > +        cb->callback(gsi, cb->data);
> > +    read_unlock(&hvm_irq->gsi_callbacks_lock);
> > +}
> 
> Just as an observation (for now at least) - holding the lock here
> means the callbacks cannot re-register themselves.

Well, re-registering would be weird, as the callback is not
unregistered after execution. What is likely more relevant is that the
callback cannot unregister itself. I haven't found a need for this so
far, so I think it's fine.

> > +bool hvm_gsi_has_callbacks(const struct domain *d, unsigned int gsi)
> > +{
> > +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> > +    bool has_callbacks;
> > +
> > +    read_lock(&hvm_irq->gsi_callbacks_lock);
> > +    has_callbacks = !list_empty(&hvm_irq->gsi_callbacks[gsi]);
> > +    read_unlock(&hvm_irq->gsi_callbacks_lock);
> > +
> > +    return has_callbacks;
> > +}
> 
> What use is this function? Its result is stale by the time the
> caller can look at it, as you've dropped the lock.

Right, that function is only used to decide whether the vIOAPIC needs
to register an EOI callback when injecting a vector to the vlapic. The
workflow is to first register a callback with the vIOAPIC and
afterwards inject an interrupt which will trigger the callback
logic.

Playing with the callback registration while interrupts can be
injected will likely result in a malfunction of the device that relies
on those callbacks, but that's to be expected anyway when playing such
games.

That said multiple users sharing a vIOAPIC pin should be fine as long
as they follow the logic above: always register a callback before
attempting to inject an interrupt.

> > @@ -421,13 +423,25 @@ static void eoi_callback(unsigned int vector, void *data)
> >              if ( is_iommu_enabled(d) )
> >              {
> >                  spin_unlock(&d->arch.hvm.irq_lock);
> > -                hvm_dpci_eoi(vioapic->base_gsi + pin);
> > +                hvm_dpci_eoi(gsi);
> >                  spin_lock(&d->arch.hvm.irq_lock);
> >              }
> >  
> > +            /*
> > +             * Callbacks don't expect to be executed with any lock held, so
> > +             * drop the lock that protects the vIO-APIC fields from changing.
> > +             *
> > +             * Note that the redirection entry itself cannot go away, so upon
> > +             * retaking the lock we only need to avoid making assumptions on
> > +             * redirection entry field values (ie: recheck the IRR field).
> > +             */
> > +            spin_unlock(&d->arch.hvm.irq_lock);
> > +            hvm_gsi_execute_callbacks(gsi);
> > +            spin_lock(&d->arch.hvm.irq_lock);
> 
> The two pairs of unlock / re-lock want folding, I think - there's
> no point causing extra contention on the lock here.

The chunk above will go away on the next patch - there's no need to
fold it as it makes the following patch less clear.

> > @@ -443,7 +457,8 @@ static void ioapic_inj_irq(
> >      struct vlapic *target,
> >      uint8_t vector,
> >      uint8_t trig_mode,
> > -    uint8_t delivery_mode)
> > +    uint8_t delivery_mode,
> > +    bool callback)
> >  {
> >      HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "irq %d trig %d deliv %d",
> >                  vector, trig_mode, delivery_mode);
> > @@ -452,7 +467,7 @@ static void ioapic_inj_irq(
> >             (delivery_mode == dest_LowestPrio));
> >  
> >      vlapic_set_irq_callback(target, vector, trig_mode,
> > -                            trig_mode ? eoi_callback : NULL, NULL);
> > +                            callback ? eoi_callback : NULL, NULL);
> 
> I think you'd better use trig_mode || callback here and ...
> 
> > @@ -466,6 +481,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
> >      struct vlapic *target;
> >      struct vcpu *v;
> >      unsigned int irq = vioapic->base_gsi + pin;
> > +    bool callback = trig_mode || hvm_gsi_has_callbacks(d, irq);
> >  
> >      ASSERT(spin_is_locked(&d->arch.hvm.irq_lock));
> >  
> > @@ -492,7 +508,8 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
> >              target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
> >          if ( target != NULL )
> >          {
> > -            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode);
> > +            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode,
> > +                           callback);
> 
> ... invoke hvm_gsi_has_callbacks() right here and ...
> 
> > @@ -507,7 +524,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
> >          for_each_vcpu ( d, v )
> >              if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
> >                  ioapic_inj_irq(vioapic, vcpu_vlapic(v), vector, trig_mode,
> > -                               delivery_mode);
> > +                               delivery_mode, callback);
> 
> ... here, avoiding to call the function when you don't need the
> result.

I think there's a slim chance of not needing to use the callback local
variable, and hence didn't consider limiting it. I can do, but I'm
unsure this will bring any real benefit while making the code more
complex IMO.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-04-07 16:27     ` Roger Pau Monné
@ 2021-04-08  6:20       ` Jan Beulich
  2021-04-08  9:12         ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-08  6:20 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 07.04.2021 18:27, Roger Pau Monné wrote:
> On Wed, Apr 07, 2021 at 04:55:43PM +0200, Jan Beulich wrote:
>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>> Add a new vlapic_set_irq_callback helper in order to inject a vector
>>> and set a callback to be executed when the guest performs the end of
>>> interrupt acknowledgment.
>>>
>>> Such functionality will be used to migrate the current ad hoc handling
>>> done in vlapic_handle_EOI for the vectors that require some logic to
>>> be executed when the end of interrupt is performed.
>>>
>>> The setter of the callback will be in charge for setting the callback
>>> again on resume. That is the reason why vlapic_set_callback is not a
>>> static function.
>>
>> I'm struggling with your use of "resume" here: Resuming from S3
>> doesn't require re-doing anything that's kept in memory, does it?
>> So what meaning does the word have here?
> 
> Right, I can see the confusion. Resume here means a guest being
> migrated or restored, not Xen itself being resumed. Callbacks are not
> part of the exported guest state, and hence any emulated device that
> requires a callback will have to register it as part of loading the
> saved state.
> 
>> Apart from this, and with the xzalloc_array() change requested
>> by Andrew, this looks good to me.

In which case with this change and "resume" replaced suitably in the
description
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism
  2021-04-07 16:46     ` Roger Pau Monné
@ 2021-04-08  6:27       ` Jan Beulich
  2021-04-08  8:59         ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-08  6:27 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 07.04.2021 18:46, Roger Pau Monné wrote:
> On Wed, Apr 07, 2021 at 05:19:06PM +0200, Jan Beulich wrote:
>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/hvm/vioapic.c
>>> +++ b/xen/arch/x86/hvm/vioapic.c
>>> @@ -621,7 +624,43 @@ static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
>>>           d->arch.hvm.nr_vioapics != 1 )
>>>          return -EOPNOTSUPP;
>>>  
>>> -    return hvm_load_entry(IOAPIC, h, &s->domU);
>>> +    rc = hvm_load_entry(IOAPIC, h, &s->domU);
>>> +    if ( rc )
>>> +        return rc;
>>> +
>>> +    for ( i = 0; i < ARRAY_SIZE(s->domU.redirtbl); i++ )
>>> +    {
>>> +        const union vioapic_redir_entry *ent = &s->domU.redirtbl[i];
>>> +        unsigned int vector = ent->fields.vector;
>>> +        unsigned int delivery_mode = ent->fields.delivery_mode;
>>> +        struct vcpu *v;
>>> +
>>> +        /*
>>> +         * Add a callback for each possible vector injected by a redirection
>>> +         * entry.
>>> +         */
>>> +        if ( vector < 16 || !ent->fields.remote_irr ||
>>> +             (delivery_mode != dest_LowestPrio && delivery_mode != dest_Fixed) )
>>> +            continue;
>>> +
>>> +        for_each_vcpu ( d, v )
>>> +        {
>>> +            struct vlapic *vlapic = vcpu_vlapic(v);
>>> +
>>> +            /*
>>> +             * NB: if the vlapic registers were restored before the vio-apic
>>> +             * ones we could test whether the vector is set in the vlapic IRR
>>> +             * or ISR registers before unconditionally setting the callback.
>>> +             * This is harmless as eoi_callback is capable of dealing with
>>> +             * spurious callbacks.
>>> +             */
>>> +            if ( vlapic_match_dest(vlapic, NULL, 0, ent->fields.dest_id,
>>> +                                   ent->fields.dest_mode) )
>>> +                vlapic_set_callback(vlapic, vector, eoi_callback, NULL);
>>
>> eoi_callback()'s behavior is only one of the aspects to consider here.
>> The other is vlapic_set_callback()'s complaining if it finds a
>> callback already set. What guarantees that a mistakenly set callback
>> here won't get in conflict with some future use of the same vector by
>> the guest?
> 
> Such conflict would only manifest as a warning message, but won't
> cause any malfunction, as the later callback would override the
> current one.
> 
> This model I'm proposing doesn't support lapic vector sharing with
> different devices that require EOI callbacks, I think we already
> discussed this on a previous series and agreed it was fine.

The problem with such false positive warning messages is that
they'll cause cautious people to investigate, i.e. spend perhaps
a sizable amount of time in understanding what was actually a non-
issue. I view this as a problem, even if the code's functioning is
fine the way it is. I'm not even sure explicitly mentioning the
situation in the comment is going to help, as one may not stumble
across that comment while investigating.

>>> --- a/xen/arch/x86/hvm/vlapic.c
>>> +++ b/xen/arch/x86/hvm/vlapic.c
>>> @@ -192,7 +192,13 @@ void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
>>>  
>>>      if ( hvm_funcs.update_eoi_exit_bitmap )
>>>          alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
>>> -                          trig || callback);
>>> +                          /*
>>> +                           * NB: need to explicitly convert to boolean to avoid
>>> +                           * truncation wrongly result in false begin reported
>>> +                           * for example when the pointer sits on a page
>>> +                           * boundary.
>>> +                           */
>>> +                          !!callback);
>>
>> I've had quite a bit of difficulty with the comment. Once I realized
>> that you likely mean "being" instead of "begin" it got a bit better.
>> I'd like to suggest also s/result/resulting/, a comma after "reported",
>> and maybe then s/being reported/getting passed/.
>>
>> As to explicitly converting to bool, wouldn't a cast to bool do? That's
>> more explicitly an "explicit conversion" than using !!.
> 
> I've always used !! in the past for such cases because it's shorter, I
> can explicitly cast if you prefer that instead.

I agree with the "shorter" aspect. What I'm afraid of is that someone may,
despite the comment, think the !! is a stray leftover from the bool_t
days. I'd therefore prefer to keep the !! pattern for just the legacy
uses, and see casts used in cases like the one here. However, if both you
and Andrew think otherwise, so be it.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-07 17:08     ` Roger Pau Monné
@ 2021-04-08  6:34       ` Jan Beulich
  2021-04-15 16:04       ` Roger Pau Monné
  1 sibling, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2021-04-08  6:34 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 07.04.2021 19:08, Roger Pau Monné wrote:
> On Wed, Apr 07, 2021 at 05:51:14PM +0200, Jan Beulich wrote:
>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>> +bool hvm_gsi_has_callbacks(const struct domain *d, unsigned int gsi)
>>> +{
>>> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
>>> +    bool has_callbacks;
>>> +
>>> +    read_lock(&hvm_irq->gsi_callbacks_lock);
>>> +    has_callbacks = !list_empty(&hvm_irq->gsi_callbacks[gsi]);
>>> +    read_unlock(&hvm_irq->gsi_callbacks_lock);
>>> +
>>> +    return has_callbacks;
>>> +}
>>
>> What use is this function? Its result is stale by the time the
>> caller can look at it, as you've dropped the lock.
> 
> Right, that function is only used to decide whether the vIOAPIC needs
> to register an EOI callback when injecting a vector to the vlapic. The
> workflow is to first register a callback with the vIOAPIC and
> afterwards inject an interrupt which will trigger the callback
> logic.
> 
> Playing with the callback registration while interrupts can be
> injected will likely result in a malfunction of the device that relies
> on those callbacks, but that's to be expected anyway when playing such
> games.
> 
> That said multiple users sharing a vIOAPIC pin should be fine as long
> as they follow the logic above: always register a callback before
> attempting to inject an interrupt.

May I ask that you add a comment ahead of this function pointing out
the restriction?

>>> @@ -443,7 +457,8 @@ static void ioapic_inj_irq(
>>>      struct vlapic *target,
>>>      uint8_t vector,
>>>      uint8_t trig_mode,
>>> -    uint8_t delivery_mode)
>>> +    uint8_t delivery_mode,
>>> +    bool callback)
>>>  {
>>>      HVM_DBG_LOG(DBG_LEVEL_IOAPIC, "irq %d trig %d deliv %d",
>>>                  vector, trig_mode, delivery_mode);
>>> @@ -452,7 +467,7 @@ static void ioapic_inj_irq(
>>>             (delivery_mode == dest_LowestPrio));
>>>  
>>>      vlapic_set_irq_callback(target, vector, trig_mode,
>>> -                            trig_mode ? eoi_callback : NULL, NULL);
>>> +                            callback ? eoi_callback : NULL, NULL);
>>
>> I think you'd better use trig_mode || callback here and ...
>>
>>> @@ -466,6 +481,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>>>      struct vlapic *target;
>>>      struct vcpu *v;
>>>      unsigned int irq = vioapic->base_gsi + pin;
>>> +    bool callback = trig_mode || hvm_gsi_has_callbacks(d, irq);
>>>  
>>>      ASSERT(spin_is_locked(&d->arch.hvm.irq_lock));
>>>  
>>> @@ -492,7 +508,8 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>>>              target = vlapic_lowest_prio(d, NULL, 0, dest, dest_mode);
>>>          if ( target != NULL )
>>>          {
>>> -            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode);
>>> +            ioapic_inj_irq(vioapic, target, vector, trig_mode, delivery_mode,
>>> +                           callback);
>>
>> ... invoke hvm_gsi_has_callbacks() right here and ...
>>
>>> @@ -507,7 +524,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
>>>          for_each_vcpu ( d, v )
>>>              if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
>>>                  ioapic_inj_irq(vioapic, vcpu_vlapic(v), vector, trig_mode,
>>> -                               delivery_mode);
>>> +                               delivery_mode, callback);
>>
>> ... here, avoiding to call the function when you don't need the
>> result.
> 
> I think there's a slim chance of not needing to use the callback local
> variable, and hence didn't consider limiting it. I can do, but I'm
> unsure this will bring any real benefit while making the code more
> complex IMO.

Really the variable remaining unused in a minor set of cases was only
a secondary observation. What I first stumbled over is the moving of
the decision whether a callback is wanted from ioapic_inj_irq() to its
caller. Since the function clearly is intended as a helper of
vioapic_deliver(), I guess in the end it's fine the way you have it.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism
  2021-04-08  6:27       ` Jan Beulich
@ 2021-04-08  8:59         ` Roger Pau Monné
  2021-04-08 10:52           ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-08  8:59 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Thu, Apr 08, 2021 at 08:27:10AM +0200, Jan Beulich wrote:
> On 07.04.2021 18:46, Roger Pau Monné wrote:
> > On Wed, Apr 07, 2021 at 05:19:06PM +0200, Jan Beulich wrote:
> >> On 31.03.2021 12:32, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/hvm/vioapic.c
> >>> +++ b/xen/arch/x86/hvm/vioapic.c
> >>> @@ -621,7 +624,43 @@ static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
> >>>           d->arch.hvm.nr_vioapics != 1 )
> >>>          return -EOPNOTSUPP;
> >>>  
> >>> -    return hvm_load_entry(IOAPIC, h, &s->domU);
> >>> +    rc = hvm_load_entry(IOAPIC, h, &s->domU);
> >>> +    if ( rc )
> >>> +        return rc;
> >>> +
> >>> +    for ( i = 0; i < ARRAY_SIZE(s->domU.redirtbl); i++ )
> >>> +    {
> >>> +        const union vioapic_redir_entry *ent = &s->domU.redirtbl[i];
> >>> +        unsigned int vector = ent->fields.vector;
> >>> +        unsigned int delivery_mode = ent->fields.delivery_mode;
> >>> +        struct vcpu *v;
> >>> +
> >>> +        /*
> >>> +         * Add a callback for each possible vector injected by a redirection
> >>> +         * entry.
> >>> +         */
> >>> +        if ( vector < 16 || !ent->fields.remote_irr ||
> >>> +             (delivery_mode != dest_LowestPrio && delivery_mode != dest_Fixed) )
> >>> +            continue;
> >>> +
> >>> +        for_each_vcpu ( d, v )
> >>> +        {
> >>> +            struct vlapic *vlapic = vcpu_vlapic(v);
> >>> +
> >>> +            /*
> >>> +             * NB: if the vlapic registers were restored before the vio-apic
> >>> +             * ones we could test whether the vector is set in the vlapic IRR
> >>> +             * or ISR registers before unconditionally setting the callback.
> >>> +             * This is harmless as eoi_callback is capable of dealing with
> >>> +             * spurious callbacks.
> >>> +             */
> >>> +            if ( vlapic_match_dest(vlapic, NULL, 0, ent->fields.dest_id,
> >>> +                                   ent->fields.dest_mode) )
> >>> +                vlapic_set_callback(vlapic, vector, eoi_callback, NULL);
> >>
> >> eoi_callback()'s behavior is only one of the aspects to consider here.
> >> The other is vlapic_set_callback()'s complaining if it finds a
> >> callback already set. What guarantees that a mistakenly set callback
> >> here won't get in conflict with some future use of the same vector by
> >> the guest?
> > 
> > Such conflict would only manifest as a warning message, but won't
> > cause any malfunction, as the later callback would override the
> > current one.
> > 
> > This model I'm proposing doesn't support lapic vector sharing with
> > different devices that require EOI callbacks, I think we already
> > discussed this on a previous series and agreed it was fine.
> 
> The problem with such false positive warning messages is that
> they'll cause cautious people to investigate, i.e. spend perhaps
> a sizable amount of time in understanding what was actually a non-
> issue. I view this as a problem, even if the code's functioning is
> fine the way it is. I'm not even sure explicitly mentioning the
> situation in the comment is going to help, as one may not stumble
> across that comment while investigating.

What about making the warning message in case of override in
vlapic_set_callback conditional to there being a vector pending in IRR
or ISR?

Without having such vector pending the callback is just useless, as
it's not going to be executed, so overriding it in that situation is
perfectly fine. That should prevent the restoring here triggering the
message unless there's indeed a troublesome sharing of a vector.

> >>> --- a/xen/arch/x86/hvm/vlapic.c
> >>> +++ b/xen/arch/x86/hvm/vlapic.c
> >>> @@ -192,7 +192,13 @@ void vlapic_set_irq_callback(struct vlapic *vlapic, uint8_t vec, uint8_t trig,
> >>>  
> >>>      if ( hvm_funcs.update_eoi_exit_bitmap )
> >>>          alternative_vcall(hvm_funcs.update_eoi_exit_bitmap, target, vec,
> >>> -                          trig || callback);
> >>> +                          /*
> >>> +                           * NB: need to explicitly convert to boolean to avoid
> >>> +                           * truncation wrongly result in false begin reported
> >>> +                           * for example when the pointer sits on a page
> >>> +                           * boundary.
> >>> +                           */
> >>> +                          !!callback);
> >>
> >> I've had quite a bit of difficulty with the comment. Once I realized
> >> that you likely mean "being" instead of "begin" it got a bit better.
> >> I'd like to suggest also s/result/resulting/, a comma after "reported",
> >> and maybe then s/being reported/getting passed/.
> >>
> >> As to explicitly converting to bool, wouldn't a cast to bool do? That's
> >> more explicitly an "explicit conversion" than using !!.
> > 
> > I've always used !! in the past for such cases because it's shorter, I
> > can explicitly cast if you prefer that instead.
> 
> I agree with the "shorter" aspect. What I'm afraid of is that someone may,
> despite the comment, think the !! is a stray leftover from the bool_t
> days. I'd therefore prefer to keep the !! pattern for just the legacy
> uses, and see casts used in cases like the one here. However, if both you
> and Andrew think otherwise, so be it.

I'm fine with casting to boolean.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-04-08  6:20       ` Jan Beulich
@ 2021-04-08  9:12         ` Roger Pau Monné
  2021-04-08 10:49           ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-08  9:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Thu, Apr 08, 2021 at 08:20:15AM +0200, Jan Beulich wrote:
> On 07.04.2021 18:27, Roger Pau Monné wrote:
> > On Wed, Apr 07, 2021 at 04:55:43PM +0200, Jan Beulich wrote:
> >> On 31.03.2021 12:32, Roger Pau Monne wrote:
> >>> Add a new vlapic_set_irq_callback helper in order to inject a vector
> >>> and set a callback to be executed when the guest performs the end of
> >>> interrupt acknowledgment.
> >>>
> >>> Such functionality will be used to migrate the current ad hoc handling
> >>> done in vlapic_handle_EOI for the vectors that require some logic to
> >>> be executed when the end of interrupt is performed.
> >>>
> >>> The setter of the callback will be in charge for setting the callback
> >>> again on resume. That is the reason why vlapic_set_callback is not a
> >>> static function.
> >>
> >> I'm struggling with your use of "resume" here: Resuming from S3
> >> doesn't require re-doing anything that's kept in memory, does it?
> >> So what meaning does the word have here?
> > 
> > Right, I can see the confusion. Resume here means a guest being
> > migrated or restored, not Xen itself being resumed. Callbacks are not
> > part of the exported guest state, and hence any emulated device that
> > requires a callback will have to register it as part of loading the
> > saved state.
> > 
> >> Apart from this, and with the xzalloc_array() change requested
> >> by Andrew, this looks good to me.
> 
> In which case with this change and "resume" replaced suitably in the
> description

I've worded it as:

"The setter of the callback will be in charge for setting the callback
again on guest restore or resume, as callbacks are not saved as part
of the vlapic state. That is the reason why vlapic_set_callback is not
a static function."

> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks. I also want to make the warning message in vlapic_set_callback
conditional to the vector being pending in ISR or IRR, so I will not
add your RB this time if that's fine.

Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-04-08  9:12         ` Roger Pau Monné
@ 2021-04-08 10:49           ` Jan Beulich
  2021-04-08 10:56             ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-08 10:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 08.04.2021 11:12, Roger Pau Monné wrote:
> On Thu, Apr 08, 2021 at 08:20:15AM +0200, Jan Beulich wrote:
>> On 07.04.2021 18:27, Roger Pau Monné wrote:
>>> On Wed, Apr 07, 2021 at 04:55:43PM +0200, Jan Beulich wrote:
>>>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>>>> Add a new vlapic_set_irq_callback helper in order to inject a vector
>>>>> and set a callback to be executed when the guest performs the end of
>>>>> interrupt acknowledgment.
>>>>>
>>>>> Such functionality will be used to migrate the current ad hoc handling
>>>>> done in vlapic_handle_EOI for the vectors that require some logic to
>>>>> be executed when the end of interrupt is performed.
>>>>>
>>>>> The setter of the callback will be in charge for setting the callback
>>>>> again on resume. That is the reason why vlapic_set_callback is not a
>>>>> static function.
>>>>
>>>> I'm struggling with your use of "resume" here: Resuming from S3
>>>> doesn't require re-doing anything that's kept in memory, does it?
>>>> So what meaning does the word have here?
>>>
>>> Right, I can see the confusion. Resume here means a guest being
>>> migrated or restored, not Xen itself being resumed. Callbacks are not
>>> part of the exported guest state, and hence any emulated device that
>>> requires a callback will have to register it as part of loading the
>>> saved state.
>>>
>>>> Apart from this, and with the xzalloc_array() change requested
>>>> by Andrew, this looks good to me.
>>
>> In which case with this change and "resume" replaced suitably in the
>> description
> 
> I've worded it as:
> 
> "The setter of the callback will be in charge for setting the callback
> again on guest restore or resume, as callbacks are not saved as part
> of the vlapic state. That is the reason why vlapic_set_callback is not
> a static function."

Hmm, you still mention "resume", which makes me continue to wonder
what you're thinking of beyond guest restore.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism
  2021-04-08  8:59         ` Roger Pau Monné
@ 2021-04-08 10:52           ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2021-04-08 10:52 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 08.04.2021 10:59, Roger Pau Monné wrote:
> On Thu, Apr 08, 2021 at 08:27:10AM +0200, Jan Beulich wrote:
>> On 07.04.2021 18:46, Roger Pau Monné wrote:
>>> On Wed, Apr 07, 2021 at 05:19:06PM +0200, Jan Beulich wrote:
>>>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>>>> --- a/xen/arch/x86/hvm/vioapic.c
>>>>> +++ b/xen/arch/x86/hvm/vioapic.c
>>>>> @@ -621,7 +624,43 @@ static int ioapic_load(struct domain *d, hvm_domain_context_t *h)
>>>>>           d->arch.hvm.nr_vioapics != 1 )
>>>>>          return -EOPNOTSUPP;
>>>>>  
>>>>> -    return hvm_load_entry(IOAPIC, h, &s->domU);
>>>>> +    rc = hvm_load_entry(IOAPIC, h, &s->domU);
>>>>> +    if ( rc )
>>>>> +        return rc;
>>>>> +
>>>>> +    for ( i = 0; i < ARRAY_SIZE(s->domU.redirtbl); i++ )
>>>>> +    {
>>>>> +        const union vioapic_redir_entry *ent = &s->domU.redirtbl[i];
>>>>> +        unsigned int vector = ent->fields.vector;
>>>>> +        unsigned int delivery_mode = ent->fields.delivery_mode;
>>>>> +        struct vcpu *v;
>>>>> +
>>>>> +        /*
>>>>> +         * Add a callback for each possible vector injected by a redirection
>>>>> +         * entry.
>>>>> +         */
>>>>> +        if ( vector < 16 || !ent->fields.remote_irr ||
>>>>> +             (delivery_mode != dest_LowestPrio && delivery_mode != dest_Fixed) )
>>>>> +            continue;
>>>>> +
>>>>> +        for_each_vcpu ( d, v )
>>>>> +        {
>>>>> +            struct vlapic *vlapic = vcpu_vlapic(v);
>>>>> +
>>>>> +            /*
>>>>> +             * NB: if the vlapic registers were restored before the vio-apic
>>>>> +             * ones we could test whether the vector is set in the vlapic IRR
>>>>> +             * or ISR registers before unconditionally setting the callback.
>>>>> +             * This is harmless as eoi_callback is capable of dealing with
>>>>> +             * spurious callbacks.
>>>>> +             */
>>>>> +            if ( vlapic_match_dest(vlapic, NULL, 0, ent->fields.dest_id,
>>>>> +                                   ent->fields.dest_mode) )
>>>>> +                vlapic_set_callback(vlapic, vector, eoi_callback, NULL);
>>>>
>>>> eoi_callback()'s behavior is only one of the aspects to consider here.
>>>> The other is vlapic_set_callback()'s complaining if it finds a
>>>> callback already set. What guarantees that a mistakenly set callback
>>>> here won't get in conflict with some future use of the same vector by
>>>> the guest?
>>>
>>> Such conflict would only manifest as a warning message, but won't
>>> cause any malfunction, as the later callback would override the
>>> current one.
>>>
>>> This model I'm proposing doesn't support lapic vector sharing with
>>> different devices that require EOI callbacks, I think we already
>>> discussed this on a previous series and agreed it was fine.
>>
>> The problem with such false positive warning messages is that
>> they'll cause cautious people to investigate, i.e. spend perhaps
>> a sizable amount of time in understanding what was actually a non-
>> issue. I view this as a problem, even if the code's functioning is
>> fine the way it is. I'm not even sure explicitly mentioning the
>> situation in the comment is going to help, as one may not stumble
>> across that comment while investigating.
> 
> What about making the warning message in case of override in
> vlapic_set_callback conditional to there being a vector pending in IRR
> or ISR?
> 
> Without having such vector pending the callback is just useless, as
> it's not going to be executed, so overriding it in that situation is
> perfectly fine. That should prevent the restoring here triggering the
> message unless there's indeed a troublesome sharing of a vector.

Ah yes, since the callbacks are self-clearing, this gating looks quite
reasonable to me.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism
  2021-04-08 10:49           ` Jan Beulich
@ 2021-04-08 10:56             ` Roger Pau Monné
  0 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-08 10:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Thu, Apr 08, 2021 at 12:49:44PM +0200, Jan Beulich wrote:
> On 08.04.2021 11:12, Roger Pau Monné wrote:
> > On Thu, Apr 08, 2021 at 08:20:15AM +0200, Jan Beulich wrote:
> >> On 07.04.2021 18:27, Roger Pau Monné wrote:
> >>> On Wed, Apr 07, 2021 at 04:55:43PM +0200, Jan Beulich wrote:
> >>>> On 31.03.2021 12:32, Roger Pau Monne wrote:
> >>>>> Add a new vlapic_set_irq_callback helper in order to inject a vector
> >>>>> and set a callback to be executed when the guest performs the end of
> >>>>> interrupt acknowledgment.
> >>>>>
> >>>>> Such functionality will be used to migrate the current ad hoc handling
> >>>>> done in vlapic_handle_EOI for the vectors that require some logic to
> >>>>> be executed when the end of interrupt is performed.
> >>>>>
> >>>>> The setter of the callback will be in charge for setting the callback
> >>>>> again on resume. That is the reason why vlapic_set_callback is not a
> >>>>> static function.
> >>>>
> >>>> I'm struggling with your use of "resume" here: Resuming from S3
> >>>> doesn't require re-doing anything that's kept in memory, does it?
> >>>> So what meaning does the word have here?
> >>>
> >>> Right, I can see the confusion. Resume here means a guest being
> >>> migrated or restored, not Xen itself being resumed. Callbacks are not
> >>> part of the exported guest state, and hence any emulated device that
> >>> requires a callback will have to register it as part of loading the
> >>> saved state.
> >>>
> >>>> Apart from this, and with the xzalloc_array() change requested
> >>>> by Andrew, this looks good to me.
> >>
> >> In which case with this change and "resume" replaced suitably in the
> >> description
> > 
> > I've worded it as:
> > 
> > "The setter of the callback will be in charge for setting the callback
> > again on guest restore or resume, as callbacks are not saved as part
> > of the vlapic state. That is the reason why vlapic_set_callback is not
> > a static function."
> 
> Hmm, you still mention "resume", which makes me continue to wonder
> what you're thinking of beyond guest restore.

Urg, yes, let me remove that resume.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-07 15:51   ` Jan Beulich
  2021-04-07 17:08     ` Roger Pau Monné
@ 2021-04-08 12:52     ` Roger Pau Monné
  2021-04-08 14:31       ` Jan Beulich
  1 sibling, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-08 12:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Wed, Apr 07, 2021 at 05:51:14PM +0200, Jan Beulich wrote:
> On 31.03.2021 12:32, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/irq.c
> > +++ b/xen/arch/x86/hvm/irq.c
> > +void hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
> > +                                 struct hvm_gsi_eoi_callback *cb)
> > +{
> > +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> > +    const struct list_head *tmp;
> > +
> > +    if ( gsi >= hvm_irq->nr_gsis )
> > +    {
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    write_lock(&hvm_irq->gsi_callbacks_lock);
> > +    list_for_each ( tmp, &hvm_irq->gsi_callbacks[gsi] )
> > +        if ( tmp == &cb->list )
> > +        {
> > +            list_del(&cb->list);
> > +            break;
> > +        }
> > +    write_unlock(&hvm_irq->gsi_callbacks_lock);
> > +}
> 
> Perhaps somehow flag, at least in debug builds, if the callback
> wasn#t found?

I've added a debug printf here to warn if the callback is not found,
but I see it triggering because hpet_set_timer will call
destroy_periodic_time and create_periodic_time and thus two calls will
be made to hvm_gsi_unregister_callback. This is fine, but adding a
message there gets too verbose, so I will drop it and leave the code
as-is.

I don't see a problem with calling destroy_periodic_time multiple
times even if the timer was not active, and that shouldn't result in a
message being printed.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-08 12:52     ` Roger Pau Monné
@ 2021-04-08 14:31       ` Jan Beulich
  2021-04-08 15:06         ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-08 14:31 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 08.04.2021 14:52, Roger Pau Monné wrote:
> On Wed, Apr 07, 2021 at 05:51:14PM +0200, Jan Beulich wrote:
>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/hvm/irq.c
>>> +++ b/xen/arch/x86/hvm/irq.c
>>> +void hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
>>> +                                 struct hvm_gsi_eoi_callback *cb)
>>> +{
>>> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
>>> +    const struct list_head *tmp;
>>> +
>>> +    if ( gsi >= hvm_irq->nr_gsis )
>>> +    {
>>> +        ASSERT_UNREACHABLE();
>>> +        return;
>>> +    }
>>> +
>>> +    write_lock(&hvm_irq->gsi_callbacks_lock);
>>> +    list_for_each ( tmp, &hvm_irq->gsi_callbacks[gsi] )
>>> +        if ( tmp == &cb->list )
>>> +        {
>>> +            list_del(&cb->list);
>>> +            break;
>>> +        }
>>> +    write_unlock(&hvm_irq->gsi_callbacks_lock);
>>> +}
>>
>> Perhaps somehow flag, at least in debug builds, if the callback
>> wasn#t found?
> 
> I've added a debug printf here to warn if the callback is not found,
> but I see it triggering because hpet_set_timer will call
> destroy_periodic_time and create_periodic_time and thus two calls will
> be made to hvm_gsi_unregister_callback. This is fine, but adding a
> message there gets too verbose, so I will drop it and leave the code
> as-is.
> 
> I don't see a problem with calling destroy_periodic_time multiple
> times even if the timer was not active, and that shouldn't result in a
> message being printed.

If destroy_periodic_time() is to remain the only caller, I guess I
agree. Other (future) callers may then need this function to gain
a return value indicating whether the callback was actually found.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/11] x86/dpci: switch to use a GSI EOI callback
  2021-03-31 10:33 ` [PATCH v3 08/11] x86/dpci: switch to use a GSI EOI callback Roger Pau Monne
@ 2021-04-08 14:49   ` Jan Beulich
  2021-04-08 15:23     ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-08 14:49 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, Paul Durrant, xen-devel

On 31.03.2021 12:33, Roger Pau Monne wrote:
> @@ -515,17 +528,44 @@ int pt_irq_create_bind(
>          }
>          else
>          {
> +            /*
> +             * NB: the callback structure allocated below will never be freed
> +             * once setup because it's used by the hardware domain and will
> +             * never be unregistered.
> +             */
> +            cb = xmalloc(struct hvm_gsi_eoi_callback);

Is this comment as well as ...

>              ASSERT(is_hardware_domain(d));
>  
> +            if ( !cb )
> +            {
> +                spin_unlock(&d->event_lock);
> +                return -ENOMEM;
> +            }
> +
>              /* MSI_TRANSLATE is not supported for the hardware domain. */
>              if ( pt_irq_bind->irq_type != PT_IRQ_TYPE_PCI ||
>                   pirq >= hvm_domain_irq(d)->nr_gsis )
>              {
>                  spin_unlock(&d->event_lock);
> -
> +                xfree(cb);
>                  return -EINVAL;
>              }
>              guest_gsi = pirq;
> +
> +            cb->callback = dpci_eoi;
> +            cb->data = d;
> +            /*
> +             * IRQ binds created for the hardware domain are never destroyed,
> +             * so it's fine to not keep a reference to cb here.
> +             */
> +            rc = hvm_gsi_register_callback(d, guest_gsi, cb);

... the one here really true? vpci_msi_arch_update() and
vpci_msi_disable() seem to tell me otherwise (or for the former
comment, they suggest there should be un-registration somewhere).
It also doesn't seem logical to me, considering (yet to be made
work) pass-through of devices or hot-unplugged ones, at which
point Dom0 shouldn't retain IRQ bindings, I would think.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-08 14:31       ` Jan Beulich
@ 2021-04-08 15:06         ` Roger Pau Monné
  0 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-08 15:06 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Thu, Apr 08, 2021 at 04:31:59PM +0200, Jan Beulich wrote:
> On 08.04.2021 14:52, Roger Pau Monné wrote:
> > On Wed, Apr 07, 2021 at 05:51:14PM +0200, Jan Beulich wrote:
> >> On 31.03.2021 12:32, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/hvm/irq.c
> >>> +++ b/xen/arch/x86/hvm/irq.c
> >>> +void hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
> >>> +                                 struct hvm_gsi_eoi_callback *cb)
> >>> +{
> >>> +    struct hvm_irq *hvm_irq = hvm_domain_irq(d);
> >>> +    const struct list_head *tmp;
> >>> +
> >>> +    if ( gsi >= hvm_irq->nr_gsis )
> >>> +    {
> >>> +        ASSERT_UNREACHABLE();
> >>> +        return;
> >>> +    }
> >>> +
> >>> +    write_lock(&hvm_irq->gsi_callbacks_lock);
> >>> +    list_for_each ( tmp, &hvm_irq->gsi_callbacks[gsi] )
> >>> +        if ( tmp == &cb->list )
> >>> +        {
> >>> +            list_del(&cb->list);
> >>> +            break;
> >>> +        }
> >>> +    write_unlock(&hvm_irq->gsi_callbacks_lock);
> >>> +}
> >>
> >> Perhaps somehow flag, at least in debug builds, if the callback
> >> wasn#t found?
> > 
> > I've added a debug printf here to warn if the callback is not found,
> > but I see it triggering because hpet_set_timer will call
> > destroy_periodic_time and create_periodic_time and thus two calls will
> > be made to hvm_gsi_unregister_callback. This is fine, but adding a
> > message there gets too verbose, so I will drop it and leave the code
> > as-is.
> > 
> > I don't see a problem with calling destroy_periodic_time multiple
> > times even if the timer was not active, and that shouldn't result in a
> > message being printed.
> 
> If destroy_periodic_time() is to remain the only caller, I guess I
> agree. Other (future) callers may then need this function to gain
> a return value indicating whether the callback was actually found.

There's also pt_irq_destroy_bind which likely cares about the return
value, so let's return a value from hvm_gsi_unregister_callback and
check it in pt_irq_destroy_bind.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 08/11] x86/dpci: switch to use a GSI EOI callback
  2021-04-08 14:49   ` Jan Beulich
@ 2021-04-08 15:23     ` Roger Pau Monné
  0 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-08 15:23 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Paul Durrant, xen-devel

On Thu, Apr 08, 2021 at 04:49:48PM +0200, Jan Beulich wrote:
> On 31.03.2021 12:33, Roger Pau Monne wrote:
> > @@ -515,17 +528,44 @@ int pt_irq_create_bind(
> >          }
> >          else
> >          {
> > +            /*
> > +             * NB: the callback structure allocated below will never be freed
> > +             * once setup because it's used by the hardware domain and will
> > +             * never be unregistered.
> > +             */
> > +            cb = xmalloc(struct hvm_gsi_eoi_callback);
> 
> Is this comment as well as ...
> 
> >              ASSERT(is_hardware_domain(d));
> >  
> > +            if ( !cb )
> > +            {
> > +                spin_unlock(&d->event_lock);
> > +                return -ENOMEM;
> > +            }
> > +
> >              /* MSI_TRANSLATE is not supported for the hardware domain. */
> >              if ( pt_irq_bind->irq_type != PT_IRQ_TYPE_PCI ||
> >                   pirq >= hvm_domain_irq(d)->nr_gsis )
> >              {
> >                  spin_unlock(&d->event_lock);
> > -
> > +                xfree(cb);
> >                  return -EINVAL;
> >              }
> >              guest_gsi = pirq;
> > +
> > +            cb->callback = dpci_eoi;
> > +            cb->data = d;
> > +            /*
> > +             * IRQ binds created for the hardware domain are never destroyed,
> > +             * so it's fine to not keep a reference to cb here.
> > +             */
> > +            rc = hvm_gsi_register_callback(d, guest_gsi, cb);
> 
> ... the one here really true? vpci_msi_arch_update() and
> vpci_msi_disable() seem to tell me otherwise (or for the former
> comment, they suggest there should be un-registration somewhere).

MSI doesn't use hvm_gsi_register_callback at all, since those are only
used for GSI interrupts. I should replace IRQ with GSI in the comment
above to make it clearer.

> It also doesn't seem logical to me, considering (yet to be made
> work) pass-through of devices or hot-unplugged ones, at which
> point Dom0 shouldn't retain IRQ bindings, I would think.

Hm, maybe. I think we are still very far from that. Right now GSIs are
bound to a PVH dom0 based on the unmasked vIO-APIC pins, and they are
never unbound. We could see about unbinding them, but TBH I would
expect a PVH dom0 to just mask the vIO-APIC pin when it has no
devices using it if those are unplugged.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 09/11] x86/vpt: switch interrupt injection model
  2021-03-31 10:33 ` [PATCH v3 09/11] x86/vpt: switch interrupt injection model Roger Pau Monne
@ 2021-04-14 10:28   ` Jan Beulich
  2021-04-14 13:37     ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-14 10:28 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Andrew Cooper, Wei Liu, Jun Nakajima, Kevin Tian, xen-devel

On 31.03.2021 12:33, Roger Pau Monne wrote:
> ---
>  xen/arch/x86/hvm/svm/intr.c   |   3 -
>  xen/arch/x86/hvm/vmx/intr.c   |  59 ------
>  xen/arch/x86/hvm/vpt.c        | 334 ++++++++++++++--------------------
>  xen/include/asm-x86/hvm/vpt.h |   5 +-
>  4 files changed, 143 insertions(+), 258 deletions(-)

Nice.

> @@ -285,189 +238,144 @@ static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
>              list_del(&pt->list);
>          pt->on_list = false;
>          pt->pending_intr_nr = 0;
> +
> +        return;
>      }
> -    else if ( mode_is(v->domain, one_missed_tick_pending) ||
> -              mode_is(v->domain, no_missed_ticks_pending) )
> +
> +    if ( mode_is(v->domain, one_missed_tick_pending) ||
> +         mode_is(v->domain, no_missed_ticks_pending) )
>      {
> -        pt->last_plt_gtime = hvm_get_guest_time(v);
>          pt_process_missed_ticks(pt);
>          pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */
> +    }
> +    else if ( !pt->pending_intr_nr )
> +        pt_process_missed_ticks(pt);

Did you lose a -- here? I.e. does the condition mean to match ...

> +    if ( !pt->pending_intr_nr )
>          set_timer(&pt->timer, pt->scheduled);
> +}
> +
> +static void pt_timer_fn(void *data)
> +{
> +    struct periodic_time *pt = data;
> +    struct vcpu *v;
> +    time_cb *cb = NULL;
> +    void *cb_priv;
> +    unsigned int irq;
> +
> +    pt_lock(pt);
> +
> +    v = pt->vcpu;
> +    irq = pt->irq;
> +
> +    if ( inject_interrupt(pt) )
> +    {
> +        pt->scheduled += pt->period;
> +        pt->do_not_freeze = 0;
> +        cb = pt->cb;
> +        cb_priv = pt->priv;
>      }
>      else
>      {
> -        pt->last_plt_gtime += pt->period;
> -        if ( --pt->pending_intr_nr == 0 )

... this original code? Otherwise I can't see why the condition
guards a pt_process_missed_ticks() invocation.

> @@ -617,20 +556,29 @@ void pt_adjust_global_vcpu_target(struct vcpu *v)
>      write_unlock(&pl_time->vhpet.lock);
>  }
>  
> -
>  static void pt_resume(struct periodic_time *pt)
>  {
> +    struct vcpu *v;
> +    time_cb *cb = NULL;
> +    void *cb_priv;
> +
>      if ( pt->vcpu == NULL )
>          return;
>  
>      pt_lock(pt);
> -    if ( pt->pending_intr_nr && !pt->on_list )
> +    if ( pt->pending_intr_nr && !pt->on_list && inject_interrupt(pt) )
>      {
> +        pt->pending_intr_nr--;
> +        cb = pt->cb;
> +        cb_priv = pt->priv;
> +        v = pt->vcpu;
>          pt->on_list = 1;
>          list_add(&pt->list, &pt->vcpu->arch.hvm.tm_list);
> -        vcpu_kick(pt->vcpu);
>      }
>      pt_unlock(pt);
> +
> +    if ( cb )
> +        cb(v, cb_priv);
>  }

I'm afraid until we raise our supported gcc versions baseline, v and
cb_priv will need an initializer at the top of the function just like
cb.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 10/11] x86/vpt: remove vPT timers per-vCPU lists
  2021-03-31 10:33 ` [PATCH v3 10/11] x86/vpt: remove vPT timers per-vCPU lists Roger Pau Monne
@ 2021-04-14 10:38   ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2021-04-14 10:38 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 31.03.2021 12:33, Roger Pau Monne wrote:
> No longer add vPT timers to lists on specific vCPUs, since there's no
> need anymore to check if timer interrupts have been injected on return
> to HVM guest.
> 
> Such change allows to get rid of virtual timers vCPU migration, and
> also cleanup some of the virtual timers fields that are no longer
> required.
> 
> The model is also slightly different now in that timers are not
> stopped when a vCPU is de-scheduled. Such timers will continue
> running, and when triggered the function will try to inject the
> corresponding interrupt to the guest (which might be different than
> the currently running one). Note that the timer triggering when the
> guest is no longer running can only happen once, as the timer callback
> will not reset the interrupt to fire again. Such resetting if required
> will be done by the EOI callback.
> 
> Since virtual timers are no longer added to per-VCPU lists when active
> a new 'masked' field is added to the structure, to signal whether a
> timer has it's interrupt source currently masked.

I guess this paragraph has become stale with ...

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v2:
>  - Remove pt_{save/restore}_timer and instead use
>    pt_{freeze/thaw}_time.
>  - Remove the introduction of the 'masked' field, it's not needed.

... this change?

Other than this
Reviewed-by: Jan Beulich <jbeulich@suse.com>

>  - Rework pt_active to use timer_is_active.
> 
> Changes since v1:
>  - New in this version.
> ---
>  xen/arch/x86/domain.c          |   4 +-
>  xen/arch/x86/hvm/hvm.c         |   4 +-
>  xen/arch/x86/hvm/vlapic.c      |   1 -
>  xen/arch/x86/hvm/vpt.c         | 174 ++++-----------------------------
>  xen/include/asm-x86/hvm/vcpu.h |   3 +-
>  xen/include/asm-x86/hvm/vpt.h  |  12 +--
>  6 files changed, 27 insertions(+), 171 deletions(-)

Yet nicer than the previous one!

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 09/11] x86/vpt: switch interrupt injection model
  2021-04-14 10:28   ` Jan Beulich
@ 2021-04-14 13:37     ` Roger Pau Monné
  2021-04-14 14:05       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-14 13:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Jun Nakajima, Kevin Tian, xen-devel

On Wed, Apr 14, 2021 at 12:28:43PM +0200, Jan Beulich wrote:
> On 31.03.2021 12:33, Roger Pau Monne wrote:
> > ---
> >  xen/arch/x86/hvm/svm/intr.c   |   3 -
> >  xen/arch/x86/hvm/vmx/intr.c   |  59 ------
> >  xen/arch/x86/hvm/vpt.c        | 334 ++++++++++++++--------------------
> >  xen/include/asm-x86/hvm/vpt.h |   5 +-
> >  4 files changed, 143 insertions(+), 258 deletions(-)
> 
> Nice.
> 
> > @@ -285,189 +238,144 @@ static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
> >              list_del(&pt->list);
> >          pt->on_list = false;
> >          pt->pending_intr_nr = 0;
> > +
> > +        return;
> >      }
> > -    else if ( mode_is(v->domain, one_missed_tick_pending) ||
> > -              mode_is(v->domain, no_missed_ticks_pending) )
> > +
> > +    if ( mode_is(v->domain, one_missed_tick_pending) ||
> > +         mode_is(v->domain, no_missed_ticks_pending) )
> >      {
> > -        pt->last_plt_gtime = hvm_get_guest_time(v);
> >          pt_process_missed_ticks(pt);
> >          pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */
> > +    }
> > +    else if ( !pt->pending_intr_nr )
> > +        pt_process_missed_ticks(pt);
> 
> Did you lose a -- here? I.e. does the condition mean to match ...
> 
> > +    if ( !pt->pending_intr_nr )
> >          set_timer(&pt->timer, pt->scheduled);
> > +}
> > +
> > +static void pt_timer_fn(void *data)
> > +{
> > +    struct periodic_time *pt = data;
> > +    struct vcpu *v;
> > +    time_cb *cb = NULL;
> > +    void *cb_priv;
> > +    unsigned int irq;
> > +
> > +    pt_lock(pt);
> > +
> > +    v = pt->vcpu;
> > +    irq = pt->irq;
> > +
> > +    if ( inject_interrupt(pt) )
> > +    {
> > +        pt->scheduled += pt->period;
> > +        pt->do_not_freeze = 0;
> > +        cb = pt->cb;
> > +        cb_priv = pt->priv;
> >      }
> >      else
> >      {
> > -        pt->last_plt_gtime += pt->period;
> > -        if ( --pt->pending_intr_nr == 0 )
> 
> ... this original code? Otherwise I can't see why the condition
> guards a pt_process_missed_ticks() invocation.

I think the logic here changed enough to not match anymore. Certainly
pending_intr_nr shouldn't be decreased there, as pt_irq_fired is
invoked after an EOI in this patch, instead of being invoked when a
vpt related interrupt was injected. I think I should better rename
pt_irq_fired to pt_irq_eoi and that would make it clearer.

FWIW, decreasing pending_intr_nr should only be done after an
inject_interrupt call.

> > @@ -617,20 +556,29 @@ void pt_adjust_global_vcpu_target(struct vcpu *v)
> >      write_unlock(&pl_time->vhpet.lock);
> >  }
> >  
> > -
> >  static void pt_resume(struct periodic_time *pt)
> >  {
> > +    struct vcpu *v;
> > +    time_cb *cb = NULL;
> > +    void *cb_priv;
> > +
> >      if ( pt->vcpu == NULL )
> >          return;
> >  
> >      pt_lock(pt);
> > -    if ( pt->pending_intr_nr && !pt->on_list )
> > +    if ( pt->pending_intr_nr && !pt->on_list && inject_interrupt(pt) )
> >      {
> > +        pt->pending_intr_nr--;
> > +        cb = pt->cb;
> > +        cb_priv = pt->priv;
> > +        v = pt->vcpu;
> >          pt->on_list = 1;
> >          list_add(&pt->list, &pt->vcpu->arch.hvm.tm_list);
> > -        vcpu_kick(pt->vcpu);
> >      }
> >      pt_unlock(pt);
> > +
> > +    if ( cb )
> > +        cb(v, cb_priv);
> >  }
> 
> I'm afraid until we raise our supported gcc versions baseline, v and
> cb_priv will need an initializer at the top of the function just like
> cb.

Will add such initializations.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 09/11] x86/vpt: switch interrupt injection model
  2021-04-14 13:37     ` Roger Pau Monné
@ 2021-04-14 14:05       ` Jan Beulich
  2021-04-14 14:20         ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-14 14:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Andrew Cooper, Wei Liu, Jun Nakajima, Kevin Tian, xen-devel

On 14.04.2021 15:37, Roger Pau Monné wrote:
> On Wed, Apr 14, 2021 at 12:28:43PM +0200, Jan Beulich wrote:
>> On 31.03.2021 12:33, Roger Pau Monne wrote:
>>> ---
>>>  xen/arch/x86/hvm/svm/intr.c   |   3 -
>>>  xen/arch/x86/hvm/vmx/intr.c   |  59 ------
>>>  xen/arch/x86/hvm/vpt.c        | 334 ++++++++++++++--------------------
>>>  xen/include/asm-x86/hvm/vpt.h |   5 +-
>>>  4 files changed, 143 insertions(+), 258 deletions(-)
>>
>> Nice.
>>
>>> @@ -285,189 +238,144 @@ static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
>>>              list_del(&pt->list);
>>>          pt->on_list = false;
>>>          pt->pending_intr_nr = 0;
>>> +
>>> +        return;
>>>      }
>>> -    else if ( mode_is(v->domain, one_missed_tick_pending) ||
>>> -              mode_is(v->domain, no_missed_ticks_pending) )
>>> +
>>> +    if ( mode_is(v->domain, one_missed_tick_pending) ||
>>> +         mode_is(v->domain, no_missed_ticks_pending) )
>>>      {
>>> -        pt->last_plt_gtime = hvm_get_guest_time(v);
>>>          pt_process_missed_ticks(pt);
>>>          pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */
>>> +    }
>>> +    else if ( !pt->pending_intr_nr )
>>> +        pt_process_missed_ticks(pt);
>>
>> Did you lose a -- here? I.e. does the condition mean to match ...
>>
>>> +    if ( !pt->pending_intr_nr )
>>>          set_timer(&pt->timer, pt->scheduled);
>>> +}
>>> +
>>> +static void pt_timer_fn(void *data)
>>> +{
>>> +    struct periodic_time *pt = data;
>>> +    struct vcpu *v;
>>> +    time_cb *cb = NULL;
>>> +    void *cb_priv;
>>> +    unsigned int irq;
>>> +
>>> +    pt_lock(pt);
>>> +
>>> +    v = pt->vcpu;
>>> +    irq = pt->irq;
>>> +
>>> +    if ( inject_interrupt(pt) )
>>> +    {
>>> +        pt->scheduled += pt->period;
>>> +        pt->do_not_freeze = 0;
>>> +        cb = pt->cb;
>>> +        cb_priv = pt->priv;
>>>      }
>>>      else
>>>      {
>>> -        pt->last_plt_gtime += pt->period;
>>> -        if ( --pt->pending_intr_nr == 0 )
>>
>> ... this original code? Otherwise I can't see why the condition
>> guards a pt_process_missed_ticks() invocation.
> 
> I think the logic here changed enough to not match anymore. Certainly
> pending_intr_nr shouldn't be decreased there, as pt_irq_fired is
> invoked after an EOI in this patch, instead of being invoked when a
> vpt related interrupt was injected. I think I should better rename
> pt_irq_fired to pt_irq_eoi and that would make it clearer.

But pt_process_missed_ticks() should be called only when a tick was
missed, shouldn't it? Or actually, looking at the function, I guess
I'm confused. Does your patch change the meaning of the field?

> FWIW, decreasing pending_intr_nr should only be done after an
> inject_interrupt call.

Then this line (which you leave in place)

         pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */

is contradicting the (new) model.

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 09/11] x86/vpt: switch interrupt injection model
  2021-04-14 14:05       ` Jan Beulich
@ 2021-04-14 14:20         ` Roger Pau Monné
  0 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-14 14:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, Jun Nakajima, Kevin Tian, xen-devel

On Wed, Apr 14, 2021 at 04:05:20PM +0200, Jan Beulich wrote:
> On 14.04.2021 15:37, Roger Pau Monné wrote:
> > On Wed, Apr 14, 2021 at 12:28:43PM +0200, Jan Beulich wrote:
> >> On 31.03.2021 12:33, Roger Pau Monne wrote:
> >>> ---
> >>>  xen/arch/x86/hvm/svm/intr.c   |   3 -
> >>>  xen/arch/x86/hvm/vmx/intr.c   |  59 ------
> >>>  xen/arch/x86/hvm/vpt.c        | 334 ++++++++++++++--------------------
> >>>  xen/include/asm-x86/hvm/vpt.h |   5 +-
> >>>  4 files changed, 143 insertions(+), 258 deletions(-)
> >>
> >> Nice.
> >>
> >>> @@ -285,189 +238,144 @@ static void pt_irq_fired(struct vcpu *v, struct periodic_time *pt)
> >>>              list_del(&pt->list);
> >>>          pt->on_list = false;
> >>>          pt->pending_intr_nr = 0;
> >>> +
> >>> +        return;
> >>>      }
> >>> -    else if ( mode_is(v->domain, one_missed_tick_pending) ||
> >>> -              mode_is(v->domain, no_missed_ticks_pending) )
> >>> +
> >>> +    if ( mode_is(v->domain, one_missed_tick_pending) ||
> >>> +         mode_is(v->domain, no_missed_ticks_pending) )
> >>>      {
> >>> -        pt->last_plt_gtime = hvm_get_guest_time(v);
> >>>          pt_process_missed_ticks(pt);
> >>>          pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */
> >>> +    }
> >>> +    else if ( !pt->pending_intr_nr )
> >>> +        pt_process_missed_ticks(pt);
> >>
> >> Did you lose a -- here? I.e. does the condition mean to match ...
> >>
> >>> +    if ( !pt->pending_intr_nr )
> >>>          set_timer(&pt->timer, pt->scheduled);
> >>> +}
> >>> +
> >>> +static void pt_timer_fn(void *data)
> >>> +{
> >>> +    struct periodic_time *pt = data;
> >>> +    struct vcpu *v;
> >>> +    time_cb *cb = NULL;
> >>> +    void *cb_priv;
> >>> +    unsigned int irq;
> >>> +
> >>> +    pt_lock(pt);
> >>> +
> >>> +    v = pt->vcpu;
> >>> +    irq = pt->irq;
> >>> +
> >>> +    if ( inject_interrupt(pt) )
> >>> +    {
> >>> +        pt->scheduled += pt->period;
> >>> +        pt->do_not_freeze = 0;
> >>> +        cb = pt->cb;
> >>> +        cb_priv = pt->priv;
> >>>      }
> >>>      else
> >>>      {
> >>> -        pt->last_plt_gtime += pt->period;
> >>> -        if ( --pt->pending_intr_nr == 0 )
> >>
> >> ... this original code? Otherwise I can't see why the condition
> >> guards a pt_process_missed_ticks() invocation.
> > 
> > I think the logic here changed enough to not match anymore. Certainly
> > pending_intr_nr shouldn't be decreased there, as pt_irq_fired is
> > invoked after an EOI in this patch, instead of being invoked when a
> > vpt related interrupt was injected. I think I should better rename
> > pt_irq_fired to pt_irq_eoi and that would make it clearer.
> 
> But pt_process_missed_ticks() should be called only when a tick was
> missed, shouldn't it?

No, I think the purpose of the function is to update the
pending_intr_nr field, ie: calculate if and how many ticks have been
missed.

It's fine for pt_process_missed_ticks to return without having changed
pending_intr_nr at all if no ticks have been missed.

> Or actually, looking at the function, I guess
> I'm confused. Does your patch change the meaning of the field?

Not really, I think pt_process_missed_ticks has always had this logic.
The pending_intr_nr filed should still have the same logic, account
for the amount of missed ticks up to the value in the scheduled field.

> > FWIW, decreasing pending_intr_nr should only be done after an
> > inject_interrupt call.
> 
> Then this line (which you leave in place)
> 
>          pt->pending_intr_nr = 0; /* 'collapse' all missed ticks */
> 
> is contradicting the (new) model.

Oh, right, that's mode specific. no_missed_ticks_pending will just
drop any interrupts that haven't been injected when they should have
been. I had the 'account missed ticks' mode in mind when I wrote that.

I now have pt_irq_fired as:

static void irq_eoi(struct periodic_time *pt)
{
    if ( pt->one_shot )
    {
        pt->pending_intr_nr = 0;
        return;
    }

    pt_process_missed_ticks(pt);
    /* 'collapse' missed ticks according to the selected mode. */
    switch ( pt->vcpu->domain->arch.hvm.params[HVM_PARAM_TIMER_MODE] )
    {
    case HVMPTM_one_missed_tick_pending:
        pt->pending_intr_nr = min(pt->pending_intr_nr, 1u);
        break;

    case HVMPTM_no_missed_ticks_pending:
        pt->pending_intr_nr = 0;
        break;
    }

    if ( !pt->pending_intr_nr )
    {
        /* Make sure timer follows vCPU. */
        migrate_timer(&pt->timer, current->processor);
        set_timer(&pt->timer, pt->scheduled);
    }
}

But I think it's best if I post it as a new version, so you can see
the context.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-07 17:08     ` Roger Pau Monné
  2021-04-08  6:34       ` Jan Beulich
@ 2021-04-15 16:04       ` Roger Pau Monné
  2021-04-16  7:29         ` Jan Beulich
  1 sibling, 1 reply; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-15 16:04 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Wed, Apr 07, 2021 at 07:08:06PM +0200, Roger Pau Monné wrote:
> On Wed, Apr 07, 2021 at 05:51:14PM +0200, Jan Beulich wrote:
> > On 31.03.2021 12:32, Roger Pau Monne wrote:
> > > --- a/xen/arch/x86/hvm/irq.c
> > > +++ b/xen/arch/x86/hvm/irq.c
> > > +void hvm_gsi_execute_callbacks(unsigned int gsi)
> > > +{
> > > +    struct hvm_irq *hvm_irq = hvm_domain_irq(current->domain);
> > > +    struct hvm_gsi_eoi_callback *cb;
> > > +
> > > +    read_lock(&hvm_irq->gsi_callbacks_lock);
> > > +    list_for_each_entry ( cb, &hvm_irq->gsi_callbacks[gsi], list )
> > > +        cb->callback(gsi, cb->data);
> > > +    read_unlock(&hvm_irq->gsi_callbacks_lock);
> > > +}
> > 
> > Just as an observation (for now at least) - holding the lock here
> > means the callbacks cannot re-register themselves.
> 
> Well, re-registering would be weird, as the callback is not
> unregistered after execution. What is likely more relevant is that the
> callback cannot unregister itself. I haven't found a need for this so
> far, so I think it's fine.

I'm afraid I was wrong here - rtc_pf_callback could attempt to
unregister the timer, and thus end up calling
hvm_gsi_unregister_callback inside of a callback.

I need to figure a way to solve this. We already run the RTC in no ack
mode (which is correct because of the flag we expose in the WAET ACPI
table), and hence I wonder if we still need to keep the code for the
strict_mode around, since it's not used at all. Would you be OK with
me removing the mode_strict related code?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-15 16:04       ` Roger Pau Monné
@ 2021-04-16  7:29         ` Jan Beulich
  2021-04-19  8:31           ` Roger Pau Monné
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2021-04-16  7:29 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Andrew Cooper, Wei Liu, xen-devel

On 15.04.2021 18:04, Roger Pau Monné wrote:
> On Wed, Apr 07, 2021 at 07:08:06PM +0200, Roger Pau Monné wrote:
>> On Wed, Apr 07, 2021 at 05:51:14PM +0200, Jan Beulich wrote:
>>> On 31.03.2021 12:32, Roger Pau Monne wrote:
>>>> --- a/xen/arch/x86/hvm/irq.c
>>>> +++ b/xen/arch/x86/hvm/irq.c
>>>> +void hvm_gsi_execute_callbacks(unsigned int gsi)
>>>> +{
>>>> +    struct hvm_irq *hvm_irq = hvm_domain_irq(current->domain);
>>>> +    struct hvm_gsi_eoi_callback *cb;
>>>> +
>>>> +    read_lock(&hvm_irq->gsi_callbacks_lock);
>>>> +    list_for_each_entry ( cb, &hvm_irq->gsi_callbacks[gsi], list )
>>>> +        cb->callback(gsi, cb->data);
>>>> +    read_unlock(&hvm_irq->gsi_callbacks_lock);
>>>> +}
>>>
>>> Just as an observation (for now at least) - holding the lock here
>>> means the callbacks cannot re-register themselves.
>>
>> Well, re-registering would be weird, as the callback is not
>> unregistered after execution. What is likely more relevant is that the
>> callback cannot unregister itself. I haven't found a need for this so
>> far, so I think it's fine.
> 
> I'm afraid I was wrong here - rtc_pf_callback could attempt to
> unregister the timer, and thus end up calling
> hvm_gsi_unregister_callback inside of a callback.
> 
> I need to figure a way to solve this. We already run the RTC in no ack
> mode (which is correct because of the flag we expose in the WAET ACPI
> table), and hence I wonder if we still need to keep the code for the
> strict_mode around, since it's not used at all. Would you be OK with
> me removing the mode_strict related code?

Not sure, to be honest. Years ago I did submit a patch correcting this
("x86/HVM: tie RTC emulation mode to enabling of Viridian emulation"),
as we shouldn't assume all guests to even know of WAET. Hence running
uniformly in rtc_mode_no_ack isn't really correct. I'm still carrying
this patch, as Tim (iirc) had asked not to tie the behavior to the
Viridian param, but give it its own one. Which I still didn't get to.

Of course, if we decided to drop mode_strict support, I could also
drop that patch ...

Jan


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs
  2021-04-16  7:29         ` Jan Beulich
@ 2021-04-19  8:31           ` Roger Pau Monné
  0 siblings, 0 replies; 51+ messages in thread
From: Roger Pau Monné @ 2021-04-19  8:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

On Fri, Apr 16, 2021 at 09:29:26AM +0200, Jan Beulich wrote:
> On 15.04.2021 18:04, Roger Pau Monné wrote:
> > On Wed, Apr 07, 2021 at 07:08:06PM +0200, Roger Pau Monné wrote:
> >> On Wed, Apr 07, 2021 at 05:51:14PM +0200, Jan Beulich wrote:
> >>> On 31.03.2021 12:32, Roger Pau Monne wrote:
> >>>> --- a/xen/arch/x86/hvm/irq.c
> >>>> +++ b/xen/arch/x86/hvm/irq.c
> >>>> +void hvm_gsi_execute_callbacks(unsigned int gsi)
> >>>> +{
> >>>> +    struct hvm_irq *hvm_irq = hvm_domain_irq(current->domain);
> >>>> +    struct hvm_gsi_eoi_callback *cb;
> >>>> +
> >>>> +    read_lock(&hvm_irq->gsi_callbacks_lock);
> >>>> +    list_for_each_entry ( cb, &hvm_irq->gsi_callbacks[gsi], list )
> >>>> +        cb->callback(gsi, cb->data);
> >>>> +    read_unlock(&hvm_irq->gsi_callbacks_lock);
> >>>> +}
> >>>
> >>> Just as an observation (for now at least) - holding the lock here
> >>> means the callbacks cannot re-register themselves.
> >>
> >> Well, re-registering would be weird, as the callback is not
> >> unregistered after execution. What is likely more relevant is that the
> >> callback cannot unregister itself. I haven't found a need for this so
> >> far, so I think it's fine.
> > 
> > I'm afraid I was wrong here - rtc_pf_callback could attempt to
> > unregister the timer, and thus end up calling
> > hvm_gsi_unregister_callback inside of a callback.
> > 
> > I need to figure a way to solve this. We already run the RTC in no ack
> > mode (which is correct because of the flag we expose in the WAET ACPI
> > table), and hence I wonder if we still need to keep the code for the
> > strict_mode around, since it's not used at all. Would you be OK with
> > me removing the mode_strict related code?
> 
> Not sure, to be honest. Years ago I did submit a patch correcting this
> ("x86/HVM: tie RTC emulation mode to enabling of Viridian emulation"),
> as we shouldn't assume all guests to even know of WAET.

It's very likely guest that don't even know about WAET to continue
working fine even in the no_ack mode. In fact the current code for
strict_mode will inject 10 interrupts without REG_C being read, as
there's no check for the value of REG_C before injecting the
interrupt.

> Hence running
> uniformly in rtc_mode_no_ack isn't really correct. I'm still carrying
> this patch, as Tim (iirc) had asked not to tie the behavior to the
> Viridian param, but give it its own one. Which I still didn't get to.
> 
> Of course, if we decided to drop mode_strict support, I could also
> drop that patch ...

AFAICT the no_ack mode it's been used since Xen 4.3, and so far we had
no complains, so I think it's safe to just remove the code for
strict_mode. It can always be fetched from the repository history if
there's a need to support strict_mode in the future.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2021-04-19  8:31 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31 10:32 [PATCH v3 00/11] x86/intr: introduce EOI callbacks and fix vPT Roger Pau Monne
2021-03-31 10:32 ` [PATCH v3 01/11] x86/hvm: drop vcpu parameter from vlapic EOI callbacks Roger Pau Monne
2021-03-31 16:02   ` Jan Beulich
2021-03-31 16:24     ` Andrew Cooper
2021-04-01  9:12       ` Roger Pau Monné
2021-04-01 11:06   ` Jan Beulich
2021-04-07  7:41     ` Roger Pau Monné
2021-04-07  8:19       ` Jan Beulich
2021-03-31 10:32 ` [PATCH v3 02/11] x86/hvm: drop domain parameter from vioapic/vpic " Roger Pau Monne
2021-03-31 16:04   ` Jan Beulich
2021-04-01  9:15     ` Roger Pau Monné
2021-04-01  9:28       ` Jan Beulich
2021-03-31 10:32 ` [PATCH v3 03/11] x86/vlapic: introduce an EOI callback mechanism Roger Pau Monne
2021-03-31 11:47   ` Andrew Cooper
2021-03-31 12:50     ` Roger Pau Monné
2021-04-07 14:55   ` Jan Beulich
2021-04-07 16:27     ` Roger Pau Monné
2021-04-08  6:20       ` Jan Beulich
2021-04-08  9:12         ` Roger Pau Monné
2021-04-08 10:49           ` Jan Beulich
2021-04-08 10:56             ` Roger Pau Monné
2021-03-31 10:32 ` [PATCH v3 04/11] x86/vmsi: use the newly introduced EOI callbacks Roger Pau Monne
2021-04-07 14:59   ` Jan Beulich
2021-03-31 10:32 ` [PATCH v3 05/11] x86/vioapic: switch to use the EOI callback mechanism Roger Pau Monne
2021-04-07 15:19   ` Jan Beulich
2021-04-07 16:46     ` Roger Pau Monné
2021-04-08  6:27       ` Jan Beulich
2021-04-08  8:59         ` Roger Pau Monné
2021-04-08 10:52           ` Jan Beulich
2021-03-31 10:32 ` [PATCH v3 06/11] x86/hvm: allowing registering EOI callbacks for GSIs Roger Pau Monne
2021-04-07 15:51   ` Jan Beulich
2021-04-07 17:08     ` Roger Pau Monné
2021-04-08  6:34       ` Jan Beulich
2021-04-15 16:04       ` Roger Pau Monné
2021-04-16  7:29         ` Jan Beulich
2021-04-19  8:31           ` Roger Pau Monné
2021-04-08 12:52     ` Roger Pau Monné
2021-04-08 14:31       ` Jan Beulich
2021-04-08 15:06         ` Roger Pau Monné
2021-03-31 10:32 ` [PATCH v3 07/11] x86/dpci: move code Roger Pau Monne
2021-03-31 10:33 ` [PATCH v3 08/11] x86/dpci: switch to use a GSI EOI callback Roger Pau Monne
2021-04-08 14:49   ` Jan Beulich
2021-04-08 15:23     ` Roger Pau Monné
2021-03-31 10:33 ` [PATCH v3 09/11] x86/vpt: switch interrupt injection model Roger Pau Monne
2021-04-14 10:28   ` Jan Beulich
2021-04-14 13:37     ` Roger Pau Monné
2021-04-14 14:05       ` Jan Beulich
2021-04-14 14:20         ` Roger Pau Monné
2021-03-31 10:33 ` [PATCH v3 10/11] x86/vpt: remove vPT timers per-vCPU lists Roger Pau Monne
2021-04-14 10:38   ` Jan Beulich
2021-03-31 10:33 ` [PATCH v3 11/11] x86/vpt: introduce a per-vPT lock Roger Pau Monne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).