[PATCH v3a 00/39] (0/3) Fixups for the new VGIC(-v2) implementation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3a 00/39] (0/3) Fixups for the new VGIC(-v2) implementation
@ 2018-03-22 11:56 Andre Przywara
  2018-03-22 11:56 ` [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ Andre Przywara
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Andre Przywara @ 2018-03-22 11:56 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini; +Cc: xen-devel

Hi,

this is just an update of the three patches which didn't get any review
tags so far.
The fixes for the new versions of 03/39 and 39/39 are pretty straight
forward, but 14/39 is more of a beast. I sent a diff to the original
patch [1] separately to give an idea of the changes.

I added the R-b: and A-b: tags along with the NIT fixes to my tree and
will later push a branch with those tags and these fixes here in a somewhat
final version.
Look out for the vgic-new/v3a branch appearing at
http://www.linux-arm.org/git?p=xen-ap.git

Cheers,
Andre

[1] https://lists.xen.org/archives/html/xen-devel/2018-03/msg02680.html

---
Changelog v3 ... v3a: (copied from the patches' changelog)
03/39:
- always set/clear _IRQ_INPROGRESS bit (not only for guest IRQs)
- add comments
14/39:
- take hardware IRQ lock in vgic_v2_fold_lr_state()
- fix last remaining u32 usage
- print message when using new VGIC
- add TODO about racy _IRQ_INPROGRESS setting
39/39:
- print panic when trying to run on GICv3 hardware

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ
  2018-03-22 11:56 [PATCH v3a 00/39] (0/3) Fixups for the new VGIC(-v2) implementation Andre Przywara
@ 2018-03-22 11:56 ` Andre Przywara
  2018-03-22 13:58   ` Julien Grall
  2018-03-26 20:09   ` Stefano Stabellini
  2018-03-22 11:56 ` [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend Andre Przywara
  2018-03-22 11:56 ` [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system Andre Przywara
  2 siblings, 2 replies; 20+ messages in thread
From: Andre Przywara @ 2018-03-22 11:56 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini; +Cc: xen-devel

When playing around with hardware mapped, level triggered virtual IRQs,
there is the need to explicitly set the active or pending state of an
interrupt at some point.
To prepare the GIC for that, we introduce a set_active_state() and a
set_pending_state() function to let the VGIC manipulate the state of
an associated hardware IRQ.
This takes care of properly setting the _IRQ_INPROGRESS bit.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
---
Changelog v3 ... v3a:
- always set/clear _IRQ_INPROGRESS bit (not only for guest IRQs)
- add comments

Changelog v2 ... v3:
- extend comments to note preliminary nature of vgic_get_lpi()

Changelog v1 ... v2:
- reorder header file inclusion

 xen/arch/arm/gic-v2.c     | 41 +++++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/gic-v3.c     | 37 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/gic.h | 24 ++++++++++++++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index aa0fc6c1a1..7374686235 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -243,6 +243,45 @@ static void gicv2_poke_irq(struct irq_desc *irqd, uint32_t offset)
     writel_gicd(1U << (irqd->irq % 32), offset + (irqd->irq / 32) * 4);
 }
 
+/*
+ * This is forcing the active state of an interrupt, somewhat circumventing
+ * the normal interrupt flow and the GIC state machine. So use with care
+ * and only if you know what you are doing. For this reason we also have to
+ * tinker with the _IRQ_INPROGRESS bit here, since the normal IRQ handler
+ * will not be involved.
+ */
+static void gicv2_set_active_state(struct irq_desc *irqd, bool active)
+{
+    ASSERT(spin_is_locked(&irqd->lock));
+
+    if ( active )
+    {
+        set_bit(_IRQ_INPROGRESS, &irqd->status);
+        gicv2_poke_irq(irqd, GICD_ISACTIVER);
+    }
+    else
+    {
+        clear_bit(_IRQ_INPROGRESS, &irqd->status);
+        gicv2_poke_irq(irqd, GICD_ICACTIVER);
+    }
+}
+
+static void gicv2_set_pending_state(struct irq_desc *irqd, bool pending)
+{
+    ASSERT(spin_is_locked(&irqd->lock));
+
+    if ( pending )
+    {
+        /* The _IRQ_INPROGRESS bit will be set when the interrupt fires. */
+        gicv2_poke_irq(irqd, GICD_ISPENDR);
+    }
+    else
+    {
+        /* The _IRQ_INPROGRESS remains unchanged. */
+        gicv2_poke_irq(irqd, GICD_ICPENDR);
+    }
+}
+
 static void gicv2_set_irq_type(struct irq_desc *desc, unsigned int type)
 {
     uint32_t cfg, actual, edgebit;
@@ -1278,6 +1317,8 @@ const static struct gic_hw_operations gicv2_ops = {
     .eoi_irq             = gicv2_eoi_irq,
     .deactivate_irq      = gicv2_dir_irq,
     .read_irq            = gicv2_read_irq,
+    .set_active_state    = gicv2_set_active_state,
+    .set_pending_state   = gicv2_set_pending_state,
     .set_irq_type        = gicv2_set_irq_type,
     .set_irq_priority    = gicv2_set_irq_priority,
     .send_SGI            = gicv2_send_SGI,
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index cb41844af2..a5105ac9e7 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -477,6 +477,41 @@ static unsigned int gicv3_read_irq(void)
     return irq;
 }
 
+/*
+ * This is forcing the active state of an interrupt, somewhat circumventing
+ * the normal interrupt flow and the GIC state machine. So use with care
+ * and only if you know what you are doing. For this reason we also have to
+ * tinker with the _IRQ_INPROGRESS bit here, since the normal IRQ handler
+ * will not be involved.
+ */
+static void gicv3_set_active_state(struct irq_desc *irqd, bool active)
+{
+    ASSERT(spin_is_locked(&irqd->lock));
+
+    if ( active )
+    {
+        set_bit(_IRQ_INPROGRESS, &irqd->status);
+        gicv3_poke_irq(irqd, GICD_ISACTIVER, false);
+    }
+    else
+    {
+        clear_bit(_IRQ_INPROGRESS, &irqd->status);
+        gicv3_poke_irq(irqd, GICD_ICACTIVER, false);
+    }
+}
+
+static void gicv3_set_pending_state(struct irq_desc *irqd, bool pending)
+{
+    ASSERT(spin_is_locked(&irqd->lock));
+
+    if ( pending )
+        /* The _IRQ_INPROGRESS bit will be set when the interrupt fires. */
+        gicv3_poke_irq(irqd, GICD_ISPENDR, false);
+    else
+        /* The _IRQ_INPROGRESS bit will remain unchanged. */
+        gicv3_poke_irq(irqd, GICD_ICPENDR, false);
+}
+
 static inline uint64_t gicv3_mpidr_to_affinity(int cpu)
 {
      uint64_t mpidr = cpu_logical_map(cpu);
@@ -1769,6 +1804,8 @@ static const struct gic_hw_operations gicv3_ops = {
     .eoi_irq             = gicv3_eoi_irq,
     .deactivate_irq      = gicv3_dir_irq,
     .read_irq            = gicv3_read_irq,
+    .set_active_state    = gicv3_set_active_state,
+    .set_pending_state   = gicv3_set_pending_state,
     .set_irq_type        = gicv3_set_irq_type,
     .set_irq_priority    = gicv3_set_irq_priority,
     .send_SGI            = gicv3_send_sgi,
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index 3079387e06..2aca243ac3 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -345,6 +345,10 @@ struct gic_hw_operations {
     void (*deactivate_irq)(struct irq_desc *irqd);
     /* Read IRQ id and Ack */
     unsigned int (*read_irq)(void);
+    /* Force the active state of an IRQ by accessing the distributor */
+    void (*set_active_state)(struct irq_desc *irqd, bool state);
+    /* Force the pending state of an IRQ by accessing the distributor */
+    void (*set_pending_state)(struct irq_desc *irqd, bool state);
     /* Set IRQ type */
     void (*set_irq_type)(struct irq_desc *desc, unsigned int type);
     /* Set IRQ priority */
@@ -393,6 +397,26 @@ static inline unsigned int gic_get_nr_lrs(void)
     return gic_hw_ops->info->nr_lrs;
 }
 
+/*
+ * Set the active state of an IRQ. This should be used with care, as this
+ * directly forces the active bit, without considering the GIC state machine.
+ * For private IRQs this only works for those of the current CPU.
+ */
+static inline void gic_set_active_state(struct irq_desc *irqd, bool state)
+{
+    gic_hw_ops->set_active_state(irqd, state);
+}
+
+/*
+ * Set the pending state of an IRQ. This should be used with care, as this
+ * directly forces the pending bit, without considering the GIC state machine.
+ * For private IRQs this only works for those of the current CPU.
+ */
+static inline void gic_set_pending_state(struct irq_desc *irqd, bool state)
+{
+    gic_hw_ops->set_pending_state(irqd, state);
+}
+
 void register_gic_ops(const struct gic_hw_operations *ops);
 int gic_make_hwdom_dt_node(const struct domain *d,
                            const struct dt_device_node *gic,
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-22 11:56 [PATCH v3a 00/39] (0/3) Fixups for the new VGIC(-v2) implementation Andre Przywara
  2018-03-22 11:56 ` [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ Andre Przywara
@ 2018-03-22 11:56 ` Andre Przywara
  2018-03-22 14:06   ` Julien Grall
  2018-03-26 23:22   ` Stefano Stabellini
  2018-03-22 11:56 ` [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system Andre Przywara
  2 siblings, 2 replies; 20+ messages in thread
From: Andre Przywara @ 2018-03-22 11:56 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini; +Cc: xen-devel

Processing maintenance interrupts and accessing the list registers
are dependent on the host's GIC version.
Introduce vgic-v2.c to contain GICv2 specific functions.
Implement the GICv2 specific code for syncing the emulation state
into the VGIC registers.
This also adds the hook to let Xen setup the host GIC addresses.

This is based on Linux commit 140b086dd197, written by Marc Zyngier.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
---
Changelog v3 ... v3a:
- take hardware IRQ lock in vgic_v2_fold_lr_state()
- fix last remaining u32 usage
- print message when using new VGIC
- add TODO about racy _IRQ_INPROGRESS setting

Changelog v2 ... v3:
- remove no longer needed asm/io.h header
- replace 0/1 with false/true for bool's
- clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
- fix indentation and w/s issues

Changelog v1 ... v2:
- remove v2 specific underflow function (now generic)
- re-add Linux code to properly handle acked level IRQs

 xen/arch/arm/vgic/vgic-v2.c | 259 ++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/vgic/vgic.c    |   6 +
 xen/arch/arm/vgic/vgic.h    |   9 ++
 3 files changed, 274 insertions(+)
 create mode 100644 xen/arch/arm/vgic/vgic-v2.c

diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
new file mode 100644
index 0000000000..1773503cfb
--- /dev/null
+++ b/xen/arch/arm/vgic/vgic-v2.c
@@ -0,0 +1,259 @@
+/*
+ * Copyright (C) 2015, 2016 ARM Ltd.
+ * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <asm/new_vgic.h>
+#include <asm/bug.h>
+#include <asm/gic.h>
+#include <xen/sched.h>
+#include <xen/sizes.h>
+
+#include "vgic.h"
+
+static struct {
+    bool enabled;
+    paddr_t dbase;          /* Distributor interface address */
+    paddr_t cbase;          /* CPU interface address & size */
+    paddr_t csize;
+    paddr_t vbase;          /* Virtual CPU interface address */
+
+    /* Offset to add to get an 8kB contiguous region if GIC is aliased */
+    uint32_t aliased_offset;
+} gic_v2_hw_data;
+
+void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
+                      paddr_t vbase, uint32_t aliased_offset)
+{
+    gic_v2_hw_data.enabled = true;
+    gic_v2_hw_data.dbase = dbase;
+    gic_v2_hw_data.cbase = cbase;
+    gic_v2_hw_data.csize = csize;
+    gic_v2_hw_data.vbase = vbase;
+    gic_v2_hw_data.aliased_offset = aliased_offset;
+
+    printk("Using the new VGIC implementation.\n");
+}
+
+/*
+ * transfer the content of the LRs back into the corresponding ap_list:
+ * - active bit is transferred as is
+ * - pending bit is
+ *   - transferred as is in case of edge sensitive IRQs
+ *   - set to the line-level (resample time) for level sensitive IRQs
+ */
+void vgic_v2_fold_lr_state(struct vcpu *vcpu)
+{
+    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
+    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
+    unsigned long flags;
+    unsigned int lr;
+
+    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
+        return;
+
+    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
+
+    for ( lr = 0; lr < used_lrs; lr++ )
+    {
+        struct gic_lr lr_val;
+        uint32_t intid;
+        struct vgic_irq *irq;
+        struct irq_desc *desc = NULL;
+        bool have_desc_lock = false;
+
+        gic_hw_ops->read_lr(lr, &lr_val);
+
+        /*
+         * TODO: Possible optimization to avoid reading LRs:
+         * Read the ELRSR to find out which of our LRs have been cleared
+         * by the guest. We just need to know the IRQ number for those, which
+         * we could save in an array when populating the LRs.
+         * This trades one MMIO access (ELRSR) for possibly more than one (LRs),
+         * but requires some more code to save the IRQ number and to handle
+         * those finished IRQs according to the algorithm below.
+         * We need some numbers to justify this: chances are that we don't
+         * have many LRs in use most of the time, so we might not save much.
+         */
+        gic_hw_ops->clear_lr(lr);
+
+        intid = lr_val.virq;
+        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
+
+        local_irq_save(flags);
+        spin_lock(&irq->irq_lock);
+
+        /* The locking order forces us to drop and re-take the locks here. */
+        if ( irq->hw )
+        {
+            spin_unlock(&irq->irq_lock);
+
+            desc = irq_to_desc(irq->hwintid);
+            spin_lock(&desc->lock);
+            spin_lock(&irq->irq_lock);
+
+            /* This h/w IRQ should still be assigned to the virtual IRQ. */
+            ASSERT(irq->hw && desc->irq == irq->hwintid);
+
+            have_desc_lock = true;
+        }
+
+        /*
+         * If a hardware mapped IRQ has been handled for good, we need to
+         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
+         *
+         * TODO: This is probably racy, but is so already in the existing
+         * VGIC. A fix does not seem to be trivial.
+         */
+        if ( irq->hw && !lr_val.active && !lr_val.pending )
+            clear_bit(_IRQ_INPROGRESS, &desc->status);
+
+        /* Always preserve the active bit */
+        irq->active = lr_val.active;
+
+        /* Edge is the only case where we preserve the pending bit */
+        if ( irq->config == VGIC_CONFIG_EDGE && lr_val.pending )
+        {
+            irq->pending_latch = true;
+
+            if ( vgic_irq_is_sgi(intid) )
+                irq->source |= (1U << lr_val.virt.source);
+        }
+
+        /* Clear soft pending state when level irqs have been acked. */
+        if ( irq->config == VGIC_CONFIG_LEVEL && !lr_val.pending )
+            irq->pending_latch = false;
+
+        /*
+         * Level-triggered mapped IRQs are special because we only
+         * observe rising edges as input to the VGIC.
+         *
+         * If the guest never acked the interrupt we have to sample
+         * the physical line and set the line level, because the
+         * device state could have changed or we simply need to
+         * process the still pending interrupt later.
+         *
+         * If this causes us to lower the level, we have to also clear
+         * the physical active state, since we will otherwise never be
+         * told when the interrupt becomes asserted again.
+         */
+        if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
+        {
+            ASSERT(irq->hwintid >= VGIC_NR_PRIVATE_IRQS);
+
+            irq->line_level = gic_read_pending_state(desc);
+
+            if ( !irq->line_level )
+                gic_set_active_state(desc, false);
+        }
+
+        spin_unlock(&irq->irq_lock);
+        if ( have_desc_lock )
+            spin_unlock(&desc->lock);
+        local_irq_restore(flags);
+
+        vgic_put_irq(vcpu->domain, irq);
+    }
+
+    gic_hw_ops->update_hcr_status(GICH_HCR_EN, false);
+    vgic_cpu->used_lrs = 0;
+}
+
+/**
+ * vgic_v2_populate_lr() - Populates an LR with the state of a given IRQ.
+ * @vcpu: The VCPU which the given @irq belongs to.
+ * @irq:  The IRQ to convert into an LR. The irq_lock must be held already.
+ * @lr:   The LR number to transfer the state into.
+ *
+ * This moves a virtual IRQ, represented by its vgic_irq, into a list register.
+ * Apart from translating the logical state into the LR bitfields, it also
+ * changes some state in the vgic_irq.
+ * For an edge sensitive IRQ the pending state is cleared in struct vgic_irq,
+ * for a level sensitive IRQ the pending state value is unchanged, as it is
+ * dictated directly by the input line level.
+ *
+ * If @irq describes an SGI with multiple sources, we choose the
+ * lowest-numbered source VCPU and clear that bit in the source bitmap.
+ *
+ * The irq_lock must be held by the caller.
+ */
+void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int lr)
+{
+    struct gic_lr lr_val = {0};
+
+    lr_val.virq = irq->intid;
+
+    if ( irq_is_pending(irq) )
+    {
+        lr_val.pending = true;
+
+        if ( irq->config == VGIC_CONFIG_EDGE )
+            irq->pending_latch = false;
+
+        if ( vgic_irq_is_sgi(irq->intid) )
+        {
+            uint32_t src = ffs(irq->source);
+
+            BUG_ON(!src);
+            lr_val.virt.source = (src - 1);
+            irq->source &= ~(1 << (src - 1));
+            if ( irq->source )
+                irq->pending_latch = true;
+        }
+    }
+
+    lr_val.active = irq->active;
+
+    if ( irq->hw )
+    {
+        lr_val.hw_status = true;
+        lr_val.hw.pirq = irq->hwintid;
+        /*
+         * Never set pending+active on a HW interrupt, as the
+         * pending state is kept at the physical distributor
+         * level.
+         */
+        if ( irq->active && irq_is_pending(irq) )
+            lr_val.pending = false;
+    }
+    else
+    {
+        if ( irq->config == VGIC_CONFIG_LEVEL )
+            lr_val.virt.eoi = true;
+    }
+
+    /*
+     * Level-triggered mapped IRQs are special because we only observe
+     * rising edges as input to the VGIC.  We therefore lower the line
+     * level here, so that we can take new virtual IRQs.  See
+     * vgic_v2_fold_lr_state for more info.
+     */
+    if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
+        irq->line_level = false;
+
+    /* The GICv2 LR only holds five bits of priority. */
+    lr_val.priority = irq->priority >> 3;
+
+    gic_hw_ops->write_lr(lr, &lr_val);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index d91ed29d96..214176c14e 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -520,6 +520,7 @@ retry:
 
 static void vgic_fold_lr_state(struct vcpu *vcpu)
 {
+    vgic_v2_fold_lr_state(vcpu);
 }
 
 /* Requires the irq_lock to be held. */
@@ -527,6 +528,8 @@ static void vgic_populate_lr(struct vcpu *vcpu,
                              struct vgic_irq *irq, int lr)
 {
     ASSERT(spin_is_locked(&irq->irq_lock));
+
+    vgic_v2_populate_lr(vcpu, irq, lr);
 }
 
 static void vgic_set_underflow(struct vcpu *vcpu)
@@ -640,7 +643,10 @@ void vgic_sync_to_lrs(void)
     spin_lock(&current->arch.vgic.ap_list_lock);
     vgic_flush_lr_state(current);
     spin_unlock(&current->arch.vgic.ap_list_lock);
+
+    gic_hw_ops->update_hcr_status(GICH_HCR_EN, 1);
 }
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/vgic/vgic.h b/xen/arch/arm/vgic/vgic.h
index 1547478518..e2b6d51e47 100644
--- a/xen/arch/arm/vgic/vgic.h
+++ b/xen/arch/arm/vgic/vgic.h
@@ -27,6 +27,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
         return irq->pending_latch || irq->line_level;
 }
 
+static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
+{
+    return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
+}
+
 struct vgic_irq *vgic_get_irq(struct domain *d, struct vcpu *vcpu,
                               uint32_t intid);
 void vgic_put_irq(struct domain *d, struct vgic_irq *irq);
@@ -41,6 +46,10 @@ static inline void vgic_get_irq_kref(struct vgic_irq *irq)
     atomic_inc(&irq->refcount);
 }
 
+void vgic_v2_fold_lr_state(struct vcpu *vcpu);
+void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int lr);
+void vgic_v2_set_underflow(struct vcpu *vcpu);
+
 #endif
 
 /*
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system
  2018-03-22 11:56 [PATCH v3a 00/39] (0/3) Fixups for the new VGIC(-v2) implementation Andre Przywara
  2018-03-22 11:56 ` [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ Andre Przywara
  2018-03-22 11:56 ` [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend Andre Przywara
@ 2018-03-22 11:56 ` Andre Przywara
  2018-03-22 13:59   ` Julien Grall
  2018-03-27 22:53   ` Stefano Stabellini
  2 siblings, 2 replies; 20+ messages in thread
From: Andre Przywara @ 2018-03-22 11:56 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini; +Cc: xen-devel

Now that we have both the old VGIC prepared to cope with a sibling and
the code for the new VGIC in place, lets add a Kconfig option to enable
the new code and wire it into the Xen build system.
This will add a compile time option to use either the "old" or the "new"
VGIC.
In the moment this is restricted to a vGIC-v2. To make the build system
happy, we provide a temporary dummy implementation of
vgic_v3_setup_hw() to allow building for now.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
---
Changelog v3 ... v3a:
- print panic when trying to run on GICv3 hardware

Changelog v2 ... v3:
- fix indentation of Kconfig entry
- select NEEDS_LIST_SORT
- drop unconditional list_sort.o inclusion

Changelog v1 ... v2:
- add Kconfig help text
- use separate Makefile in vgic/ directory
- protect compilation without GICV3 support
- always include list_sort() in build

 xen/arch/arm/Kconfig       | 18 +++++++++++++++++-
 xen/arch/arm/Makefile      |  5 ++++-
 xen/arch/arm/vgic/Makefile |  5 +++++
 xen/arch/arm/vgic/vgic.c   | 11 +++++++++++
 4 files changed, 37 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/arm/vgic/Makefile

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 2782ee6589..8174c0c635 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -48,7 +48,23 @@ config HAS_GICV3
 config HAS_ITS
         bool
         prompt "GICv3 ITS MSI controller support" if EXPERT = "y"
-        depends on HAS_GICV3
+        depends on HAS_GICV3 && !NEW_VGIC
+
+config NEW_VGIC
+	bool
+	prompt "Use new VGIC implementation"
+	select NEEDS_LIST_SORT
+	---help---
+
+	This is an alternative implementation of the ARM GIC interrupt
+	controller emulation, based on the Linux/KVM VGIC. It has a better
+	design and fixes many shortcomings of the existing GIC emulation in
+	Xen. It will eventually replace the existing/old VGIC.
+	However at the moment it lacks support for Dom0 using the ITS for
+	using MSIs.
+	Say Y if you want to help testing this new code or if you experience
+	problems with the standard emulation.
+	At the moment this implementation is not security supported.
 
 config SBSA_VUART_CONSOLE
 	bool "Emulated SBSA UART console support"
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 41d7366527..a9533b107e 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -16,7 +16,6 @@ obj-y += domain_build.o
 obj-y += domctl.o
 obj-$(EARLY_PRINTK) += early_printk.o
 obj-y += gic.o
-obj-y += gic-vgic.o
 obj-y += gic-v2.o
 obj-$(CONFIG_HAS_GICV3) += gic-v3.o
 obj-$(CONFIG_HAS_ITS) += gic-v3-its.o
@@ -47,10 +46,14 @@ obj-y += sysctl.o
 obj-y += time.o
 obj-y += traps.o
 obj-y += vcpreg.o
+subdir-$(CONFIG_NEW_VGIC) += vgic
+ifneq ($(CONFIG_NEW_VGIC),y)
+obj-y += gic-vgic.o
 obj-y += vgic.o
 obj-y += vgic-v2.o
 obj-$(CONFIG_HAS_GICV3) += vgic-v3.o
 obj-$(CONFIG_HAS_ITS) += vgic-v3-its.o
+endif
 obj-y += vm_event.o
 obj-y += vtimer.o
 obj-$(CONFIG_SBSA_VUART_CONSOLE) += vpl011.o
diff --git a/xen/arch/arm/vgic/Makefile b/xen/arch/arm/vgic/Makefile
new file mode 100644
index 0000000000..806826948e
--- /dev/null
+++ b/xen/arch/arm/vgic/Makefile
@@ -0,0 +1,5 @@
+obj-y += vgic.o
+obj-y += vgic-v2.o
+obj-y += vgic-mmio.o
+obj-y += vgic-mmio-v2.o
+obj-y += vgic-init.o
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index f9a5088285..ac18cab6f3 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -981,6 +981,17 @@ unsigned int vgic_max_vcpus(const struct domain *d)
     return min_t(unsigned int, MAX_VIRT_CPUS, vgic_vcpu_limit);
 }
 
+#ifdef CONFIG_HAS_GICV3
+/* Dummy implementation to allow building without actual vGICv3 support. */
+void vgic_v3_setup_hw(paddr_t dbase,
+                      unsigned int nr_rdist_regions,
+                      const struct rdist_region *regions,
+                      unsigned int intid_bits)
+{
+    panic("New VGIC implementation does not yet support GICv3.");
+}
+#endif
+
 /*
  * Local variables:
  * mode: C
-- 
2.14.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ
  2018-03-22 11:56 ` [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ Andre Przywara
@ 2018-03-22 13:58   ` Julien Grall
  2018-03-26 20:09   ` Stefano Stabellini
  1 sibling, 0 replies; 20+ messages in thread
From: Julien Grall @ 2018-03-22 13:58 UTC (permalink / raw)
  To: Andre Przywara, Stefano Stabellini; +Cc: xen-devel

Hi Andre,

On 03/22/2018 11:56 AM, Andre Przywara wrote:
> When playing around with hardware mapped, level triggered virtual IRQs,
> there is the need to explicitly set the active or pending state of an
> interrupt at some point.
> To prepare the GIC for that, we introduce a set_active_state() and a
> set_pending_state() function to let the VGIC manipulate the state of
> an associated hardware IRQ.
> This takes care of properly setting the _IRQ_INPROGRESS bit.
> 
> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>

Reviewed-by: Julien Grall <julien.grall@arm.com>

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system
  2018-03-22 11:56 ` [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system Andre Przywara
@ 2018-03-22 13:59   ` Julien Grall
  2018-03-27 22:53   ` Stefano Stabellini
  1 sibling, 0 replies; 20+ messages in thread
From: Julien Grall @ 2018-03-22 13:59 UTC (permalink / raw)
  To: Andre Przywara, Stefano Stabellini; +Cc: xen-devel

Hi Andre,

On 03/22/2018 11:56 AM, Andre Przywara wrote:
> Now that we have both the old VGIC prepared to cope with a sibling and
> the code for the new VGIC in place, lets add a Kconfig option to enable
> the new code and wire it into the Xen build system.
> This will add a compile time option to use either the "old" or the "new"
> VGIC.
> In the moment this is restricted to a vGIC-v2. To make the build system
> happy, we provide a temporary dummy implementation of
> vgic_v3_setup_hw() to allow building for now.
> 
> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>

Acked-by: Julien Grall <julien.grall@arm.com>

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-22 11:56 ` [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend Andre Przywara
@ 2018-03-22 14:06   ` Julien Grall
  2018-03-22 15:12     ` Andre Przywara
  2018-03-22 16:26     ` Andre Przywara
  2018-03-26 23:22   ` Stefano Stabellini
  1 sibling, 2 replies; 20+ messages in thread
From: Julien Grall @ 2018-03-22 14:06 UTC (permalink / raw)
  To: Andre Przywara, Stefano Stabellini; +Cc: xen-devel

Hi Andre,

On 03/22/2018 11:56 AM, Andre Przywara wrote:
> +        /* The locking order forces us to drop and re-take the locks here. */
> +        if ( irq->hw )
> +        {
> +            spin_unlock(&irq->irq_lock);
> +
> +            desc = irq_to_desc(irq->hwintid);
> +            spin_lock(&desc->lock);
> +            spin_lock(&irq->irq_lock);
> +
> +            /* This h/w IRQ should still be assigned to the virtual IRQ. */
> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
> +
> +            have_desc_lock = true;
> +        }

I am a bit concerned of this dance in fold_lr_state(). This looks 
awfully complex but I don't have better solution here. I will have a 
think during the night.

However, this is not going to solve the race condition I mentioned 
between clearing _IRQ_INPROGRESS here and setting _IRQ_INPROGRESS in 
do_IRQ. This is because you don't know the order they are going to be 
executed.

I wanted to make sure you didn't intend to solve that one. Am I correct?

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-22 14:06   ` Julien Grall
@ 2018-03-22 15:12     ` Andre Przywara
  2018-03-23  0:49       ` Julien Grall
  2018-03-22 16:26     ` Andre Przywara
  1 sibling, 1 reply; 20+ messages in thread
From: Andre Przywara @ 2018-03-22 15:12 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini; +Cc: xen-devel

Hi,

On 22/03/18 14:06, Julien Grall wrote:
> Hi Andre,
> 
> On 03/22/2018 11:56 AM, Andre Przywara wrote:
>> +        /* The locking order forces us to drop and re-take the locks
>> here. */
>> +        if ( irq->hw )
>> +        {
>> +            spin_unlock(&irq->irq_lock);
>> +
>> +            desc = irq_to_desc(irq->hwintid);
>> +            spin_lock(&desc->lock);
>> +            spin_lock(&irq->irq_lock);
>> +
>> +            /* This h/w IRQ should still be assigned to the virtual
>> IRQ. */
>> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
>> +
>> +            have_desc_lock = true;
>> +        }
> 
> I am a bit concerned of this dance in fold_lr_state(). This looks
> awfully complex but I don't have better solution here.

I agree.

> I will have a think during the night.
> 
> However, this is not going to solve the race condition I mentioned
> between clearing _IRQ_INPROGRESS here and setting _IRQ_INPROGRESS in
> do_IRQ. This is because you don't know the order they are going to be
> executed.
> 
> I wanted to make sure you didn't intend to solve that one. Am I correct?

This is right, this is orthogonal and not addressed by this patch. I
have a hunch we need to solve this in irq.c instead.

Cheers,
Andre.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-22 14:06   ` Julien Grall
  2018-03-22 15:12     ` Andre Przywara
@ 2018-03-22 16:26     ` Andre Przywara
  1 sibling, 0 replies; 20+ messages in thread
From: Andre Przywara @ 2018-03-22 16:26 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini; +Cc: xen-devel

Hi,

On 22/03/18 14:06, Julien Grall wrote:
> Hi Andre,
> 
> On 03/22/2018 11:56 AM, Andre Przywara wrote:
>> +        /* The locking order forces us to drop and re-take the locks
>> here. */
>> +        if ( irq->hw )
>> +        {
>> +            spin_unlock(&irq->irq_lock);
>> +
>> +            desc = irq_to_desc(irq->hwintid);

Argh, those two lines should be swapped, I guess.
I guess that doesn't really matter with our current "stick with that
hardware mapped IRQ forever" approach, but should be more future proof
anyway and is more correct.

Cheers,
Andre.

>> +            spin_lock(&desc->lock);
>> +            spin_lock(&irq->irq_lock);
>> +
>> +            /* This h/w IRQ should still be assigned to the virtual
>> IRQ. */
>> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
>> +
>> +            have_desc_lock = true;
>> +        }
> 
> I am a bit concerned of this dance in fold_lr_state(). This looks
> awfully complex but I don't have better solution here. I will have a
> think during the night.
> 
> However, this is not going to solve the race condition I mentioned
> between clearing _IRQ_INPROGRESS here and setting _IRQ_INPROGRESS in
> do_IRQ. This is because you don't know the order they are going to be
> executed.
> 
> I wanted to make sure you didn't intend to solve that one. Am I correct?
> 
> Cheers,
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-22 15:12     ` Andre Przywara
@ 2018-03-23  0:49       ` Julien Grall
  0 siblings, 0 replies; 20+ messages in thread
From: Julien Grall @ 2018-03-23  0:49 UTC (permalink / raw)
  To: Andre Przywara, Stefano Stabellini; +Cc: xen-devel

Hi,

On 03/22/2018 03:12 PM, Andre Przywara wrote:
> Hi,
> 
> On 22/03/18 14:06, Julien Grall wrote:
>> Hi Andre,
>>
>> On 03/22/2018 11:56 AM, Andre Przywara wrote:
>>> +        /* The locking order forces us to drop and re-take the locks
>>> here. */
>>> +        if ( irq->hw )
>>> +        {
>>> +            spin_unlock(&irq->irq_lock);
>>> +
>>> +            desc = irq_to_desc(irq->hwintid);
>>> +            spin_lock(&desc->lock);
>>> +            spin_lock(&irq->irq_lock);
>>> +
>>> +            /* This h/w IRQ should still be assigned to the virtual
>>> IRQ. */
>>> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
>>> +
>>> +            have_desc_lock = true;
>>> +        }
>>
>> I am a bit concerned of this dance in fold_lr_state(). This looks
>> awfully complex but I don't have better solution here.
> 
> I agree.

I still have much idea how to solve that nicely. Maybe Stefano has?

Meanwhile, I would be happy to get that in Xen:

Acked-by: Julien Grall <julien.grall@arm.com.

> 
>> I will have a think during the night.
>>
>> However, this is not going to solve the race condition I mentioned
>> between clearing _IRQ_INPROGRESS here and setting _IRQ_INPROGRESS in
>> do_IRQ. This is because you don't know the order they are going to be
>> executed.
>>
>> I wanted to make sure you didn't intend to solve that one. Am I correct?
> 
> This is right, this is orthogonal and not addressed by this patch. I
> have a hunch we need to solve this in irq.c instead.

I guess this should be logged on Jira with the rest of the open items.

> 
> Cheers,
> Andre.
> 

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ
  2018-03-22 11:56 ` [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ Andre Przywara
  2018-03-22 13:58   ` Julien Grall
@ 2018-03-26 20:09   ` Stefano Stabellini
  1 sibling, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2018-03-26 20:09 UTC (permalink / raw)
  To: Andre Przywara; +Cc: xen-devel, Julien Grall, Stefano Stabellini

On Thu, 22 Mar 2018, Andre Przywara wrote:
> When playing around with hardware mapped, level triggered virtual IRQs,
> there is the need to explicitly set the active or pending state of an
> interrupt at some point.
> To prepare the GIC for that, we introduce a set_active_state() and a
> set_pending_state() function to let the VGIC manipulate the state of
> an associated hardware IRQ.
> This takes care of properly setting the _IRQ_INPROGRESS bit.
> 
> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>

> ---
> Changelog v3 ... v3a:
> - always set/clear _IRQ_INPROGRESS bit (not only for guest IRQs)
> - add comments
> 
> Changelog v2 ... v3:
> - extend comments to note preliminary nature of vgic_get_lpi()
> 
> Changelog v1 ... v2:
> - reorder header file inclusion
> 
>  xen/arch/arm/gic-v2.c     | 41 +++++++++++++++++++++++++++++++++++++++++
>  xen/arch/arm/gic-v3.c     | 37 +++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/gic.h | 24 ++++++++++++++++++++++++
>  3 files changed, 102 insertions(+)
> 
> diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
> index aa0fc6c1a1..7374686235 100644
> --- a/xen/arch/arm/gic-v2.c
> +++ b/xen/arch/arm/gic-v2.c
> @@ -243,6 +243,45 @@ static void gicv2_poke_irq(struct irq_desc *irqd, uint32_t offset)
>      writel_gicd(1U << (irqd->irq % 32), offset + (irqd->irq / 32) * 4);
>  }
>  
> +/*
> + * This is forcing the active state of an interrupt, somewhat circumventing
> + * the normal interrupt flow and the GIC state machine. So use with care
> + * and only if you know what you are doing. For this reason we also have to
> + * tinker with the _IRQ_INPROGRESS bit here, since the normal IRQ handler
> + * will not be involved.
> + */
> +static void gicv2_set_active_state(struct irq_desc *irqd, bool active)
> +{
> +    ASSERT(spin_is_locked(&irqd->lock));
> +
> +    if ( active )
> +    {
> +        set_bit(_IRQ_INPROGRESS, &irqd->status);
> +        gicv2_poke_irq(irqd, GICD_ISACTIVER);
> +    }
> +    else
> +    {
> +        clear_bit(_IRQ_INPROGRESS, &irqd->status);
> +        gicv2_poke_irq(irqd, GICD_ICACTIVER);
> +    }
> +}
> +
> +static void gicv2_set_pending_state(struct irq_desc *irqd, bool pending)
> +{
> +    ASSERT(spin_is_locked(&irqd->lock));
> +
> +    if ( pending )
> +    {
> +        /* The _IRQ_INPROGRESS bit will be set when the interrupt fires. */
> +        gicv2_poke_irq(irqd, GICD_ISPENDR);
> +    }
> +    else
> +    {
> +        /* The _IRQ_INPROGRESS remains unchanged. */
> +        gicv2_poke_irq(irqd, GICD_ICPENDR);
> +    }
> +}
> +
>  static void gicv2_set_irq_type(struct irq_desc *desc, unsigned int type)
>  {
>      uint32_t cfg, actual, edgebit;
> @@ -1278,6 +1317,8 @@ const static struct gic_hw_operations gicv2_ops = {
>      .eoi_irq             = gicv2_eoi_irq,
>      .deactivate_irq      = gicv2_dir_irq,
>      .read_irq            = gicv2_read_irq,
> +    .set_active_state    = gicv2_set_active_state,
> +    .set_pending_state   = gicv2_set_pending_state,
>      .set_irq_type        = gicv2_set_irq_type,
>      .set_irq_priority    = gicv2_set_irq_priority,
>      .send_SGI            = gicv2_send_SGI,
> diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index cb41844af2..a5105ac9e7 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -477,6 +477,41 @@ static unsigned int gicv3_read_irq(void)
>      return irq;
>  }
>  
> +/*
> + * This is forcing the active state of an interrupt, somewhat circumventing
> + * the normal interrupt flow and the GIC state machine. So use with care
> + * and only if you know what you are doing. For this reason we also have to
> + * tinker with the _IRQ_INPROGRESS bit here, since the normal IRQ handler
> + * will not be involved.
> + */
> +static void gicv3_set_active_state(struct irq_desc *irqd, bool active)
> +{
> +    ASSERT(spin_is_locked(&irqd->lock));
> +
> +    if ( active )
> +    {
> +        set_bit(_IRQ_INPROGRESS, &irqd->status);
> +        gicv3_poke_irq(irqd, GICD_ISACTIVER, false);
> +    }
> +    else
> +    {
> +        clear_bit(_IRQ_INPROGRESS, &irqd->status);
> +        gicv3_poke_irq(irqd, GICD_ICACTIVER, false);
> +    }
> +}
> +
> +static void gicv3_set_pending_state(struct irq_desc *irqd, bool pending)
> +{
> +    ASSERT(spin_is_locked(&irqd->lock));
> +
> +    if ( pending )
> +        /* The _IRQ_INPROGRESS bit will be set when the interrupt fires. */
> +        gicv3_poke_irq(irqd, GICD_ISPENDR, false);
> +    else
> +        /* The _IRQ_INPROGRESS bit will remain unchanged. */
> +        gicv3_poke_irq(irqd, GICD_ICPENDR, false);
> +}
> +
>  static inline uint64_t gicv3_mpidr_to_affinity(int cpu)
>  {
>       uint64_t mpidr = cpu_logical_map(cpu);
> @@ -1769,6 +1804,8 @@ static const struct gic_hw_operations gicv3_ops = {
>      .eoi_irq             = gicv3_eoi_irq,
>      .deactivate_irq      = gicv3_dir_irq,
>      .read_irq            = gicv3_read_irq,
> +    .set_active_state    = gicv3_set_active_state,
> +    .set_pending_state   = gicv3_set_pending_state,
>      .set_irq_type        = gicv3_set_irq_type,
>      .set_irq_priority    = gicv3_set_irq_priority,
>      .send_SGI            = gicv3_send_sgi,
> diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
> index 3079387e06..2aca243ac3 100644
> --- a/xen/include/asm-arm/gic.h
> +++ b/xen/include/asm-arm/gic.h
> @@ -345,6 +345,10 @@ struct gic_hw_operations {
>      void (*deactivate_irq)(struct irq_desc *irqd);
>      /* Read IRQ id and Ack */
>      unsigned int (*read_irq)(void);
> +    /* Force the active state of an IRQ by accessing the distributor */
> +    void (*set_active_state)(struct irq_desc *irqd, bool state);
> +    /* Force the pending state of an IRQ by accessing the distributor */
> +    void (*set_pending_state)(struct irq_desc *irqd, bool state);
>      /* Set IRQ type */
>      void (*set_irq_type)(struct irq_desc *desc, unsigned int type);
>      /* Set IRQ priority */
> @@ -393,6 +397,26 @@ static inline unsigned int gic_get_nr_lrs(void)
>      return gic_hw_ops->info->nr_lrs;
>  }
>  
> +/*
> + * Set the active state of an IRQ. This should be used with care, as this
> + * directly forces the active bit, without considering the GIC state machine.
> + * For private IRQs this only works for those of the current CPU.
> + */
> +static inline void gic_set_active_state(struct irq_desc *irqd, bool state)
> +{
> +    gic_hw_ops->set_active_state(irqd, state);
> +}
> +
> +/*
> + * Set the pending state of an IRQ. This should be used with care, as this
> + * directly forces the pending bit, without considering the GIC state machine.
> + * For private IRQs this only works for those of the current CPU.
> + */
> +static inline void gic_set_pending_state(struct irq_desc *irqd, bool state)
> +{
> +    gic_hw_ops->set_pending_state(irqd, state);
> +}
> +
>  void register_gic_ops(const struct gic_hw_operations *ops);
>  int gic_make_hwdom_dt_node(const struct domain *d,
>                             const struct dt_device_node *gic,
> -- 
> 2.14.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-22 11:56 ` [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend Andre Przywara
  2018-03-22 14:06   ` Julien Grall
@ 2018-03-26 23:22   ` Stefano Stabellini
  2018-03-27  0:16     ` Julien Grall
                       ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Stefano Stabellini @ 2018-03-26 23:22 UTC (permalink / raw)
  To: Andre Przywara; +Cc: xen-devel, Julien Grall, Stefano Stabellini

On Thu, 22 Mar 2018, Andre Przywara wrote:
> Processing maintenance interrupts and accessing the list registers
> are dependent on the host's GIC version.
> Introduce vgic-v2.c to contain GICv2 specific functions.
> Implement the GICv2 specific code for syncing the emulation state
> into the VGIC registers.
> This also adds the hook to let Xen setup the host GIC addresses.
> 
> This is based on Linux commit 140b086dd197, written by Marc Zyngier.
> 
> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
> ---
> Changelog v3 ... v3a:
> - take hardware IRQ lock in vgic_v2_fold_lr_state()
> - fix last remaining u32 usage
> - print message when using new VGIC
> - add TODO about racy _IRQ_INPROGRESS setting
> 
> Changelog v2 ... v3:
> - remove no longer needed asm/io.h header
> - replace 0/1 with false/true for bool's
> - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
> - fix indentation and w/s issues
> 
> Changelog v1 ... v2:
> - remove v2 specific underflow function (now generic)
> - re-add Linux code to properly handle acked level IRQs
> 
>  xen/arch/arm/vgic/vgic-v2.c | 259 ++++++++++++++++++++++++++++++++++++++++++++
>  xen/arch/arm/vgic/vgic.c    |   6 +
>  xen/arch/arm/vgic/vgic.h    |   9 ++
>  3 files changed, 274 insertions(+)
>  create mode 100644 xen/arch/arm/vgic/vgic-v2.c
> 
> diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
> new file mode 100644
> index 0000000000..1773503cfb
> --- /dev/null
> +++ b/xen/arch/arm/vgic/vgic-v2.c
> @@ -0,0 +1,259 @@
> +/*
> + * Copyright (C) 2015, 2016 ARM Ltd.
> + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <asm/new_vgic.h>
> +#include <asm/bug.h>
> +#include <asm/gic.h>
> +#include <xen/sched.h>
> +#include <xen/sizes.h>
> +
> +#include "vgic.h"
> +
> +static struct {
> +    bool enabled;
> +    paddr_t dbase;          /* Distributor interface address */
> +    paddr_t cbase;          /* CPU interface address & size */
> +    paddr_t csize;
> +    paddr_t vbase;          /* Virtual CPU interface address */
> +
> +    /* Offset to add to get an 8kB contiguous region if GIC is aliased */
> +    uint32_t aliased_offset;
> +} gic_v2_hw_data;
> +
> +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
> +                      paddr_t vbase, uint32_t aliased_offset)
> +{
> +    gic_v2_hw_data.enabled = true;
> +    gic_v2_hw_data.dbase = dbase;
> +    gic_v2_hw_data.cbase = cbase;
> +    gic_v2_hw_data.csize = csize;
> +    gic_v2_hw_data.vbase = vbase;
> +    gic_v2_hw_data.aliased_offset = aliased_offset;
> +
> +    printk("Using the new VGIC implementation.\n");
> +}
> +
> +/*
> + * transfer the content of the LRs back into the corresponding ap_list:
> + * - active bit is transferred as is
> + * - pending bit is
> + *   - transferred as is in case of edge sensitive IRQs
> + *   - set to the line-level (resample time) for level sensitive IRQs
> + */
> +void vgic_v2_fold_lr_state(struct vcpu *vcpu)
> +{
> +    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
> +    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
> +    unsigned long flags;
> +    unsigned int lr;
> +
> +    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
> +        return;
> +
> +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
> +
> +    for ( lr = 0; lr < used_lrs; lr++ )
> +    {
> +        struct gic_lr lr_val;
> +        uint32_t intid;
> +        struct vgic_irq *irq;
> +        struct irq_desc *desc = NULL;
> +        bool have_desc_lock = false;
> +
> +        gic_hw_ops->read_lr(lr, &lr_val);
> +
> +        /*
> +         * TODO: Possible optimization to avoid reading LRs:
> +         * Read the ELRSR to find out which of our LRs have been cleared
> +         * by the guest. We just need to know the IRQ number for those, which
> +         * we could save in an array when populating the LRs.
> +         * This trades one MMIO access (ELRSR) for possibly more than one (LRs),
> +         * but requires some more code to save the IRQ number and to handle
> +         * those finished IRQs according to the algorithm below.
> +         * We need some numbers to justify this: chances are that we don't
> +         * have many LRs in use most of the time, so we might not save much.
> +         */
> +        gic_hw_ops->clear_lr(lr);
> +
> +        intid = lr_val.virq;
> +        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
> +
> +        local_irq_save(flags);

Shouldn't we disable interrupts earlier, maybe at the beginning of the
function? Is it not a problem if we take an interrupt a couple of lines
above with the read_lr and clear_lr that we do?


> +        spin_lock(&irq->irq_lock);
> +
> +        /* The locking order forces us to drop and re-take the locks here. */
> +        if ( irq->hw )
> +        {
> +            spin_unlock(&irq->irq_lock);
> +
> +            desc = irq_to_desc(irq->hwintid);
> +            spin_lock(&desc->lock);
> +            spin_lock(&irq->irq_lock);
> +
> +            /* This h/w IRQ should still be assigned to the virtual IRQ. */
> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
> +
> +            have_desc_lock = true;
> +        }

I agree with Julien that this looks very fragile. Instead, I think it
would be best to always take the desc lock (if irq->hw) before the
irq_lock earlier in this function. That way, we don't have to deal with
this business of unlocking and relocking. Do you see any problems with
it? We don't change irq->hw at run time, so it looks OK to me.


> +        /*
> +         * If a hardware mapped IRQ has been handled for good, we need to
> +         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
> +         *
> +         * TODO: This is probably racy, but is so already in the existing
> +         * VGIC. A fix does not seem to be trivial.
> +         */
> +        if ( irq->hw && !lr_val.active && !lr_val.pending )
> +            clear_bit(_IRQ_INPROGRESS, &desc->status);

I'll reply here to Julien's comment:

> I realize the current vGIC is doing exactly the same thing. But this is racy.
> 
> Imagine the interrupt is firing on another pCPU (I wasn't able to rule out this even when the interrupt is following the vCPU), that pCPU may set _IRQ_INPROGRESS before this
> is cleared here.

The assumption in the old vgic was that this scenario was not possible.
vgic_migrate_irq would avoid changing physical interrupt affinity if a
virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298).
Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the
time of clearing the LR we would change the physical irq affinity (see
xen/arch/arm/gic-vgic.c:L240).

I think we would need a similar mechanism here to protect ourselves from
races. Is there something equivalent in the new vgic?


> +        /* Always preserve the active bit */
> +        irq->active = lr_val.active;
> +
> +        /* Edge is the only case where we preserve the pending bit */
> +        if ( irq->config == VGIC_CONFIG_EDGE && lr_val.pending )
> +        {
> +            irq->pending_latch = true;
> +
> +            if ( vgic_irq_is_sgi(intid) )
> +                irq->source |= (1U << lr_val.virt.source);
> +        }
> +
> +        /* Clear soft pending state when level irqs have been acked. */
> +        if ( irq->config == VGIC_CONFIG_LEVEL && !lr_val.pending )
> +            irq->pending_latch = false;
> +
> +        /*
> +         * Level-triggered mapped IRQs are special because we only
> +         * observe rising edges as input to the VGIC.
> +         *
> +         * If the guest never acked the interrupt we have to sample
> +         * the physical line and set the line level, because the
> +         * device state could have changed or we simply need to
> +         * process the still pending interrupt later.
> +         *
> +         * If this causes us to lower the level, we have to also clear
> +         * the physical active state, since we will otherwise never be
> +         * told when the interrupt becomes asserted again.
> +         */
> +        if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
> +        {
> +            ASSERT(irq->hwintid >= VGIC_NR_PRIVATE_IRQS);
> +
> +            irq->line_level = gic_read_pending_state(desc);
> +
> +            if ( !irq->line_level )
> +                gic_set_active_state(desc, false);
> +        }
> +
> +        spin_unlock(&irq->irq_lock);
> +        if ( have_desc_lock )
> +            spin_unlock(&desc->lock);
> +        local_irq_restore(flags);
> +
> +        vgic_put_irq(vcpu->domain, irq);
> +    }
> +
> +    gic_hw_ops->update_hcr_status(GICH_HCR_EN, false);
> +    vgic_cpu->used_lrs = 0;
> +}
> +
> +/**
> + * vgic_v2_populate_lr() - Populates an LR with the state of a given IRQ.
> + * @vcpu: The VCPU which the given @irq belongs to.
> + * @irq:  The IRQ to convert into an LR. The irq_lock must be held already.
> + * @lr:   The LR number to transfer the state into.
> + *
> + * This moves a virtual IRQ, represented by its vgic_irq, into a list register.
> + * Apart from translating the logical state into the LR bitfields, it also
> + * changes some state in the vgic_irq.
> + * For an edge sensitive IRQ the pending state is cleared in struct vgic_irq,
> + * for a level sensitive IRQ the pending state value is unchanged, as it is
> + * dictated directly by the input line level.
> + *
> + * If @irq describes an SGI with multiple sources, we choose the
> + * lowest-numbered source VCPU and clear that bit in the source bitmap.
> + *
> + * The irq_lock must be held by the caller.
> + */
> +void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int lr)
> +{
> +    struct gic_lr lr_val = {0};
> +
> +    lr_val.virq = irq->intid;
> +
> +    if ( irq_is_pending(irq) )
> +    {
> +        lr_val.pending = true;
> +
> +        if ( irq->config == VGIC_CONFIG_EDGE )
> +            irq->pending_latch = false;
> +
> +        if ( vgic_irq_is_sgi(irq->intid) )
> +        {
> +            uint32_t src = ffs(irq->source);
> +
> +            BUG_ON(!src);
> +            lr_val.virt.source = (src - 1);
> +            irq->source &= ~(1 << (src - 1));
> +            if ( irq->source )
> +                irq->pending_latch = true;
> +        }
> +    }
> +
> +    lr_val.active = irq->active;
> +
> +    if ( irq->hw )
> +    {
> +        lr_val.hw_status = true;
> +        lr_val.hw.pirq = irq->hwintid;
> +        /*
> +         * Never set pending+active on a HW interrupt, as the
> +         * pending state is kept at the physical distributor
> +         * level.
> +         */
> +        if ( irq->active && irq_is_pending(irq) )
> +            lr_val.pending = false;
> +    }
> +    else
> +    {
> +        if ( irq->config == VGIC_CONFIG_LEVEL )
> +            lr_val.virt.eoi = true;
> +    }
> +
> +    /*
> +     * Level-triggered mapped IRQs are special because we only observe
> +     * rising edges as input to the VGIC.  We therefore lower the line
> +     * level here, so that we can take new virtual IRQs.  See
> +     * vgic_v2_fold_lr_state for more info.
> +     */
> +    if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
> +        irq->line_level = false;
> +
> +    /* The GICv2 LR only holds five bits of priority. */
> +    lr_val.priority = irq->priority >> 3;
> +
> +    gic_hw_ops->write_lr(lr, &lr_val);
> +}
> +
> +/*
> + * Local variables:
> + * mode: C
> + * c-file-style: "BSD"
> + * c-basic-offset: 4
> + * indent-tabs-mode: nil
> + * End:
> + */
> diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
> index d91ed29d96..214176c14e 100644
> --- a/xen/arch/arm/vgic/vgic.c
> +++ b/xen/arch/arm/vgic/vgic.c
> @@ -520,6 +520,7 @@ retry:
>  
>  static void vgic_fold_lr_state(struct vcpu *vcpu)
>  {
> +    vgic_v2_fold_lr_state(vcpu);
>  }
>  
>  /* Requires the irq_lock to be held. */
> @@ -527,6 +528,8 @@ static void vgic_populate_lr(struct vcpu *vcpu,
>                               struct vgic_irq *irq, int lr)
>  {
>      ASSERT(spin_is_locked(&irq->irq_lock));
> +
> +    vgic_v2_populate_lr(vcpu, irq, lr);
>  }
>  
>  static void vgic_set_underflow(struct vcpu *vcpu)
> @@ -640,7 +643,10 @@ void vgic_sync_to_lrs(void)
>      spin_lock(&current->arch.vgic.ap_list_lock);
>      vgic_flush_lr_state(current);
>      spin_unlock(&current->arch.vgic.ap_list_lock);
> +
> +    gic_hw_ops->update_hcr_status(GICH_HCR_EN, 1);
>  }
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/arm/vgic/vgic.h b/xen/arch/arm/vgic/vgic.h
> index 1547478518..e2b6d51e47 100644
> --- a/xen/arch/arm/vgic/vgic.h
> +++ b/xen/arch/arm/vgic/vgic.h
> @@ -27,6 +27,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
>          return irq->pending_latch || irq->line_level;
>  }
>  
> +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
> +{
> +    return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
> +}
> +
>  struct vgic_irq *vgic_get_irq(struct domain *d, struct vcpu *vcpu,
>                                uint32_t intid);
>  void vgic_put_irq(struct domain *d, struct vgic_irq *irq);
> @@ -41,6 +46,10 @@ static inline void vgic_get_irq_kref(struct vgic_irq *irq)
>      atomic_inc(&irq->refcount);
>  }
>  
> +void vgic_v2_fold_lr_state(struct vcpu *vcpu);
> +void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int lr);
> +void vgic_v2_set_underflow(struct vcpu *vcpu);
> +
>  #endif
>  
>  /*
> -- 
> 2.14.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-26 23:22   ` Stefano Stabellini
@ 2018-03-27  0:16     ` Julien Grall
  2018-03-27  0:23       ` Stefano Stabellini
  2018-03-27  0:21     ` Stefano Stabellini
  2018-03-27 15:33     ` Andre Przywara
  2 siblings, 1 reply; 20+ messages in thread
From: Julien Grall @ 2018-03-27  0:16 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Julien Grall, Andre Przywara


[-- Attachment #1.1: Type: text/plain, Size: 15638 bytes --]

Hi,

Sorry for the formatting.

On Tue, 27 Mar 2018, 07:25 Stefano Stabellini, <sstabellini@kernel.org>
wrote:

> On Thu, 22 Mar 2018, Andre Przywara wrote:
> > Processing maintenance interrupts and accessing the list registers
> > are dependent on the host's GIC version.
> > Introduce vgic-v2.c to contain GICv2 specific functions.
> > Implement the GICv2 specific code for syncing the emulation state
> > into the VGIC registers.
> > This also adds the hook to let Xen setup the host GIC addresses.
> >
> > This is based on Linux commit 140b086dd197, written by Marc Zyngier.
> >
> > Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
> > ---
> > Changelog v3 ... v3a:
> > - take hardware IRQ lock in vgic_v2_fold_lr_state()
> > - fix last remaining u32 usage
> > - print message when using new VGIC
> > - add TODO about racy _IRQ_INPROGRESS setting
> >
> > Changelog v2 ... v3:
> > - remove no longer needed asm/io.h header
> > - replace 0/1 with false/true for bool's
> > - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
> > - fix indentation and w/s issues
> >
> > Changelog v1 ... v2:
> > - remove v2 specific underflow function (now generic)
> > - re-add Linux code to properly handle acked level IRQs
> >
> >  xen/arch/arm/vgic/vgic-v2.c | 259
> ++++++++++++++++++++++++++++++++++++++++++++
> >  xen/arch/arm/vgic/vgic.c    |   6 +
> >  xen/arch/arm/vgic/vgic.h    |   9 ++
> >  3 files changed, 274 insertions(+)
> >  create mode 100644 xen/arch/arm/vgic/vgic-v2.c
> >
> > diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
> > new file mode 100644
> > index 0000000000..1773503cfb
> > --- /dev/null
> > +++ b/xen/arch/arm/vgic/vgic-v2.c
> > @@ -0,0 +1,259 @@
> > +/*
> > + * Copyright (C) 2015, 2016 ARM Ltd.
> > + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program.  If not, see <http://www.gnu.org/licenses/
> >.
> > + */
> > +
> > +#include <asm/new_vgic.h>
> > +#include <asm/bug.h>
> > +#include <asm/gic.h>
> > +#include <xen/sched.h>
> > +#include <xen/sizes.h>
> > +
> > +#include "vgic.h"
> > +
> > +static struct {
> > +    bool enabled;
> > +    paddr_t dbase;          /* Distributor interface address */
> > +    paddr_t cbase;          /* CPU interface address & size */
> > +    paddr_t csize;
> > +    paddr_t vbase;          /* Virtual CPU interface address */
> > +
> > +    /* Offset to add to get an 8kB contiguous region if GIC is aliased
> */
> > +    uint32_t aliased_offset;
> > +} gic_v2_hw_data;
> > +
> > +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
> > +                      paddr_t vbase, uint32_t aliased_offset)
> > +{
> > +    gic_v2_hw_data.enabled = true;
> > +    gic_v2_hw_data.dbase = dbase;
> > +    gic_v2_hw_data.cbase = cbase;
> > +    gic_v2_hw_data.csize = csize;
> > +    gic_v2_hw_data.vbase = vbase;
> > +    gic_v2_hw_data.aliased_offset = aliased_offset;
> > +
> > +    printk("Using the new VGIC implementation.\n");
> > +}
> > +
> > +/*
> > + * transfer the content of the LRs back into the corresponding ap_list:
> > + * - active bit is transferred as is
> > + * - pending bit is
> > + *   - transferred as is in case of edge sensitive IRQs
> > + *   - set to the line-level (resample time) for level sensitive IRQs
> > + */
> > +void vgic_v2_fold_lr_state(struct vcpu *vcpu)
> > +{
> > +    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
> > +    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
> > +    unsigned long flags;
> > +    unsigned int lr;
> > +
> > +    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
> > +        return;
> > +
> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
> > +
> > +    for ( lr = 0; lr < used_lrs; lr++ )
> > +    {
> > +        struct gic_lr lr_val;
> > +        uint32_t intid;
> > +        struct vgic_irq *irq;
> > +        struct irq_desc *desc = NULL;
> > +        bool have_desc_lock = false;
> > +
> > +        gic_hw_ops->read_lr(lr, &lr_val);
> > +
> > +        /*
> > +         * TODO: Possible optimization to avoid reading LRs:
> > +         * Read the ELRSR to find out which of our LRs have been cleared
> > +         * by the guest. We just need to know the IRQ number for those,
> which
> > +         * we could save in an array when populating the LRs.
> > +         * This trades one MMIO access (ELRSR) for possibly more than
> one (LRs),
> > +         * but requires some more code to save the IRQ number and to
> handle
> > +         * those finished IRQs according to the algorithm below.
> > +         * We need some numbers to justify this: chances are that we
> don't
> > +         * have many LRs in use most of the time, so we might not save
> much.
> > +         */
> > +        gic_hw_ops->clear_lr(lr);
> > +
> > +        intid = lr_val.virq;
> > +        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
> > +
> > +        local_irq_save(flags);
>
> Shouldn't we disable interrupts earlier, maybe at the beginning of the
> function? Is it not a problem if we take an interrupt a couple of lines
> above with the read_lr and clear_lr that we do?
>
>
> > +        spin_lock(&irq->irq_lock);
> > +
> > +        /* The locking order forces us to drop and re-take the locks
> here. */
> > +        if ( irq->hw )
> > +        {
> > +            spin_unlock(&irq->irq_lock);
> > +
> > +            desc = irq_to_desc(irq->hwintid);
> > +            spin_lock(&desc->lock);
> > +            spin_lock(&irq->irq_lock);
> > +
> > +            /* This h/w IRQ should still be assigned to the virtual
> IRQ. */
> > +            ASSERT(irq->hw && desc->irq == irq->hwintid);
> > +
> > +            have_desc_lock = true;
> > +        }
>
> I agree with Julien that this looks very fragile. Instead, I think it
> would be best to always take the desc lock (if irq->hw) before the
> irq_lock earlier in this function. That way, we don't have to deal with
> this business of unlocking and relocking. Do you see any problems with
> it? We don't change irq->hw at run time, so it looks OK to me.
>
>
> > +        /*
> > +         * If a hardware mapped IRQ has been handled for good, we need
> to
> > +         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
> > +         *
> > +         * TODO: This is probably racy, but is so already in the
> existing
> > +         * VGIC. A fix does not seem to be trivial.
> > +         */
> > +        if ( irq->hw && !lr_val.active && !lr_val.pending )
> > +            clear_bit(_IRQ_INPROGRESS, &desc->status);
>
> I'll reply here to Julien's comment:
>
> > I realize the current vGIC is doing exactly the same thing. But this is
> racy.
> >
> > Imagine the interrupt is firing on another pCPU (I wasn't able to rule
> out this even when the interrupt is following the vCPU), that pCPU may set
> _IRQ_INPROGRESS before this
> > is cleared here.
>
> The assumption in the old vgic was that this scenario was not possible.
> vgic_migrate_irq would avoid changing physical interrupt affinity if a
> virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298).
> Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the
> time of clearing the LR we would change the physical irq affinity (see
> xen/arch/arm/gic-vgic.c:L240).


> I think we would need a similar mechanism here to protect ourselves from
> races. Is there something equivalent in the new vgic?
>

A mechanism that we know is already very racy on the old vGIC. You also
have to take into account that write to ITARGETR/IROUTER will take effect
in finite time. A interrupt may still get pending on the old pCPU.

To be honest we should migrate the interrupt on gues MMIO and finding a way
to handle the desc->state correctly.

One of the solution is to avoid relying in the desc->state when releasing
IRQ. So we would not need to care a potential.

Cheers,


>
> > +        /* Always preserve the active bit */
> > +        irq->active = lr_val.active;
> > +
> > +        /* Edge is the only case where we preserve the pending bit */
> > +        if ( irq->config == VGIC_CONFIG_EDGE && lr_val.pending )
> > +        {
> > +            irq->pending_latch = true;
> > +
> > +            if ( vgic_irq_is_sgi(intid) )
> > +                irq->source |= (1U << lr_val.virt.source);
> > +        }
> > +
> > +        /* Clear soft pending state when level irqs have been acked. */
> > +        if ( irq->config == VGIC_CONFIG_LEVEL && !lr_val.pending )
> > +            irq->pending_latch = false;
> > +
> > +        /*
> > +         * Level-triggered mapped IRQs are special because we only
> > +         * observe rising edges as input to the VGIC.
> > +         *
> > +         * If the guest never acked the interrupt we have to sample
> > +         * the physical line and set the line level, because the
> > +         * device state could have changed or we simply need to
> > +         * process the still pending interrupt later.
> > +         *
> > +         * If this causes us to lower the level, we have to also clear
> > +         * the physical active state, since we will otherwise never be
> > +         * told when the interrupt becomes asserted again.
> > +         */
> > +        if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
> > +        {
> > +            ASSERT(irq->hwintid >= VGIC_NR_PRIVATE_IRQS);
> > +
> > +            irq->line_level = gic_read_pending_state(desc);
> > +
> > +            if ( !irq->line_level )
> > +                gic_set_active_state(desc, false);
> > +        }
> > +
> > +        spin_unlock(&irq->irq_lock);
> > +        if ( have_desc_lock )
> > +            spin_unlock(&desc->lock);
> > +        local_irq_restore(flags);
> > +
> > +        vgic_put_irq(vcpu->domain, irq);
> > +    }
> > +
> > +    gic_hw_ops->update_hcr_status(GICH_HCR_EN, false);
> > +    vgic_cpu->used_lrs = 0;
> > +}
> > +
> > +/**
> > + * vgic_v2_populate_lr() - Populates an LR with the state of a given
> IRQ.
> > + * @vcpu: The VCPU which the given @irq belongs to.
> > + * @irq:  The IRQ to convert into an LR. The irq_lock must be held
> already.
> > + * @lr:   The LR number to transfer the state into.
> > + *
> > + * This moves a virtual IRQ, represented by its vgic_irq, into a list
> register.
> > + * Apart from translating the logical state into the LR bitfields, it
> also
> > + * changes some state in the vgic_irq.
> > + * For an edge sensitive IRQ the pending state is cleared in struct
> vgic_irq,
> > + * for a level sensitive IRQ the pending state value is unchanged, as
> it is
> > + * dictated directly by the input line level.
> > + *
> > + * If @irq describes an SGI with multiple sources, we choose the
> > + * lowest-numbered source VCPU and clear that bit in the source bitmap.
> > + *
> > + * The irq_lock must be held by the caller.
> > + */
> > +void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int
> lr)
> > +{
> > +    struct gic_lr lr_val = {0};
> > +
> > +    lr_val.virq = irq->intid;
> > +
> > +    if ( irq_is_pending(irq) )
> > +    {
> > +        lr_val.pending = true;
> > +
> > +        if ( irq->config == VGIC_CONFIG_EDGE )
> > +            irq->pending_latch = false;
> > +
> > +        if ( vgic_irq_is_sgi(irq->intid) )
> > +        {
> > +            uint32_t src = ffs(irq->source);
> > +
> > +            BUG_ON(!src);
> > +            lr_val.virt.source = (src - 1);
> > +            irq->source &= ~(1 << (src - 1));
> > +            if ( irq->source )
> > +                irq->pending_latch = true;
> > +        }
> > +    }
> > +
> > +    lr_val.active = irq->active;
> > +
> > +    if ( irq->hw )
> > +    {
> > +        lr_val.hw_status = true;
> > +        lr_val.hw.pirq = irq->hwintid;
> > +        /*
> > +         * Never set pending+active on a HW interrupt, as the
> > +         * pending state is kept at the physical distributor
> > +         * level.
> > +         */
> > +        if ( irq->active && irq_is_pending(irq) )
> > +            lr_val.pending = false;
> > +    }
> > +    else
> > +    {
> > +        if ( irq->config == VGIC_CONFIG_LEVEL )
> > +            lr_val.virt.eoi = true;
> > +    }
> > +
> > +    /*
> > +     * Level-triggered mapped IRQs are special because we only observe
> > +     * rising edges as input to the VGIC.  We therefore lower the line
> > +     * level here, so that we can take new virtual IRQs.  See
> > +     * vgic_v2_fold_lr_state for more info.
> > +     */
> > +    if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
> > +        irq->line_level = false;
> > +
> > +    /* The GICv2 LR only holds five bits of priority. */
> > +    lr_val.priority = irq->priority >> 3;
> > +
> > +    gic_hw_ops->write_lr(lr, &lr_val);
> > +}
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
> > diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
> > index d91ed29d96..214176c14e 100644
> > --- a/xen/arch/arm/vgic/vgic.c
> > +++ b/xen/arch/arm/vgic/vgic.c
> > @@ -520,6 +520,7 @@ retry:
> >
> >  static void vgic_fold_lr_state(struct vcpu *vcpu)
> >  {
> > +    vgic_v2_fold_lr_state(vcpu);
> >  }
> >
> >  /* Requires the irq_lock to be held. */
> > @@ -527,6 +528,8 @@ static void vgic_populate_lr(struct vcpu *vcpu,
> >                               struct vgic_irq *irq, int lr)
> >  {
> >      ASSERT(spin_is_locked(&irq->irq_lock));
> > +
> > +    vgic_v2_populate_lr(vcpu, irq, lr);
> >  }
> >
> >  static void vgic_set_underflow(struct vcpu *vcpu)
> > @@ -640,7 +643,10 @@ void vgic_sync_to_lrs(void)
> >      spin_lock(&current->arch.vgic.ap_list_lock);
> >      vgic_flush_lr_state(current);
> >      spin_unlock(&current->arch.vgic.ap_list_lock);
> > +
> > +    gic_hw_ops->update_hcr_status(GICH_HCR_EN, 1);
> >  }
> > +
> >  /*
> >   * Local variables:
> >   * mode: C
> > diff --git a/xen/arch/arm/vgic/vgic.h b/xen/arch/arm/vgic/vgic.h
> > index 1547478518..e2b6d51e47 100644
> > --- a/xen/arch/arm/vgic/vgic.h
> > +++ b/xen/arch/arm/vgic/vgic.h
> > @@ -27,6 +27,11 @@ static inline bool irq_is_pending(struct vgic_irq
> *irq)
> >          return irq->pending_latch || irq->line_level;
> >  }
> >
> > +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
> > +{
> > +    return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
> > +}
> > +
> >  struct vgic_irq *vgic_get_irq(struct domain *d, struct vcpu *vcpu,
> >                                uint32_t intid);
> >  void vgic_put_irq(struct domain *d, struct vgic_irq *irq);
> > @@ -41,6 +46,10 @@ static inline void vgic_get_irq_kref(struct vgic_irq
> *irq)
> >      atomic_inc(&irq->refcount);
> >  }
> >
> > +void vgic_v2_fold_lr_state(struct vcpu *vcpu);
> > +void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int
> lr);
> > +void vgic_v2_set_underflow(struct vcpu *vcpu);
> > +
> >  #endif
> >
> >  /*
> > --
> > 2.14.1
> >
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

[-- Attachment #1.2: Type: text/html, Size: 20146 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-26 23:22   ` Stefano Stabellini
  2018-03-27  0:16     ` Julien Grall
@ 2018-03-27  0:21     ` Stefano Stabellini
  2018-03-27 15:33     ` Andre Przywara
  2 siblings, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2018-03-27  0:21 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Julien Grall, Andre Przywara

On Mon, 26 Mar 2018, Stefano Stabellini wrote:
> On Thu, 22 Mar 2018, Andre Przywara wrote:
> > Processing maintenance interrupts and accessing the list registers
> > are dependent on the host's GIC version.
> > Introduce vgic-v2.c to contain GICv2 specific functions.
> > Implement the GICv2 specific code for syncing the emulation state
> > into the VGIC registers.
> > This also adds the hook to let Xen setup the host GIC addresses.
> > 
> > This is based on Linux commit 140b086dd197, written by Marc Zyngier.
> > 
> > Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
> > ---
> > Changelog v3 ... v3a:
> > - take hardware IRQ lock in vgic_v2_fold_lr_state()
> > - fix last remaining u32 usage
> > - print message when using new VGIC
> > - add TODO about racy _IRQ_INPROGRESS setting
> > 
> > Changelog v2 ... v3:
> > - remove no longer needed asm/io.h header
> > - replace 0/1 with false/true for bool's
> > - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
> > - fix indentation and w/s issues
> > 
> > Changelog v1 ... v2:
> > - remove v2 specific underflow function (now generic)
> > - re-add Linux code to properly handle acked level IRQs
> > 
> >  xen/arch/arm/vgic/vgic-v2.c | 259 ++++++++++++++++++++++++++++++++++++++++++++
> >  xen/arch/arm/vgic/vgic.c    |   6 +
> >  xen/arch/arm/vgic/vgic.h    |   9 ++
> >  3 files changed, 274 insertions(+)
> >  create mode 100644 xen/arch/arm/vgic/vgic-v2.c
> > 
> > diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
> > new file mode 100644
> > index 0000000000..1773503cfb
> > --- /dev/null
> > +++ b/xen/arch/arm/vgic/vgic-v2.c
> > @@ -0,0 +1,259 @@
> > +/*
> > + * Copyright (C) 2015, 2016 ARM Ltd.
> > + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include <asm/new_vgic.h>
> > +#include <asm/bug.h>
> > +#include <asm/gic.h>
> > +#include <xen/sched.h>
> > +#include <xen/sizes.h>
> > +
> > +#include "vgic.h"
> > +
> > +static struct {
> > +    bool enabled;
> > +    paddr_t dbase;          /* Distributor interface address */
> > +    paddr_t cbase;          /* CPU interface address & size */
> > +    paddr_t csize;
> > +    paddr_t vbase;          /* Virtual CPU interface address */
> > +
> > +    /* Offset to add to get an 8kB contiguous region if GIC is aliased */
> > +    uint32_t aliased_offset;
> > +} gic_v2_hw_data;
> > +
> > +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
> > +                      paddr_t vbase, uint32_t aliased_offset)
> > +{
> > +    gic_v2_hw_data.enabled = true;
> > +    gic_v2_hw_data.dbase = dbase;
> > +    gic_v2_hw_data.cbase = cbase;
> > +    gic_v2_hw_data.csize = csize;
> > +    gic_v2_hw_data.vbase = vbase;
> > +    gic_v2_hw_data.aliased_offset = aliased_offset;
> > +
> > +    printk("Using the new VGIC implementation.\n");
> > +}
> > +
> > +/*
> > + * transfer the content of the LRs back into the corresponding ap_list:
> > + * - active bit is transferred as is
> > + * - pending bit is
> > + *   - transferred as is in case of edge sensitive IRQs
> > + *   - set to the line-level (resample time) for level sensitive IRQs
> > + */
> > +void vgic_v2_fold_lr_state(struct vcpu *vcpu)
> > +{
> > +    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
> > +    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
> > +    unsigned long flags;
> > +    unsigned int lr;
> > +
> > +    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
> > +        return;
> > +
> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
> > +
> > +    for ( lr = 0; lr < used_lrs; lr++ )
> > +    {
> > +        struct gic_lr lr_val;
> > +        uint32_t intid;
> > +        struct vgic_irq *irq;
> > +        struct irq_desc *desc = NULL;
> > +        bool have_desc_lock = false;
> > +
> > +        gic_hw_ops->read_lr(lr, &lr_val);
> > +
> > +        /*
> > +         * TODO: Possible optimization to avoid reading LRs:
> > +         * Read the ELRSR to find out which of our LRs have been cleared
> > +         * by the guest. We just need to know the IRQ number for those, which
> > +         * we could save in an array when populating the LRs.
> > +         * This trades one MMIO access (ELRSR) for possibly more than one (LRs),
> > +         * but requires some more code to save the IRQ number and to handle
> > +         * those finished IRQs according to the algorithm below.
> > +         * We need some numbers to justify this: chances are that we don't
> > +         * have many LRs in use most of the time, so we might not save much.
> > +         */
> > +        gic_hw_ops->clear_lr(lr);
> > +
> > +        intid = lr_val.virq;
> > +        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
> > +
> > +        local_irq_save(flags);
> 
> Shouldn't we disable interrupts earlier, maybe at the beginning of the
> function? Is it not a problem if we take an interrupt a couple of lines
> above with the read_lr and clear_lr that we do?
> 
> 
> > +        spin_lock(&irq->irq_lock);
> > +
> > +        /* The locking order forces us to drop and re-take the locks here. */
> > +        if ( irq->hw )
> > +        {
> > +            spin_unlock(&irq->irq_lock);
> > +
> > +            desc = irq_to_desc(irq->hwintid);
> > +            spin_lock(&desc->lock);
> > +            spin_lock(&irq->irq_lock);
> > +
> > +            /* This h/w IRQ should still be assigned to the virtual IRQ. */
> > +            ASSERT(irq->hw && desc->irq == irq->hwintid);
> > +
> > +            have_desc_lock = true;
> > +        }
> 
> I agree with Julien that this looks very fragile. Instead, I think it
> would be best to always take the desc lock (if irq->hw) before the
> irq_lock earlier in this function. That way, we don't have to deal with
> this business of unlocking and relocking. Do you see any problems with
> it? We don't change irq->hw at run time, so it looks OK to me.
> 
> 
> > +        /*
> > +         * If a hardware mapped IRQ has been handled for good, we need to
> > +         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
> > +         *
> > +         * TODO: This is probably racy, but is so already in the existing
> > +         * VGIC. A fix does not seem to be trivial.
> > +         */
> > +        if ( irq->hw && !lr_val.active && !lr_val.pending )
> > +            clear_bit(_IRQ_INPROGRESS, &desc->status);
> 
> I'll reply here to Julien's comment:
> 
> > I realize the current vGIC is doing exactly the same thing. But this is racy.
> > 
> > Imagine the interrupt is firing on another pCPU (I wasn't able to rule out this even when the interrupt is following the vCPU), that pCPU may set _IRQ_INPROGRESS before this
> > is cleared here.
> 
> The assumption in the old vgic was that this scenario was not possible.
> vgic_migrate_irq would avoid changing physical interrupt affinity if a
> virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298).
> Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the
> time of clearing the LR we would change the physical irq affinity (see
> xen/arch/arm/gic-vgic.c:L240).
> 
> I think we would need a similar mechanism here to protect ourselves from
> races. Is there something equivalent in the new vgic?

After reading the following patches, I am thinking that to fix the race
we need to make sure that irq->vcpu is set to NULL here, and we
need to set it to NULL with the desc->lock held.

Let's supposed that the irq is INPROGRESS on cpu0/vcpu0, and we get the
same interrupt on cpu1/vcpu1 because ITARGETSR has been changed.
vgic_queue_irq_unlock simply drops the interrupt because irq->vcpu !=
NULL (also keep in mind that at that point desc->lock is held). Maybe it
isn't nice but it shouldn't be racy.

When cpu0/vcpu0 clears INPROGRESS in vgic_v2_fold_lr_state, it is not a
problem because the interrupt was never injected in cpu1/vcpu1.

Vice versa, if cpu0/vcpu0 clears INPROGRESS and sets irq->vcpu = NULL in
vgic_v2_fold_lr_state before the new interrupt is delivered to
cpu1/vcpu1, then the interrupt will be injected to cpu1/vcpu1 and
INPROGRESS is set again correctly.

In other words, as long as irq->vcpu != NULL and INPROGRESS are kept in
sync, the race is avoided. With this patch, the race exists because
INPROGRESS could be set on cpu1/vcpu1 while cpu0/vcpu0 clears
sets vcpu to NULL.

Does it make sense? Do you agree with this analysis?

The fix, although it looks pretty small, could be sent separately after
the code freeze.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-27  0:16     ` Julien Grall
@ 2018-03-27  0:23       ` Stefano Stabellini
  0 siblings, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2018-03-27  0:23 UTC (permalink / raw)
  To: Julien Grall; +Cc: xen-devel, Julien Grall, Stefano Stabellini, Andre Przywara

[-- Attachment #1: Type: TEXT/PLAIN, Size: 10024 bytes --]

On Tue, 27 Mar 2018, Julien Grall wrote:
> Hi,
> Sorry for the formatting.
> 
> On Tue, 27 Mar 2018, 07:25 Stefano Stabellini, <sstabellini@kernel.org> wrote:
>       On Thu, 22 Mar 2018, Andre Przywara wrote:
>       > Processing maintenance interrupts and accessing the list registers
>       > are dependent on the host's GIC version.
>       > Introduce vgic-v2.c to contain GICv2 specific functions.
>       > Implement the GICv2 specific code for syncing the emulation state
>       > into the VGIC registers.
>       > This also adds the hook to let Xen setup the host GIC addresses.
>       >
>       > This is based on Linux commit 140b086dd197, written by Marc Zyngier.
>       >
>       > Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
>       > ---
>       > Changelog v3 ... v3a:
>       > - take hardware IRQ lock in vgic_v2_fold_lr_state()
>       > - fix last remaining u32 usage
>       > - print message when using new VGIC
>       > - add TODO about racy _IRQ_INPROGRESS setting
>       >
>       > Changelog v2 ... v3:
>       > - remove no longer needed asm/io.h header
>       > - replace 0/1 with false/true for bool's
>       > - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
>       > - fix indentation and w/s issues
>       >
>       > Changelog v1 ... v2:
>       > - remove v2 specific underflow function (now generic)
>       > - re-add Linux code to properly handle acked level IRQs
>       >
>       >  xen/arch/arm/vgic/vgic-v2.c | 259 ++++++++++++++++++++++++++++++++++++++++++++
>       >  xen/arch/arm/vgic/vgic.c    |   6 +
>       >  xen/arch/arm/vgic/vgic.h    |   9 ++
>       >  3 files changed, 274 insertions(+)
>       >  create mode 100644 xen/arch/arm/vgic/vgic-v2.c
>       >
>       > diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
>       > new file mode 100644
>       > index 0000000000..1773503cfb
>       > --- /dev/null
>       > +++ b/xen/arch/arm/vgic/vgic-v2.c
>       > @@ -0,0 +1,259 @@
>       > +/*
>       > + * Copyright (C) 2015, 2016 ARM Ltd.
>       > + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
>       > + *
>       > + * This program is free software; you can redistribute it and/or modify
>       > + * it under the terms of the GNU General Public License version 2 as
>       > + * published by the Free Software Foundation.
>       > + *
>       > + * This program is distributed in the hope that it will be useful,
>       > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>       > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>       > + * GNU General Public License for more details.
>       > + *
>       > + * You should have received a copy of the GNU General Public License
>       > + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>       > + */
>       > +
>       > +#include <asm/new_vgic.h>
>       > +#include <asm/bug.h>
>       > +#include <asm/gic.h>
>       > +#include <xen/sched.h>
>       > +#include <xen/sizes.h>
>       > +
>       > +#include "vgic.h"
>       > +
>       > +static struct {
>       > +    bool enabled;
>       > +    paddr_t dbase;          /* Distributor interface address */
>       > +    paddr_t cbase;          /* CPU interface address & size */
>       > +    paddr_t csize;
>       > +    paddr_t vbase;          /* Virtual CPU interface address */
>       > +
>       > +    /* Offset to add to get an 8kB contiguous region if GIC is aliased */
>       > +    uint32_t aliased_offset;
>       > +} gic_v2_hw_data;
>       > +
>       > +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
>       > +                      paddr_t vbase, uint32_t aliased_offset)
>       > +{
>       > +    gic_v2_hw_data.enabled = true;
>       > +    gic_v2_hw_data.dbase = dbase;
>       > +    gic_v2_hw_data.cbase = cbase;
>       > +    gic_v2_hw_data.csize = csize;
>       > +    gic_v2_hw_data.vbase = vbase;
>       > +    gic_v2_hw_data.aliased_offset = aliased_offset;
>       > +
>       > +    printk("Using the new VGIC implementation.\n");
>       > +}
>       > +
>       > +/*
>       > + * transfer the content of the LRs back into the corresponding ap_list:
>       > + * - active bit is transferred as is
>       > + * - pending bit is
>       > + *   - transferred as is in case of edge sensitive IRQs
>       > + *   - set to the line-level (resample time) for level sensitive IRQs
>       > + */
>       > +void vgic_v2_fold_lr_state(struct vcpu *vcpu)
>       > +{
>       > +    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
>       > +    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
>       > +    unsigned long flags;
>       > +    unsigned int lr;
>       > +
>       > +    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
>       > +        return;
>       > +
>       > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
>       > +
>       > +    for ( lr = 0; lr < used_lrs; lr++ )
>       > +    {
>       > +        struct gic_lr lr_val;
>       > +        uint32_t intid;
>       > +        struct vgic_irq *irq;
>       > +        struct irq_desc *desc = NULL;
>       > +        bool have_desc_lock = false;
>       > +
>       > +        gic_hw_ops->read_lr(lr, &lr_val);
>       > +
>       > +        /*
>       > +         * TODO: Possible optimization to avoid reading LRs:
>       > +         * Read the ELRSR to find out which of our LRs have been cleared
>       > +         * by the guest. We just need to know the IRQ number for those, which
>       > +         * we could save in an array when populating the LRs.
>       > +         * This trades one MMIO access (ELRSR) for possibly more than one (LRs),
>       > +         * but requires some more code to save the IRQ number and to handle
>       > +         * those finished IRQs according to the algorithm below.
>       > +         * We need some numbers to justify this: chances are that we don't
>       > +         * have many LRs in use most of the time, so we might not save much.
>       > +         */
>       > +        gic_hw_ops->clear_lr(lr);
>       > +
>       > +        intid = lr_val.virq;
>       > +        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
>       > +
>       > +        local_irq_save(flags);
> 
>       Shouldn't we disable interrupts earlier, maybe at the beginning of the
>       function? Is it not a problem if we take an interrupt a couple of lines
>       above with the read_lr and clear_lr that we do?
> 
> 
>       > +        spin_lock(&irq->irq_lock);
>       > +
>       > +        /* The locking order forces us to drop and re-take the locks here. */
>       > +        if ( irq->hw )
>       > +        {
>       > +            spin_unlock(&irq->irq_lock);
>       > +
>       > +            desc = irq_to_desc(irq->hwintid);
>       > +            spin_lock(&desc->lock);
>       > +            spin_lock(&irq->irq_lock);
>       > +
>       > +            /* This h/w IRQ should still be assigned to the virtual IRQ. */
>       > +            ASSERT(irq->hw && desc->irq == irq->hwintid);
>       > +
>       > +            have_desc_lock = true;
>       > +        }
> 
>       I agree with Julien that this looks very fragile. Instead, I think it
>       would be best to always take the desc lock (if irq->hw) before the
>       irq_lock earlier in this function. That way, we don't have to deal with
>       this business of unlocking and relocking. Do you see any problems with
>       it? We don't change irq->hw at run time, so it looks OK to me.
> 
> 
>       > +        /*
>       > +         * If a hardware mapped IRQ has been handled for good, we need to
>       > +         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
>       > +         *
>       > +         * TODO: This is probably racy, but is so already in the existing
>       > +         * VGIC. A fix does not seem to be trivial.
>       > +         */
>       > +        if ( irq->hw && !lr_val.active && !lr_val.pending )
>       > +            clear_bit(_IRQ_INPROGRESS, &desc->status);
> 
>       I'll reply here to Julien's comment:
> 
>       > I realize the current vGIC is doing exactly the same thing. But this is racy.
>       >
>       > Imagine the interrupt is firing on another pCPU (I wasn't able to rule out this even when the interrupt is following the vCPU), that pCPU may set _IRQ_INPROGRESS
>       before this
>       > is cleared here.
> 
>       The assumption in the old vgic was that this scenario was not possible.
>       vgic_migrate_irq would avoid changing physical interrupt affinity if a
>       virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298).
>       Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the
>       time of clearing the LR we would change the physical irq affinity (see
>       xen/arch/arm/gic-vgic.c:L240).
> 
> 
>       I think we would need a similar mechanism here to protect ourselves from
>       races. Is there something equivalent in the new vgic?
> 
> 
> A mechanism that we know is already very racy on the old vGIC. You also have to take into account that write to ITARGETR/IROUTER will take effect in finite time. A interrupt
> may still get pending on the old pCPU. 
> 
> To be honest we should migrate the interrupt on gues MMIO and finding a way to handle the desc->state correctly.
> 
> One of the solution is to avoid relying in the desc->state when releasing IRQ. So we would not need to care a potential.

I think I might have a simple suggestion to fix this, a suggestion that
doesn't require anything like the mechanism we had in the old vgic. I
hope I didn't miss anything :-)

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-26 23:22   ` Stefano Stabellini
  2018-03-27  0:16     ` Julien Grall
  2018-03-27  0:21     ` Stefano Stabellini
@ 2018-03-27 15:33     ` Andre Przywara
  2018-03-27 19:41       ` Stefano Stabellini
  2 siblings, 1 reply; 20+ messages in thread
From: Andre Przywara @ 2018-03-27 15:33 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Julien Grall

Hi,

On 27/03/18 00:22, Stefano Stabellini wrote:
> On Thu, 22 Mar 2018, Andre Przywara wrote:
>> Processing maintenance interrupts and accessing the list registers
>> are dependent on the host's GIC version.
>> Introduce vgic-v2.c to contain GICv2 specific functions.
>> Implement the GICv2 specific code for syncing the emulation state
>> into the VGIC registers.
>> This also adds the hook to let Xen setup the host GIC addresses.
>>
>> This is based on Linux commit 140b086dd197, written by Marc Zyngier.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
>> ---
>> Changelog v3 ... v3a:
>> - take hardware IRQ lock in vgic_v2_fold_lr_state()
>> - fix last remaining u32 usage
>> - print message when using new VGIC
>> - add TODO about racy _IRQ_INPROGRESS setting
>>
>> Changelog v2 ... v3:
>> - remove no longer needed asm/io.h header
>> - replace 0/1 with false/true for bool's
>> - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
>> - fix indentation and w/s issues
>>
>> Changelog v1 ... v2:
>> - remove v2 specific underflow function (now generic)
>> - re-add Linux code to properly handle acked level IRQs
>>
>>  xen/arch/arm/vgic/vgic-v2.c | 259 ++++++++++++++++++++++++++++++++++++++++++++
>>  xen/arch/arm/vgic/vgic.c    |   6 +
>>  xen/arch/arm/vgic/vgic.h    |   9 ++
>>  3 files changed, 274 insertions(+)
>>  create mode 100644 xen/arch/arm/vgic/vgic-v2.c
>>
>> diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
>> new file mode 100644
>> index 0000000000..1773503cfb
>> --- /dev/null
>> +++ b/xen/arch/arm/vgic/vgic-v2.c
>> @@ -0,0 +1,259 @@
>> +/*
>> + * Copyright (C) 2015, 2016 ARM Ltd.
>> + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <asm/new_vgic.h>
>> +#include <asm/bug.h>
>> +#include <asm/gic.h>
>> +#include <xen/sched.h>
>> +#include <xen/sizes.h>
>> +
>> +#include "vgic.h"
>> +
>> +static struct {
>> +    bool enabled;
>> +    paddr_t dbase;          /* Distributor interface address */
>> +    paddr_t cbase;          /* CPU interface address & size */
>> +    paddr_t csize;
>> +    paddr_t vbase;          /* Virtual CPU interface address */
>> +
>> +    /* Offset to add to get an 8kB contiguous region if GIC is aliased */
>> +    uint32_t aliased_offset;
>> +} gic_v2_hw_data;
>> +
>> +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
>> +                      paddr_t vbase, uint32_t aliased_offset)
>> +{
>> +    gic_v2_hw_data.enabled = true;
>> +    gic_v2_hw_data.dbase = dbase;
>> +    gic_v2_hw_data.cbase = cbase;
>> +    gic_v2_hw_data.csize = csize;
>> +    gic_v2_hw_data.vbase = vbase;
>> +    gic_v2_hw_data.aliased_offset = aliased_offset;
>> +
>> +    printk("Using the new VGIC implementation.\n");
>> +}
>> +
>> +/*
>> + * transfer the content of the LRs back into the corresponding ap_list:
>> + * - active bit is transferred as is
>> + * - pending bit is
>> + *   - transferred as is in case of edge sensitive IRQs
>> + *   - set to the line-level (resample time) for level sensitive IRQs
>> + */
>> +void vgic_v2_fold_lr_state(struct vcpu *vcpu)
>> +{
>> +    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
>> +    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
>> +    unsigned long flags;
>> +    unsigned int lr;
>> +
>> +    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
>> +        return;
>> +
>> +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
>> +
>> +    for ( lr = 0; lr < used_lrs; lr++ )
>> +    {
>> +        struct gic_lr lr_val;
>> +        uint32_t intid;
>> +        struct vgic_irq *irq;
>> +        struct irq_desc *desc = NULL;
>> +        bool have_desc_lock = false;
>> +
>> +        gic_hw_ops->read_lr(lr, &lr_val);
>> +
>> +        /*
>> +         * TODO: Possible optimization to avoid reading LRs:
>> +         * Read the ELRSR to find out which of our LRs have been cleared
>> +         * by the guest. We just need to know the IRQ number for those, which
>> +         * we could save in an array when populating the LRs.
>> +         * This trades one MMIO access (ELRSR) for possibly more than one (LRs),
>> +         * but requires some more code to save the IRQ number and to handle
>> +         * those finished IRQs according to the algorithm below.
>> +         * We need some numbers to justify this: chances are that we don't
>> +         * have many LRs in use most of the time, so we might not save much.
>> +         */
>> +        gic_hw_ops->clear_lr(lr);
>> +
>> +        intid = lr_val.virq;
>> +        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
>> +
>> +        local_irq_save(flags);
> 
> Shouldn't we disable interrupts earlier, maybe at the beginning of the
> function? Is it not a problem if we take an interrupt a couple of lines
> above with the read_lr and clear_lr that we do?

In contrast to the existing VGIC we only touch the LRs when entering or
leaving the hypervisor, not in-between. So if an hardware IRQ fires
in-between, the handler will not touch any LRs. So I don't see any
problem with leaving interrupts enabled.

>> +        spin_lock(&irq->irq_lock);
>> +
>> +        /* The locking order forces us to drop and re-take the locks here. */
>> +        if ( irq->hw )
>> +        {
>> +            spin_unlock(&irq->irq_lock);
>> +
>> +            desc = irq_to_desc(irq->hwintid);
>> +            spin_lock(&desc->lock);
>> +            spin_lock(&irq->irq_lock);
>> +
>> +            /* This h/w IRQ should still be assigned to the virtual IRQ. */
>> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
>> +
>> +            have_desc_lock = true;
>> +        }
> 
> I agree with Julien that this looks very fragile. Instead, I think it
> would be best to always take the desc lock (if irq->hw) before the
> irq_lock earlier in this function.

Well, how is this going work in a race free manner? To get the
corresponding hardware interrupt, we have to lookup irq->hw and
irq->hwintid, which is racy when done without holding the lock.

> That way, we don't have to deal with
> this business of unlocking and relocking. Do you see any problems with
> it? We don't change irq->hw at run time, so it looks OK to me.

Yeah, I see the point that irq->hw and irq->hwintid are somewhat
"write-once" members. But that is a bit fragile assumption, I expect
this actually to change over time. And then it will be hard to chase
down all places were we relied on this assumption. So I'd rather code
this in a sane way, so that we don't have to worry about.
Keep in mind, taking uncontended locks is rather cheap, and those locks
here probably are very much so.

>> +        /*
>> +         * If a hardware mapped IRQ has been handled for good, we need to
>> +         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
>> +         *
>> +         * TODO: This is probably racy, but is so already in the existing
>> +         * VGIC. A fix does not seem to be trivial.
>> +         */
>> +        if ( irq->hw && !lr_val.active && !lr_val.pending )
>> +            clear_bit(_IRQ_INPROGRESS, &desc->status);
> 
> I'll reply here to Julien's comment:
> 
>> I realize the current vGIC is doing exactly the same thing. But this is racy.
>>
>> Imagine the interrupt is firing on another pCPU (I wasn't able to rule out this even when the interrupt is following the vCPU), that pCPU may set _IRQ_INPROGRESS before this
>> is cleared here.
> 
> The assumption in the old vgic was that this scenario was not possible.
> vgic_migrate_irq would avoid changing physical interrupt affinity if a
> virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298).
> Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the
> time of clearing the LR we would change the physical irq affinity (see
> xen/arch/arm/gic-vgic.c:L240).
> 
> I think we would need a similar mechanism here to protect ourselves from
> races. Is there something equivalent in the new vgic?

I am not sure this is exactly covering your concerns, but I think we are
pretty good with our "two vCPU approach" (irq->vcpu and
irq->target_vcpu). So the affinity can change at any point at will, it
won't affect this current interrupt. We handle migration explicitly in
vgic_prune_ap_list().

My gut feeling is that mirroring the physical active state in the
_IRQ_INPROGRESS bit is a bad idea, as it's duplicating state and is
racy, by it's very nature.
The only purpose of this bit seems to be that once an IRQ is no longer
connected to a guest - either because the domain is going to die or the
IRQ being explicitly disconnected (which doesn't happen anymore?), we
need to possibly deactivate the hardware side of that, right?
I wonder if that can be achieved by probing the actual active state in
the distributor instead? This should be the the authoritative state anyway.
And this is done very rarely, so we don't care about the performance, do we?

Cheers,
Andre.

>> +        /* Always preserve the active bit */
>> +        irq->active = lr_val.active;
>> +
>> +        /* Edge is the only case where we preserve the pending bit */
>> +        if ( irq->config == VGIC_CONFIG_EDGE && lr_val.pending )
>> +        {
>> +            irq->pending_latch = true;
>> +
>> +            if ( vgic_irq_is_sgi(intid) )
>> +                irq->source |= (1U << lr_val.virt.source);
>> +        }
>> +
>> +        /* Clear soft pending state when level irqs have been acked. */
>> +        if ( irq->config == VGIC_CONFIG_LEVEL && !lr_val.pending )
>> +            irq->pending_latch = false;
>> +
>> +        /*
>> +         * Level-triggered mapped IRQs are special because we only
>> +         * observe rising edges as input to the VGIC.
>> +         *
>> +         * If the guest never acked the interrupt we have to sample
>> +         * the physical line and set the line level, because the
>> +         * device state could have changed or we simply need to
>> +         * process the still pending interrupt later.
>> +         *
>> +         * If this causes us to lower the level, we have to also clear
>> +         * the physical active state, since we will otherwise never be
>> +         * told when the interrupt becomes asserted again.
>> +         */
>> +        if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
>> +        {
>> +            ASSERT(irq->hwintid >= VGIC_NR_PRIVATE_IRQS);
>> +
>> +            irq->line_level = gic_read_pending_state(desc);
>> +
>> +            if ( !irq->line_level )
>> +                gic_set_active_state(desc, false);
>> +        }
>> +
>> +        spin_unlock(&irq->irq_lock);
>> +        if ( have_desc_lock )
>> +            spin_unlock(&desc->lock);
>> +        local_irq_restore(flags);
>> +
>> +        vgic_put_irq(vcpu->domain, irq);
>> +    }
>> +
>> +    gic_hw_ops->update_hcr_status(GICH_HCR_EN, false);
>> +    vgic_cpu->used_lrs = 0;
>> +}
>> +
>> +/**
>> + * vgic_v2_populate_lr() - Populates an LR with the state of a given IRQ.
>> + * @vcpu: The VCPU which the given @irq belongs to.
>> + * @irq:  The IRQ to convert into an LR. The irq_lock must be held already.
>> + * @lr:   The LR number to transfer the state into.
>> + *
>> + * This moves a virtual IRQ, represented by its vgic_irq, into a list register.
>> + * Apart from translating the logical state into the LR bitfields, it also
>> + * changes some state in the vgic_irq.
>> + * For an edge sensitive IRQ the pending state is cleared in struct vgic_irq,
>> + * for a level sensitive IRQ the pending state value is unchanged, as it is
>> + * dictated directly by the input line level.
>> + *
>> + * If @irq describes an SGI with multiple sources, we choose the
>> + * lowest-numbered source VCPU and clear that bit in the source bitmap.
>> + *
>> + * The irq_lock must be held by the caller.
>> + */
>> +void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int lr)
>> +{
>> +    struct gic_lr lr_val = {0};
>> +
>> +    lr_val.virq = irq->intid;
>> +
>> +    if ( irq_is_pending(irq) )
>> +    {
>> +        lr_val.pending = true;
>> +
>> +        if ( irq->config == VGIC_CONFIG_EDGE )
>> +            irq->pending_latch = false;
>> +
>> +        if ( vgic_irq_is_sgi(irq->intid) )
>> +        {
>> +            uint32_t src = ffs(irq->source);
>> +
>> +            BUG_ON(!src);
>> +            lr_val.virt.source = (src - 1);
>> +            irq->source &= ~(1 << (src - 1));
>> +            if ( irq->source )
>> +                irq->pending_latch = true;
>> +        }
>> +    }
>> +
>> +    lr_val.active = irq->active;
>> +
>> +    if ( irq->hw )
>> +    {
>> +        lr_val.hw_status = true;
>> +        lr_val.hw.pirq = irq->hwintid;
>> +        /*
>> +         * Never set pending+active on a HW interrupt, as the
>> +         * pending state is kept at the physical distributor
>> +         * level.
>> +         */
>> +        if ( irq->active && irq_is_pending(irq) )
>> +            lr_val.pending = false;
>> +    }
>> +    else
>> +    {
>> +        if ( irq->config == VGIC_CONFIG_LEVEL )
>> +            lr_val.virt.eoi = true;
>> +    }
>> +
>> +    /*
>> +     * Level-triggered mapped IRQs are special because we only observe
>> +     * rising edges as input to the VGIC.  We therefore lower the line
>> +     * level here, so that we can take new virtual IRQs.  See
>> +     * vgic_v2_fold_lr_state for more info.
>> +     */
>> +    if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
>> +        irq->line_level = false;
>> +
>> +    /* The GICv2 LR only holds five bits of priority. */
>> +    lr_val.priority = irq->priority >> 3;
>> +
>> +    gic_hw_ops->write_lr(lr, &lr_val);
>> +}
>> +
>> +/*
>> + * Local variables:
>> + * mode: C
>> + * c-file-style: "BSD"
>> + * c-basic-offset: 4
>> + * indent-tabs-mode: nil
>> + * End:
>> + */
>> diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
>> index d91ed29d96..214176c14e 100644
>> --- a/xen/arch/arm/vgic/vgic.c
>> +++ b/xen/arch/arm/vgic/vgic.c
>> @@ -520,6 +520,7 @@ retry:
>>  
>>  static void vgic_fold_lr_state(struct vcpu *vcpu)
>>  {
>> +    vgic_v2_fold_lr_state(vcpu);
>>  }
>>  
>>  /* Requires the irq_lock to be held. */
>> @@ -527,6 +528,8 @@ static void vgic_populate_lr(struct vcpu *vcpu,
>>                               struct vgic_irq *irq, int lr)
>>  {
>>      ASSERT(spin_is_locked(&irq->irq_lock));
>> +
>> +    vgic_v2_populate_lr(vcpu, irq, lr);
>>  }
>>  
>>  static void vgic_set_underflow(struct vcpu *vcpu)
>> @@ -640,7 +643,10 @@ void vgic_sync_to_lrs(void)
>>      spin_lock(&current->arch.vgic.ap_list_lock);
>>      vgic_flush_lr_state(current);
>>      spin_unlock(&current->arch.vgic.ap_list_lock);
>> +
>> +    gic_hw_ops->update_hcr_status(GICH_HCR_EN, 1);
>>  }
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/xen/arch/arm/vgic/vgic.h b/xen/arch/arm/vgic/vgic.h
>> index 1547478518..e2b6d51e47 100644
>> --- a/xen/arch/arm/vgic/vgic.h
>> +++ b/xen/arch/arm/vgic/vgic.h
>> @@ -27,6 +27,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
>>          return irq->pending_latch || irq->line_level;
>>  }
>>  
>> +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
>> +{
>> +    return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
>> +}
>> +
>>  struct vgic_irq *vgic_get_irq(struct domain *d, struct vcpu *vcpu,
>>                                uint32_t intid);
>>  void vgic_put_irq(struct domain *d, struct vgic_irq *irq);
>> @@ -41,6 +46,10 @@ static inline void vgic_get_irq_kref(struct vgic_irq *irq)
>>      atomic_inc(&irq->refcount);
>>  }
>>  
>> +void vgic_v2_fold_lr_state(struct vcpu *vcpu);
>> +void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int lr);
>> +void vgic_v2_set_underflow(struct vcpu *vcpu);
>> +
>>  #endif
>>  
>>  /*
>> -- 
>> 2.14.1
>>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-27 15:33     ` Andre Przywara
@ 2018-03-27 19:41       ` Stefano Stabellini
  2018-03-27 22:38         ` André Przywara
  0 siblings, 1 reply; 20+ messages in thread
From: Stefano Stabellini @ 2018-03-27 19:41 UTC (permalink / raw)
  To: Andre Przywara; +Cc: xen-devel, Julien Grall, Stefano Stabellini

On Tue, 27 Mar 2018, Andre Przywara wrote:
> Hi,
> 
> On 27/03/18 00:22, Stefano Stabellini wrote:
> > On Thu, 22 Mar 2018, Andre Przywara wrote:
> >> Processing maintenance interrupts and accessing the list registers
> >> are dependent on the host's GIC version.
> >> Introduce vgic-v2.c to contain GICv2 specific functions.
> >> Implement the GICv2 specific code for syncing the emulation state
> >> into the VGIC registers.
> >> This also adds the hook to let Xen setup the host GIC addresses.
> >>
> >> This is based on Linux commit 140b086dd197, written by Marc Zyngier.
> >>
> >> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
> >> ---
> >> Changelog v3 ... v3a:
> >> - take hardware IRQ lock in vgic_v2_fold_lr_state()
> >> - fix last remaining u32 usage
> >> - print message when using new VGIC
> >> - add TODO about racy _IRQ_INPROGRESS setting
> >>
> >> Changelog v2 ... v3:
> >> - remove no longer needed asm/io.h header
> >> - replace 0/1 with false/true for bool's
> >> - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
> >> - fix indentation and w/s issues
> >>
> >> Changelog v1 ... v2:
> >> - remove v2 specific underflow function (now generic)
> >> - re-add Linux code to properly handle acked level IRQs
> >>
> >>  xen/arch/arm/vgic/vgic-v2.c | 259 ++++++++++++++++++++++++++++++++++++++++++++
> >>  xen/arch/arm/vgic/vgic.c    |   6 +
> >>  xen/arch/arm/vgic/vgic.h    |   9 ++
> >>  3 files changed, 274 insertions(+)
> >>  create mode 100644 xen/arch/arm/vgic/vgic-v2.c
> >>
> >> diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
> >> new file mode 100644
> >> index 0000000000..1773503cfb
> >> --- /dev/null
> >> +++ b/xen/arch/arm/vgic/vgic-v2.c
> >> @@ -0,0 +1,259 @@
> >> +/*
> >> + * Copyright (C) 2015, 2016 ARM Ltd.
> >> + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >> + */
> >> +
> >> +#include <asm/new_vgic.h>
> >> +#include <asm/bug.h>
> >> +#include <asm/gic.h>
> >> +#include <xen/sched.h>
> >> +#include <xen/sizes.h>
> >> +
> >> +#include "vgic.h"
> >> +
> >> +static struct {
> >> +    bool enabled;
> >> +    paddr_t dbase;          /* Distributor interface address */
> >> +    paddr_t cbase;          /* CPU interface address & size */
> >> +    paddr_t csize;
> >> +    paddr_t vbase;          /* Virtual CPU interface address */
> >> +
> >> +    /* Offset to add to get an 8kB contiguous region if GIC is aliased */
> >> +    uint32_t aliased_offset;
> >> +} gic_v2_hw_data;
> >> +
> >> +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
> >> +                      paddr_t vbase, uint32_t aliased_offset)
> >> +{
> >> +    gic_v2_hw_data.enabled = true;
> >> +    gic_v2_hw_data.dbase = dbase;
> >> +    gic_v2_hw_data.cbase = cbase;
> >> +    gic_v2_hw_data.csize = csize;
> >> +    gic_v2_hw_data.vbase = vbase;
> >> +    gic_v2_hw_data.aliased_offset = aliased_offset;
> >> +
> >> +    printk("Using the new VGIC implementation.\n");
> >> +}
> >> +
> >> +/*
> >> + * transfer the content of the LRs back into the corresponding ap_list:
> >> + * - active bit is transferred as is
> >> + * - pending bit is
> >> + *   - transferred as is in case of edge sensitive IRQs
> >> + *   - set to the line-level (resample time) for level sensitive IRQs
> >> + */
> >> +void vgic_v2_fold_lr_state(struct vcpu *vcpu)
> >> +{
> >> +    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
> >> +    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
> >> +    unsigned long flags;
> >> +    unsigned int lr;
> >> +
> >> +    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
> >> +        return;
> >> +
> >> +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
> >> +
> >> +    for ( lr = 0; lr < used_lrs; lr++ )
> >> +    {
> >> +        struct gic_lr lr_val;
> >> +        uint32_t intid;
> >> +        struct vgic_irq *irq;
> >> +        struct irq_desc *desc = NULL;
> >> +        bool have_desc_lock = false;
> >> +
> >> +        gic_hw_ops->read_lr(lr, &lr_val);
> >> +
> >> +        /*
> >> +         * TODO: Possible optimization to avoid reading LRs:
> >> +         * Read the ELRSR to find out which of our LRs have been cleared
> >> +         * by the guest. We just need to know the IRQ number for those, which
> >> +         * we could save in an array when populating the LRs.
> >> +         * This trades one MMIO access (ELRSR) for possibly more than one (LRs),
> >> +         * but requires some more code to save the IRQ number and to handle
> >> +         * those finished IRQs according to the algorithm below.
> >> +         * We need some numbers to justify this: chances are that we don't
> >> +         * have many LRs in use most of the time, so we might not save much.
> >> +         */
> >> +        gic_hw_ops->clear_lr(lr);
> >> +
> >> +        intid = lr_val.virq;
> >> +        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
> >> +
> >> +        local_irq_save(flags);
> > 
> > Shouldn't we disable interrupts earlier, maybe at the beginning of the
> > function? Is it not a problem if we take an interrupt a couple of lines
> > above with the read_lr and clear_lr that we do?
> 
> In contrast to the existing VGIC we only touch the LRs when entering or
> leaving the hypervisor, not in-between. So if an hardware IRQ fires
> in-between, the handler will not touch any LRs. So I don't see any
> problem with leaving interrupts enabled.

Nice! Now that you wrote the series and you know exactly how the code
works, I would love to see an update on the design doc to write down
stuff like this. (You don't have to do it as part of this series, as a
follow up would be fine.)


> >> +        spin_lock(&irq->irq_lock);
> >> +
> >> +        /* The locking order forces us to drop and re-take the locks here. */
> >> +        if ( irq->hw )
> >> +        {
> >> +            spin_unlock(&irq->irq_lock);
> >> +
> >> +            desc = irq_to_desc(irq->hwintid);
> >> +            spin_lock(&desc->lock);
> >> +            spin_lock(&irq->irq_lock);
> >> +
> >> +            /* This h/w IRQ should still be assigned to the virtual IRQ. */
> >> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
> >> +
> >> +            have_desc_lock = true;
> >> +        }
> > 
> > I agree with Julien that this looks very fragile. Instead, I think it
> > would be best to always take the desc lock (if irq->hw) before the
> > irq_lock earlier in this function.
> 
> Well, how is this going work in a race free manner? To get the
> corresponding hardware interrupt, we have to lookup irq->hw and
> irq->hwintid, which is racy when done without holding the lock.
> 
> > That way, we don't have to deal with
> > this business of unlocking and relocking. Do you see any problems with
> > it? We don't change irq->hw at run time, so it looks OK to me.
> 
> Yeah, I see the point that irq->hw and irq->hwintid are somewhat
> "write-once" members. But that is a bit fragile assumption, I expect
> this actually to change over time. And then it will be hard to chase
> down all places were we relied on this assumption. 

Yeah, we already make this assumption in other places. I would add a
single-line TODO comment on top so that we can easily grep for it.


> So I'd rather code
> this in a sane way, so that we don't have to worry about.
> Keep in mind, taking uncontended locks is rather cheap, and those locks
> here probably are very much so.

Yeah but the code looks alien :-)  I would prefer to take the desc->lock
earlier with a simple TODO comment. Otherwise, I would be also happy to
see other ways to solve this issue.


> >> +        /*
> >> +         * If a hardware mapped IRQ has been handled for good, we need to
> >> +         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
> >> +         *
> >> +         * TODO: This is probably racy, but is so already in the existing
> >> +         * VGIC. A fix does not seem to be trivial.
> >> +         */
> >> +        if ( irq->hw && !lr_val.active && !lr_val.pending )
> >> +            clear_bit(_IRQ_INPROGRESS, &desc->status);
> > 
> > I'll reply here to Julien's comment:
> > 
> >> I realize the current vGIC is doing exactly the same thing. But this is racy.
> >>
> >> Imagine the interrupt is firing on another pCPU (I wasn't able to rule out this even when the interrupt is following the vCPU), that pCPU may set _IRQ_INPROGRESS before this
> >> is cleared here.
> > 
> > The assumption in the old vgic was that this scenario was not possible.
> > vgic_migrate_irq would avoid changing physical interrupt affinity if a
> > virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298).
> > Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the
> > time of clearing the LR we would change the physical irq affinity (see
> > xen/arch/arm/gic-vgic.c:L240).
> > 
> > I think we would need a similar mechanism here to protect ourselves from
> > races. Is there something equivalent in the new vgic?
> 
> I am not sure this is exactly covering your concerns, but I think we are
> pretty good with our "two vCPU approach" (irq->vcpu and
> irq->target_vcpu). So the affinity can change at any point at will, it
> won't affect this current interrupt. We handle migration explicitly in
> vgic_prune_ap_list().

Yeah, I like the new approach, it is well done. Kudos to Marc and
Christoffer and to you for porting it to Xen so well. I don't think we
need any extra-infrastructure for dealing with the _IRQ_INPROGRESS
issue.


> My gut feeling is that mirroring the physical active state in the
> _IRQ_INPROGRESS bit is a bad idea, as it's duplicating state and is
> racy, by it's very nature.
> The only purpose of this bit seems to be that once an IRQ is no longer
> connected to a guest - either because the domain is going to die or the
> IRQ being explicitly disconnected (which doesn't happen anymore?), we
> need to possibly deactivate the hardware side of that, right?
> I wonder if that can be achieved by probing the actual active state in
> the distributor instead? This should be the the authoritative state anyway.
> And this is done very rarely, so we don't care about the performance, do we?

Today, the purpose of _IRQ_INPROGRESS is to have a common way to deal
with physical interrupts targeting Xen and targeting guests. It is
common across architectures. I agree it is not very useful for guest
interrupts, but it is useful for hypervisor interrupts.

We could consider avoiding _IRQ_INPROGRESS altogether for guest
interrupts on ARM and using it only for hypervisor interrupts (do not
set _IRQ_INPROGRESS for guest interrupts at all). I cannot see a problem
with that right now, please double check I am not missing anything.

Otherwise, I think it would make sense to just make sure that when we
clear irq->vcpu, we also clear _IRQ_INPROGRESS consistently. Like we do
today.

Either way, it shouldn't be too hard to fix this issue.


> >> +        /* Always preserve the active bit */
> >> +        irq->active = lr_val.active;
> >> +
> >> +        /* Edge is the only case where we preserve the pending bit */
> >> +        if ( irq->config == VGIC_CONFIG_EDGE && lr_val.pending )
> >> +        {
> >> +            irq->pending_latch = true;
> >> +
> >> +            if ( vgic_irq_is_sgi(intid) )
> >> +                irq->source |= (1U << lr_val.virt.source);
> >> +        }
> >> +
> >> +        /* Clear soft pending state when level irqs have been acked. */
> >> +        if ( irq->config == VGIC_CONFIG_LEVEL && !lr_val.pending )
> >> +            irq->pending_latch = false;
> >> +
> >> +        /*
> >> +         * Level-triggered mapped IRQs are special because we only
> >> +         * observe rising edges as input to the VGIC.
> >> +         *
> >> +         * If the guest never acked the interrupt we have to sample
> >> +         * the physical line and set the line level, because the
> >> +         * device state could have changed or we simply need to
> >> +         * process the still pending interrupt later.
> >> +         *
> >> +         * If this causes us to lower the level, we have to also clear
> >> +         * the physical active state, since we will otherwise never be
> >> +         * told when the interrupt becomes asserted again.
> >> +         */
> >> +        if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
> >> +        {
> >> +            ASSERT(irq->hwintid >= VGIC_NR_PRIVATE_IRQS);
> >> +
> >> +            irq->line_level = gic_read_pending_state(desc);
> >> +
> >> +            if ( !irq->line_level )
> >> +                gic_set_active_state(desc, false);
> >> +        }
> >> +
> >> +        spin_unlock(&irq->irq_lock);
> >> +        if ( have_desc_lock )
> >> +            spin_unlock(&desc->lock);
> >> +        local_irq_restore(flags);
> >> +
> >> +        vgic_put_irq(vcpu->domain, irq);
> >> +    }
> >> +
> >> +    gic_hw_ops->update_hcr_status(GICH_HCR_EN, false);
> >> +    vgic_cpu->used_lrs = 0;
> >> +}
> >> +
> >> +/**
> >> + * vgic_v2_populate_lr() - Populates an LR with the state of a given IRQ.
> >> + * @vcpu: The VCPU which the given @irq belongs to.
> >> + * @irq:  The IRQ to convert into an LR. The irq_lock must be held already.
> >> + * @lr:   The LR number to transfer the state into.
> >> + *
> >> + * This moves a virtual IRQ, represented by its vgic_irq, into a list register.
> >> + * Apart from translating the logical state into the LR bitfields, it also
> >> + * changes some state in the vgic_irq.
> >> + * For an edge sensitive IRQ the pending state is cleared in struct vgic_irq,
> >> + * for a level sensitive IRQ the pending state value is unchanged, as it is
> >> + * dictated directly by the input line level.
> >> + *
> >> + * If @irq describes an SGI with multiple sources, we choose the
> >> + * lowest-numbered source VCPU and clear that bit in the source bitmap.
> >> + *
> >> + * The irq_lock must be held by the caller.
> >> + */
> >> +void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int lr)
> >> +{
> >> +    struct gic_lr lr_val = {0};
> >> +
> >> +    lr_val.virq = irq->intid;
> >> +
> >> +    if ( irq_is_pending(irq) )
> >> +    {
> >> +        lr_val.pending = true;
> >> +
> >> +        if ( irq->config == VGIC_CONFIG_EDGE )
> >> +            irq->pending_latch = false;
> >> +
> >> +        if ( vgic_irq_is_sgi(irq->intid) )
> >> +        {
> >> +            uint32_t src = ffs(irq->source);
> >> +
> >> +            BUG_ON(!src);
> >> +            lr_val.virt.source = (src - 1);
> >> +            irq->source &= ~(1 << (src - 1));
> >> +            if ( irq->source )
> >> +                irq->pending_latch = true;
> >> +        }
> >> +    }
> >> +
> >> +    lr_val.active = irq->active;
> >> +
> >> +    if ( irq->hw )
> >> +    {
> >> +        lr_val.hw_status = true;
> >> +        lr_val.hw.pirq = irq->hwintid;
> >> +        /*
> >> +         * Never set pending+active on a HW interrupt, as the
> >> +         * pending state is kept at the physical distributor
> >> +         * level.
> >> +         */
> >> +        if ( irq->active && irq_is_pending(irq) )
> >> +            lr_val.pending = false;
> >> +    }
> >> +    else
> >> +    {
> >> +        if ( irq->config == VGIC_CONFIG_LEVEL )
> >> +            lr_val.virt.eoi = true;
> >> +    }
> >> +
> >> +    /*
> >> +     * Level-triggered mapped IRQs are special because we only observe
> >> +     * rising edges as input to the VGIC.  We therefore lower the line
> >> +     * level here, so that we can take new virtual IRQs.  See
> >> +     * vgic_v2_fold_lr_state for more info.
> >> +     */
> >> +    if ( vgic_irq_is_mapped_level(irq) && lr_val.pending )
> >> +        irq->line_level = false;
> >> +
> >> +    /* The GICv2 LR only holds five bits of priority. */
> >> +    lr_val.priority = irq->priority >> 3;
> >> +
> >> +    gic_hw_ops->write_lr(lr, &lr_val);
> >> +}
> >> +
> >> +/*
> >> + * Local variables:
> >> + * mode: C
> >> + * c-file-style: "BSD"
> >> + * c-basic-offset: 4
> >> + * indent-tabs-mode: nil
> >> + * End:
> >> + */
> >> diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
> >> index d91ed29d96..214176c14e 100644
> >> --- a/xen/arch/arm/vgic/vgic.c
> >> +++ b/xen/arch/arm/vgic/vgic.c
> >> @@ -520,6 +520,7 @@ retry:
> >>  
> >>  static void vgic_fold_lr_state(struct vcpu *vcpu)
> >>  {
> >> +    vgic_v2_fold_lr_state(vcpu);
> >>  }
> >>  
> >>  /* Requires the irq_lock to be held. */
> >> @@ -527,6 +528,8 @@ static void vgic_populate_lr(struct vcpu *vcpu,
> >>                               struct vgic_irq *irq, int lr)
> >>  {
> >>      ASSERT(spin_is_locked(&irq->irq_lock));
> >> +
> >> +    vgic_v2_populate_lr(vcpu, irq, lr);
> >>  }
> >>  
> >>  static void vgic_set_underflow(struct vcpu *vcpu)
> >> @@ -640,7 +643,10 @@ void vgic_sync_to_lrs(void)
> >>      spin_lock(&current->arch.vgic.ap_list_lock);
> >>      vgic_flush_lr_state(current);
> >>      spin_unlock(&current->arch.vgic.ap_list_lock);
> >> +
> >> +    gic_hw_ops->update_hcr_status(GICH_HCR_EN, 1);
> >>  }
> >> +
> >>  /*
> >>   * Local variables:
> >>   * mode: C
> >> diff --git a/xen/arch/arm/vgic/vgic.h b/xen/arch/arm/vgic/vgic.h
> >> index 1547478518..e2b6d51e47 100644
> >> --- a/xen/arch/arm/vgic/vgic.h
> >> +++ b/xen/arch/arm/vgic/vgic.h
> >> @@ -27,6 +27,11 @@ static inline bool irq_is_pending(struct vgic_irq *irq)
> >>          return irq->pending_latch || irq->line_level;
> >>  }
> >>  
> >> +static inline bool vgic_irq_is_mapped_level(struct vgic_irq *irq)
> >> +{
> >> +    return irq->config == VGIC_CONFIG_LEVEL && irq->hw;
> >> +}
> >> +
> >>  struct vgic_irq *vgic_get_irq(struct domain *d, struct vcpu *vcpu,
> >>                                uint32_t intid);
> >>  void vgic_put_irq(struct domain *d, struct vgic_irq *irq);
> >> @@ -41,6 +46,10 @@ static inline void vgic_get_irq_kref(struct vgic_irq *irq)
> >>      atomic_inc(&irq->refcount);
> >>  }
> >>  
> >> +void vgic_v2_fold_lr_state(struct vcpu *vcpu);
> >> +void vgic_v2_populate_lr(struct vcpu *vcpu, struct vgic_irq *irq, int lr);
> >> +void vgic_v2_set_underflow(struct vcpu *vcpu);
> >> +
> >>  #endif
> >>  
> >>  /*

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-27 19:41       ` Stefano Stabellini
@ 2018-03-27 22:38         ` André Przywara
  2018-03-27 23:39           ` Stefano Stabellini
  0 siblings, 1 reply; 20+ messages in thread
From: André Przywara @ 2018-03-27 22:38 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, Julien Grall

On 27/03/18 20:41, Stefano Stabellini wrote:
> On Tue, 27 Mar 2018, Andre Przywara wrote:
>> Hi,
>>
>> On 27/03/18 00:22, Stefano Stabellini wrote:
>>> On Thu, 22 Mar 2018, Andre Przywara wrote:
>>>> Processing maintenance interrupts and accessing the list registers
>>>> are dependent on the host's GIC version.
>>>> Introduce vgic-v2.c to contain GICv2 specific functions.
>>>> Implement the GICv2 specific code for syncing the emulation state
>>>> into the VGIC registers.
>>>> This also adds the hook to let Xen setup the host GIC addresses.
>>>>
>>>> This is based on Linux commit 140b086dd197, written by Marc Zyngier.
>>>>
>>>> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
>>>> ---
>>>> Changelog v3 ... v3a:
>>>> - take hardware IRQ lock in vgic_v2_fold_lr_state()
>>>> - fix last remaining u32 usage
>>>> - print message when using new VGIC
>>>> - add TODO about racy _IRQ_INPROGRESS setting
>>>>
>>>> Changelog v2 ... v3:
>>>> - remove no longer needed asm/io.h header
>>>> - replace 0/1 with false/true for bool's
>>>> - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
>>>> - fix indentation and w/s issues
>>>>
>>>> Changelog v1 ... v2:
>>>> - remove v2 specific underflow function (now generic)
>>>> - re-add Linux code to properly handle acked level IRQs
>>>>
>>>>  xen/arch/arm/vgic/vgic-v2.c | 259 ++++++++++++++++++++++++++++++++++++++++++++
>>>>  xen/arch/arm/vgic/vgic.c    |   6 +
>>>>  xen/arch/arm/vgic/vgic.h    |   9 ++
>>>>  3 files changed, 274 insertions(+)
>>>>  create mode 100644 xen/arch/arm/vgic/vgic-v2.c
>>>>
>>>> diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
>>>> new file mode 100644
>>>> index 0000000000..1773503cfb
>>>> --- /dev/null
>>>> +++ b/xen/arch/arm/vgic/vgic-v2.c
>>>> @@ -0,0 +1,259 @@
>>>> +/*
>>>> + * Copyright (C) 2015, 2016 ARM Ltd.
>>>> + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify
>>>> + * it under the terms of the GNU General Public License version 2 as
>>>> + * published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope that it will be useful,
>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> + * GNU General Public License for more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License
>>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#include <asm/new_vgic.h>
>>>> +#include <asm/bug.h>
>>>> +#include <asm/gic.h>
>>>> +#include <xen/sched.h>
>>>> +#include <xen/sizes.h>
>>>> +
>>>> +#include "vgic.h"
>>>> +
>>>> +static struct {
>>>> +    bool enabled;
>>>> +    paddr_t dbase;          /* Distributor interface address */
>>>> +    paddr_t cbase;          /* CPU interface address & size */
>>>> +    paddr_t csize;
>>>> +    paddr_t vbase;          /* Virtual CPU interface address */
>>>> +
>>>> +    /* Offset to add to get an 8kB contiguous region if GIC is aliased */
>>>> +    uint32_t aliased_offset;
>>>> +} gic_v2_hw_data;
>>>> +
>>>> +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
>>>> +                      paddr_t vbase, uint32_t aliased_offset)
>>>> +{
>>>> +    gic_v2_hw_data.enabled = true;
>>>> +    gic_v2_hw_data.dbase = dbase;
>>>> +    gic_v2_hw_data.cbase = cbase;
>>>> +    gic_v2_hw_data.csize = csize;
>>>> +    gic_v2_hw_data.vbase = vbase;
>>>> +    gic_v2_hw_data.aliased_offset = aliased_offset;
>>>> +
>>>> +    printk("Using the new VGIC implementation.\n");
>>>> +}
>>>> +
>>>> +/*
>>>> + * transfer the content of the LRs back into the corresponding ap_list:
>>>> + * - active bit is transferred as is
>>>> + * - pending bit is
>>>> + *   - transferred as is in case of edge sensitive IRQs
>>>> + *   - set to the line-level (resample time) for level sensitive IRQs
>>>> + */
>>>> +void vgic_v2_fold_lr_state(struct vcpu *vcpu)
>>>> +{
>>>> +    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
>>>> +    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
>>>> +    unsigned long flags;
>>>> +    unsigned int lr;
>>>> +
>>>> +    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
>>>> +        return;
>>>> +
>>>> +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
>>>> +
>>>> +    for ( lr = 0; lr < used_lrs; lr++ )
>>>> +    {
>>>> +        struct gic_lr lr_val;
>>>> +        uint32_t intid;
>>>> +        struct vgic_irq *irq;
>>>> +        struct irq_desc *desc = NULL;
>>>> +        bool have_desc_lock = false;
>>>> +
>>>> +        gic_hw_ops->read_lr(lr, &lr_val);
>>>> +
>>>> +        /*
>>>> +         * TODO: Possible optimization to avoid reading LRs:
>>>> +         * Read the ELRSR to find out which of our LRs have been cleared
>>>> +         * by the guest. We just need to know the IRQ number for those, which
>>>> +         * we could save in an array when populating the LRs.
>>>> +         * This trades one MMIO access (ELRSR) for possibly more than one (LRs),
>>>> +         * but requires some more code to save the IRQ number and to handle
>>>> +         * those finished IRQs according to the algorithm below.
>>>> +         * We need some numbers to justify this: chances are that we don't
>>>> +         * have many LRs in use most of the time, so we might not save much.
>>>> +         */
>>>> +        gic_hw_ops->clear_lr(lr);
>>>> +
>>>> +        intid = lr_val.virq;
>>>> +        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
>>>> +
>>>> +        local_irq_save(flags);
>>>
>>> Shouldn't we disable interrupts earlier, maybe at the beginning of the
>>> function? Is it not a problem if we take an interrupt a couple of lines
>>> above with the read_lr and clear_lr that we do?
>>
>> In contrast to the existing VGIC we only touch the LRs when entering or
>> leaving the hypervisor, not in-between. So if an hardware IRQ fires
>> in-between, the handler will not touch any LRs. So I don't see any
>> problem with leaving interrupts enabled.
> 
> Nice! Now that you wrote the series and you know exactly how the code
> works, I would love to see an update on the design doc to write down
> stuff like this. (You don't have to do it as part of this series, as a
> follow up would be fine.)

Yes, that was my plan anyway.

>>>> +        spin_lock(&irq->irq_lock);
>>>> +
>>>> +        /* The locking order forces us to drop and re-take the locks here. */
>>>> +        if ( irq->hw )
>>>> +        {
>>>> +            spin_unlock(&irq->irq_lock);
>>>> +
>>>> +            desc = irq_to_desc(irq->hwintid);
>>>> +            spin_lock(&desc->lock);
>>>> +            spin_lock(&irq->irq_lock);
>>>> +
>>>> +            /* This h/w IRQ should still be assigned to the virtual IRQ. */
>>>> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
>>>> +
>>>> +            have_desc_lock = true;
>>>> +        }
>>>
>>> I agree with Julien that this looks very fragile. Instead, I think it
>>> would be best to always take the desc lock (if irq->hw) before the
>>> irq_lock earlier in this function.
>>
>> Well, how is this going work in a race free manner? To get the
>> corresponding hardware interrupt, we have to lookup irq->hw and
>> irq->hwintid, which is racy when done without holding the lock.
>>
>>> That way, we don't have to deal with
>>> this business of unlocking and relocking. Do you see any problems with
>>> it? We don't change irq->hw at run time, so it looks OK to me.
>>
>> Yeah, I see the point that irq->hw and irq->hwintid are somewhat
>> "write-once" members. But that is a bit fragile assumption, I expect
>> this actually to change over time. And then it will be hard to chase
>> down all places were we relied on this assumption. 
> 
> Yeah, we already make this assumption in other places. I would add a
> single-line TODO comment on top so that we can easily grep for it.

OK.

>> So I'd rather code
>> this in a sane way, so that we don't have to worry about.
>> Keep in mind, taking uncontended locks is rather cheap, and those locks
>> here probably are very much so.
> 
> Yeah but the code looks alien :-)  I would prefer to take the desc->lock
> earlier with a simple TODO comment. Otherwise, I would be also happy to
> see other ways to solve this issue.
> 
> 
>>>> +        /*
>>>> +         * If a hardware mapped IRQ has been handled for good, we need to
>>>> +         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
>>>> +         *
>>>> +         * TODO: This is probably racy, but is so already in the existing
>>>> +         * VGIC. A fix does not seem to be trivial.
>>>> +         */
>>>> +        if ( irq->hw && !lr_val.active && !lr_val.pending )
>>>> +            clear_bit(_IRQ_INPROGRESS, &desc->status);
>>>
>>> I'll reply here to Julien's comment:
>>>
>>>> I realize the current vGIC is doing exactly the same thing. But this is racy.
>>>>
>>>> Imagine the interrupt is firing on another pCPU (I wasn't able to rule out this even when the interrupt is following the vCPU), that pCPU may set _IRQ_INPROGRESS before this
>>>> is cleared here.
>>>
>>> The assumption in the old vgic was that this scenario was not possible.
>>> vgic_migrate_irq would avoid changing physical interrupt affinity if a
>>> virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298).
>>> Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the
>>> time of clearing the LR we would change the physical irq affinity (see
>>> xen/arch/arm/gic-vgic.c:L240).
>>>
>>> I think we would need a similar mechanism here to protect ourselves from
>>> races. Is there something equivalent in the new vgic?
>>
>> I am not sure this is exactly covering your concerns, but I think we are
>> pretty good with our "two vCPU approach" (irq->vcpu and
>> irq->target_vcpu). So the affinity can change at any point at will, it
>> won't affect this current interrupt. We handle migration explicitly in
>> vgic_prune_ap_list().
> 
> Yeah, I like the new approach, it is well done. Kudos to Marc and
> Christoffer and to you for porting it to Xen so well. I don't think we
> need any extra-infrastructure for dealing with the _IRQ_INPROGRESS
> issue.
> 
> 
>> My gut feeling is that mirroring the physical active state in the
>> _IRQ_INPROGRESS bit is a bad idea, as it's duplicating state and is
>> racy, by it's very nature.
>> The only purpose of this bit seems to be that once an IRQ is no longer
>> connected to a guest - either because the domain is going to die or the
>> IRQ being explicitly disconnected (which doesn't happen anymore?), we
>> need to possibly deactivate the hardware side of that, right?
>> I wonder if that can be achieved by probing the actual active state in
>> the distributor instead? This should be the the authoritative state anyway.
>> And this is done very rarely, so we don't care about the performance, do we?
> 
> Today, the purpose of _IRQ_INPROGRESS is to have a common way to deal
> with physical interrupts targeting Xen and targeting guests. It is
> common across architectures.

Ah, true, so it might be not a good idea to get rid of it, then.

> I agree it is not very useful for guest
> interrupts, but it is useful for hypervisor interrupts.
> 
> We could consider avoiding _IRQ_INPROGRESS altogether for guest
> interrupts on ARM and using it only for hypervisor interrupts (do not
> set _IRQ_INPROGRESS for guest interrupts at all). I cannot see a problem
> with that right now, please double check I am not missing anything.

Mmh, but wouldn't that kill a hardware mapped IRQ when the domain dies
while the IRQ is still handled by the guest? Because no-one will ever
deactivate this on the host side then, so new IRQs will be masked
forever? I thought this was one of the main use cases for this flag.

> Otherwise, I think it would make sense to just make sure that when we
> clear irq->vcpu, we also clear _IRQ_INPROGRESS consistently. Like we do
> today.
> 
> Either way, it shouldn't be too hard to fix this issue.

Alright, will try to come up with something tomorrow.

Cheers,
Andre


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system
  2018-03-22 11:56 ` [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system Andre Przywara
  2018-03-22 13:59   ` Julien Grall
@ 2018-03-27 22:53   ` Stefano Stabellini
  1 sibling, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2018-03-27 22:53 UTC (permalink / raw)
  To: Andre Przywara; +Cc: xen-devel, Julien Grall, Stefano Stabellini

On Thu, 22 Mar 2018, Andre Przywara wrote:
> Now that we have both the old VGIC prepared to cope with a sibling and
> the code for the new VGIC in place, lets add a Kconfig option to enable
> the new code and wire it into the Xen build system.
> This will add a compile time option to use either the "old" or the "new"
> VGIC.
> In the moment this is restricted to a vGIC-v2. To make the build system
> happy, we provide a temporary dummy implementation of
> vgic_v3_setup_hw() to allow building for now.
> 
> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> Changelog v3 ... v3a:
> - print panic when trying to run on GICv3 hardware
> 
> Changelog v2 ... v3:
> - fix indentation of Kconfig entry
> - select NEEDS_LIST_SORT
> - drop unconditional list_sort.o inclusion
> 
> Changelog v1 ... v2:
> - add Kconfig help text
> - use separate Makefile in vgic/ directory
> - protect compilation without GICV3 support
> - always include list_sort() in build
> 
>  xen/arch/arm/Kconfig       | 18 +++++++++++++++++-
>  xen/arch/arm/Makefile      |  5 ++++-
>  xen/arch/arm/vgic/Makefile |  5 +++++
>  xen/arch/arm/vgic/vgic.c   | 11 +++++++++++
>  4 files changed, 37 insertions(+), 2 deletions(-)
>  create mode 100644 xen/arch/arm/vgic/Makefile
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 2782ee6589..8174c0c635 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -48,7 +48,23 @@ config HAS_GICV3
>  config HAS_ITS
>          bool
>          prompt "GICv3 ITS MSI controller support" if EXPERT = "y"
> -        depends on HAS_GICV3
> +        depends on HAS_GICV3 && !NEW_VGIC
> +
> +config NEW_VGIC
> +	bool
> +	prompt "Use new VGIC implementation"
> +	select NEEDS_LIST_SORT
> +	---help---
> +
> +	This is an alternative implementation of the ARM GIC interrupt
> +	controller emulation, based on the Linux/KVM VGIC. It has a better
> +	design and fixes many shortcomings of the existing GIC emulation in
> +	Xen. It will eventually replace the existing/old VGIC.
> +	However at the moment it lacks support for Dom0 using the ITS for
> +	using MSIs.
> +	Say Y if you want to help testing this new code or if you experience
> +	problems with the standard emulation.
> +	At the moment this implementation is not security supported.
>  
>  config SBSA_VUART_CONSOLE
>  	bool "Emulated SBSA UART console support"
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 41d7366527..a9533b107e 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -16,7 +16,6 @@ obj-y += domain_build.o
>  obj-y += domctl.o
>  obj-$(EARLY_PRINTK) += early_printk.o
>  obj-y += gic.o
> -obj-y += gic-vgic.o
>  obj-y += gic-v2.o
>  obj-$(CONFIG_HAS_GICV3) += gic-v3.o
>  obj-$(CONFIG_HAS_ITS) += gic-v3-its.o
> @@ -47,10 +46,14 @@ obj-y += sysctl.o
>  obj-y += time.o
>  obj-y += traps.o
>  obj-y += vcpreg.o
> +subdir-$(CONFIG_NEW_VGIC) += vgic
> +ifneq ($(CONFIG_NEW_VGIC),y)
> +obj-y += gic-vgic.o
>  obj-y += vgic.o
>  obj-y += vgic-v2.o
>  obj-$(CONFIG_HAS_GICV3) += vgic-v3.o
>  obj-$(CONFIG_HAS_ITS) += vgic-v3-its.o
> +endif
>  obj-y += vm_event.o
>  obj-y += vtimer.o
>  obj-$(CONFIG_SBSA_VUART_CONSOLE) += vpl011.o
> diff --git a/xen/arch/arm/vgic/Makefile b/xen/arch/arm/vgic/Makefile
> new file mode 100644
> index 0000000000..806826948e
> --- /dev/null
> +++ b/xen/arch/arm/vgic/Makefile
> @@ -0,0 +1,5 @@
> +obj-y += vgic.o
> +obj-y += vgic-v2.o
> +obj-y += vgic-mmio.o
> +obj-y += vgic-mmio-v2.o
> +obj-y += vgic-init.o
> diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
> index f9a5088285..ac18cab6f3 100644
> --- a/xen/arch/arm/vgic/vgic.c
> +++ b/xen/arch/arm/vgic/vgic.c
> @@ -981,6 +981,17 @@ unsigned int vgic_max_vcpus(const struct domain *d)
>      return min_t(unsigned int, MAX_VIRT_CPUS, vgic_vcpu_limit);
>  }
>  
> +#ifdef CONFIG_HAS_GICV3
> +/* Dummy implementation to allow building without actual vGICv3 support. */
> +void vgic_v3_setup_hw(paddr_t dbase,
> +                      unsigned int nr_rdist_regions,
> +                      const struct rdist_region *regions,
> +                      unsigned int intid_bits)
> +{
> +    panic("New VGIC implementation does not yet support GICv3.");
> +}
> +#endif
> +
>  /*
>   * Local variables:
>   * mode: C
> -- 
> 2.14.1
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend
  2018-03-27 22:38         ` André Przywara
@ 2018-03-27 23:39           ` Stefano Stabellini
  0 siblings, 0 replies; 20+ messages in thread
From: Stefano Stabellini @ 2018-03-27 23:39 UTC (permalink / raw)
  To: André Przywara; +Cc: xen-devel, Julien Grall, Stefano Stabellini

[-- Attachment #1: Type: TEXT/PLAIN, Size: 12847 bytes --]

On Tue, 27 Mar 2018, André Przywara wrote:
> On 27/03/18 20:41, Stefano Stabellini wrote:
> > On Tue, 27 Mar 2018, Andre Przywara wrote:
> >> Hi,
> >>
> >> On 27/03/18 00:22, Stefano Stabellini wrote:
> >>> On Thu, 22 Mar 2018, Andre Przywara wrote:
> >>>> Processing maintenance interrupts and accessing the list registers
> >>>> are dependent on the host's GIC version.
> >>>> Introduce vgic-v2.c to contain GICv2 specific functions.
> >>>> Implement the GICv2 specific code for syncing the emulation state
> >>>> into the VGIC registers.
> >>>> This also adds the hook to let Xen setup the host GIC addresses.
> >>>>
> >>>> This is based on Linux commit 140b086dd197, written by Marc Zyngier.
> >>>>
> >>>> Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
> >>>> ---
> >>>> Changelog v3 ... v3a:
> >>>> - take hardware IRQ lock in vgic_v2_fold_lr_state()
> >>>> - fix last remaining u32 usage
> >>>> - print message when using new VGIC
> >>>> - add TODO about racy _IRQ_INPROGRESS setting
> >>>>
> >>>> Changelog v2 ... v3:
> >>>> - remove no longer needed asm/io.h header
> >>>> - replace 0/1 with false/true for bool's
> >>>> - clear _IRQ_INPROGRESS bit when retiring hardware mapped IRQ
> >>>> - fix indentation and w/s issues
> >>>>
> >>>> Changelog v1 ... v2:
> >>>> - remove v2 specific underflow function (now generic)
> >>>> - re-add Linux code to properly handle acked level IRQs
> >>>>
> >>>>  xen/arch/arm/vgic/vgic-v2.c | 259 ++++++++++++++++++++++++++++++++++++++++++++
> >>>>  xen/arch/arm/vgic/vgic.c    |   6 +
> >>>>  xen/arch/arm/vgic/vgic.h    |   9 ++
> >>>>  3 files changed, 274 insertions(+)
> >>>>  create mode 100644 xen/arch/arm/vgic/vgic-v2.c
> >>>>
> >>>> diff --git a/xen/arch/arm/vgic/vgic-v2.c b/xen/arch/arm/vgic/vgic-v2.c
> >>>> new file mode 100644
> >>>> index 0000000000..1773503cfb
> >>>> --- /dev/null
> >>>> +++ b/xen/arch/arm/vgic/vgic-v2.c
> >>>> @@ -0,0 +1,259 @@
> >>>> +/*
> >>>> + * Copyright (C) 2015, 2016 ARM Ltd.
> >>>> + * Imported from Linux ("new" KVM VGIC) and heavily adapted to Xen.
> >>>> + *
> >>>> + * This program is free software; you can redistribute it and/or modify
> >>>> + * it under the terms of the GNU General Public License version 2 as
> >>>> + * published by the Free Software Foundation.
> >>>> + *
> >>>> + * This program is distributed in the hope that it will be useful,
> >>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >>>> + * GNU General Public License for more details.
> >>>> + *
> >>>> + * You should have received a copy of the GNU General Public License
> >>>> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> >>>> + */
> >>>> +
> >>>> +#include <asm/new_vgic.h>
> >>>> +#include <asm/bug.h>
> >>>> +#include <asm/gic.h>
> >>>> +#include <xen/sched.h>
> >>>> +#include <xen/sizes.h>
> >>>> +
> >>>> +#include "vgic.h"
> >>>> +
> >>>> +static struct {
> >>>> +    bool enabled;
> >>>> +    paddr_t dbase;          /* Distributor interface address */
> >>>> +    paddr_t cbase;          /* CPU interface address & size */
> >>>> +    paddr_t csize;
> >>>> +    paddr_t vbase;          /* Virtual CPU interface address */
> >>>> +
> >>>> +    /* Offset to add to get an 8kB contiguous region if GIC is aliased */
> >>>> +    uint32_t aliased_offset;
> >>>> +} gic_v2_hw_data;
> >>>> +
> >>>> +void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
> >>>> +                      paddr_t vbase, uint32_t aliased_offset)
> >>>> +{
> >>>> +    gic_v2_hw_data.enabled = true;
> >>>> +    gic_v2_hw_data.dbase = dbase;
> >>>> +    gic_v2_hw_data.cbase = cbase;
> >>>> +    gic_v2_hw_data.csize = csize;
> >>>> +    gic_v2_hw_data.vbase = vbase;
> >>>> +    gic_v2_hw_data.aliased_offset = aliased_offset;
> >>>> +
> >>>> +    printk("Using the new VGIC implementation.\n");
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> + * transfer the content of the LRs back into the corresponding ap_list:
> >>>> + * - active bit is transferred as is
> >>>> + * - pending bit is
> >>>> + *   - transferred as is in case of edge sensitive IRQs
> >>>> + *   - set to the line-level (resample time) for level sensitive IRQs
> >>>> + */
> >>>> +void vgic_v2_fold_lr_state(struct vcpu *vcpu)
> >>>> +{
> >>>> +    struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic;
> >>>> +    unsigned int used_lrs = vcpu->arch.vgic.used_lrs;
> >>>> +    unsigned long flags;
> >>>> +    unsigned int lr;
> >>>> +
> >>>> +    if ( !used_lrs )    /* No LRs used, so nothing to sync back here. */
> >>>> +        return;
> >>>> +
> >>>> +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
> >>>> +
> >>>> +    for ( lr = 0; lr < used_lrs; lr++ )
> >>>> +    {
> >>>> +        struct gic_lr lr_val;
> >>>> +        uint32_t intid;
> >>>> +        struct vgic_irq *irq;
> >>>> +        struct irq_desc *desc = NULL;
> >>>> +        bool have_desc_lock = false;
> >>>> +
> >>>> +        gic_hw_ops->read_lr(lr, &lr_val);
> >>>> +
> >>>> +        /*
> >>>> +         * TODO: Possible optimization to avoid reading LRs:
> >>>> +         * Read the ELRSR to find out which of our LRs have been cleared
> >>>> +         * by the guest. We just need to know the IRQ number for those, which
> >>>> +         * we could save in an array when populating the LRs.
> >>>> +         * This trades one MMIO access (ELRSR) for possibly more than one (LRs),
> >>>> +         * but requires some more code to save the IRQ number and to handle
> >>>> +         * those finished IRQs according to the algorithm below.
> >>>> +         * We need some numbers to justify this: chances are that we don't
> >>>> +         * have many LRs in use most of the time, so we might not save much.
> >>>> +         */
> >>>> +        gic_hw_ops->clear_lr(lr);
> >>>> +
> >>>> +        intid = lr_val.virq;
> >>>> +        irq = vgic_get_irq(vcpu->domain, vcpu, intid);
> >>>> +
> >>>> +        local_irq_save(flags);
> >>>
> >>> Shouldn't we disable interrupts earlier, maybe at the beginning of the
> >>> function? Is it not a problem if we take an interrupt a couple of lines
> >>> above with the read_lr and clear_lr that we do?
> >>
> >> In contrast to the existing VGIC we only touch the LRs when entering or
> >> leaving the hypervisor, not in-between. So if an hardware IRQ fires
> >> in-between, the handler will not touch any LRs. So I don't see any
> >> problem with leaving interrupts enabled.
> > 
> > Nice! Now that you wrote the series and you know exactly how the code
> > works, I would love to see an update on the design doc to write down
> > stuff like this. (You don't have to do it as part of this series, as a
> > follow up would be fine.)
> 
> Yes, that was my plan anyway.
> 
> >>>> +        spin_lock(&irq->irq_lock);
> >>>> +
> >>>> +        /* The locking order forces us to drop and re-take the locks here. */
> >>>> +        if ( irq->hw )
> >>>> +        {
> >>>> +            spin_unlock(&irq->irq_lock);
> >>>> +
> >>>> +            desc = irq_to_desc(irq->hwintid);
> >>>> +            spin_lock(&desc->lock);
> >>>> +            spin_lock(&irq->irq_lock);
> >>>> +
> >>>> +            /* This h/w IRQ should still be assigned to the virtual IRQ. */
> >>>> +            ASSERT(irq->hw && desc->irq == irq->hwintid);
> >>>> +
> >>>> +            have_desc_lock = true;
> >>>> +        }
> >>>
> >>> I agree with Julien that this looks very fragile. Instead, I think it
> >>> would be best to always take the desc lock (if irq->hw) before the
> >>> irq_lock earlier in this function.
> >>
> >> Well, how is this going work in a race free manner? To get the
> >> corresponding hardware interrupt, we have to lookup irq->hw and
> >> irq->hwintid, which is racy when done without holding the lock.
> >>
> >>> That way, we don't have to deal with
> >>> this business of unlocking and relocking. Do you see any problems with
> >>> it? We don't change irq->hw at run time, so it looks OK to me.
> >>
> >> Yeah, I see the point that irq->hw and irq->hwintid are somewhat
> >> "write-once" members. But that is a bit fragile assumption, I expect
> >> this actually to change over time. And then it will be hard to chase
> >> down all places were we relied on this assumption. 
> > 
> > Yeah, we already make this assumption in other places. I would add a
> > single-line TODO comment on top so that we can easily grep for it.
> 
> OK.
> 
> >> So I'd rather code
> >> this in a sane way, so that we don't have to worry about.
> >> Keep in mind, taking uncontended locks is rather cheap, and those locks
> >> here probably are very much so.
> > 
> > Yeah but the code looks alien :-)  I would prefer to take the desc->lock
> > earlier with a simple TODO comment. Otherwise, I would be also happy to
> > see other ways to solve this issue.
> > 
> > 
> >>>> +        /*
> >>>> +         * If a hardware mapped IRQ has been handled for good, we need to
> >>>> +         * clear the _IRQ_INPROGRESS bit to allow handling of new IRQs.
> >>>> +         *
> >>>> +         * TODO: This is probably racy, but is so already in the existing
> >>>> +         * VGIC. A fix does not seem to be trivial.
> >>>> +         */
> >>>> +        if ( irq->hw && !lr_val.active && !lr_val.pending )
> >>>> +            clear_bit(_IRQ_INPROGRESS, &desc->status);
> >>>
> >>> I'll reply here to Julien's comment:
> >>>
> >>>> I realize the current vGIC is doing exactly the same thing. But this is racy.
> >>>>
> >>>> Imagine the interrupt is firing on another pCPU (I wasn't able to rule out this even when the interrupt is following the vCPU), that pCPU may set _IRQ_INPROGRESS before this
> >>>> is cleared here.
> >>>
> >>> The assumption in the old vgic was that this scenario was not possible.
> >>> vgic_migrate_irq would avoid changing physical interrupt affinity if a
> >>> virtual interrupt was currently in an LR (see xen/arch/arm/vgic.c:L298).
> >>> Instead, it would set the irq as GIC_IRQ_GUEST_MIGRATING, then at the
> >>> time of clearing the LR we would change the physical irq affinity (see
> >>> xen/arch/arm/gic-vgic.c:L240).
> >>>
> >>> I think we would need a similar mechanism here to protect ourselves from
> >>> races. Is there something equivalent in the new vgic?
> >>
> >> I am not sure this is exactly covering your concerns, but I think we are
> >> pretty good with our "two vCPU approach" (irq->vcpu and
> >> irq->target_vcpu). So the affinity can change at any point at will, it
> >> won't affect this current interrupt. We handle migration explicitly in
> >> vgic_prune_ap_list().
> > 
> > Yeah, I like the new approach, it is well done. Kudos to Marc and
> > Christoffer and to you for porting it to Xen so well. I don't think we
> > need any extra-infrastructure for dealing with the _IRQ_INPROGRESS
> > issue.
> > 
> > 
> >> My gut feeling is that mirroring the physical active state in the
> >> _IRQ_INPROGRESS bit is a bad idea, as it's duplicating state and is
> >> racy, by it's very nature.
> >> The only purpose of this bit seems to be that once an IRQ is no longer
> >> connected to a guest - either because the domain is going to die or the
> >> IRQ being explicitly disconnected (which doesn't happen anymore?), we
> >> need to possibly deactivate the hardware side of that, right?
> >> I wonder if that can be achieved by probing the actual active state in
> >> the distributor instead? This should be the the authoritative state anyway.
> >> And this is done very rarely, so we don't care about the performance, do we?
> > 
> > Today, the purpose of _IRQ_INPROGRESS is to have a common way to deal
> > with physical interrupts targeting Xen and targeting guests. It is
> > common across architectures.
> 
> Ah, true, so it might be not a good idea to get rid of it, then.
> 
> > I agree it is not very useful for guest
> > interrupts, but it is useful for hypervisor interrupts.
> > 
> > We could consider avoiding _IRQ_INPROGRESS altogether for guest
> > interrupts on ARM and using it only for hypervisor interrupts (do not
> > set _IRQ_INPROGRESS for guest interrupts at all). I cannot see a problem
> > with that right now, please double check I am not missing anything.
> 
> Mmh, but wouldn't that kill a hardware mapped IRQ when the domain dies
> while the IRQ is still handled by the guest? Because no-one will ever
> deactivate this on the host side then, so new IRQs will be masked
> forever? I thought this was one of the main use cases for this flag.

Yes, gic_remove_irq_from_guest needs to be changed to read the state of
the irq from the distributor.


> > Otherwise, I think it would make sense to just make sure that when we
> > clear irq->vcpu, we also clear _IRQ_INPROGRESS consistently. Like we do
> > today.
> > 
> > Either way, it shouldn't be too hard to fix this issue.
> 
> Alright, will try to come up with something tomorrow.

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-03-27 23:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22 11:56 [PATCH v3a 00/39] (0/3) Fixups for the new VGIC(-v2) implementation Andre Przywara
2018-03-22 11:56 ` [PATCH v3a 03/39] ARM: GIC: Allow tweaking the active and pending state of an IRQ Andre Przywara
2018-03-22 13:58   ` Julien Grall
2018-03-26 20:09   ` Stefano Stabellini
2018-03-22 11:56 ` [PATCH v3a 14/39] ARM: new VGIC: Add GICv2 world switch backend Andre Przywara
2018-03-22 14:06   ` Julien Grall
2018-03-22 15:12     ` Andre Przywara
2018-03-23  0:49       ` Julien Grall
2018-03-22 16:26     ` Andre Przywara
2018-03-26 23:22   ` Stefano Stabellini
2018-03-27  0:16     ` Julien Grall
2018-03-27  0:23       ` Stefano Stabellini
2018-03-27  0:21     ` Stefano Stabellini
2018-03-27 15:33     ` Andre Przywara
2018-03-27 19:41       ` Stefano Stabellini
2018-03-27 22:38         ` André Przywara
2018-03-27 23:39           ` Stefano Stabellini
2018-03-22 11:56 ` [PATCH v3a 39/39] ARM: VGIC: wire new VGIC(-v2) files into Xen build system Andre Przywara
2018-03-22 13:59   ` Julien Grall
2018-03-27 22:53   ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.