[PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers
@ 2015-11-18 16:42 Julien Grall
  2015-11-18 16:42 ` [PATCH v6 1/6] xen/arm: vgic-v2: Implement correctly ITARGETSR0 - ITARGETSR7 read-only Julien Grall
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Julien Grall @ 2015-11-18 16:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, ian.campbell, stefano.stabellini

Hi all,

This series aims to fix the 32-bit access on 64-bit registers. Some guest
OS such as FreeBSD and Linux (ITS and recently 32-bit guest using GICv3)
use 32-bit access and will crash at boot time.

For all the changes see in each patch.

Sincerely yours,

Julien Grall (6):
  xen/arm: vgic-v2: Implement correctly ITARGETSR0 - ITARGETSR7
    read-only
  xen/arm: vgic-v2: Handle correctly byte write in ITARGETSR
  xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0
  xen/arm: vgic: Optimize the way to store the target vCPU in the rank
  xen/arm: vgic: Introduce helpers to extract/update/clear/set vGIC
    register ...
  xen/arm: vgic-v3: Support 32-bit access for 64-bit registers

 xen/arch/arm/vgic-v2.c     | 282 +++++++++++++++++++++++++++------------------
 xen/arch/arm/vgic-v3.c     | 264 ++++++++++++++++++++++++++----------------
 xen/arch/arm/vgic.c        |  45 ++++++--
 xen/include/asm-arm/gic.h  |   2 +
 xen/include/asm-arm/vgic.h | 129 +++++++++++++++++----
 5 files changed, 480 insertions(+), 242 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v6 1/6] xen/arm: vgic-v2: Implement correctly ITARGETSR0 - ITARGETSR7 read-only
  2015-11-18 16:42 [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers Julien Grall
@ 2015-11-18 16:42 ` Julien Grall
  2015-11-18 16:42 ` [PATCH v6 2/6] xen/arm: vgic-v2: Handle correctly byte write in ITARGETSR Julien Grall
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Julien Grall @ 2015-11-18 16:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, ian.campbell, stefano.stabellini

Each ITARGETSR register are 4-byte wide and the offset is in byte.

The current implementation is computing the end of the range wrongly
resulting to emulate only ITARGETSR{0,1} read-only. The rest will be
treated as read-write.

As 8 registers should be read-only, the end of the range should be
ITARGETSR + (4 * 8) - 1.

For convenience introduce ITARGETSR7 and ITARGETSR8.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---
    This would be a good candidate to backport. Without it a guest could
    modify ITARGETSR{0-7} and redirect the interrupt to the wrong vCPU.

    Spotted while testing to boot FreeBSD guest with this series.
    FreeBSD is writing in ITARGETSR{0 - 7} and will therefore crash xen
    due to the valid ASSERT in vgic_store_itargetsr.

    Note that the emulation is not properly emulated the last register
    of each range. I'm planning to fix it in a follow-up series.

    Changes in v6:
        - Add Stefano's reviewed-by

    Changes in v5:
        - Patch added
---
 xen/arch/arm/vgic-v2.c    | 4 ++--
 xen/include/asm-arm/gic.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index f7d784b..041291c 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -338,11 +338,11 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
                v, r, gicd_reg - GICD_ICACTIVER);
         return 0;
 
-    case GICD_ITARGETSR ... GICD_ITARGETSR + 7:
+    case GICD_ITARGETSR ... GICD_ITARGETSR7:
         /* SGI/PPI target is read only */
         goto write_ignore_32;
 
-    case GICD_ITARGETSR + 8 ... GICD_ITARGETSRN:
+    case GICD_ITARGETSR8 ... GICD_ITARGETSRN:
     {
         /* unsigned long needed for find_next_bit */
         unsigned long target;
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index 0116481..3064d1c 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -42,6 +42,8 @@
 #define GICD_IPRIORITYR (0x400)
 #define GICD_IPRIORITYRN (0x7F8)
 #define GICD_ITARGETSR  (0x800)
+#define GICD_ITARGETSR7 (0x81C)
+#define GICD_ITARGETSR8 (0x820)
 #define GICD_ITARGETSRN (0xBF8)
 #define GICD_ICFGR      (0xC00)
 #define GICD_ICFGRN     (0xCFC)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 2/6] xen/arm: vgic-v2: Handle correctly byte write in ITARGETSR
  2015-11-18 16:42 [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers Julien Grall
  2015-11-18 16:42 ` [PATCH v6 1/6] xen/arm: vgic-v2: Implement correctly ITARGETSR0 - ITARGETSR7 read-only Julien Grall
@ 2015-11-18 16:42 ` Julien Grall
  2015-11-18 16:42 ` [PATCH v6 3/6] xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0 Julien Grall
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Julien Grall @ 2015-11-18 16:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, ian.campbell, stefano.stabellini

During a store, the byte is always in the low part of the register (i.e
[0:7]).

We are incorrectly masking the register by using a shift of the byte
offset in the ITARGETSR while the byte is alwasy in r[0:7]. This will
result in a target list equal to 0 which is ignored by the emulation.

Because of that the guest will only be able to modify the first byte in
each ITARGETSR.

Furthermore, the body of the loop is retrieving the old target list
using the index of the byte.

To avoid modifying too much the loop, shift the byte stored to the correct
offset.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

----
    This change used to be embedded in "xen/arm: vgic: Optimize the way
    to store the target vCPU in the rank". It has been moved out to
    avoid having too much functional changes in a single patch.

    This patch is a good candidate to backport to Xen 4.6 and Xen 4.5.
    Without it a guest won't be able migrate an IRQ from one vCPU to
    another if it's using byte access to write in ITARGETSR.

    Note that if we backport this patch alone, the resulting code in
    earlier version of Xen will be complex to read. As the last patch
    of this serie should also be backported, I'm planning to request
    backport for the whole series.

    Changes in v6:
        - Add Stefano's acked-by

    Changes in v5:
        - Update commit message based on Ian's suggestion

    Changes in v4:
        - Patch added
---
 xen/arch/arm/vgic-v2.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 041291c..486e497 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -353,11 +353,11 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         /* 8-bit vcpu mask for this domain */
         BUG_ON(v->domain->max_vcpus > 8);
         target = (1 << v->domain->max_vcpus) - 1;
-        if ( dabt.size == 2 )
-            target = target | (target << 8) | (target << 16) | (target << 24);
+        target = target | (target << 8) | (target << 16) | (target << 24);
+        if ( dabt.size == DABT_WORD )
+            target &= r;
         else
-            target = (target << (8 * (gicd_reg & 0x3)));
-        target &= r;
+            target &= (r << (8 * (gicd_reg & 0x3)));
         /* ignore zero writes */
         if ( !target )
             goto write_ignore;
@@ -381,7 +381,7 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
 
             if ( new_target != old_target )
             {
-                irq = gicd_reg - GICD_ITARGETSR + (i / 8);
+                irq = (gicd_reg & ~0x3) - GICD_ITARGETSR + (i / 8);
                 v_target = v->domain->vcpu[new_target];
                 v_old = v->domain->vcpu[old_target];
                 vgic_migrate_irq(v_old, v_target, irq);
@@ -393,7 +393,7 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
                                              DABT_WORD)] = target;
         else
             vgic_byte_write(&rank->v2.itargets[REG_RANK_INDEX(8,
-                      gicd_reg - GICD_ITARGETSR, DABT_WORD)], target, gicd_reg);
+                      gicd_reg - GICD_ITARGETSR, DABT_WORD)], r, gicd_reg);
         vgic_unlock_rank(v, rank, flags);
         return 1;
     }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 3/6] xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0
  2015-11-18 16:42 [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers Julien Grall
  2015-11-18 16:42 ` [PATCH v6 1/6] xen/arm: vgic-v2: Implement correctly ITARGETSR0 - ITARGETSR7 read-only Julien Grall
  2015-11-18 16:42 ` [PATCH v6 2/6] xen/arm: vgic-v2: Handle correctly byte write in ITARGETSR Julien Grall
@ 2015-11-18 16:42 ` Julien Grall
  2015-11-25 11:34   ` Ian Campbell
  2015-11-18 16:42 ` [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank Julien Grall
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2015-11-18 16:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, ian.campbell, stefano.stabellini

The current implementation ignores the whole write if one of the field is
0. Although, based on the spec (4.3.12 IHI 0048B.b), 0 is a valid value
when:
    - The interrupt is not wired in the distributor. From the Xen
    point of view, it means that the corresponding bit is not set in
    d->arch.vgic.allocated_irqs.
    - The user wants to disable the IRQ forwarding in the distributor.
    I.e the IRQ stays pending in the distributor and never received by
    the guest.

Implementing the later will require more work in Xen because we always
assume the interrupt is forwarded to a valid vCPU. So for now, ignore
any field where the value is 0.

The emulation of the write access of ITARGETSR has been reworked and
moved to a new function because it would have been difficult to
implement properly the behavior with the current code.

The new implementation is breaking the register in 4 distinct bytes. For
each byte, it will check the validity of the target list, find the new
target, migrate the interrupt and store the value if necessary.

In the new implementation there is nearly no distinction of the access
size to avoid having too many different path which is harder to test.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
    This change used to be embedded in "xen/arm: vgic: Optimize the way
    to store the target vCPU in the rank". It has been moved out to
    avoid having too much functional changes in a single patch.

    I'm not sure if this patch should be backported to Xen 4.6. Without
    it any guest writing 0 in one the field won't be able to migrate
    other interrupts. Although, in all the use case I've seen, the guest
    is read ITARGETSR first and write-back the value with the
    corresponding byte changed.

    Note that NR_BITS_PER_TARGET has been kept as a define because it
    will be use in another function in the next patch and I wanted to
    avoid having to duplicate const var = ...

    Also I didn't bother to move the incrementation of the offset on the
    next patch as it's currently harmless.

    Changes in v5:
        - s/NR_TARGET_PER_REG/NR_TARGETS_PER_ITARGETSR/
        - s/NR_BIT_PER_TARGET/NR_BITS_PER_TARGET/
        - Compute NR_BITS_PER_TARGET based on NR_TARGETS_PER_ITARGETSR
        - Typoes in the comments
        - Compute the shift for itargetsr on every loop
        - Use gprintk(XENLOG_WARNING,...)
        - Clarify commit message

    Changes in v4:
        - Patch added
---
 xen/arch/arm/vgic-v2.c | 145 ++++++++++++++++++++++++++++++++++---------------
 1 file changed, 102 insertions(+), 43 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 486e497..ad2ea0a 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -57,6 +57,98 @@ void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
     vgic_v2_hw.aliased_offset = aliased_offset;
 }
 
+#define NR_TARGETS_PER_ITARGETSR    4U
+#define NR_BITS_PER_TARGET  (32U / NR_TARGETS_PER_ITARGETSR)
+
+/*
+ * Store an ITARGETSR register. This function only deals with ITARGETSR8
+ * and onwards.
+ *
+ * Note the offset will be aligned to the appropriate boundary.
+ */
+static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
+                                 unsigned int offset, uint32_t itargetsr)
+{
+    unsigned int i;
+    unsigned int regidx = REG_RANK_INDEX(8, offset, DABT_WORD);
+    unsigned int virq;
+
+    ASSERT(spin_is_locked(&rank->lock));
+
+    /*
+     * The ITARGETSR0-7, used for SGIs/PPIs, are implemented RO in the
+     * emulation and should never call this function.
+     *
+     * They all live in the first rank.
+     */
+    BUILD_BUG_ON(NR_INTERRUPT_PER_RANK != 32);
+    ASSERT(rank->index >= 1);
+
+    offset &= INTERRUPT_RANK_MASK;
+    offset &= ~(NR_TARGETS_PER_ITARGETSR - 1);
+
+    virq = rank->index * NR_INTERRUPT_PER_RANK + offset;
+
+    for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++, virq++ )
+    {
+        unsigned int new_target, old_target;
+        uint8_t new_mask, old_mask;
+
+        /*
+         * Don't need to mask as we rely on new_mask to fit for only one
+         * target.
+         */
+        BUILD_BUG_ON((sizeof (new_mask) * 8) != NR_BITS_PER_TARGET);
+
+        new_mask = itargetsr >> (i * NR_BITS_PER_TARGET);
+        old_mask = vgic_byte_read(rank->v2.itargets[regidx], i);
+
+        /*
+         * SPIs are using the 1-N model (see 1.4.3 in ARM IHI 0048B).
+         * While the interrupt could be set pending to all the vCPUs in
+         * target list, it's not guaranteed by the spec.
+         * For simplicity, always route the vIRQ to the first interrupt
+         * in the target list
+         */
+        new_target = ffs(new_mask);
+        old_target = ffs(old_mask);
+
+        /* The current target should always be valid */
+        ASSERT(old_target && (old_target <= d->max_vcpus));
+
+        /*
+         * Ignore the write request for this interrupt if the new target
+         * is invalid.
+         * XXX: From the spec, if the target list is not valid, the
+         * interrupt should be ignored (i.e not forwarded to the
+         * guest).
+         */
+        if ( !new_target || (new_target > d->max_vcpus) )
+        {
+            gprintk(XENLOG_WARNING,
+                   "No valid vCPU found for vIRQ%u in the target list (%#x). Skip it\n",
+                   virq, new_mask);
+            continue;
+        }
+
+        /* The vCPU ID always starts from 0 */
+        new_target--;
+        old_target--;
+
+        /* Only migrate the vIRQ if the target vCPU has changed */
+        if ( new_target != old_target )
+        {
+            vgic_migrate_irq(d->vcpu[old_target],
+                             d->vcpu[new_target],
+                             virq);
+        }
+
+        /* Bit corresponding to unimplemented CPU is write-ignore. */
+        new_mask &= (1 << d->max_vcpus) - 1;
+        vgic_byte_write(&rank->v2.itargets[regidx], new_mask, i);
+    }
+}
+
 static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
                                    register_t *r, void *priv)
 {
@@ -344,56 +436,23 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
 
     case GICD_ITARGETSR8 ... GICD_ITARGETSRN:
     {
-        /* unsigned long needed for find_next_bit */
-        unsigned long target;
-        int i;
+        uint32_t itargetsr;
+
         if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR, DABT_WORD);
         if ( rank == NULL) goto write_ignore;
-        /* 8-bit vcpu mask for this domain */
-        BUG_ON(v->domain->max_vcpus > 8);
-        target = (1 << v->domain->max_vcpus) - 1;
-        target = target | (target << 8) | (target << 16) | (target << 24);
+        vgic_lock_rank(v, rank, flags);
         if ( dabt.size == DABT_WORD )
-            target &= r;
+            itargetsr = r;
         else
-            target &= (r << (8 * (gicd_reg & 0x3)));
-        /* ignore zero writes */
-        if ( !target )
-            goto write_ignore;
-        /* For word reads ignore writes where any single byte is zero */
-        if ( dabt.size == 2 &&
-            !((target & 0xff) && (target & (0xff << 8)) &&
-             (target & (0xff << 16)) && (target & (0xff << 24))))
-            goto write_ignore;
-        vgic_lock_rank(v, rank, flags);
-        i = 0;
-        while ( (i = find_next_bit(&target, 32, i)) < 32 )
         {
-            unsigned int irq, new_target, old_target;
-            unsigned long old_target_mask;
-            struct vcpu *v_target, *v_old;
-
-            new_target = i % 8;
-            old_target_mask = vgic_byte_read(rank->v2.itargets[REG_RANK_INDEX(8,
-                                             gicd_reg - GICD_ITARGETSR, DABT_WORD)], i/8);
-            old_target = find_first_bit(&old_target_mask, 8);
-
-            if ( new_target != old_target )
-            {
-                irq = (gicd_reg & ~0x3) - GICD_ITARGETSR + (i / 8);
-                v_target = v->domain->vcpu[new_target];
-                v_old = v->domain->vcpu[old_target];
-                vgic_migrate_irq(v_old, v_target, irq);
-            }
-            i += 8 - new_target;
+            itargetsr = rank->v2.itargets[REG_RANK_INDEX(8,
+                                    gicd_reg - GICD_ITARGETSR,
+                                    DABT_WORD)];
+            vgic_byte_write(&itargetsr, r, gicd_reg);
         }
-        if ( dabt.size == DABT_WORD )
-            rank->v2.itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR,
-                                             DABT_WORD)] = target;
-        else
-            vgic_byte_write(&rank->v2.itargets[REG_RANK_INDEX(8,
-                      gicd_reg - GICD_ITARGETSR, DABT_WORD)], r, gicd_reg);
+        vgic_store_itargetsr(v->domain, rank, gicd_reg - GICD_ITARGETSR,
+                             itargetsr);
         vgic_unlock_rank(v, rank, flags);
         return 1;
     }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank
  2015-11-18 16:42 [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers Julien Grall
                   ` (2 preceding siblings ...)
  2015-11-18 16:42 ` [PATCH v6 3/6] xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0 Julien Grall
@ 2015-11-18 16:42 ` Julien Grall
  2015-11-25 11:37   ` Ian Campbell
  2015-11-18 16:42 ` [PATCH v6 5/6] xen/arm: vgic: Introduce helpers to extract/update/clear/set vGIC register Julien Grall
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2015-11-18 16:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, ian.campbell, stefano.stabellini

Xen is currently directly storing the value of GICD_ITARGETSR register
(for GICv2) and GICD_IROUTER (for GICv3) in the rank. This makes the
emulation of the registers access very simple but makes the code to get
the target vCPU for a given vIRQ more complex.

While the target vCPU of an vIRQ is retrieved every time an vIRQ is
injected to the guest, the access to the register occurs less often.

So the data structure should be optimized for the most common case
rather than the inverse.

This patch introduces the usage of an array to store the target vCPU for
every interrupt in the rank. This will make the code to get the target
very quick. The emulation code will now have to generate the GICD_ITARGETSR
and GICD_IROUTER register for read access and split it to store in a
convenient way.

With the new way to store the target vCPU, the structure vgic_irq_rank
is shrunk down from 320 bytes to 92 bytes. This is saving about 228
bytes of memory allocated separately per vCPU.

Note that with these changes, any read to those register will list only
the target vCPU used by Xen. As the spec is not clear whether this is a
valid choice or not, OSes which have a different interpretation of the
spec (i.e OSes which perform read-modify-write operations on these
registers) may not boot anymore on Xen. Although, I think this is fair
trade between memory usage in Xen (1KB less on a domain using 4 vCPUs
with no SPIs) and a strict interpretation of the spec (though all the
cases are not clearly defined).

Furthermore, the implementation of the callback get_target_vcpu is now
exactly the same. Consolidate the implementation in the common vGIC code
and drop the callback.

Finally take the opportunity to fix coding style and replace "irq" by
"virq" to make clear that we are dealing with virtual IRQ in section we
are modifying.

Signed-off-by: Julien Grall <julien.grall@citrix.com>

---
    The real reason is I'd like to drop vgic_byte_* helpers in favor of more
    generic access helpers. Although it would make the code to retrieve the
    priority more complex. So reworking the code to get the target vCPU
    was the best solution.

    Changes in v5:
        - NR_TARGET_PER_REG has been renamed to NR_TARGETS_PER_ITARGETSR
        - NR_BIT_PER_TARGET has been renamed to NR_BITS_PER_TARGET
        - Fix comments
        - s/NR_BYTE_PER_IROUTER/NR_BYTES_PER_IROUTER/
        - Remove the duplicate declaration of virq in vgic_store_itargetsr

    Changes in v4:
        - Move the behavior changes in 2 separate patches:
            * patch #1: Handle correctly byte write in ITARGETSR
            * patch #2: Don't ignore a write in ITARGETSR if one field is 0
        - Update commit message

    Changes in v3:
        - Make clear in the commit message that we rely on a corner case
        of the spec.
        - Typoes

    Changes in v2:
        - Remove get_target_vcpu callback
        - s/irq/virq/ to make clear we are using virtual IRQ
        - Update the commit message
        - Rename vgic_generate_* to vgic_fetch
        - Improve code comment
        - Move the introduction of vgic_rank_init in a separate patch
        - Make use of rank->idx

    Changes in v1:
        - Patch added
---
 xen/arch/arm/vgic-v2.c     |  86 +++++++++++++-------------------
 xen/arch/arm/vgic-v3.c     | 119 ++++++++++++++++++++++++---------------------
 xen/arch/arm/vgic.c        |  45 ++++++++++++-----
 xen/include/asm-arm/vgic.h |  18 +++----
 4 files changed, 137 insertions(+), 131 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index ad2ea0a..62159bf 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -61,8 +61,31 @@ void vgic_v2_setup_hw(paddr_t dbase, paddr_t cbase, paddr_t csize,
 #define NR_BITS_PER_TARGET  (32U / NR_TARGETS_PER_ITARGETSR)
 
 /*
- * Store an ITARGETSR register. This function only deals with ITARGETSR8
- * and onwards.
+ * Fetch an ITARGETSR register based on the offset from ITARGETSR0. Only
+ * one vCPU will be listed for a given vIRQ.
+ *
+ * Note the offset will be aligned to the appropriate boundary.
+ */
+static uint32_t vgic_fetch_itargetsr(struct vgic_irq_rank *rank,
+                                     unsigned int offset)
+{
+    uint32_t reg = 0;
+    unsigned int i;
+
+    ASSERT(spin_is_locked(&rank->lock));
+
+    offset &= INTERRUPT_RANK_MASK;
+    offset &= ~(NR_TARGETS_PER_ITARGETSR - 1);
+
+    for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++ )
+        reg |= (1 << rank->vcpu[offset]) << (i * NR_BITS_PER_TARGET);
+
+    return reg;
+}
+
+/*
+ * Store an ITARGETSR register in a convenient way and migrate the vIRQ
+ * if necessary. This function only deals with ITARGETSR8 and onwards.
  *
  * Note the offset will be aligned to the appropriate boundary.
  */
@@ -70,7 +93,6 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
                                  unsigned int offset, uint32_t itargetsr)
 {
     unsigned int i;
-    unsigned int regidx = REG_RANK_INDEX(8, offset, DABT_WORD);
     unsigned int virq;
 
     ASSERT(spin_is_locked(&rank->lock));
@@ -92,7 +114,7 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
     for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++, virq++ )
     {
         unsigned int new_target, old_target;
-        uint8_t new_mask, old_mask;
+        uint8_t new_mask;
 
         /*
          * Don't need to mask as we rely on new_mask to fit for only one
@@ -101,7 +123,6 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
         BUILD_BUG_ON((sizeof (new_mask) * 8) != NR_BITS_PER_TARGET);
 
         new_mask = itargetsr >> (i * NR_BITS_PER_TARGET);
-        old_mask = vgic_byte_read(rank->v2.itargets[regidx], i);
 
         /*
          * SPIs are using the 1-N model (see 1.4.3 in ARM IHI 0048B).
@@ -111,10 +132,6 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
          * in the target list
          */
         new_target = ffs(new_mask);
-        old_target = ffs(old_mask);
-
-        /* The current target should always be valid */
-        ASSERT(old_target && (old_target <= d->max_vcpus));
 
         /*
          * Ignore the write request for this interrupt if the new target
@@ -133,7 +150,8 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
 
         /* The vCPU ID always starts from 0 */
         new_target--;
-        old_target--;
+
+        old_target = rank->vcpu[offset];
 
         /* Only migrate the vIRQ if the target vCPU has changed */
         if ( new_target != old_target )
@@ -143,9 +161,7 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
                              virq);
         }
 
-        /* Bit corresponding to unimplemented CPU is write-ignore. */
-        new_mask &= (1 << d->max_vcpus) - 1;
-        vgic_byte_write(&rank->v2.itargets[regidx], new_mask, i);
+        rank->vcpu[offset] = new_target;
     }
 }
 
@@ -225,8 +241,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
         rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR, DABT_WORD);
         if ( rank == NULL) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = rank->v2.itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR,
-                                              DABT_WORD)];
+        *r = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR);
         if ( dabt.size == DABT_BYTE )
             *r = vgic_byte_read(*r, gicd_reg);
         vgic_unlock_rank(v, rank, flags);
@@ -446,9 +461,7 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
             itargetsr = r;
         else
         {
-            itargetsr = rank->v2.itargets[REG_RANK_INDEX(8,
-                                    gicd_reg - GICD_ITARGETSR,
-                                    DABT_WORD)];
+            itargetsr = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR);
             vgic_byte_write(&itargetsr, r, gicd_reg);
         }
         vgic_store_itargetsr(v->domain, rank, gicd_reg - GICD_ITARGETSR,
@@ -553,43 +566,16 @@ static const struct mmio_handler_ops vgic_v2_distr_mmio_handler = {
     .write = vgic_v2_distr_mmio_write,
 };
 
-static struct vcpu *vgic_v2_get_target_vcpu(struct vcpu *v, unsigned int irq)
-{
-    unsigned long target;
-    struct vcpu *v_target;
-    struct vgic_irq_rank *rank = vgic_rank_irq(v, irq);
-    ASSERT(spin_is_locked(&rank->lock));
-
-    target = vgic_byte_read(rank->v2.itargets[REG_RANK_INDEX(8,
-                                              irq, DABT_WORD)], irq & 0x3);
-
-    /* 1-N SPI should be delivered as pending to all the vcpus in the
-     * mask, but here we just return the first vcpu for simplicity and
-     * because it would be too slow to do otherwise. */
-    target = find_first_bit(&target, 8);
-    ASSERT(target >= 0 && target < v->domain->max_vcpus);
-    v_target = v->domain->vcpu[target];
-    return v_target;
-}
-
 static int vgic_v2_vcpu_init(struct vcpu *v)
 {
-    int i;
-
-    /* For SGI and PPI the target is always this CPU */
-    for ( i = 0 ; i < 8 ; i++ )
-        v->arch.vgic.private_irqs->v2.itargets[i] =
-              (1<<(v->vcpu_id+0))
-            | (1<<(v->vcpu_id+8))
-            | (1<<(v->vcpu_id+16))
-            | (1<<(v->vcpu_id+24));
+    /* Nothing specific to initialize for this driver */
 
     return 0;
 }
 
 static int vgic_v2_domain_init(struct domain *d)
 {
-    int i, ret;
+    int ret;
     paddr_t cbase, csize;
     paddr_t vbase;
 
@@ -635,11 +621,6 @@ static int vgic_v2_domain_init(struct domain *d)
     if ( ret )
         return ret;
 
-    /* By default deliver to CPU0 */
-    for ( i = 0; i < DOMAIN_NR_RANKS(d); i++ )
-        memset(d->arch.vgic.shared_irqs[i].v2.itargets, 0x1,
-               sizeof(d->arch.vgic.shared_irqs[i].v2.itargets));
-
     register_mmio_handler(d, &vgic_v2_distr_mmio_handler, d->arch.vgic.dbase,
                           PAGE_SIZE, NULL);
 
@@ -649,7 +630,6 @@ static int vgic_v2_domain_init(struct domain *d)
 static const struct vgic_ops vgic_v2_ops = {
     .vcpu_init   = vgic_v2_vcpu_init,
     .domain_init = vgic_v2_domain_init,
-    .get_target_vcpu = vgic_v2_get_target_vcpu,
     .max_vcpus = 8,
 };
 
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index b5249ff..4f976d4 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -89,18 +89,72 @@ static struct vcpu *vgic_v3_irouter_to_vcpu(struct domain *d, uint64_t irouter)
     return d->vcpu[vcpu_id];
 }
 
-static struct vcpu *vgic_v3_get_target_vcpu(struct vcpu *v, unsigned int irq)
-{
-    struct vcpu *v_target;
-    struct vgic_irq_rank *rank = vgic_rank_irq(v, irq);
+#define NR_BYTES_PER_IROUTER 8U
 
+/*
+ * Fetch an IROUTER register based on the offset from IROUTER0. Only one
+ * vCPU will be listed for a given vIRQ.
+ *
+ * Note the offset will be aligned to the appropriate  boundary.
+ */
+static uint64_t vgic_fetch_irouter(struct vgic_irq_rank *rank,
+                                   unsigned int offset)
+{
     ASSERT(spin_is_locked(&rank->lock));
 
-    v_target = vgic_v3_irouter_to_vcpu(v->domain, rank->v3.irouter[irq % 32]);
+    /* There is exactly 1 vIRQ per IROUTER */
+    offset /= NR_BYTES_PER_IROUTER;
 
-    ASSERT(v_target != NULL);
+    /* Get the index in the rank */
+    offset &= INTERRUPT_RANK_MASK;
 
-    return v_target;
+    return vcpuid_to_vaffinity(rank->vcpu[offset]);
+}
+
+/*
+ * Store an IROUTER register in a convenient way and migrate the vIRQ
+ * if necessary. This function only deals with IROUTER32 and onwards.
+ *
+ * Note the offset will be aligned to the appropriate boundary.
+ */
+static void vgic_store_irouter(struct domain *d, struct vgic_irq_rank *rank,
+                               unsigned int offset, uint64_t irouter)
+{
+    struct vcpu *new_vcpu, *old_vcpu;
+    unsigned int virq;
+
+    /* There is 1 vIRQ per IROUTER */
+    virq = offset / NR_BYTES_PER_IROUTER;
+
+    /*
+     * The IROUTER0-31, used for SGIs/PPIs, are reserved and should
+     * never call this function.
+     */
+    ASSERT(virq >= 32);
+
+    /* Get the index in the rank */
+    offset &= virq & INTERRUPT_RANK_MASK;
+
+    new_vcpu = vgic_v3_irouter_to_vcpu(d, irouter);
+    old_vcpu = d->vcpu[rank->vcpu[offset]];
+
+    /*
+     * From the spec (see 8.9.13 in IHI 0069A), any write with an
+     * invalid vCPU will lead to the interrupt being ignored.
+     *
+     * But the current code to inject an IRQ is not able to cope with
+     * invalid vCPU. So for now, just ignore the write.
+     *
+     * TODO: Respect the spec
+     */
+    if ( !new_vcpu )
+        return;
+
+    /* Only migrate the IRQ if the target vCPU has changed */
+    if ( new_vcpu != old_vcpu )
+        vgic_migrate_irq(old_vcpu, new_vcpu, virq);
+
+    rank->vcpu[offset] = new_vcpu->vcpu_id;
 }
 
 static int __vgic_v3_rdistr_rd_mmio_read(struct vcpu *v, mmio_info_t *info,
@@ -726,8 +780,7 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
                                 DABT_DOUBLE_WORD);
         if ( rank == NULL ) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = rank->v3.irouter[REG_RANK_INDEX(64,
-                              (gicd_reg - GICD_IROUTER), DABT_DOUBLE_WORD)];
+        *r = vgic_fetch_irouter(rank, gicd_reg - GICD_IROUTER);
         vgic_unlock_rank(v, rank, flags);
         return 1;
     case GICD_NSACR ... GICD_NSACRN:
@@ -812,8 +865,6 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
     struct hsr_dabt dabt = info->dabt;
     struct vgic_irq_rank *rank;
     unsigned long flags;
-    uint64_t new_irouter, old_irouter;
-    struct vcpu *old_vcpu, *new_vcpu;
     int gicd_reg = (int)(info->gpa - v->domain->arch.vgic.dbase);
 
     perfc_incr(vgicd_writes);
@@ -881,32 +932,8 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER,
                                 DABT_DOUBLE_WORD);
         if ( rank == NULL ) goto write_ignore;
-        new_irouter = r;
         vgic_lock_rank(v, rank, flags);
-
-        old_irouter = rank->v3.irouter[REG_RANK_INDEX(64,
-                                       (gicd_reg - GICD_IROUTER),
-                                       DABT_DOUBLE_WORD)];
-        old_vcpu = vgic_v3_irouter_to_vcpu(v->domain, old_irouter);
-        new_vcpu = vgic_v3_irouter_to_vcpu(v->domain, new_irouter);
-
-        if ( !new_vcpu )
-        {
-            printk(XENLOG_G_DEBUG
-                   "%pv: vGICD: wrong irouter at offset %#08x val %#"PRIregister,
-                   v, gicd_reg, r);
-            vgic_unlock_rank(v, rank, flags);
-            /*
-             * TODO: Don't inject a fault to the guest when the MPIDR is
-             * not valid. From the spec, the interrupt should be
-             * ignored.
-             */
-            return 0;
-        }
-        rank->v3.irouter[REG_RANK_INDEX(64, (gicd_reg - GICD_IROUTER),
-                         DABT_DOUBLE_WORD)] = new_irouter;
-        if ( old_vcpu != new_vcpu )
-            vgic_migrate_irq(old_vcpu, new_vcpu, (gicd_reg - GICD_IROUTER)/8);
+        vgic_store_irouter(v->domain, rank, gicd_reg - GICD_IROUTER, r);
         vgic_unlock_rank(v, rank, flags);
         return 1;
     case GICD_NSACR ... GICD_NSACRN:
@@ -1036,7 +1063,6 @@ static const struct mmio_handler_ops vgic_distr_mmio_handler = {
 static int vgic_v3_vcpu_init(struct vcpu *v)
 {
     int i;
-    uint64_t affinity;
     paddr_t rdist_base;
     struct vgic_rdist_region *region;
     unsigned int last_cpu;
@@ -1045,15 +1071,6 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
     struct domain *d = v->domain;
     uint32_t rdist_stride = d->arch.vgic.rdist_stride;
 
-    /* For SGI and PPI the target is always this CPU */
-    affinity = (MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 3) << 32 |
-                MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 2) << 16 |
-                MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 1) << 8  |
-                MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 0));
-
-    for ( i = 0 ; i < 32 ; i++ )
-        v->arch.vgic.private_irqs->v3.irouter[i] = affinity;
-
     /*
      * Find the region where the re-distributor lives. For this purpose,
      * we look one region ahead as we have only the first CPU in hand.
@@ -1098,7 +1115,7 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
 
 static int vgic_v3_domain_init(struct domain *d)
 {
-    int i, idx;
+    int i;
 
     /*
      * Domain 0 gets the hardware address.
@@ -1150,13 +1167,6 @@ static int vgic_v3_domain_init(struct domain *d)
         d->arch.vgic.rdist_regions[0].first_cpu = 0;
     }
 
-    /* By default deliver to CPU0 */
-    for ( i = 0; i < DOMAIN_NR_RANKS(d); i++ )
-    {
-        for ( idx = 0; idx < 32; idx++ )
-            d->arch.vgic.shared_irqs[i].v3.irouter[idx] = 0;
-    }
-
     /* Register mmio handle for the Distributor */
     register_mmio_handler(d, &vgic_distr_mmio_handler, d->arch.vgic.dbase,
                           SZ_64K, NULL);
@@ -1182,7 +1192,6 @@ static int vgic_v3_domain_init(struct domain *d)
 static const struct vgic_ops v3_ops = {
     .vcpu_init   = vgic_v3_vcpu_init,
     .domain_init = vgic_v3_domain_init,
-    .get_target_vcpu  = vgic_v3_get_target_vcpu,
     .emulate_sysreg  = vgic_v3_emulate_sysreg,
     /*
      * We use both AFF1 and AFF0 in (v)MPIDR. Thus, the max number of CPU
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 7bb4570..531ce5d 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -68,11 +68,24 @@ static void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq)
     p->irq = virq;
 }
 
-static void vgic_rank_init(struct vgic_irq_rank *rank, uint8_t index)
+static void vgic_rank_init(struct vgic_irq_rank *rank, uint8_t index,
+                           unsigned int vcpu)
 {
+    unsigned int i;
+
+    /*
+     * Make sure that the type chosen to store the target is able to
+     * store an VCPU ID between 0 and the maximum of virtual CPUs
+     * supported.
+     */
+    BUILD_BUG_ON((1 << (sizeof(rank->vcpu[0]) * 8)) < MAX_VIRT_CPUS);
+
     spin_lock_init(&rank->lock);
 
     rank->index = index;
+
+    for ( i = 0; i < NR_INTERRUPT_PER_RANK; i++ )
+        rank->vcpu[i] = vcpu;
 }
 
 int domain_vgic_init(struct domain *d, unsigned int nr_spis)
@@ -121,8 +134,9 @@ int domain_vgic_init(struct domain *d, unsigned int nr_spis)
     for (i=0; i<d->arch.vgic.nr_spis; i++)
         vgic_init_pending_irq(&d->arch.vgic.pending_irqs[i], i + 32);
 
+    /* SPIs are routed to VCPU0 by default */
     for ( i = 0; i < DOMAIN_NR_RANKS(d); i++ )
-        vgic_rank_init(&d->arch.vgic.shared_irqs[i], i + 1);
+        vgic_rank_init(&d->arch.vgic.shared_irqs[i], i + 1, 0);
 
     ret = d->arch.vgic.handler->domain_init(d);
     if ( ret )
@@ -176,7 +190,8 @@ int vcpu_vgic_init(struct vcpu *v)
     if ( v->arch.vgic.private_irqs == NULL )
       return -ENOMEM;
 
-    vgic_rank_init(v->arch.vgic.private_irqs, 0);
+    /* SGIs/PPIs are always routed to this VCPU */
+    vgic_rank_init(v->arch.vgic.private_irqs, 0, v->vcpu_id);
 
     v->domain->arch.vgic.handler->vcpu_init(v);
 
@@ -197,17 +212,27 @@ int vcpu_vgic_free(struct vcpu *v)
     return 0;
 }
 
+/* The function should be called by rank lock taken. */
+static struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
+{
+    struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
+
+    ASSERT(spin_is_locked(&rank->lock));
+
+    return v->domain->vcpu[rank->vcpu[virq & INTERRUPT_RANK_MASK]];
+}
+
 /* takes the rank lock */
-struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq)
+struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
 {
-    struct domain *d = v->domain;
     struct vcpu *v_target;
-    struct vgic_irq_rank *rank = vgic_rank_irq(v, irq);
+    struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
     unsigned long flags;
 
     vgic_lock_rank(v, rank, flags);
-    v_target = d->arch.vgic.handler->get_target_vcpu(v, irq);
+    v_target = __vgic_get_target_vcpu(v, virq);
     vgic_unlock_rank(v, rank, flags);
+
     return v_target;
 }
 
@@ -286,7 +311,6 @@ void arch_move_irqs(struct vcpu *v)
 
 void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
 {
-    struct domain *d = v->domain;
     const unsigned long mask = r;
     struct pending_irq *p;
     unsigned int irq;
@@ -296,7 +320,7 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        v_target = d->arch.vgic.handler->get_target_vcpu(v, irq);
+        v_target = __vgic_get_target_vcpu(v, irq);
         p = irq_to_pending(v_target, irq);
         clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         gic_remove_from_queues(v_target, irq);
@@ -312,7 +336,6 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
 
 void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
 {
-    struct domain *d = v->domain;
     const unsigned long mask = r;
     struct pending_irq *p;
     unsigned int irq;
@@ -322,7 +345,7 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        v_target = d->arch.vgic.handler->get_target_vcpu(v, irq);
+        v_target = __vgic_get_target_vcpu(v, irq);
         p = irq_to_pending(v_target, irq);
         set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index cb51a9e..a1ef429 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -104,14 +104,11 @@ struct vgic_irq_rank {
         uint32_t ipriorityr[8];
     };
 
-    union {
-        struct {
-            uint32_t itargets[8];
-        }v2;
-        struct {
-            uint64_t irouter[32];
-        }v3;
-    };
+    /*
+     * It's more convenient to store a target VCPU per vIRQ
+     * than the register ITARGETSR/IROUTER itself
+     */
+    uint8_t vcpu[32];
 };
 
 struct sgi_target {
@@ -130,9 +127,6 @@ struct vgic_ops {
     int (*vcpu_init)(struct vcpu *v);
     /* Domain specific initialization of vGIC */
     int (*domain_init)(struct domain *d);
-    /* Get the target vcpu for a given virq. The rank lock is already taken
-     * when calling this. */
-    struct vcpu *(*get_target_vcpu)(struct vcpu *v, unsigned int irq);
     /* vGIC sysreg emulation */
     int (*emulate_sysreg)(struct cpu_user_regs *regs, union hsr hsr);
     /* Maximum number of vCPU supported */
@@ -205,7 +199,7 @@ enum gic_sgi_mode;
 extern int domain_vgic_init(struct domain *d, unsigned int nr_spis);
 extern void domain_vgic_free(struct domain *d);
 extern int vcpu_vgic_init(struct vcpu *v);
-extern struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq);
+extern struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq);
 extern void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int virq);
 extern void vgic_vcpu_inject_spi(struct domain *d, unsigned int virq);
 extern void vgic_clear_pending_irqs(struct vcpu *v);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 5/6] xen/arm: vgic: Introduce helpers to extract/update/clear/set vGIC register ...
  2015-11-18 16:42 [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers Julien Grall
                   ` (3 preceding siblings ...)
  2015-11-18 16:42 ` [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank Julien Grall
@ 2015-11-18 16:42 ` Julien Grall
  2015-11-18 16:42 ` [PATCH v6 6/6] xen/arm: vgic-v3: Support 32-bit access for 64-bit registers Julien Grall
  2015-11-25 12:29 ` [PATCH v6 0/6] xen/arm: vgic: " Ian Campbell
  6 siblings, 0 replies; 13+ messages in thread
From: Julien Grall @ 2015-11-18 16:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, ian.campbell, stefano.stabellini

and use them in the vGIC emulation.

The GIC registers may support different access sizes. Rather than open
coding the access for every registers, provide a set of helpers to access
them.

The caller will have to call vgic_regN_* where N is the size of the
emulated registers.

The new helpers supports any access size and expect the caller to
validate the access size supported by the emulated registers.

Finally, take the opportunity to fix the coding style in section we are
modifying.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

---
    Ian, I update this patch to convert the emulation of GICD_CTLR in
    vGICv3 to the new API. Although I've kept your ack because I think
    it's not a major change. Let me know it's not fine.

    Changes in v6:
        - Correct match { in the emulation. Somehow GCC didn't complain
        about it

    Changes in v4:
        - Convert GICD_CTLR in vGICv3 to the new API.

    Changes in v3:
        - Add Ian's acked-by

    Changes in v2:
        - Rename read/write helpers to extract/update
        - Add a 's' to clearbit/setbit because we update multiple bits

    Changes in v1:
        - Make use of the new helpers in the vGICv2 code
        - Drop vgic_byte_*
        - Use unsigned long rather than uint64_t to optimize the code
        on ARM. (~about 40% less instruction per helpers).
---
 xen/arch/arm/vgic-v2.c     |  89 ++++++++++++++++++++------------
 xen/arch/arm/vgic-v3.c     | 125 +++++++++++++++++++++++++++++++--------------
 xen/include/asm-arm/vgic.h | 111 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 243 insertions(+), 82 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 62159bf..c94f0f3 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -180,24 +180,31 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
     case GICD_CTLR:
         if ( dabt.size != DABT_WORD ) goto bad_width;
         vgic_lock(v);
-        *r = v->domain->arch.vgic.ctlr;
+        *r = vgic_reg32_extract(v->domain->arch.vgic.ctlr, info);
         vgic_unlock(v);
         return 1;
     case GICD_TYPER:
+    {
+        uint32_t typer;
+
         if ( dabt.size != DABT_WORD ) goto bad_width;
         /* No secure world support for guests. */
         vgic_lock(v);
-        *r = ( ((v->domain->max_vcpus - 1) << GICD_TYPE_CPUS_SHIFT) )
+        typer = ((v->domain->max_vcpus - 1) << GICD_TYPE_CPUS_SHIFT)
             | DIV_ROUND_UP(v->domain->arch.vgic.nr_spis, 32);
         vgic_unlock(v);
+
+        *r = vgic_reg32_extract(typer, info);
+
         return 1;
+    }
     case GICD_IIDR:
         if ( dabt.size != DABT_WORD ) goto bad_width;
         /*
          * XXX Do we need a JEP106 manufacturer ID?
          * Just use the physical h/w value for now
          */
-        *r = 0x0000043b;
+        *r = vgic_reg32_extract(0x0000043b, info);
         return 1;
 
     /* Implementation defined -- read as zero */
@@ -213,7 +220,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
         rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ISENABLER, DABT_WORD);
         if ( rank == NULL) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = rank->ienable;
+        *r = vgic_reg32_extract(rank->ienable, info);
         vgic_unlock_rank(v, rank, flags);
         return 1;
 
@@ -222,7 +229,7 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
         rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ICENABLER, DABT_WORD);
         if ( rank == NULL) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = rank->ienable;
+        *r = vgic_reg32_extract(rank->ienable, info);
         vgic_unlock_rank(v, rank, flags);
         return 1;
 
@@ -237,37 +244,53 @@ static int vgic_v2_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
         goto read_as_zero;
 
     case GICD_ITARGETSR ... GICD_ITARGETSRN:
+    {
+        uint32_t itargetsr;
+
         if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR, DABT_WORD);
         if ( rank == NULL) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR);
-        if ( dabt.size == DABT_BYTE )
-            *r = vgic_byte_read(*r, gicd_reg);
+        itargetsr = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR);
         vgic_unlock_rank(v, rank, flags);
+        *r = vgic_reg32_extract(itargetsr, info);
+
         return 1;
+    }
 
     case GICD_IPRIORITYR ... GICD_IPRIORITYRN:
+    {
+        uint32_t ipriorityr;
+
         if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 8, gicd_reg - GICD_IPRIORITYR, DABT_WORD);
-        if ( rank == NULL) goto read_as_zero;
+        if ( rank == NULL ) goto read_as_zero;
 
         vgic_lock_rank(v, rank, flags);
-        *r = rank->ipriorityr[REG_RANK_INDEX(8, gicd_reg - GICD_IPRIORITYR,
-                                             DABT_WORD)];
-        if ( dabt.size == DABT_BYTE )
-            *r = vgic_byte_read(*r, gicd_reg);
+        ipriorityr = rank->ipriorityr[REG_RANK_INDEX(8,
+                                                     gicd_reg - GICD_IPRIORITYR,
+                                                     DABT_WORD)];
         vgic_unlock_rank(v, rank, flags);
+        *r = vgic_reg32_extract(ipriorityr, info);
+
         return 1;
+    }
 
     case GICD_ICFGR ... GICD_ICFGRN:
+    {
+        uint32_t icfgr;
+
         if ( dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 2, gicd_reg - GICD_ICFGR, DABT_WORD);
         if ( rank == NULL) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = rank->icfg[REG_RANK_INDEX(2, gicd_reg - GICD_ICFGR, DABT_WORD)];
+        icfgr = rank->icfg[REG_RANK_INDEX(2, gicd_reg - GICD_ICFGR, DABT_WORD)];
         vgic_unlock_rank(v, rank, flags);
+
+        *r = vgic_reg32_extract(icfgr, info);
+
         return 1;
+    }
 
     case GICD_NSACR ... GICD_NSACRN:
         /* We do not implement security extensions for guests, read zero */
@@ -377,7 +400,8 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         if ( dabt.size != DABT_WORD ) goto bad_width;
         /* Ignore all but the enable bit */
         vgic_lock(v);
-        v->domain->arch.vgic.ctlr = r & GICD_CTL_ENABLE;
+        vgic_reg32_update(&v->domain->arch.vgic.ctlr, r, info);
+        v->domain->arch.vgic.ctlr &= GICD_CTL_ENABLE;
         vgic_unlock(v);
 
         return 1;
@@ -401,8 +425,8 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         if ( rank == NULL) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
         tr = rank->ienable;
-        rank->ienable |= r;
-        vgic_enable_irqs(v, r & (~tr), rank->index);
+        vgic_reg32_setbits(&rank->ienable, r, info);
+        vgic_enable_irqs(v, (rank->ienable) & (~tr), rank->index);
         vgic_unlock_rank(v, rank, flags);
         return 1;
 
@@ -412,8 +436,8 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         if ( rank == NULL) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
         tr = rank->ienable;
-        rank->ienable &= ~r;
-        vgic_disable_irqs(v, r & tr, rank->index);
+        vgic_reg32_clearbits(&rank->ienable, r, info);
+        vgic_disable_irqs(v, (~rank->ienable) & tr, rank->index);
         vgic_unlock_rank(v, rank, flags);
         return 1;
 
@@ -457,13 +481,8 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR, DABT_WORD);
         if ( rank == NULL) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
-        if ( dabt.size == DABT_WORD )
-            itargetsr = r;
-        else
-        {
-            itargetsr = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR);
-            vgic_byte_write(&itargetsr, r, gicd_reg);
-        }
+        itargetsr = vgic_fetch_itargetsr(rank, gicd_reg - GICD_ITARGETSR);
+        vgic_reg32_update(&itargetsr, r, info);
         vgic_store_itargetsr(v->domain, rank, gicd_reg - GICD_ITARGETSR,
                              itargetsr);
         vgic_unlock_rank(v, rank, flags);
@@ -471,18 +490,20 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
     }
 
     case GICD_IPRIORITYR ... GICD_IPRIORITYRN:
+    {
+        uint32_t *ipriorityr;
+
         if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 8, gicd_reg - GICD_IPRIORITYR, DABT_WORD);
         if ( rank == NULL) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
-        if ( dabt.size == DABT_WORD )
-            rank->ipriorityr[REG_RANK_INDEX(8, gicd_reg - GICD_IPRIORITYR,
-                                            DABT_WORD)] = r;
-        else
-            vgic_byte_write(&rank->ipriorityr[REG_RANK_INDEX(8,
-                        gicd_reg - GICD_IPRIORITYR, DABT_WORD)], r, gicd_reg);
+        ipriorityr = &rank->ipriorityr[REG_RANK_INDEX(8,
+                                                      gicd_reg - GICD_IPRIORITYR,
+                                                      DABT_WORD)];
+        vgic_reg32_update(ipriorityr, r, info);
         vgic_unlock_rank(v, rank, flags);
         return 1;
+    }
 
     case GICD_ICFGR: /* SGIs */
         goto write_ignore_32;
@@ -494,7 +515,9 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         rank = vgic_rank_offset(v, 2, gicd_reg - GICD_ICFGR, DABT_WORD);
         if ( rank == NULL) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
-        rank->icfg[REG_RANK_INDEX(2, gicd_reg - GICD_ICFGR, DABT_WORD)] = r;
+        vgic_reg32_update(&rank->icfg[REG_RANK_INDEX(2, gicd_reg - GICD_ICFGR,
+                                                     DABT_WORD)],
+                          r, info);
         vgic_unlock_rank(v, rank, flags);
         return 1;
 
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 4f976d4..634211b 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -162,7 +162,6 @@ static int __vgic_v3_rdistr_rd_mmio_read(struct vcpu *v, mmio_info_t *info,
                                          register_t *r)
 {
     struct hsr_dabt dabt = info->dabt;
-    uint64_t aff;
 
     switch ( gicr_reg )
     {
@@ -171,21 +170,27 @@ static int __vgic_v3_rdistr_rd_mmio_read(struct vcpu *v, mmio_info_t *info,
         goto read_as_zero_32;
     case GICR_IIDR:
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICR_IIDR_VAL;
+        *r = vgic_reg32_extract(GICV3_GICR_IIDR_VAL, info);
         return 1;
     case GICR_TYPER:
+    {
+        uint64_t typer, aff;
+
         if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
         /* TBD: Update processor id in [23:8] when ITS support is added */
         aff = (MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 3) << 56 |
                MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 2) << 48 |
                MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 1) << 40 |
                MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 0) << 32);
-        *r = aff;
+        typer = aff;
 
         if ( v->arch.vgic.flags & VGIC_V3_RDIST_LAST )
-            *r |= GICR_TYPER_LAST;
+            typer |= GICR_TYPER_LAST;
+
+        *r = vgic_reg64_extract(typer, info);
 
         return 1;
+    }
     case GICR_STATUSR:
         /* Not implemented */
         goto read_as_zero_32;
@@ -214,7 +219,7 @@ static int __vgic_v3_rdistr_rd_mmio_read(struct vcpu *v, mmio_info_t *info,
     case GICR_SYNCR:
         if ( dabt.size != DABT_WORD ) goto bad_width;
         /* RO . But when read it always returns busy bito bit[0] */
-        *r = GICR_SYNCR_NOT_BUSY;
+        *r = vgic_reg32_extract(GICR_SYNCR_NOT_BUSY, info);
         return 1;
     case GICR_MOVLPIR:
         /* WO Read as zero */
@@ -224,22 +229,22 @@ static int __vgic_v3_rdistr_rd_mmio_read(struct vcpu *v, mmio_info_t *info,
         goto read_as_zero_64;
     case GICR_PIDR0:
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICR_PIDR0;
+        *r = vgic_reg32_extract(GICV3_GICR_PIDR0, info);
          return 1;
     case GICR_PIDR1:
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICR_PIDR1;
+        *r = vgic_reg32_extract(GICV3_GICR_PIDR1, info);
          return 1;
     case GICR_PIDR2:
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICR_PIDR2;
+        *r = vgic_reg32_extract(GICV3_GICR_PIDR2, info);
          return 1;
     case GICR_PIDR3:
         /* Manufacture/customer defined */
         goto read_as_zero_32;
     case GICR_PIDR4:
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICR_PIDR4;
+        *r = vgic_reg32_extract(GICV3_GICR_PIDR4, info);
          return 1;
     case GICR_PIDR5 ... GICR_PIDR7:
         /* Reserved0 */
@@ -360,7 +365,7 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
         rank = vgic_rank_offset(v, 1, reg - GICD_ISENABLER, DABT_WORD);
         if ( rank == NULL ) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = rank->ienable;
+        *r = vgic_reg32_extract(rank->ienable, info);
         vgic_unlock_rank(v, rank, flags);
         return 1;
     case GICD_ICENABLER ... GICD_ICENABLERN:
@@ -368,7 +373,7 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
         rank = vgic_rank_offset(v, 1, reg - GICD_ICENABLER, DABT_WORD);
         if ( rank == NULL ) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = rank->ienable;
+        *r = vgic_reg32_extract(rank->ienable, info);
         vgic_unlock_rank(v, rank, flags);
         return 1;
     /* Read the pending status of an IRQ via GICD/GICR is not supported */
@@ -382,25 +387,38 @@ static int __vgic_v3_distr_common_mmio_read(const char *name, struct vcpu *v,
         goto read_as_zero;
 
     case GICD_IPRIORITYR ... GICD_IPRIORITYRN:
+    {
+        uint32_t ipriorityr;
+
         if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 8, reg - GICD_IPRIORITYR, DABT_WORD);
         if ( rank == NULL ) goto read_as_zero;
 
         vgic_lock_rank(v, rank, flags);
-        *r = rank->ipriorityr[REG_RANK_INDEX(8, reg - GICD_IPRIORITYR,
-                                             DABT_WORD)];
-        if ( dabt.size == DABT_BYTE )
-            *r = vgic_byte_read(*r, reg);
+        ipriorityr = rank->ipriorityr[REG_RANK_INDEX(8, reg - GICD_IPRIORITYR,
+                                                     DABT_WORD)];
         vgic_unlock_rank(v, rank, flags);
+
+        *r = vgic_reg32_extract(ipriorityr, info);
+
         return 1;
+    }
+
     case GICD_ICFGR ... GICD_ICFGRN:
+    {
+        uint32_t icfgr;
+
         if ( dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 2, reg - GICD_ICFGR, DABT_WORD);
         if ( rank == NULL ) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = rank->icfg[REG_RANK_INDEX(2, reg - GICD_ICFGR, DABT_WORD)];
+        icfgr = rank->icfg[REG_RANK_INDEX(2, reg - GICD_ICFGR, DABT_WORD)];
         vgic_unlock_rank(v, rank, flags);
+
+        *r = vgic_reg32_extract(icfgr, info);
+
         return 1;
+    }
     default:
         printk(XENLOG_G_ERR
                "%pv: %s: unhandled read r%d offset %#08x\n",
@@ -439,8 +457,8 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
         if ( rank == NULL ) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
         tr = rank->ienable;
-        rank->ienable |= r;
-        vgic_enable_irqs(v, r & (~tr), rank->index);
+        vgic_reg32_setbits(&rank->ienable, r, info);
+        vgic_enable_irqs(v, (rank->ienable) & (~tr), rank->index);
         vgic_unlock_rank(v, rank, flags);
         return 1;
     case GICD_ICENABLER ... GICD_ICENABLERN:
@@ -449,8 +467,8 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
         if ( rank == NULL ) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
         tr = rank->ienable;
-        rank->ienable &= ~r;
-        vgic_disable_irqs(v, r & tr, rank->index);
+        vgic_reg32_clearbits(&rank->ienable, r, info);
+        vgic_disable_irqs(v, (~rank->ienable) & tr, rank->index);
         vgic_unlock_rank(v, rank, flags);
         return 1;
     case GICD_ISPENDR ... GICD_ISPENDRN:
@@ -482,18 +500,19 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
         return 0;
 
     case GICD_IPRIORITYR ... GICD_IPRIORITYRN:
+    {
+        uint32_t *ipriorityr;
+
         if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 8, reg - GICD_IPRIORITYR, DABT_WORD);
         if ( rank == NULL ) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
-        if ( dabt.size == DABT_WORD )
-            rank->ipriorityr[REG_RANK_INDEX(8, reg - GICD_IPRIORITYR,
-                                            DABT_WORD)] = r;
-        else
-            vgic_byte_write(&rank->ipriorityr[REG_RANK_INDEX(8,
-                       reg - GICD_IPRIORITYR, DABT_WORD)], r, reg);
+        ipriorityr = &rank->ipriorityr[REG_RANK_INDEX(8, reg - GICD_IPRIORITYR,
+                                                      DABT_WORD)];
+        vgic_reg32_update(ipriorityr, r, info);
         vgic_unlock_rank(v, rank, flags);
         return 1;
+    }
     case GICD_ICFGR: /* Restricted to configure SGIs */
         goto write_ignore_32;
     case GICD_ICFGR + 4 ... GICD_ICFGRN: /* PPI + SPIs */
@@ -503,9 +522,12 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
         rank = vgic_rank_offset(v, 2, reg - GICD_ICFGR, DABT_WORD);
         if ( rank == NULL ) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
-        rank->icfg[REG_RANK_INDEX(2, reg - GICD_ICFGR, DABT_WORD)] = r;
+        vgic_reg32_update(&rank->icfg[REG_RANK_INDEX(2, reg - GICD_ICFGR,
+                                                     DABT_WORD)],
+                          r, info);
         vgic_unlock_rank(v, rank, flags);
         return 1;
+
     default:
         printk(XENLOG_G_ERR
                "%pv: %s: unhandled write r%d=%"PRIregister" offset %#08x\n",
@@ -719,7 +741,7 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
     case GICD_CTLR:
         if ( dabt.size != DABT_WORD ) goto bad_width;
         vgic_lock(v);
-        *r = v->domain->arch.vgic.ctlr;
+        *r = vgic_reg32_extract(v->domain->arch.vgic.ctlr, info);
         vgic_unlock(v);
         return 1;
     case GICD_TYPER:
@@ -734,13 +756,16 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
          * bit is zero. The maximum is 8.
          */
         unsigned int ncpus = min_t(unsigned int, v->domain->max_vcpus, 8);
+        uint32_t typer;
 
         if ( dabt.size != DABT_WORD ) goto bad_width;
         /* No secure world support for guests. */
-        *r = ((ncpus - 1) << GICD_TYPE_CPUS_SHIFT |
-              DIV_ROUND_UP(v->domain->arch.vgic.nr_spis, 32));
+        typer = ((ncpus - 1) << GICD_TYPE_CPUS_SHIFT |
+                 DIV_ROUND_UP(v->domain->arch.vgic.nr_spis, 32));
+
+        typer |= (irq_bits - 1) << GICD_TYPE_ID_BITS_SHIFT;
 
-        *r |= (irq_bits - 1) << GICD_TYPE_ID_BITS_SHIFT;
+        *r = vgic_reg32_extract(typer, info);
 
         return 1;
     }
@@ -752,7 +777,7 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
         goto read_as_zero_32;
     case GICD_IIDR:
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICD_IIDR_VAL;
+        *r = vgic_reg32_extract(GICV3_GICD_IIDR_VAL, info);
         return 1;
     case 0x020 ... 0x03c:
     case 0xc000 ... 0xffcc:
@@ -775,14 +800,22 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
         /* SGI/PPI is RES0 */
         goto read_as_zero_64;
     case GICD_IROUTER32 ... GICD_IROUTERN:
+    {
+        uint64_t irouter;
+
         if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER,
                                 DABT_DOUBLE_WORD);
         if ( rank == NULL ) goto read_as_zero;
         vgic_lock_rank(v, rank, flags);
-        *r = vgic_fetch_irouter(rank, gicd_reg - GICD_IROUTER);
+        irouter = vgic_fetch_irouter(rank, gicd_reg - GICD_IROUTER);
         vgic_unlock_rank(v, rank, flags);
+
+        *r = vgic_reg64_extract(irouter, info);
+
         return 1;
+    }
+
     case GICD_NSACR ... GICD_NSACRN:
         /* We do not implement security extensions for guests, read zero */
         goto read_as_zero_32;
@@ -798,17 +831,17 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
     case GICD_PIDR0:
         /* GICv3 identification value */
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICD_PIDR0;
+        *r = vgic_reg32_extract(GICV3_GICD_PIDR0, info);
         return 1;
     case GICD_PIDR1:
         /* GICv3 identification value */
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICD_PIDR1;
+        *r = vgic_reg32_extract(GICV3_GICD_PIDR1, info);
         return 1;
     case GICD_PIDR2:
         /* GICv3 identification value */
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICD_PIDR2;
+        *r = vgic_reg32_extract(GICV3_GICD_PIDR2, info);
         return 1;
     case GICD_PIDR3:
         /* GICv3 identification value. Manufacturer/Customer defined */
@@ -816,7 +849,7 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
     case GICD_PIDR4:
         /* GICv3 identification value */
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        *r = GICV3_GICD_PIDR4;
+        *r = vgic_reg32_extract(GICV3_GICD_PIDR4, info);
         return 1;
     case GICD_PIDR5 ... GICD_PIDR7:
         /* Reserved0 */
@@ -872,17 +905,24 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
     switch ( gicd_reg )
     {
     case GICD_CTLR:
+    {
+        uint32_t ctlr = 0;
+
         if ( dabt.size != DABT_WORD ) goto bad_width;
 
         vgic_lock(v);
+
+        vgic_reg32_update(&ctlr, r, info);
+
         /* Only EnableGrp1A can be changed */
-        if ( r & GICD_CTLR_ENABLE_G1A )
+        if ( ctlr & GICD_CTLR_ENABLE_G1A )
             v->domain->arch.vgic.ctlr |= GICD_CTLR_ENABLE_G1A;
         else
             v->domain->arch.vgic.ctlr &= ~GICD_CTLR_ENABLE_G1A;
         vgic_unlock(v);
 
         return 1;
+    }
     case GICD_TYPER:
         /* RO -- write ignored */
         goto write_ignore_32;
@@ -928,14 +968,21 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
         /* SGI/PPI is RES0 */
         goto write_ignore_64;
     case GICD_IROUTER32 ... GICD_IROUTERN:
+    {
+        uint64_t irouter;
+
         if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER,
                                 DABT_DOUBLE_WORD);
         if ( rank == NULL ) goto write_ignore;
         vgic_lock_rank(v, rank, flags);
-        vgic_store_irouter(v->domain, rank, gicd_reg - GICD_IROUTER, r);
+        irouter = vgic_fetch_irouter(rank, gicd_reg - GICD_IROUTER);
+        vgic_reg64_update(&irouter, r, info);
+        vgic_store_irouter(v->domain, rank, gicd_reg - GICD_IROUTER, irouter);
         vgic_unlock_rank(v, rank, flags);
         return 1;
+    }
+
     case GICD_NSACR ... GICD_NSACRN:
         /* We do not implement security extensions for guests, write ignore */
         goto write_ignore_32;
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index a1ef429..005f822 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -19,6 +19,7 @@
 #define __ASM_ARM_VGIC_H__
 
 #include <xen/bitops.h>
+#include <asm/mmio.h>
 
 struct pending_irq
 {
@@ -166,26 +167,116 @@ static inline int REG_RANK_NR(int b, uint32_t n)
     }
 }
 
-static inline uint32_t vgic_byte_read(uint32_t val, int offset)
+#define VGIC_REG_MASK(size) ((~0UL) >> (BITS_PER_LONG - ((1 << (size)) * 8)))
+
+/*
+ * The check on the size supported by the register has to be done by
+ * the caller of vgic_regN_*.
+ *
+ * vgic_reg_* should never be called directly. Instead use the vgic_regN_*
+ * according to size of the emulated register
+ *
+ * Note that the alignment fault will always be taken in the guest
+ * (see B3.12.7 DDI0406.b).
+ */
+static inline register_t vgic_reg_extract(unsigned long reg,
+                                          unsigned int offset,
+                                          enum dabt_size size)
+{
+    reg >>= 8 * offset;
+    reg &= VGIC_REG_MASK(size);
+
+    return reg;
+}
+
+static inline void vgic_reg_update(unsigned long *reg, register_t val,
+                                   unsigned int offset,
+                                   enum dabt_size size)
 {
-    int byte = offset & 0x3;
+    unsigned long mask = VGIC_REG_MASK(size);
+    int shift = offset * 8;
 
-    val = val >> (8*byte);
-    val &= 0x000000ff;
+    *reg &= ~(mask << shift);
+    *reg |= ((unsigned long)val & mask) << shift;
+}
+
+static inline void vgic_reg_setbits(unsigned long *reg, register_t bits,
+                                    unsigned int offset,
+                                    enum dabt_size size)
+{
+    unsigned long mask = VGIC_REG_MASK(size);
+    int shift = offset * 8;
 
-    return val;
+    *reg |= ((unsigned long)bits & mask) << shift;
 }
 
-static inline void vgic_byte_write(uint32_t *reg, uint32_t var, int offset)
+static inline void vgic_reg_clearbits(unsigned long *reg, register_t bits,
+                                      unsigned int offset,
+                                      enum dabt_size size)
 {
-    int byte = offset & 0x3;
+    unsigned long mask = VGIC_REG_MASK(size);
+    int shift = offset * 8;
 
-    var &= 0xff;
+    *reg &= ~(((unsigned long)bits & mask) << shift);
+}
 
-    *reg &= ~(0xff << (8*byte));
-    *reg |= (var << (8*byte));
+/* N-bit register helpers */
+#define VGIC_REG_HELPERS(sz, offmask)                                   \
+static inline register_t vgic_reg##sz##_extract(uint##sz##_t reg,       \
+                                                const mmio_info_t *info)\
+{                                                                       \
+    return vgic_reg_extract(reg, info->gpa & offmask,                   \
+                            info->dabt.size);                           \
+}                                                                       \
+                                                                        \
+static inline void vgic_reg##sz##_update(uint##sz##_t *reg,             \
+                                         register_t val,                \
+                                         const mmio_info_t *info)       \
+{                                                                       \
+    unsigned long tmp = *reg;                                           \
+                                                                        \
+    vgic_reg_update(&tmp, val, info->gpa & offmask,                     \
+                    info->dabt.size);                                   \
+                                                                        \
+    *reg = tmp;                                                         \
+}                                                                       \
+                                                                        \
+static inline void vgic_reg##sz##_setbits(uint##sz##_t *reg,            \
+                                          register_t bits,              \
+                                          const mmio_info_t *info)      \
+{                                                                       \
+    unsigned long tmp = *reg;                                           \
+                                                                        \
+    vgic_reg_setbits(&tmp, bits, info->gpa & offmask,                   \
+                     info->dabt.size);                                  \
+                                                                        \
+    *reg = tmp;                                                         \
+}                                                                       \
+                                                                        \
+static inline void vgic_reg##sz##_clearbits(uint##sz##_t *reg,          \
+                                            register_t bits,            \
+                                            const mmio_info_t *info)    \
+{                                                                       \
+    unsigned long tmp = *reg;                                           \
+                                                                        \
+    vgic_reg_clearbits(&tmp, bits, info->gpa & offmask,                 \
+                       info->dabt.size);                                \
+                                                                        \
+    *reg = tmp;                                                         \
 }
 
+/*
+ * 64 bits registers are only supported on platform with 64-bit long.
+ * This is also allow us to optimize the 32 bit case by using
+ * unsigned long rather than uint64_t
+ */
+#if BITS_PER_LONG == 64
+VGIC_REG_HELPERS(64, 0x7);
+#endif
+VGIC_REG_HELPERS(32, 0x3);
+
+#undef VGIC_REG_HELPERS
+
 enum gic_sgi_mode;
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v6 6/6] xen/arm: vgic-v3: Support 32-bit access for 64-bit registers
  2015-11-18 16:42 [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers Julien Grall
                   ` (4 preceding siblings ...)
  2015-11-18 16:42 ` [PATCH v6 5/6] xen/arm: vgic: Introduce helpers to extract/update/clear/set vGIC register Julien Grall
@ 2015-11-18 16:42 ` Julien Grall
  2015-11-25 12:29 ` [PATCH v6 0/6] xen/arm: vgic: " Ian Campbell
  6 siblings, 0 replies; 13+ messages in thread
From: Julien Grall @ 2015-11-18 16:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Julien Grall, ian.campbell, stefano.stabellini

Based on 8.1.3 (IHI 0069A), unless stated otherwise, the 64-bit registers
supports both 32-bit and 64-bits access.

All the registers we properly emulate (i.e not RAZ/WI) supports 32-bit access.

For RAZ/WI, it's also seems to be the case but I'm not 100% sure. Anyway,
emulating 32-bit access for them doesn't hurt. Note that we would need
some extra care when they will be implemented (for instance GICR_PROPBASER).

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

---
    This is technically fixing boot of FreeBSD ARM64 guest with GICv3.

    AFAICT, Linux is not using 32-bit access in the GICv3 code expect
    for the ITS (which we don't support yet).

    So this patch is a good candidate for Xen 4.6.1. Although this patch is
    heavily depend on previous patches. It may be possible to shuffle
    and move the "opmitization" patches towards the end. I haven't yet
    done that because I feel this series makes more sense in the current
    order.

    Also, I haven't move vgic_reg64_check_access in vgic.h because there
    is no usage in this series outside of vgic-v3.c and the helpers is
    GICv3 oriented.

    Changes in v2:
        - Add Ian's acked-by

    Changes in v1:
        - Support 32bit access on the most significant word of
        GICR_TYPER
---
 xen/arch/arm/vgic-v3.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 634211b..3a7b86f 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -157,6 +157,15 @@ static void vgic_store_irouter(struct domain *d, struct vgic_irq_rank *rank,
     rank->vcpu[offset] = new_vcpu->vcpu_id;
 }
 
+static inline bool vgic_reg64_check_access(struct hsr_dabt dabt)
+{
+    /*
+     * 64 bits registers can be accessible using 32-bit and 64-bit unless
+     * stated otherwise (See 8.1.3 ARM IHI 0069A).
+     */
+    return ( dabt.size == DABT_DOUBLE_WORD || dabt.size == DABT_WORD );
+}
+
 static int __vgic_v3_rdistr_rd_mmio_read(struct vcpu *v, mmio_info_t *info,
                                          uint32_t gicr_reg,
                                          register_t *r)
@@ -173,10 +182,11 @@ static int __vgic_v3_rdistr_rd_mmio_read(struct vcpu *v, mmio_info_t *info,
         *r = vgic_reg32_extract(GICV3_GICR_IIDR_VAL, info);
         return 1;
     case GICR_TYPER:
+    case GICR_TYPER + 4:
     {
         uint64_t typer, aff;
 
-        if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
+        if ( !vgic_reg64_check_access(dabt) ) goto bad_width;
         /* TBD: Update processor id in [23:8] when ITS support is added */
         aff = (MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 3) << 56 |
                MPIDR_AFFINITY_LEVEL(v->arch.vmpidr, 2) << 48 |
@@ -262,7 +272,7 @@ bad_width:
     return 0;
 
 read_as_zero_64:
-    if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
+    if ( !vgic_reg64_check_access(dabt) ) goto bad_width;
     *r = 0;
     return 1;
 
@@ -338,7 +348,7 @@ bad_width:
     return 0;
 
 write_ignore_64:
-    if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
+    if ( vgic_reg64_check_access(dabt) ) goto bad_width;
     return 1;
 
 write_ignore_32:
@@ -803,7 +813,7 @@ static int vgic_v3_distr_mmio_read(struct vcpu *v, mmio_info_t *info,
     {
         uint64_t irouter;
 
-        if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
+        if ( !vgic_reg64_check_access(dabt) ) goto bad_width;
         rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER,
                                 DABT_DOUBLE_WORD);
         if ( rank == NULL ) goto read_as_zero;
@@ -878,7 +888,7 @@ bad_width:
     return 0;
 
 read_as_zero_64:
-    if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
+    if ( vgic_reg64_check_access(dabt) ) goto bad_width;
     *r = 0;
     return 1;
 
@@ -971,7 +981,7 @@ static int vgic_v3_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
     {
         uint64_t irouter;
 
-        if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
+        if ( !vgic_reg64_check_access(dabt) ) goto bad_width;
         rank = vgic_rank_offset(v, 64, gicd_reg - GICD_IROUTER,
                                 DABT_DOUBLE_WORD);
         if ( rank == NULL ) goto write_ignore;
@@ -1030,7 +1040,7 @@ write_ignore_32:
     return 1;
 
 write_ignore_64:
-    if ( dabt.size != DABT_DOUBLE_WORD ) goto bad_width;
+    if ( vgic_reg64_check_access(dabt) ) goto bad_width;
     return 1;
 
 write_ignore:
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 3/6] xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0
  2015-11-18 16:42 ` [PATCH v6 3/6] xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0 Julien Grall
@ 2015-11-25 11:34   ` Ian Campbell
  0 siblings, 0 replies; 13+ messages in thread
From: Ian Campbell @ 2015-11-25 11:34 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: stefano.stabellini

On Wed, 2015-11-18 at 16:42 +0000, Julien Grall wrote:
> The current implementation ignores the whole write if one of the field is
> 0. Although, based on the spec (4.3.12 IHI 0048B.b), 0 is a valid value
> when:
>     - The interrupt is not wired in the distributor. From the Xen
>     point of view, it means that the corresponding bit is not set in
>     d->arch.vgic.allocated_irqs.
>     - The user wants to disable the IRQ forwarding in the distributor.
>     I.e the IRQ stays pending in the distributor and never received by
>     the guest.
> 
> Implementing the later will require more work in Xen because we always
> assume the interrupt is forwarded to a valid vCPU. So for now, ignore
> any field where the value is 0.
> 
> The emulation of the write access of ITARGETSR has been reworked and
> moved to a new function because it would have been difficult to
> implement properly the behavior with the current code.
> 
> The new implementation is breaking the register in 4 distinct bytes. For
> each byte, it will check the validity of the target list, find the new
> target, migrate the interrupt and store the value if necessary.
> 
> In the new implementation there is nearly no distinction of the access
> size to avoid having too many different path which is harder to test.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> ---
>     This change used to be embedded in "xen/arm: vgic: Optimize the way
>     to store the target vCPU in the rank". It has been moved out to
>     avoid having too much functional changes in a single patch.
> 
>     I'm not sure if this patch should be backported to Xen 4.6. Without
>     it any guest writing 0 in one the field won't be able to migrate
>     other interrupts. Although, in all the use case I've seen, the guest
>     is read ITARGETSR first and write-back the value with the
>     corresponding byte changed.

Lets park it until someone reports they are tripping over this then.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank
  2015-11-18 16:42 ` [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank Julien Grall
@ 2015-11-25 11:37   ` Ian Campbell
  2015-11-30 13:32     ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Ian Campbell @ 2015-11-25 11:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: stefano.stabellini

On Wed, 2015-11-18 at 16:42 +0000, Julien Grall wrote:
> Xen is currently directly storing the value of GICD_ITARGETSR register
> (for GICv2) and GICD_IROUTER (for GICv3) in the rank. This makes the
> emulation of the registers access very simple but makes the code to get
> the target vCPU for a given vIRQ more complex.
> 
> While the target vCPU of an vIRQ is retrieved every time an vIRQ is
> injected to the guest, the access to the register occurs less often.
> 
> So the data structure should be optimized for the most common case
> rather than the inverse.
> 
> This patch introduces the usage of an array to store the target vCPU for
> every interrupt in the rank. This will make the code to get the target
> very quick. The emulation code will now have to generate the
> GICD_ITARGETSR
> and GICD_IROUTER register for read access and split it to store in a
> convenient way.
> 
> With the new way to store the target vCPU, the structure vgic_irq_rank
> is shrunk down from 320 bytes to 92 bytes. This is saving about 228
> bytes of memory allocated separately per vCPU.
> 
> Note that with these changes, any read to those register will list only
> the target vCPU used by Xen. As the spec is not clear whether this is a
> valid choice or not, OSes which have a different interpretation of the
> spec (i.e OSes which perform read-modify-write operations on these
> registers) may not boot anymore on Xen. Although, I think this is fair
> trade between memory usage in Xen (1KB less on a domain using 4 vCPUs
> with no SPIs) and a strict interpretation of the spec (though all the
> cases are not clearly defined).
> 
> Furthermore, the implementation of the callback get_target_vcpu is now
> exactly the same. Consolidate the implementation in the common vGIC code
> and drop the callback.
> 
> Finally take the opportunity to fix coding style and replace "irq" by
> "virq" to make clear that we are dealing with virtual IRQ in section we
> are modifying.
> 
> Signed-off-by: Julien Grall <julien.grall@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

I have one clarifying question, which may or may not be worth a followup:

> + * Fetch an ITARGETSR register based on the offset from ITARGETSR0.

Is the offset here in terms of bytes or in terms of entire ITARGETSR<n>
registers (i.e. 4 bytes)?

Might be worth clarifying the comment?

Ian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers
  2015-11-18 16:42 [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers Julien Grall
                   ` (5 preceding siblings ...)
  2015-11-18 16:42 ` [PATCH v6 6/6] xen/arm: vgic-v3: Support 32-bit access for 64-bit registers Julien Grall
@ 2015-11-25 12:29 ` Ian Campbell
  6 siblings, 0 replies; 13+ messages in thread
From: Ian Campbell @ 2015-11-25 12:29 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: stefano.stabellini

On Wed, 2015-11-18 at 16:42 +0000, Julien Grall wrote:
> Hi all,
> 
> This series aims to fix the 32-bit access on 64-bit registers. Some guest
> OS such as FreeBSD and Linux (ITS and recently 32-bit guest using GICv3)
> use 32-bit access and will crash at boot time.
> 
> For all the changes see in each patch.
> 
> Sincerely yours,
> 
> Julien Grall (6):
>   xen/arm: vgic-v2: Implement correctly ITARGETSR0 - ITARGETSR7
>     read-only
>   xen/arm: vgic-v2: Handle correctly byte write in ITARGETSR
>   xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0
>   xen/arm: vgic: Optimize the way to store the target vCPU in the rank
>   xen/arm: vgic: Introduce helpers to extract/update/clear/set vGIC
>     register ...
>   xen/arm: vgic-v3: Support 32-bit access for 64-bit registers

All applied, thanks.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank
  2015-11-25 11:37   ` Ian Campbell
@ 2015-11-30 13:32     ` Julien Grall
  2015-11-30 13:55       ` Ian Campbell
  0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2015-11-30 13:32 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: stefano.stabellini

Hi Ian,

On 25/11/15 11:37, Ian Campbell wrote:
> On Wed, 2015-11-18 at 16:42 +0000, Julien Grall wrote:
>> Xen is currently directly storing the value of GICD_ITARGETSR register
>> (for GICv2) and GICD_IROUTER (for GICv3) in the rank. This makes the
>> emulation of the registers access very simple but makes the code to get
>> the target vCPU for a given vIRQ more complex.
>>
>> While the target vCPU of an vIRQ is retrieved every time an vIRQ is
>> injected to the guest, the access to the register occurs less often.
>>
>> So the data structure should be optimized for the most common case
>> rather than the inverse.
>>
>> This patch introduces the usage of an array to store the target vCPU for
>> every interrupt in the rank. This will make the code to get the target
>> very quick. The emulation code will now have to generate the
>> GICD_ITARGETSR
>> and GICD_IROUTER register for read access and split it to store in a
>> convenient way.
>>
>> With the new way to store the target vCPU, the structure vgic_irq_rank
>> is shrunk down from 320 bytes to 92 bytes. This is saving about 228
>> bytes of memory allocated separately per vCPU.
>>
>> Note that with these changes, any read to those register will list only
>> the target vCPU used by Xen. As the spec is not clear whether this is a
>> valid choice or not, OSes which have a different interpretation of the
>> spec (i.e OSes which perform read-modify-write operations on these
>> registers) may not boot anymore on Xen. Although, I think this is fair
>> trade between memory usage in Xen (1KB less on a domain using 4 vCPUs
>> with no SPIs) and a strict interpretation of the spec (though all the
>> cases are not clearly defined).
>>
>> Furthermore, the implementation of the callback get_target_vcpu is now
>> exactly the same. Consolidate the implementation in the common vGIC code
>> and drop the callback.
>>
>> Finally take the opportunity to fix coding style and replace "irq" by
>> "virq" to make clear that we are dealing with virtual IRQ in section we
>> are modifying.
>>
>> Signed-off-by: Julien Grall <julien.grall@citrix.com>
> 
> Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> I have one clarifying question, which may or may not be worth a followup:
> 
>> + * Fetch an ITARGETSR register based on the offset from ITARGETSR0.
> 
> Is the offset here in terms of bytes or in terms of entire ITARGETSR<n>
> registers (i.e. 4 bytes)?

The offset is in term of bytes.

> Might be worth clarifying the comment?

I'm not sure, I think it's implicit with the following sentence in the
comment:

"Note the offset will be aligned to the appropriate boundary."

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank
  2015-11-30 13:32     ` Julien Grall
@ 2015-11-30 13:55       ` Ian Campbell
  2015-11-30 14:02         ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Ian Campbell @ 2015-11-30 13:55 UTC (permalink / raw)
  To: Julien Grall, xen-devel; +Cc: stefano.stabellini

On Mon, 2015-11-30 at 13:32 +0000, Julien Grall wrote:
> Hi Ian,
> 
> On 25/11/15 11:37, Ian Campbell wrote:
> > On Wed, 2015-11-18 at 16:42 +0000, Julien Grall wrote:
> > > Xen is currently directly storing the value of GICD_ITARGETSR
> > > register
> > > (for GICv2) and GICD_IROUTER (for GICv3) in the rank. This makes the
> > > emulation of the registers access very simple but makes the code to
> > > get
> > > the target vCPU for a given vIRQ more complex.
> > > 
> > > While the target vCPU of an vIRQ is retrieved every time an vIRQ is
> > > injected to the guest, the access to the register occurs less often.
> > > 
> > > So the data structure should be optimized for the most common case
> > > rather than the inverse.
> > > 
> > > This patch introduces the usage of an array to store the target vCPU
> > > for
> > > every interrupt in the rank. This will make the code to get the
> > > target
> > > very quick. The emulation code will now have to generate the
> > > GICD_ITARGETSR
> > > and GICD_IROUTER register for read access and split it to store in a
> > > convenient way.
> > > 
> > > With the new way to store the target vCPU, the structure
> > > vgic_irq_rank
> > > is shrunk down from 320 bytes to 92 bytes. This is saving about 228
> > > bytes of memory allocated separately per vCPU.
> > > 
> > > Note that with these changes, any read to those register will list
> > > only
> > > the target vCPU used by Xen. As the spec is not clear whether this is
> > > a
> > > valid choice or not, OSes which have a different interpretation of
> > > the
> > > spec (i.e OSes which perform read-modify-write operations on these
> > > registers) may not boot anymore on Xen. Although, I think this is
> > > fair
> > > trade between memory usage in Xen (1KB less on a domain using 4 vCPUs
> > > with no SPIs) and a strict interpretation of the spec (though all the
> > > cases are not clearly defined).
> > > 
> > > Furthermore, the implementation of the callback get_target_vcpu is
> > > now
> > > exactly the same. Consolidate the implementation in the common vGIC
> > > code
> > > and drop the callback.
> > > 
> > > Finally take the opportunity to fix coding style and replace "irq" by
> > > "virq" to make clear that we are dealing with virtual IRQ in section
> > > we
> > > are modifying.
> > > 
> > > Signed-off-by: Julien Grall <julien.grall@citrix.com>
> > 
> > Acked-by: Ian Campbell <ian.campbell@citrix.com>
> > 
> > I have one clarifying question, which may or may not be worth a
> > followup:
> > 
> > > + * Fetch an ITARGETSR register based on the offset from ITARGETSR0.
> > 
> > Is the offset here in terms of bytes or in terms of entire ITARGETSR<n>
> > registers (i.e. 4 bytes)?
> 
> The offset is in term of bytes.
> 
> > Might be worth clarifying the comment?
> 
> I'm not sure, I think it's implicit with the following sentence in the
> comment:
> 
> "Note the offset will be aligned to the appropriate boundary."

It's very implicit, since without knowing the answer it's not clear what an
appropriate boundary is. 

How about: "Note the byte offset will be aligned to an ITARGETSR<n>"
boundary" ?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank
  2015-11-30 13:55       ` Ian Campbell
@ 2015-11-30 14:02         ` Julien Grall
  0 siblings, 0 replies; 13+ messages in thread
From: Julien Grall @ 2015-11-30 14:02 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: stefano.stabellini

On 30/11/15 13:55, Ian Campbell wrote:
> On Mon, 2015-11-30 at 13:32 +0000, Julien Grall wrote:
>> Hi Ian,
>>
>> On 25/11/15 11:37, Ian Campbell wrote:
>>> On Wed, 2015-11-18 at 16:42 +0000, Julien Grall wrote:
>>>> Xen is currently directly storing the value of GICD_ITARGETSR
>>>> register
>>>> (for GICv2) and GICD_IROUTER (for GICv3) in the rank. This makes the
>>>> emulation of the registers access very simple but makes the code to
>>>> get
>>>> the target vCPU for a given vIRQ more complex.
>>>>
>>>> While the target vCPU of an vIRQ is retrieved every time an vIRQ is
>>>> injected to the guest, the access to the register occurs less often.
>>>>
>>>> So the data structure should be optimized for the most common case
>>>> rather than the inverse.
>>>>
>>>> This patch introduces the usage of an array to store the target vCPU
>>>> for
>>>> every interrupt in the rank. This will make the code to get the
>>>> target
>>>> very quick. The emulation code will now have to generate the
>>>> GICD_ITARGETSR
>>>> and GICD_IROUTER register for read access and split it to store in a
>>>> convenient way.
>>>>
>>>> With the new way to store the target vCPU, the structure
>>>> vgic_irq_rank
>>>> is shrunk down from 320 bytes to 92 bytes. This is saving about 228
>>>> bytes of memory allocated separately per vCPU.
>>>>
>>>> Note that with these changes, any read to those register will list
>>>> only
>>>> the target vCPU used by Xen. As the spec is not clear whether this is
>>>> a
>>>> valid choice or not, OSes which have a different interpretation of
>>>> the
>>>> spec (i.e OSes which perform read-modify-write operations on these
>>>> registers) may not boot anymore on Xen. Although, I think this is
>>>> fair
>>>> trade between memory usage in Xen (1KB less on a domain using 4 vCPUs
>>>> with no SPIs) and a strict interpretation of the spec (though all the
>>>> cases are not clearly defined).
>>>>
>>>> Furthermore, the implementation of the callback get_target_vcpu is
>>>> now
>>>> exactly the same. Consolidate the implementation in the common vGIC
>>>> code
>>>> and drop the callback.
>>>>
>>>> Finally take the opportunity to fix coding style and replace "irq" by
>>>> "virq" to make clear that we are dealing with virtual IRQ in section
>>>> we
>>>> are modifying.
>>>>
>>>> Signed-off-by: Julien Grall <julien.grall@citrix.com>
>>>
>>> Acked-by: Ian Campbell <ian.campbell@citrix.com>
>>>
>>> I have one clarifying question, which may or may not be worth a
>>> followup:
>>>
>>>> + * Fetch an ITARGETSR register based on the offset from ITARGETSR0.
>>>
>>> Is the offset here in terms of bytes or in terms of entire ITARGETSR<n>
>>> registers (i.e. 4 bytes)?
>>
>> The offset is in term of bytes.
>>
>>> Might be worth clarifying the comment?
>>
>> I'm not sure, I think it's implicit with the following sentence in the
>> comment:
>>
>> "Note the offset will be aligned to the appropriate boundary."
> 
> It's very implicit, since without knowing the answer it's not clear what an
> appropriate boundary is. 
> 
> How about: "Note the byte offset will be aligned to an ITARGETSR<n>"
> boundary" ?

Ok. I will do the same for the comment on top of vgic_*_irouter.

Regards,

-- 
Julien Grall

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-11-30 14:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-18 16:42 [PATCH v6 0/6] xen/arm: vgic: Support 32-bit access for 64-bit registers Julien Grall
2015-11-18 16:42 ` [PATCH v6 1/6] xen/arm: vgic-v2: Implement correctly ITARGETSR0 - ITARGETSR7 read-only Julien Grall
2015-11-18 16:42 ` [PATCH v6 2/6] xen/arm: vgic-v2: Handle correctly byte write in ITARGETSR Julien Grall
2015-11-18 16:42 ` [PATCH v6 3/6] xen/arm: vgic-v2: Don't ignore a write in ITARGETSR if one field is 0 Julien Grall
2015-11-25 11:34   ` Ian Campbell
2015-11-18 16:42 ` [PATCH v6 4/6] xen/arm: vgic: Optimize the way to store the target vCPU in the rank Julien Grall
2015-11-25 11:37   ` Ian Campbell
2015-11-30 13:32     ` Julien Grall
2015-11-30 13:55       ` Ian Campbell
2015-11-30 14:02         ` Julien Grall
2015-11-18 16:42 ` [PATCH v6 5/6] xen/arm: vgic: Introduce helpers to extract/update/clear/set vGIC register Julien Grall
2015-11-18 16:42 ` [PATCH v6 6/6] xen/arm: vgic-v3: Support 32-bit access for 64-bit registers Julien Grall
2015-11-25 12:29 ` [PATCH v6 0/6] xen/arm: vgic: " Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.