All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2
@ 2018-12-26 11:20 Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 01/16] gic:gic-vgic: separate GIV3 code more thoroughly Andrii Anisov
                   ` (17 more replies)
  0 siblings, 18 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

This patch series is an attempt to reduce IRQ latency with the
old GIC implementation (gic-vgic). These patches originally based
on XEN 4.10 release. The motivation was to improve benchmark
results of a system given to a customer for evaluation.
This patch series is tailored for GICv2 on RCAR H3. Several
of the most controversial patches (i.e. LRs shadowing) were
not shared to the customer, and here are for comments and discussion.
I hope several patches from here could be upstreamed. Some as is,
others with modifications.

There are several simple ideas behind these changes:
    - reduce an excessive code (condition checks)
    - drop an excessive peripheral register accesses
    - if not reduce, then move an excessive code out of spinlocks
    - if not drop, then move an excessive register
      accesses out of spinlocks

This is a v2 of the original RFC series [1]. From that series, patches
[2] and [3] have already reached mainline. Here few patches are reworked
with addressing some comments or separating them into more clear pieces,
more patches are taken from the RFC v1 as is.

The main intention of this version of RFC series is to reveal
patch-by-patch IRQ latency impact.
The measurement is performed with TBM [4], so the use-case is trivial -
passing a single IRQ twice in a second. Thus no lock contentions nor
even passing more than one interrupt to a guest at the time use-cases
are hit.

The series is based on the current xenbits/staging, commit 7f28661f6a7.
XEN is build with no DEBUG and no GICv3 support for the staging HEAD and
each commit. Four runtime configurations are evaluated for each commit:
    - sched=credit2 vwfi=trap
    - sched=credit2 vwfi=native
    - sched=credit vwfi=trap
    - sched=credit vwfi=native

Each commit is incrementally cherry-picked for the latency evaluation in
an order they appear in the table. The table also can be found shared
as a Google spreadsheet here [5].


	sched=credit2 vwfi=trap                 	sched=credit2 vwfi=native               	sched=credit vwfi=trap                  	sched=credit vwfi=native

7f28661f6a7ce3d82f881b9afedfebca7f2cf116
	max=9480 warm_max=7200 min=6600 avg=6743	max=4680 warm_max=3240 min=3000 avg=3007	max=9480 warm_max=7920 min=6720 avg=7009	max=4560 warm_max=3000 min=2880 avg=2979

gic:gic-vgic: separate GIV3 code more thoroughly

	max=9720 warm_max=6960 min=6600 avg=6617	max=5040 warm_max=3840 min=2880 avg=2905	max=9480 warm_max=7200 min=6600 avg=6871	max=4560 warm_max=3000 min=2880 avg=2887

gic-vgic:vgic: avoid excessive conversions

	max=9360 warm_max=6720 min=6480 avg=6578	max=4800 warm_max=3120 min=2880 avg=2895	max=9480 warm_max=7080 min=6600 avg=6804	max=4800 warm_max=3120 min=2880 avg=2887

gic:vgic:gic-vgic: introduce non-atomic bitops

	max=9120 warm_max=6600 min=6480 avg=6546	max=4920 warm_max=3000 min=2760 avg=2872	max=9120 warm_max=6720 min=6480 avg=6574	max=4200 warm_max=3120 min=2760 avg=2798

gic: drop interrupts enabling on interrupts processing
	max=9240 warm_max=7080 min=6360 avg=6492	max=5040 warm_max=3240 min=2760 avg=2767	max=9240 warm_max=6720 min=6480 avg=6491	max=4440 warm_max=3000 min=2760 avg=2809

gic-vgic: skip irqs locking in gic_restore_pending_irqs()
	max=9000 warm_max=6720 min=6360 avg=6430	max=4320 warm_max=3120 min=2640 avg=2671	max=9240 warm_max=6720 min=6360 avg=6459	max=4440 warm_max=2880 min=2640 avg=2668

vgic: move pause_flags check out of vgic spinlock
	max=9240 warm_max=6720 min=6360 avg=6431	max=4800 warm_max=2880 min=2640 avg=2675	max=9360 warm_max=6600 min=6360 avg=6435	max=4440 warm_max=2760 min=2640 avg=2647

vgic: move irq_to_pending out of lock
	max=8520 warm_max=7440 min=6360 avg=6444	max=4680 warm_max=3000 min=2640 avg=2753	max=9480 warm_max=6720 min=6360 avg=6445	max=4200 warm_max=3000 min=2640 avg=2667

gic-vgic:vgic: do not keep disabled IRQs in any of queues
	max=9120 warm_max=7920 min=6360 avg=6447	max=4440 warm_max=2760 min=2760 avg=2767	max=10440 warm_max=7560 min=6360 avg=6459	max=4440 warm_max=3840 min=2640 avg=2669

xen/arm: Re-enable interrupt later in the trap path
	max=9720 warm_max=9120 min=6360 avg=6441	max=4440 warm_max=2880 min=2760 avg=2767	max=9360 warm_max=6960 min=6360 avg=6451	max=4680 warm_max=2880 min=2640 avg=2675

gic-vgic: skip irqs locking in vgic_sync_from_lrs
	max=9240 warm_max=7080 min=6360 avg=6431	max=4920 warm_max=3120 min=2640 avg=2678	max=9480 warm_max=6960 min=6360 avg=6443	max=4680 warm_max=2880 min=2640 avg=2667

gic-v2: Write HCR only on change
	max=9840 warm_max=6600 min=6360 avg=6459	max=4440 warm_max=2760 min=2520 avg=2527	max=9480 warm_max=7920 min=6360 avg=6445	max=4320 warm_max=2760 min=2520 avg=2527

gic-v2: avoid HCR reading for GICv2
	max=9480 warm_max=7680 min=6360 avg=6443	max=4320 warm_max=2880 min=2520 avg=2527	max=9360 warm_max=7080 min=6720 avg=6750	max=3960 warm_max=2640 min=2400 avg=2487

hack: arm/domain: simplify context restore from idle vcpu
	max=9360 warm_max=6720 min=6000 avg=6214	max=5040 warm_max=2640 min=2520 avg=2527	max=9480 warm_max=7080 min=6240 avg=6367	max=4080 warm_max=2880 min=2400 avg=2527

hack: move gicv2 LRs reads and writes out of spinlocks
	max=9480 warm_max=6840 min=6600 avg=6612	max=4800 warm_max=2760 min=2640 avg=2739	max=9000 warm_max=7200 min=6600 avg=6636	max=4560 warm_max=2760 min=2520 avg=2619

gic: vgic: align frequently accessed data by cache line size
	max=9840 warm_max=6600 min=6240 avg=6288	max=4440 warm_max=2880 min=2640 avg=2682	max=8280 warm_max=6720 min=6360 avg=6488	max=4080 warm_max=2880 min=2640 avg=2678

gic: separate ppi processing
	NOT SUITABLE FOR EVALUATION WITH TBM

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html
[4] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00881.html
[5] https://docs.google.com/spreadsheets/d/1J_u9UDowaonnaKtgiugXqtIFT-c2E4Ss2vxgL6NnbNo/edit?usp=sharing

Andrii Anisov (15):
  gic:gic-vgic: separate GIV3 code more thoroughly
  gic-vgic:vgic: avoid excessive conversions
  gic:vgic:gic-vgic: introduce non-atomic bitops
  gic: drop interrupts enabling on interrupts processing
  gic-vgic: skip irqs locking in gic_restore_pending_irqs()
  vgic: move pause_flags check out of vgic spinlock
  vgic: move irq_to_pending out of lock
  gic-vgic:vgic: do not keep disabled IRQs in any of queues
  gic-vgic: skip irqs locking in vgic_sync_from_lrs
  gic-v2: Write HCR only on change
  gic-v2: avoid HCR reading for GICv2
  hack: arm/domain: simplify context restore from idle vcpu
  hack: move gicv2 LRs reads and writes out of spinlocks
  gic: vgic: align frequently accessed data by cache line size
  gic: separate ppi processing

Julien Grall (1):
  xen/arm: Re-enable interrupt later in the trap path

[11] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03282.html

 xen/arch/arm/arm64/entry.S   | 11 +++---
 xen/arch/arm/domain.c        | 25 +++++++-----
 xen/arch/arm/gic-v2.c        | 82 +++++++++++++++++++++++++-------------
 xen/arch/arm/gic-v3-its.c    |  2 +
 xen/arch/arm/gic-v3-lpi.c    |  2 +
 xen/arch/arm/gic-v3.c        |  4 +-
 xen/arch/arm/gic-vgic.c      | 87 +++++++++++++++++++++++++----------------
 xen/arch/arm/gic.c           | 32 +++++++++++++--
 xen/arch/arm/irq.c           | 32 +++++++++++++++
 xen/arch/arm/traps.c         |  6 +++
 xen/arch/arm/vgic-v3-its.c   |  2 +-
 xen/arch/arm/vgic.c          | 93 +++++++++++++++++++++++++++++++++++++-------
 xen/arch/arm/vgic/vgic.c     |  2 +
 xen/include/asm-arm/config.h |  2 +-
 xen/include/asm-arm/gic.h    | 10 ++---
 xen/include/asm-arm/irq.h    |  3 ++
 xen/include/asm-arm/vgic.h   | 24 ++++++++----
 xen/include/xen/sched.h      |  1 +
 18 files changed, 310 insertions(+), 110 deletions(-)

-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC v2 01/16] gic:gic-vgic: separate GIV3 code more thoroughly
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 02/16] gic-vgic:vgic: avoid excessive conversions Andrii Anisov
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

This reduces some code and conditions in an IRQ processing path,
reducing IRQ latency for a build supported GICV2 only.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---

Its a patch [1] with a better message and removed build errors.
Comments from Andre are not addressed yet.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03282.html

---
 xen/arch/arm/gic-vgic.c    | 13 ++++++++++++-
 xen/arch/arm/gic.c         |  6 ++++++
 xen/arch/arm/vgic.c        | 20 ++++++++++++++++++++
 xen/arch/arm/vgic/vgic.c   |  2 ++
 xen/include/asm-arm/irq.h  |  2 ++
 xen/include/asm-arm/vgic.h | 17 +++++++++++------
 6 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 48922f5..74b5fae 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -36,7 +36,9 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
 {
     ASSERT(!local_irq_is_enabled());
 
+#ifdef CONFIG_GICV3
     clear_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status);
+#endif
 
     gic_hw_ops->update_lr(lr, p->irq, p->priority,
                           p->desc ? p->desc->irq : INVALID_IRQ, state);
@@ -77,9 +79,11 @@ void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq)
 {
     struct pending_irq *n = irq_to_pending(v, virtual_irq);
 
+#ifdef CONFIG_GICV3
     /* If an LPI has been removed meanwhile, there is nothing left to raise. */
     if ( unlikely(!n) )
         return;
+#endif
 
     ASSERT(spin_is_locked(&v->arch.vgic.lock));
 
@@ -112,13 +116,14 @@ static unsigned int gic_find_unused_lr(struct vcpu *v,
 {
     unsigned int nr_lrs = gic_get_nr_lrs();
     unsigned long *lr_mask = (unsigned long *) &this_cpu(lr_mask);
-    struct gic_lr lr_val;
 
     ASSERT(spin_is_locked(&v->arch.vgic.lock));
 
+#ifdef CONFIG_GICV3
     if ( unlikely(test_bit(GIC_IRQ_GUEST_PRISTINE_LPI, &p->status)) )
     {
         unsigned int used_lr;
+        struct gic_lr lr_val;
 
         for_each_set_bit(used_lr, lr_mask, nr_lrs)
         {
@@ -127,6 +132,7 @@ static unsigned int gic_find_unused_lr(struct vcpu *v,
                 return used_lr;
         }
     }
+#endif
 
     lr = find_next_zero_bit(lr_mask, nr_lrs, lr);
 
@@ -142,9 +148,11 @@ void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
 
     ASSERT(spin_is_locked(&v->arch.vgic.lock));
 
+#ifdef CONFIG_GICV3
     if ( unlikely(!p) )
         /* An unmapped LPI does not need to be raised. */
         return;
+#endif
 
     if ( v == current && list_empty(&v->arch.vgic.lr_pending) )
     {
@@ -172,6 +180,8 @@ static void gic_update_one_lr(struct vcpu *v, int i)
     gic_hw_ops->read_lr(i, &lr_val);
     irq = lr_val.virq;
     p = irq_to_pending(v, irq);
+
+#ifdef CONFIG_GICV3
     /*
      * An LPI might have been unmapped, in which case we just clean up here.
      * If that LPI is marked as PRISTINE, the information in the LR is bogus,
@@ -188,6 +198,7 @@ static void gic_update_one_lr(struct vcpu *v, int i)
 
         return;
     }
+#endif
 
     if ( lr_val.active )
     {
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 6cc7dec..77fc06f 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -136,7 +136,9 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int virq,
     /* Caller has already checked that the IRQ is an SPI */
     ASSERT(virq >= 32);
     ASSERT(virq < vgic_num_irqs(d));
+#ifdef CONFIG_GICV3
     ASSERT(!is_lpi(virq));
+#endif
 
     /*
      * When routing an IRQ to guest, the virtual state is not synced
@@ -168,7 +170,9 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
 
     ASSERT(spin_is_locked(&desc->lock));
     ASSERT(test_bit(_IRQ_GUEST, &desc->status));
+#ifdef CONFIG_GICV3
     ASSERT(!is_lpi(virq));
+#endif
 
     /*
      * Removing an interrupt while the domain is running may have
@@ -391,6 +395,7 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
             do_IRQ(regs, irq, is_fiq);
             local_irq_disable();
         }
+#ifdef CONFIG_GICV3
         else if ( is_lpi(irq) )
         {
             local_irq_enable();
@@ -398,6 +403,7 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
             gic_hw_ops->do_LPI(irq);
             local_irq_disable();
         }
+#endif
         else if ( unlikely(irq < 16) )
         {
             do_sgi(regs, irq);
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 255210c..dd35695 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -64,14 +64,18 @@ struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned int irq)
 
 void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq)
 {
+#ifdef CONFIG_GICV3
     /* The lpi_vcpu_id field must be big enough to hold a VCPU ID. */
     BUILD_BUG_ON(BIT(sizeof(p->lpi_vcpu_id) * 8) < MAX_VIRT_CPUS);
+#endif
 
     memset(p, 0, sizeof(*p));
     INIT_LIST_HEAD(&p->inflight);
     INIT_LIST_HEAD(&p->lr_queue);
     p->irq = virq;
+#ifdef CONFIG_GICV3
     p->lpi_vcpu_id = INVALID_VCPU_ID;
+#endif
 }
 
 static void vgic_rank_init(struct vgic_irq_rank *rank, uint8_t index,
@@ -246,9 +250,11 @@ static int vgic_get_virq_priority(struct vcpu *v, unsigned int virq)
 {
     struct vgic_irq_rank *rank;
 
+#ifdef CONFIG_GICV3
     /* LPIs don't have a rank, also store their priority separately. */
     if ( is_lpi(virq) )
         return v->domain->arch.vgic.handler->lpi_get_priority(v->domain, virq);
+#endif
 
     rank = vgic_rank_irq(v, virq);
     return ACCESS_ONCE(rank->priority[virq & INTERRUPT_RANK_MASK]);
@@ -259,8 +265,10 @@ bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq)
     unsigned long flags;
     struct pending_irq *p;
 
+#ifdef CONFIG_GICV3
     /* This will never be called for an LPI, as we don't migrate them. */
     ASSERT(!is_lpi(irq));
+#endif
 
     spin_lock_irqsave(&old->arch.vgic.lock, flags);
 
@@ -315,6 +323,7 @@ void arch_move_irqs(struct vcpu *v)
     struct vcpu *v_target;
     int i;
 
+#ifdef CONFIG_GICV3
     /*
      * We don't migrate LPIs at the moment.
      * If we ever do, we must make sure that the struct pending_irq does
@@ -325,6 +334,7 @@ void arch_move_irqs(struct vcpu *v)
      * don't participate.
      */
     ASSERT(!is_lpi(vgic_num_irqs(d) - 1));
+#endif
 
     for ( i = 32; i < vgic_num_irqs(d); i++ )
     {
@@ -346,8 +356,10 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
     int i = 0;
     struct vcpu *v_target;
 
+#ifdef CONFIG_GICV3
     /* LPIs will never be disabled via this function. */
     ASSERT(!is_lpi(32 * n + 31));
+#endif
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
@@ -396,8 +408,10 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
     struct vcpu *v_target;
     struct domain *d = v->domain;
 
+#ifdef CONFIG_GICV3
     /* LPIs will never be enabled via this function. */
     ASSERT(!is_lpi(32 * n + 31));
+#endif
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
@@ -493,8 +507,10 @@ struct pending_irq *irq_to_pending(struct vcpu *v, unsigned int irq)
      * are used for SPIs; the rests are used for per cpu irqs */
     if ( irq < 32 )
         n = &v->arch.vgic.pending_irqs[irq];
+#ifdef CONFIG_GICV3
     else if ( is_lpi(irq) )
         n = v->domain->arch.vgic.handler->lpi_to_pending(v->domain, irq);
+#endif
     else
         n = &v->domain->arch.vgic.pending_irqs[irq - 32];
     return n;
@@ -553,12 +569,14 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
     n = irq_to_pending(v, virq);
+#ifdef CONFIG_GICV3
     /* If an LPI has been removed, there is nothing to inject here. */
     if ( unlikely(!n) )
     {
         spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
         return;
     }
+#endif
 
     /* vcpu offline */
     if ( test_bit(_VPF_down, &v->pause_flags) )
@@ -610,8 +628,10 @@ bool vgic_evtchn_irq_pending(struct vcpu *v)
     struct pending_irq *p;
 
     p = irq_to_pending(v, v->domain->arch.evtchn_irq);
+#ifdef CONFIG_GICV3
     /* Does not work for LPIs. */
     ASSERT(!is_lpi(v->domain->arch.evtchn_irq));
+#endif
 
     return list_empty(&p->inflight);
 }
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index e2844dc..b8dbdaf 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -703,8 +703,10 @@ bool vgic_evtchn_irq_pending(struct vcpu *v)
     unsigned long flags;
     bool pending;
 
+#ifdef CONFIG_GICV3
     /* Does not work for LPIs. */
     ASSERT(!is_lpi(v->domain->arch.evtchn_irq));
+#endif
 
     irq = vgic_get_irq(v->domain, v, v->domain->arch.evtchn_irq);
     spin_lock_irqsave(&irq->irq_lock, flags);
diff --git a/xen/include/asm-arm/irq.h b/xen/include/asm-arm/irq.h
index e45d574..4f1ef3c 100644
--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -63,10 +63,12 @@ struct irq_desc *__irq_to_desc(int irq);
 
 void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq);
 
+#ifdef CONFIG_GICV3
 static inline bool is_lpi(unsigned int irq)
 {
     return irq >= LPI_OFFSET;
 }
+#endif
 
 #define domain_pirq_to_irq(d, pirq) (pirq)
 
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index 447d24e..c5cb63f 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -63,27 +63,32 @@ struct pending_irq
      * GIC_IRQ_GUEST_MIGRATING: the irq is being migrated to a different
      * vcpu while it is still inflight and on an GICH_LR register on the
      * old vcpu.
-     *
-     * GIC_IRQ_GUEST_PRISTINE_LPI: the IRQ is a newly mapped LPI, which
-     * has never been in an LR before. This means that any trace of an
-     * LPI with the same number in an LR must be from an older LPI, which
-     * has been unmapped before.
-     *
      */
 #define GIC_IRQ_GUEST_QUEUED   0
 #define GIC_IRQ_GUEST_ACTIVE   1
 #define GIC_IRQ_GUEST_VISIBLE  2
 #define GIC_IRQ_GUEST_ENABLED  3
 #define GIC_IRQ_GUEST_MIGRATING   4
+#ifdef CONFIG_GICV3
+    /*
+     * GIC_IRQ_GUEST_PRISTINE_LPI: the IRQ is a newly mapped LPI, which
+     * has never been in an LR before. This means that any trace of an
+     * LPI with the same number in an LR must be from an older LPI, which
+     * has been unmapped before.
+     * Valid for GICV3 only.
+     */
 #define GIC_IRQ_GUEST_PRISTINE_LPI  5
+#endif
     unsigned long status;
     struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
     unsigned int irq;
 #define GIC_INVALID_LR         (uint8_t)~0
     uint8_t lr;
     uint8_t priority;
+#ifdef CONFIG_GICV3
     uint8_t lpi_priority;       /* Caches the priority if this is an LPI. */
     uint8_t lpi_vcpu_id;        /* The VCPU for an LPI. */
+#endif
     /* inflight is used to append instances of pending_irq to
      * vgic.inflight_irqs */
     struct list_head inflight;
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 02/16] gic-vgic:vgic: avoid excessive conversions
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 01/16] gic:gic-vgic: separate GIV3 code more thoroughly Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 03/16] gic:vgic:gic-vgic: introduce non-atomic bitops Andrii Anisov
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

Avoid excessive conversions between `pending_irq` and irq number and
priority. This is a tiny but pure optimization.

Also it aligns a `gic_raise_guest_irq()` function interface to its
current implementation in the meaning that it is removed a clearly
ignored priority argument. Seeing a call to this function from a
GICv3 code with an `p->lpi_priority` passed, it is assumed that LPI
priority should be used for prioritization in an `lr_pending` queue.
But the `lr_pending` queue is only sorted by `p->priority`, so it is
supposed that `p->lpi_priority` should be casted to `p->priority`
by GICv3 code to achieve expected behavior. On other hand, for LPIs,
`p->lpi_priority` might be still honored in the `gic_raise_guest_irq()`
functions, because it is still in the `pending_irq`.
But that fix should be made aside this patch.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---

Its a patch [1] with an improved message which also reflects what was
discussed in the thread here [2].

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03289.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg02031.html

---
 xen/arch/arm/gic-vgic.c    | 10 +++-------
 xen/arch/arm/vgic-v3-its.c |  2 +-
 xen/arch/arm/vgic.c        |  6 +++---
 xen/include/asm-arm/gic.h  |  3 ---
 xen/include/asm-arm/vgic.h |  2 ++
 5 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 74b5fae..471e2d0 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -75,10 +75,8 @@ void gic_remove_from_lr_pending(struct vcpu *v, struct pending_irq *p)
     list_del_init(&p->lr_queue);
 }
 
-void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq)
+void gic_raise_inflight_irq(struct vcpu *v, struct pending_irq *n)
 {
-    struct pending_irq *n = irq_to_pending(v, virtual_irq);
-
 #ifdef CONFIG_GICV3
     /* If an LPI has been removed meanwhile, there is nothing left to raise. */
     if ( unlikely(!n) )
@@ -139,12 +137,10 @@ static unsigned int gic_find_unused_lr(struct vcpu *v,
     return lr;
 }
 
-void gic_raise_guest_irq(struct vcpu *v, unsigned int virtual_irq,
-        unsigned int priority)
+void gic_raise_guest_irq(struct vcpu *v, struct pending_irq *p)
 {
     int i;
     unsigned int nr_lrs = gic_get_nr_lrs();
-    struct pending_irq *p = irq_to_pending(v, virtual_irq);
 
     ASSERT(spin_is_locked(&v->arch.vgic.lock));
 
@@ -240,7 +236,7 @@ static void gic_update_one_lr(struct vcpu *v, int i)
         if ( test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) &&
              test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) &&
              !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
-            gic_raise_guest_irq(v, irq, p->priority);
+            gic_raise_guest_irq(v, p);
         else {
             list_del_init(&p->inflight);
             /*
diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
index 5b73c4e..193a28f 100644
--- a/xen/arch/arm/vgic-v3-its.c
+++ b/xen/arch/arm/vgic-v3-its.c
@@ -447,7 +447,7 @@ static void update_lpi_vgic_status(struct vcpu *v, struct pending_irq *p)
     {
         if ( !list_empty(&p->inflight) &&
              !test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
-            gic_raise_guest_irq(v, p->irq, p->lpi_priority);
+            gic_raise_guest_irq(v, p);
     }
     else
         gic_remove_from_lr_pending(v, p);
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index dd35695..458b2a8 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -420,7 +420,7 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
         p = irq_to_pending(v_target, irq);
         set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         if ( !list_empty(&p->inflight) && !test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
-            gic_raise_guest_irq(v_target, irq, p->priority);
+            gic_raise_guest_irq(v_target, p);
         spin_unlock_irqrestore(&v_target->arch.vgic.lock, flags);
         if ( p->desc != NULL )
         {
@@ -589,7 +589,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
 
     if ( !list_empty(&n->inflight) )
     {
-        gic_raise_inflight_irq(v, virq);
+        gic_raise_inflight_irq(v, n);
         goto out;
     }
 
@@ -598,7 +598,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
 
     /* the irq is enabled */
     if ( test_bit(GIC_IRQ_GUEST_ENABLED, &n->status) )
-        gic_raise_guest_irq(v, virq, priority);
+        gic_raise_guest_irq(v, n);
 
     list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )
     {
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index fab02f1..3d7394d 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -252,9 +252,6 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
 extern void gic_clear_pending_irqs(struct vcpu *v);
 
 extern void init_maintenance_interrupt(void);
-extern void gic_raise_guest_irq(struct vcpu *v, unsigned int irq,
-        unsigned int priority);
-extern void gic_raise_inflight_irq(struct vcpu *v, unsigned int virtual_irq);
 
 /* Accept an interrupt from the GIC and dispatch its handler */
 extern void gic_interrupt(struct cpu_user_regs *regs, int is_fiq);
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index c5cb63f..0b1f519 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -286,6 +286,8 @@ enum gic_sgi_mode;
 extern struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq);
 extern void vgic_remove_irq_from_queues(struct vcpu *v, struct pending_irq *p);
 extern void gic_remove_from_lr_pending(struct vcpu *v, struct pending_irq *p);
+extern void gic_raise_guest_irq(struct vcpu *v, struct pending_irq *p);
+extern void gic_raise_inflight_irq(struct vcpu *v, struct pending_irq *n);
 extern void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq);
 extern struct pending_irq *irq_to_pending(struct vcpu *v, unsigned int irq);
 extern struct pending_irq *spi_to_pending(struct domain *d, unsigned int irq);
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 03/16] gic:vgic:gic-vgic: introduce non-atomic bitops
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 01/16] gic:gic-vgic: separate GIV3 code more thoroughly Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 02/16] gic-vgic:vgic: avoid excessive conversions Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 04/16] gic: drop interrupts enabling on interrupts processing Andrii Anisov
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

All bit operations for gic, vgic and gic-vgic are performed under
spinlocks, so there is no need for atomic bit ops here, they only
introduce excessive call to functions used more expensive exclusive
ARM instructions.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---

This patch was not changed. Its only here to list what is really
evaluated for IRQ latency impact. So ugly non-atomic bitops are still
here.
Moreover, taking in cosideration this [1], a closer look is needed
into bitops in XEN on ARM.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03425.html

---
 xen/arch/arm/gic-vgic.c | 16 ++++++++++++++++
 xen/arch/arm/gic.c      | 16 ++++++++++++++++
 xen/arch/arm/vgic.c     | 16 ++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 471e2d0..0f8f8d5 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -25,6 +25,22 @@
 #include <asm/gic.h>
 #include <asm/vgic.h>
 
+#undef set_bit
+#define set_bit(nr, addr) (*(addr) |= (1<<nr))
+
+#undef clear_bit
+#define clear_bit(nr, addr) (*(addr) &= ~(1<<nr))
+
+#undef test_bit
+#define test_bit(nr,addr) (*(addr) & (1<<nr))
+
+#undef test_and_clear_bit
+#define test_and_clear_bit(nr,addr) ({                    \
+    bool _x; \
+    _x = (*(addr) & (1<<nr));                        \
+    (*(addr) &= ~(1<<nr));                                \
+    (_x);})
+
 #define lr_all_full() (this_cpu(lr_mask) == ((1 << gic_get_nr_lrs()) - 1))
 
 #undef GIC_DEBUG
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 77fc06f..f576b66 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -40,6 +40,22 @@
 
 DEFINE_PER_CPU(uint64_t, lr_mask);
 
+#undef set_bit
+#define set_bit(nr, addr) (*(addr) |= (1<<nr))
+
+#undef clear_bit
+#define clear_bit(nr, addr) (*(addr) &= ~(1<<nr))
+
+#undef test_bit
+#define test_bit(nr,addr) (*(addr) & (1<<nr))
+
+#undef test_and_clear_bit
+#define test_and_clear_bit(nr,addr) ({                    \
+    bool _x; \
+    _x = (*(addr) & (1<<nr));                        \
+    (*(addr) &= ~(1<<nr));                                \
+    (_x);})
+
 #undef GIC_DEBUG
 
 const struct gic_hw_operations *gic_hw_ops;
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 458b2a8..057a721 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -33,6 +33,22 @@
 #include <asm/gic.h>
 #include <asm/vgic.h>
 
+#undef set_bit
+#define set_bit(nr, addr) (*(addr) |= (1<<nr))
+
+#undef clear_bit
+#define clear_bit(nr, addr) (*(addr) &= ~(1<<nr))
+
+#undef test_bit
+#define test_bit(nr,addr) (*(addr) & (1<<nr))
+
+#undef test_and_clear_bit
+#define test_and_clear_bit(nr,addr) ({                    \
+    bool _x = (*(addr) & (1<<nr));                        \
+    (*(addr) &= ~(1<<nr));                                \
+    return (_x);                                                 \
+})
+
 static inline struct vgic_irq_rank *vgic_get_rank(struct vcpu *v, int rank)
 {
     if ( rank == 0 )
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 04/16] gic: drop interrupts enabling on interrupts processing
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (2 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 03/16] gic:vgic:gic-vgic: introduce non-atomic bitops Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 05/16] gic-vgic: skip irqs locking in gic_restore_pending_irqs() Andrii Anisov
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

This reduces the number of context switches in case we have coming guest
interrupts from different sources at a high rate. What is likely for
multimedia use-cases.
Having irqs unlocked here makes us go through trap path again in case we
have a new guest interrupt arrived (even with the same priority, after
`desc->handler->end(desc)`), what is just a processor cycles wasting. We
will catch them all in the `gic_interrupt() function loop anyway. And
the guest irqs arrival prioritization is meaningless here, it is only
effective at guest's level.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---

The patch message is improved.

---
 xen/arch/arm/gic.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index f576b66..ecaa3d6 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -406,10 +406,8 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
 
         if ( likely(irq >= 16 && irq < 1020) )
         {
-            local_irq_enable();
             isb();
             do_IRQ(regs, irq, is_fiq);
-            local_irq_disable();
         }
 #ifdef CONFIG_GICV3
         else if ( is_lpi(irq) )
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 05/16] gic-vgic: skip irqs locking in gic_restore_pending_irqs()
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (3 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 04/16] gic: drop interrupts enabling on interrupts processing Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 06/16] vgic: move pause_flags check out of vgic spinlock Andrii Anisov
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

This function is called under IRQs disabled already, so drop additional
flags save and restore.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---

This patch is a part of [1] and already sent to the list [2].

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03293.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg02188.html

---
 xen/arch/arm/gic-vgic.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 0f8f8d5..74ab357 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -302,11 +302,12 @@ static void gic_restore_pending_irqs(struct vcpu *v)
     int lr = 0;
     struct pending_irq *p, *t, *p_r;
     struct list_head *inflight_r;
-    unsigned long flags;
     unsigned int nr_lrs = gic_get_nr_lrs();
     int lrs = nr_lrs;
 
-    spin_lock_irqsave(&v->arch.vgic.lock, flags);
+    ASSERT(!local_irq_is_enabled());
+
+    spin_lock(&v->arch.vgic.lock);
 
     if ( list_empty(&v->arch.vgic.lr_pending) )
         goto out;
@@ -350,7 +351,7 @@ found:
     }
 
 out:
-    spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
+    spin_unlock(&v->arch.vgic.lock);
 }
 
 void gic_clear_pending_irqs(struct vcpu *v)
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 06/16] vgic: move pause_flags check out of vgic spinlock
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (4 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 05/16] gic-vgic: skip irqs locking in gic_restore_pending_irqs() Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 07/16] hack:vgic: move irq_to_pending out of lock Andrii Anisov
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

Pause_flags is not related to vgic spinlock, so reduce code
under lock.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---

Comments from Julien [1] are not addressed yet.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03306.html

---
 xen/arch/arm/vgic.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 057a721..a795b6f 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -582,6 +582,12 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
         v = vgic_get_target_vcpu(d->vcpu[0], virq);
     };
 
+    /* vcpu offline */
+    if ( unlikely(test_bit(_VPF_down, &v->pause_flags)) )
+    {
+        return;
+    }
+
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
     n = irq_to_pending(v, virq);
@@ -594,13 +600,6 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
     }
 #endif
 
-    /* vcpu offline */
-    if ( test_bit(_VPF_down, &v->pause_flags) )
-    {
-        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
-        return;
-    }
-
     set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
 
     if ( !list_empty(&n->inflight) )
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 07/16] hack:vgic: move irq_to_pending out of lock
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (5 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 06/16] vgic: move pause_flags check out of vgic spinlock Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 08/16] gic-vgic:vgic: do not keep disabled IRQs in any of queues Andrii Anisov
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

For GICV2 pending_irq allocation is not concurrent, so reduce
some code under lock.

This code is not supposed to work with GICv3

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/gic-v3-its.c | 2 ++
 xen/arch/arm/gic-v3-lpi.c | 2 ++
 xen/arch/arm/gic-v3.c     | 2 ++
 xen/arch/arm/vgic.c       | 9 ++++-----
 4 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
index ba4bc00..b1b890a 100644
--- a/xen/arch/arm/gic-v3-its.c
+++ b/xen/arch/arm/gic-v3-its.c
@@ -35,6 +35,8 @@
 
 #define ITS_CMD_QUEUE_SZ                SZ_1M
 
+#error "The current gic/vgic/domain code does not support GICv3"
+
 /*
  * No lock here, as this list gets only populated upon boot while scanning
  * firmware tables for all host ITSes, and only gets iterated afterwards.
diff --git a/xen/arch/arm/gic-v3-lpi.c b/xen/arch/arm/gic-v3-lpi.c
index e8c6e15..be64e17 100644
--- a/xen/arch/arm/gic-v3-lpi.c
+++ b/xen/arch/arm/gic-v3-lpi.c
@@ -32,6 +32,8 @@
 #include <asm/page.h>
 #include <asm/sysregs.h>
 
+#error "The current gic/vgic/domain code does not support GICv3"
+
 /*
  * There could be a lot of LPIs on the host side, and they always go to
  * a guest. So having a struct irq_desc for each of them would be wasteful
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 6fbc106..8e835b5 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -44,6 +44,8 @@
 #include <asm/io.h>
 #include <asm/sysregs.h>
 
+#error "The current gic/vgic/domain code does not support GICv3"
+
 /* Global state */
 static struct {
     void __iomem *map_dbase;  /* Mapped address of distributor registers */
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index a795b6f..371576f 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -286,17 +286,16 @@ bool vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq)
     ASSERT(!is_lpi(irq));
 #endif
 
-    spin_lock_irqsave(&old->arch.vgic.lock, flags);
-
     p = irq_to_pending(old, irq);
 
     /* nothing to do for virtual interrupts */
     if ( p->desc == NULL )
     {
-        spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
         return true;
     }
 
+    spin_lock_irqsave(&old->arch.vgic.lock, flags);
+
     /* migration already in progress, no need to do anything */
     if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
     {
@@ -588,8 +587,6 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
         return;
     }
 
-    spin_lock_irqsave(&v->arch.vgic.lock, flags);
-
     n = irq_to_pending(v, virq);
 #ifdef CONFIG_GICV3
     /* If an LPI has been removed, there is nothing to inject here. */
@@ -600,6 +597,8 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
     }
 #endif
 
+    spin_lock_irqsave(&v->arch.vgic.lock, flags);
+
     set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
 
     if ( !list_empty(&n->inflight) )
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 08/16] gic-vgic:vgic: do not keep disabled IRQs in any of queues
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (6 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 07/16] hack:vgic: move irq_to_pending out of lock Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 09/16] xen/arm: Re-enable interrupt later in the trap path Andrii Anisov
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

Do not put already disabled IRQs into any of queues, and remove
an IRQ from all queues on disable. Insert the IRQ into queues on
enable if needed. This also allows simplification of under-lock
decission of events needed delivery.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/gic-vgic.c    | 26 +++++++++-----------------
 xen/arch/arm/vgic.c        | 28 +++++++++++++++++++++++++---
 xen/include/asm-arm/vgic.h |  3 ++-
 3 files changed, 36 insertions(+), 21 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 74ab357..a59295a 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -101,10 +101,6 @@ void gic_raise_inflight_irq(struct vcpu *v, struct pending_irq *n)
 
     ASSERT(spin_is_locked(&v->arch.vgic.lock));
 
-    /* Don't try to update the LR if the interrupt is disabled */
-    if ( !test_bit(GIC_IRQ_GUEST_ENABLED, &n->status) )
-        return;
-
     if ( list_empty(&n->lr_queue) )
     {
         if ( v == current )
@@ -382,6 +378,7 @@ int vgic_vcpu_pending_irq(struct vcpu *v)
     const unsigned long apr = gic_hw_ops->read_apr(0);
     int mask_priority;
     int active_priority;
+    int effective_priority;
     int rc = 0;
 
     /* We rely on reading the VMCR, which is only accessible locally. */
@@ -389,28 +386,23 @@ int vgic_vcpu_pending_irq(struct vcpu *v)
 
     mask_priority = gic_hw_ops->read_vmcr_priority();
     active_priority = find_first_bit(&apr, 32);
+    effective_priority = min(mask_priority, active_priority);
 
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
     /* TODO: We order the guest irqs by priority, but we don't change
      * the priority of host irqs. */
 
-    /* find the first enabled non-active irq, the queue is already
-     * ordered by priority */
-    list_for_each_entry( p, &v->arch.vgic.inflight_irqs, inflight )
+    /* take the first non-active irq, the queue is already
+     * ordered by priority and has no disabled IRQs*/
+    if ( !list_empty(&v->arch.vgic.inflight_irqs) )
     {
-        if ( GIC_PRI_TO_GUEST(p->priority) >= mask_priority )
-            goto out;
-        if ( GIC_PRI_TO_GUEST(p->priority) >= active_priority )
-            goto out;
-        if ( test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) )
-        {
-            rc = 1;
-            goto out;
-        }
+        p = container_of(v->arch.vgic.inflight_irqs.next, struct pending_irq,
+                          inflight);
+        if ( GIC_PRI_TO_GUEST(p->priority) < effective_priority )
+             rc = 1;
     }
 
-out:
     spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
     return rc;
 }
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 371576f..ab301d9 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -383,6 +383,7 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
         spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
         p = irq_to_pending(v_target, irq);
         clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
+        list_del_init(&p->inflight);
         gic_remove_from_lr_pending(v_target, p);
         desc = p->desc;
         spin_unlock_irqrestore(&v_target->arch.vgic.lock, flags);
@@ -434,6 +435,23 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
         spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
         p = irq_to_pending(v_target, irq);
         set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
+        if ( test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) ||
+             test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) )
+        {
+            struct pending_irq *iter;
+            p->priority = vgic_get_virq_priority(v, irq);
+            list_for_each_entry( iter, &v_target->arch.vgic.inflight_irqs,
+                                 inflight )
+            {
+                if ( iter->priority > p->priority )
+                {
+                    list_add_tail(&p->inflight, &iter->inflight);
+                    goto out;
+                }
+            }
+            list_add_tail(&p->inflight, &v_target->arch.vgic.inflight_irqs);
+        }
+out:
         if ( !list_empty(&p->inflight) && !test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
             gic_raise_guest_irq(v_target, p);
         spin_unlock_irqrestore(&v_target->arch.vgic.lock, flags);
@@ -601,6 +619,12 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
 
     set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
 
+    if ( !test_bit(GIC_IRQ_GUEST_ENABLED, &n->status) )
+    {/*Do not insert a disabled IRQ into any queue*/
+        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
+        return;
+    }
+
     if ( !list_empty(&n->inflight) )
     {
         gic_raise_inflight_irq(v, n);
@@ -610,9 +634,7 @@ void vgic_inject_irq(struct domain *d, struct vcpu *v, unsigned int virq,
     priority = vgic_get_virq_priority(v, virq);
     n->priority = priority;
 
-    /* the irq is enabled */
-    if ( test_bit(GIC_IRQ_GUEST_ENABLED, &n->status) )
-        gic_raise_guest_irq(v, n);
+    gic_raise_guest_irq(v, n);
 
     list_for_each_entry ( iter, &v->arch.vgic.inflight_irqs, inflight )
     {
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index 0b1f519..a27a1a9 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -58,7 +58,8 @@ struct pending_irq
      * irq is enabled at the vgicd level:
      *
      * GIC_IRQ_GUEST_ENABLED: the guest IRQ is enabled at the VGICD
-     * level (GICD_ICENABLER/GICD_ISENABLER).
+     * level (GICD_ICENABLER/GICD_ISENABLER). Disabled guest IRQ do not appear
+     * in an inflight queue
      *
      * GIC_IRQ_GUEST_MIGRATING: the irq is being migrated to a different
      * vcpu while it is still inflight and on an GICH_LR register on the
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 09/16] xen/arm: Re-enable interrupt later in the trap path
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (7 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 08/16] gic-vgic:vgic: do not keep disabled IRQs in any of queues Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 10/16] gic-vgic: skip irqs locking in vgic_sync_from_lrs Andrii Anisov
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Julien Grall <julien.grall@arm.com>

This makes function enter_hypervisor_head() being executed with
irqs locked. This also give a fine side effect - it assures that
LRs are cleared prior to any IRQs processing, which leads to a
better (faster) IRQs processing.

Signed-off-by: Julien Grall <julien.grall@arm.com>
[Andrii: add a justification commit message]
Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/arm64/entry.S | 11 +++++------
 xen/arch/arm/traps.c       |  6 ++++++
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/arm64/entry.S b/xen/arch/arm/arm64/entry.S
index 97b05f5..8f28789 100644
--- a/xen/arch/arm/arm64/entry.S
+++ b/xen/arch/arm/arm64/entry.S
@@ -195,7 +195,6 @@ hyp_error_invalid:
 
 hyp_error:
         entry   hyp=1
-        msr     daifclr, #2
         mov     x0, sp
         bl      do_trap_hyp_serror
         exit    hyp=1
@@ -203,7 +202,7 @@ hyp_error:
 /* Traps taken in Current EL with SP_ELx */
 hyp_sync:
         entry   hyp=1
-        msr     daifclr, #6
+        msr     daifclr, #4
         mov     x0, sp
         bl      do_trap_hyp_sync
         exit    hyp=1
@@ -304,7 +303,7 @@ guest_sync_slowpath:
         ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",
                     "nop; nop",
                     SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)
-        msr     daifclr, #6
+        msr     daifclr, #4
         mov     x0, sp
         bl      do_trap_guest_sync
 1:
@@ -332,7 +331,7 @@ guest_fiq_invalid:
 
 guest_error:
         entry   hyp=0, compat=0
-        msr     daifclr, #6
+        msr     daifclr, #4
         mov     x0, sp
         bl      do_trap_guest_serror
         exit    hyp=0, compat=0
@@ -347,7 +346,7 @@ guest_sync_compat:
         ALTERNATIVE("bl check_pending_vserror; cbnz x0, 1f",
                     "nop; nop",
                     SKIP_SYNCHRONIZE_SERROR_ENTRY_EXIT)
-        msr     daifclr, #6
+        msr     daifclr, #4
         mov     x0, sp
         bl      do_trap_guest_sync
 1:
@@ -375,7 +374,7 @@ guest_fiq_invalid_compat:
 
 guest_error_compat:
         entry   hyp=0, compat=1
-        msr     daifclr, #6
+        msr     daifclr, #4
         mov     x0, sp
         bl      do_trap_guest_serror
         exit    hyp=0, compat=1
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 67c08ab..6fa562e 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2009,6 +2009,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
     {
         struct vcpu *v = current;
 
+        ASSERT(!local_irq_is_enabled());
+
         /* If the guest has disabled the workaround, bring it back on. */
         if ( needs_ssbd_flip(v) )
             arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
@@ -2043,6 +2045,7 @@ void do_trap_guest_sync(struct cpu_user_regs *regs)
     const union hsr hsr = { .bits = regs->hsr };
 
     enter_hypervisor_head(regs);
+    local_irq_enable();
 
     switch ( hsr.ec )
     {
@@ -2174,6 +2177,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
     const union hsr hsr = { .bits = regs->hsr };
 
     enter_hypervisor_head(regs);
+    local_irq_enable();
 
     switch ( hsr.ec )
     {
@@ -2212,6 +2216,7 @@ void do_trap_hyp_sync(struct cpu_user_regs *regs)
 void do_trap_hyp_serror(struct cpu_user_regs *regs)
 {
     enter_hypervisor_head(regs);
+    local_irq_enable();
 
     __do_trap_serror(regs, VABORT_GEN_BY_GUEST(regs));
 }
@@ -2219,6 +2224,7 @@ void do_trap_hyp_serror(struct cpu_user_regs *regs)
 void do_trap_guest_serror(struct cpu_user_regs *regs)
 {
     enter_hypervisor_head(regs);
+    local_irq_enable();
 
     __do_trap_serror(regs, true);
 }
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 10/16] gic-vgic: skip irqs locking in vgic_sync_from_lrs
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (8 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 09/16] xen/arm: Re-enable interrupt later in the trap path Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 11/16] gic-v2: Write HCR only on change Andrii Anisov
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

After the patch "xen/arm: Re-enable interrupt later in the trap path"
this function is called with irqs already locked.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/gic-vgic.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index a59295a..610e3bb 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -271,9 +271,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
 void vgic_sync_from_lrs(struct vcpu *v)
 {
     int i = 0;
-    unsigned long flags;
     unsigned int nr_lrs = gic_get_nr_lrs();
 
+    ASSERT(!local_irq_is_enabled());
+
     /* The idle domain has no LRs to be cleared. Since gic_restore_state
      * doesn't write any LR registers for the idle domain they could be
      * non-zero. */
@@ -282,7 +283,7 @@ void vgic_sync_from_lrs(struct vcpu *v)
 
     gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
 
-    spin_lock_irqsave(&v->arch.vgic.lock, flags);
+    spin_lock(&v->arch.vgic.lock);
 
     while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
                               nr_lrs, i)) < nr_lrs ) {
@@ -290,7 +291,7 @@ void vgic_sync_from_lrs(struct vcpu *v)
         i++;
     }
 
-    spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
+    spin_unlock(&v->arch.vgic.lock);
 }
 
 static void gic_restore_pending_irqs(struct vcpu *v)
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 11/16] gic-v2: Write HCR only on change
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (9 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 10/16] gic-vgic: skip irqs locking in vgic_sync_from_lrs Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 12/16] gic-v2: avoid HCR reading for GICv2 Andrii Anisov
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

This saves one write to peripheral HCR register per hypervisor entry for
most cases.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/gic-v2.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 1a744c5..25147bd 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -579,14 +579,17 @@ static void gicv2_write_lr(int lr, const struct gic_lr *lr_reg)
 
 static void gicv2_hcr_status(uint32_t flag, bool status)
 {
-    uint32_t hcr = readl_gich(GICH_HCR);
+    uint32_t hcr, ohcr;
+
+    ohcr = hcr = readl_gich(GICH_HCR);
 
     if ( status )
         hcr |= flag;
     else
         hcr &= (~flag);
 
-    writel_gich(hcr, GICH_HCR);
+    if ( hcr != ohcr )
+        writel_gich(hcr, GICH_HCR);
 }
 
 static unsigned int gicv2_read_vmcr_priority(void)
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 12/16] gic-v2: avoid HCR reading for GICv2
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (10 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 11/16] gic-v2: Write HCR only on change Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 13/16] hack: arm/domain: simplify context restore from idle vcpu Andrii Anisov
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

Because the HCR value is only changed by the hypervisor,
we can rely on a cached value and do not do peripheral register
reads for this register.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/gic-v2.c     | 43 +++++++++++++++++++++++++------------------
 xen/arch/arm/gic-v3.c     |  2 +-
 xen/include/asm-arm/gic.h |  2 +-
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 25147bd..cc7571e 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -174,6 +174,28 @@ static unsigned int gicv2_cpu_mask(const cpumask_t *cpumask)
     return mask;
 }
 
+static void _gicv2_hcr_status(struct vcpu *v, uint32_t flag, bool status)
+{
+    uint32_t hcr, ohcr;
+
+    ohcr = hcr = v->arch.gic.v2.hcr;
+
+    if ( status )
+        hcr |= flag;
+    else
+        hcr &= (~flag);
+
+    if ( hcr != ohcr )
+        writel_gich(hcr, GICH_HCR);
+
+     v->arch.gic.v2.hcr = hcr;
+}
+
+static void gicv2_hcr_status(uint32_t flag, bool status)
+{
+    _gicv2_hcr_status(current, flag, status);
+}
+
 static void gicv2_save_state(struct vcpu *v)
 {
     int i;
@@ -188,10 +210,10 @@ static void gicv2_save_state(struct vcpu *v)
     v->arch.gic.v2.apr = readl_gich(GICH_APR);
     v->arch.gic.v2.vmcr = readl_gich(GICH_VMCR);
     /* Disable until next VCPU scheduled */
-    writel_gich(0, GICH_HCR);
+    _gicv2_hcr_status(v, GICH_HCR_EN, false);
 }
 
-static void gicv2_restore_state(const struct vcpu *v)
+static void gicv2_restore_state(struct vcpu *v)
 {
     int i;
 
@@ -200,7 +222,7 @@ static void gicv2_restore_state(const struct vcpu *v)
 
     writel_gich(v->arch.gic.v2.apr, GICH_APR);
     writel_gich(v->arch.gic.v2.vmcr, GICH_VMCR);
-    writel_gich(GICH_HCR_EN, GICH_HCR);
+    _gicv2_hcr_status(v, GICH_HCR_EN, true);
 }
 
 static void gicv2_dump_state(const struct vcpu *v)
@@ -577,21 +599,6 @@ static void gicv2_write_lr(int lr, const struct gic_lr *lr_reg)
     writel_gich(lrv, GICH_LR + lr * 4);
 }
 
-static void gicv2_hcr_status(uint32_t flag, bool status)
-{
-    uint32_t hcr, ohcr;
-
-    ohcr = hcr = readl_gich(GICH_HCR);
-
-    if ( status )
-        hcr |= flag;
-    else
-        hcr &= (~flag);
-
-    if ( hcr != ohcr )
-        writel_gich(hcr, GICH_HCR);
-}
-
 static unsigned int gicv2_read_vmcr_priority(void)
 {
    return ((readl_gich(GICH_VMCR) >> GICH_V2_VMCR_PRIORITY_SHIFT)
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 8e835b5..6269425 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -377,7 +377,7 @@ static void gicv3_save_state(struct vcpu *v)
     v->arch.gic.v3.sre_el1 = READ_SYSREG32(ICC_SRE_EL1);
 }
 
-static void gicv3_restore_state(const struct vcpu *v)
+static void gicv3_restore_state(struct vcpu *v)
 {
     uint32_t val;
 
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index 3d7394d..ea9bbed 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -323,7 +323,7 @@ struct gic_hw_operations {
     /* Save GIC registers */
     void (*save_state)(struct vcpu *);
     /* Restore GIC registers */
-    void (*restore_state)(const struct vcpu *);
+    void (*restore_state)(struct vcpu *);
     /* Dump GIC LR register information */
     void (*dump_state)(const struct vcpu *);
 
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 13/16] hack: arm/domain: simplify context restore from idle vcpu
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (11 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 12/16] gic-v2: avoid HCR reading for GICv2 Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2019-01-02 12:24   ` Wei Liu
  2018-12-26 11:20 ` [RFC v2 14/16] hack: move gicv2 LRs reads and writes out of spinlocks Andrii Anisov
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel
  Cc: Stefano Stabellini, Andrii Anisov, Konrad Rzeszutek Wilk,
	George Dunlap, Andre Przywara, Ian Jackson, Tim Deegan,
	Julien Grall, Jan Beulich, Andrew Cooper, Wei Liu

From: Andrii Anisov <andrii_anisov@epam.com>

Simplify context restore from idle vcpu to the one ran before it.
This improves low cpu load but high irq rate use-cases.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/domain.c   | 21 +++++++++++----------
 xen/include/xen/sched.h |  1 +
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index daf7c59..dd97d07 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -187,9 +187,6 @@ static void ctxt_switch_to(struct vcpu *n)
     WRITE_SYSREG32(vpidr, VPIDR_EL2);
     WRITE_SYSREG(n->arch.vmpidr, VMPIDR_EL2);
 
-    /* VGIC */
-    gic_restore_state(n);
-
     /* VFP */
     vfp_restore_state(n);
 
@@ -263,11 +260,6 @@ static void ctxt_switch_to(struct vcpu *n)
     WRITE_SYSREG(n->arch.csselr, CSSELR_EL1);
 
     isb();
-
-    /* This is could trigger an hardware interrupt from the virtual
-     * timer. The interrupt needs to be injected into the guest. */
-    WRITE_SYSREG32(n->arch.cntkctl, CNTKCTL_EL1);
-    virt_timer_restore(n);
 }
 
 /* Update per-VCPU guest runstate shared memory area (if registered). */
@@ -302,8 +294,17 @@ static void update_runstate_area(struct vcpu *v)
 static void schedule_tail(struct vcpu *prev)
 {
     ctxt_switch_from(prev);
-
-    ctxt_switch_to(current);
+    if ( !(is_idle_vcpu(prev) && (prev->prev == current)) )
+        ctxt_switch_to(current);
+    /* VGIC */
+    if ( !is_idle_vcpu(current) )
+    {
+        gic_restore_state(current);
+        WRITE_SYSREG32(current->arch.cntkctl, CNTKCTL_EL1);
+        virt_timer_restore(current);
+    }
+    else
+        current->prev = prev;
 
     local_irq_enable();
 
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 4956a77..8ffdb70 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -272,6 +272,7 @@ struct vcpu
     struct vpci_vcpu vpci;
 
     struct arch_vcpu arch;
+    struct vcpu *prev;
 };
 
 /* Per-domain lock can be recursively acquired in fault handlers. */
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 14/16] hack: move gicv2 LRs reads and writes out of spinlocks
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (12 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 13/16] hack: arm/domain: simplify context restore from idle vcpu Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 15/16] gic: vgic: align frequently accessed data by cache line size Andrii Anisov
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

Use values cached in `struct vcpu` for guest IRQs processing.
This also reduces LR stores and restores on VCPU switch.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/gic-v2.c     | 54 +++++++++++++++++++++++++++++++----------------
 xen/arch/arm/gic-vgic.c   |  8 +++++--
 xen/include/asm-arm/gic.h |  3 +++
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index cc7571e..bc2327b 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -198,15 +198,6 @@ static void gicv2_hcr_status(uint32_t flag, bool status)
 
 static void gicv2_save_state(struct vcpu *v)
 {
-    int i;
-
-    /* No need for spinlocks here because interrupts are disabled around
-     * this call and it only accesses struct vcpu fields that cannot be
-     * accessed simultaneously by another pCPU.
-     */
-    for ( i = 0; i < gicv2_info.nr_lrs; i++ )
-        v->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4);
-
     v->arch.gic.v2.apr = readl_gich(GICH_APR);
     v->arch.gic.v2.vmcr = readl_gich(GICH_VMCR);
     /* Disable until next VCPU scheduled */
@@ -215,14 +206,10 @@ static void gicv2_save_state(struct vcpu *v)
 
 static void gicv2_restore_state(struct vcpu *v)
 {
-    int i;
-
-    for ( i = 0; i < gicv2_info.nr_lrs; i++ )
-        writel_gich(v->arch.gic.v2.lr[i], GICH_LR + i * 4);
-
     writel_gich(v->arch.gic.v2.apr, GICH_APR);
     writel_gich(v->arch.gic.v2.vmcr, GICH_VMCR);
     _gicv2_hcr_status(v, GICH_HCR_EN, true);
+    v->arch.gic.v2.lr_update_mask = (1 << gic_hw_ops->info->nr_lrs) - 1;
 }
 
 static void gicv2_dump_state(const struct vcpu *v)
@@ -515,6 +502,7 @@ static void gicv2_disable_interface(void)
 static void gicv2_update_lr(int lr, unsigned int virq, uint8_t priority,
                             unsigned int hw_irq, unsigned int state)
 {
+    struct vcpu *v = current;
     uint32_t lr_reg;
 
     BUG_ON(lr >= gicv2_info.nr_lrs);
@@ -529,19 +517,23 @@ static void gicv2_update_lr(int lr, unsigned int virq, uint8_t priority,
         lr_reg |= GICH_V2_LR_HW | ((hw_irq & GICH_V2_LR_PHYSICAL_MASK )
                                    << GICH_V2_LR_PHYSICAL_SHIFT);
 
-    writel_gich(lr_reg, GICH_LR + lr * 4);
+    v->arch.gic.v2.lr[lr] = lr_reg;
+    v->arch.gic.v2.lr_update_mask |= 1<<lr;
 }
 
 static void gicv2_clear_lr(int lr)
 {
-    writel_gich(0, GICH_LR + lr * 4);
+    struct vcpu *v = current;
+
+    v->arch.gic.v2.lr[lr] = 0;
+    v->arch.gic.v2.lr_update_mask |= 1<<lr;
 }
 
 static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
 {
     uint32_t lrv;
 
-    lrv          = readl_gich(GICH_LR + lr * 4);
+    lrv = current->arch.gic.v2.lr[lr];
     lr_reg->virq = (lrv >> GICH_V2_LR_VIRTUAL_SHIFT) & GICH_V2_LR_VIRTUAL_MASK;
     lr_reg->priority = (lrv >> GICH_V2_LR_PRIORITY_SHIFT) & GICH_V2_LR_PRIORITY_MASK;
     lr_reg->pending = lrv & GICH_V2_LR_PENDING;
@@ -568,6 +560,7 @@ static void gicv2_read_lr(int lr, struct gic_lr *lr_reg)
 static void gicv2_write_lr(int lr, const struct gic_lr *lr_reg)
 {
     uint32_t lrv = 0;
+    struct vcpu *v = current;
 
     lrv = (((lr_reg->virq & GICH_V2_LR_VIRTUAL_MASK) << GICH_V2_LR_VIRTUAL_SHIFT)   |
           ((uint32_t)(lr_reg->priority & GICH_V2_LR_PRIORITY_MASK)
@@ -596,7 +589,30 @@ static void gicv2_write_lr(int lr, const struct gic_lr *lr_reg)
         lrv |= (uint32_t)lr_reg->virt.source << GICH_V2_LR_CPUID_SHIFT;
     }
 
-    writel_gich(lrv, GICH_LR + lr * 4);
+    v->arch.gic.v2.lr[lr] = lrv;
+    v->arch.gic.v2.lr_update_mask |= 1<<lr;
+}
+
+static void gicv2_fetch_lrs(struct vcpu *v)
+{
+    int i;
+
+    for ( i = 0; i < gicv2_info.nr_lrs; i++ )
+        if ( this_cpu(lr_mask) & 1<<i )
+            v->arch.gic.v2.lr[i] = readl_gich(GICH_LR + i * 4);
+        else
+            v->arch.gic.v2.lr[i] = 0;
+}
+
+static void gicv2_push_lrs(struct vcpu *v)
+{
+    int i;
+    uint64_t mask = v->arch.gic.v2.lr_update_mask;
+
+    for ( i = 0; i < gicv2_info.nr_lrs; i++ )
+        if ( mask & 1<<i )
+            writel_gich(v->arch.gic.v2.lr[i], GICH_LR + i * 4);
+    v->arch.gic.v2.lr_update_mask = 0;
 }
 
 static unsigned int gicv2_read_vmcr_priority(void)
@@ -1367,6 +1383,8 @@ const static struct gic_hw_operations gicv2_ops = {
     .map_hwdom_extra_mappings = gicv2_map_hwdown_extra_mappings,
     .iomem_deny_access   = gicv2_iomem_deny_access,
     .do_LPI              = gicv2_do_LPI,
+    .fetch_lrs           = gicv2_fetch_lrs,
+    .push_lrs            = gicv2_push_lrs,
 };
 
 /* Set up the GIC */
diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 610e3bb..d229873 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -282,6 +282,7 @@ void vgic_sync_from_lrs(struct vcpu *v)
         return;
 
     gic_hw_ops->update_hcr_status(GICH_HCR_UIE, false);
+    gic_hw_ops->fetch_lrs(v);
 
     spin_lock(&v->arch.vgic.lock);
 
@@ -410,11 +411,14 @@ int vgic_vcpu_pending_irq(struct vcpu *v)
 
 void vgic_sync_to_lrs(void)
 {
+    struct vcpu *v = current;
     ASSERT(!local_irq_is_enabled());
 
-    gic_restore_pending_irqs(current);
+    gic_restore_pending_irqs(v);
 
-    if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
+    gic_hw_ops->push_lrs(v);
+
+    if ( !list_empty(&v->arch.vgic.lr_pending) && lr_all_full() )
         gic_hw_ops->update_hcr_status(GICH_HCR_UIE, true);
 }
 
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index ea9bbed..add2566 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -187,6 +187,7 @@ struct gic_v2 {
     uint32_t vmcr;
     uint32_t apr;
     uint32_t lr[64];
+    uint64_t lr_update_mask;
 };
 
 /*
@@ -384,6 +385,8 @@ struct gic_hw_operations {
     int (*iomem_deny_access)(const struct domain *d);
     /* Handle LPIs, which require special handling */
     void (*do_LPI)(unsigned int lpi);
+    void (*fetch_lrs) (struct vcpu *v);
+    void (*push_lrs) (struct vcpu *v);
 };
 
 extern const struct gic_hw_operations *gic_hw_ops;
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 15/16] gic: vgic: align frequently accessed data by cache line size
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (13 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 14/16] hack: move gicv2 LRs reads and writes out of spinlocks Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2018-12-26 11:20 ` [RFC v2 16/16] gic: separate ppi processing Andrii Anisov
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

Cache line size assumed 64 byte as for H3. Align the `struct
pending_irq` and allocate lrs shadow aligned to cache line size.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/domain.c        | 4 ++++
 xen/arch/arm/vgic.c          | 9 +++++++++
 xen/include/asm-arm/config.h | 2 +-
 xen/include/asm-arm/gic.h    | 2 +-
 xen/include/asm-arm/vgic.h   | 2 +-
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index dd97d07..264bde77 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -558,6 +558,10 @@ int arch_vcpu_create(struct vcpu *v)
     v->arch.saved_context.sp = (register_t)v->arch.cpu_info;
     v->arch.saved_context.pc = (register_t)continue_new_vcpu;
 
+    v->arch.gic.v2.lr = xzalloc_bytes(sizeof(uint32_t) * gic_number_lines());
+    if ( v->arch.gic.v2.lr == NULL )
+        return -ENOMEM;
+
     /* Idle VCPUs don't need the rest of this setup */
     if ( is_idle_vcpu(v) )
         return rc;
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index ab301d9..d2cd340 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -166,6 +166,15 @@ int domain_vgic_init(struct domain *d, unsigned int nr_spis)
 
     d->arch.vgic.pending_irqs =
         xzalloc_array(struct pending_irq, d->arch.vgic.nr_spis);
+
+    if ( sizeof(struct pending_irq) != dcache_line_bytes )
+    {
+        printk ("sizeof(struct pending_irq) = %lu  is not equal to cacheline"
+                "size %lu. Is it expected?\n", sizeof(struct pending_irq),
+                dcache_line_bytes);
+        WARN();
+    }
+
     if ( d->arch.vgic.pending_irqs == NULL )
         return -ENOMEM;
 
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index bc89e84..4f3669f 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -28,7 +28,7 @@
 
 #define CONFIG_ARM 1
 
-#define CONFIG_ARM_L1_CACHE_SHIFT 7 /* XXX */
+#define CONFIG_ARM_L1_CACHE_SHIFT 6 /* XXX */
 
 #define CONFIG_SMP 1
 
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index add2566..fe44d3a 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -186,7 +186,7 @@ struct gic_v2 {
     uint32_t hcr;
     uint32_t vmcr;
     uint32_t apr;
-    uint32_t lr[64];
+    uint32_t *lr;
     uint64_t lr_update_mask;
 };
 
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index a27a1a9..d4ec96f 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -99,7 +99,7 @@ struct pending_irq
      * TODO: when implementing irq migration, taking only the current
      * vgic lock is not going to be enough. */
     struct list_head lr_queue;
-};
+}__cacheline_aligned;
 
 #define NR_INTERRUPT_PER_RANK   32
 #define INTERRUPT_RANK_MASK (NR_INTERRUPT_PER_RANK - 1)
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 16/16] gic: separate ppi processing
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (14 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 15/16] gic: vgic: align frequently accessed data by cache line size Andrii Anisov
@ 2018-12-26 11:20 ` Andrii Anisov
  2019-01-02 18:33 ` [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 André Przywara
  2019-01-21 18:14 ` Julien Grall
  17 siblings, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2018-12-26 11:20 UTC (permalink / raw)
  To: xen-devel; +Cc: Andre Przywara, Julien Grall, Stefano Stabellini, Andrii Anisov

From: Andrii Anisov <andrii_anisov@epam.com>

PPI are pcpu private, so their processing is not concurrent and do not
need pcpu shared flags and correspondent lockings. So avoid odd bits
sets, checks and locks.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
---
 xen/arch/arm/gic.c        |  8 ++++++--
 xen/arch/arm/irq.c        | 32 ++++++++++++++++++++++++++++++++
 xen/include/asm-arm/irq.h |  1 +
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index ecaa3d6..d558059 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -403,8 +403,7 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
     do  {
         /* Reading IRQ will ACK it */
         irq = gic_hw_ops->read_irq();
-
-        if ( likely(irq >= 16 && irq < 1020) )
+        if ( likely(irq >= 32 && irq < 1020) )
         {
             isb();
             do_IRQ(regs, irq, is_fiq);
@@ -422,6 +421,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
         {
             do_sgi(regs, irq);
         }
+        else if ( irq < 32 )
+        {
+            isb();
+            do_ppi(regs, irq);
+        }
         else
         {
             local_irq_disable();
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index c81b490..5884d6c 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -290,6 +290,38 @@ out_no_end:
     irq_exit();
 }
 
+void do_ppi(struct cpu_user_regs *regs, unsigned int irq)
+{
+    struct irq_desc *desc = irq_to_desc(irq);
+    struct irqaction *action;
+
+    irq_enter();
+
+    desc->handler->ack(desc);
+
+    if ( unlikely(!desc->action) )
+    {
+        printk("Unknown %s %#3.3x\n",
+               "IRQ", irq);
+        goto out;
+    }
+
+    if ( test_bit(_IRQ_DISABLED, &desc->status) )
+        goto out;
+
+    action = desc->action;
+
+    do
+    {
+        action->handler(irq, action->dev_id, regs);
+        action = action->next;
+    } while ( action );
+
+out:
+    desc->handler->end(desc);
+    irq_exit();
+}
+
 void release_irq(unsigned int irq, const void *dev_id)
 {
     struct irq_desc *desc;
diff --git a/xen/include/asm-arm/irq.h b/xen/include/asm-arm/irq.h
index 4f1ef3c..3143468 100644
--- a/xen/include/asm-arm/irq.h
+++ b/xen/include/asm-arm/irq.h
@@ -62,6 +62,7 @@ struct irq_desc *__irq_to_desc(int irq);
 #define irq_to_desc(irq)    __irq_to_desc(irq)
 
 void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, int is_fiq);
+void do_ppi(struct cpu_user_regs *regs, unsigned int irq);
 
 #ifdef CONFIG_GICV3
 static inline bool is_lpi(unsigned int irq)
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC v2 13/16] hack: arm/domain: simplify context restore from idle vcpu
  2018-12-26 11:20 ` [RFC v2 13/16] hack: arm/domain: simplify context restore from idle vcpu Andrii Anisov
@ 2019-01-02 12:24   ` Wei Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Wei Liu @ 2019-01-02 12:24 UTC (permalink / raw)
  To: Andrii Anisov
  Cc: Wei Liu, Stefano Stabellini, Andrii Anisov,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Andre Przywara, xen-devel

On Wed, Dec 26, 2018 at 01:20:24PM +0200, Andrii Anisov wrote:
> From: Andrii Anisov <andrii_anisov@epam.com>
> 
> Simplify context restore from idle vcpu to the one ran before it.
> This improves low cpu load but high irq rate use-cases.
> 
> Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>

Can this be put under arch_vcpu instead? It has no use in x86 code.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (15 preceding siblings ...)
  2018-12-26 11:20 ` [RFC v2 16/16] gic: separate ppi processing Andrii Anisov
@ 2019-01-02 18:33 ` André Przywara
  2019-01-16 17:32   ` Andrii Anisov
  2019-01-21 17:48   ` Julien Grall
  2019-01-21 18:14 ` Julien Grall
  17 siblings, 2 replies; 22+ messages in thread
From: André Przywara @ 2019-01-02 18:33 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel; +Cc: Julien Grall, Stefano Stabellini, Andrii Anisov

On 26/12/2018 11:20, Andrii Anisov wrote:
> From: Andrii Anisov <andrii_anisov@epam.com>
> 
> This patch series is an attempt to reduce IRQ latency with the
> old GIC implementation (gic-vgic). These patches originally based
> on XEN 4.10 release. The motivation was to improve benchmark
> results of a system given to a customer for evaluation.
> This patch series is tailored for GICv2 on RCAR H3. Several
> of the most controversial patches (i.e. LRs shadowing) were
> not shared to the customer, and here are for comments and discussion.
> I hope several patches from here could be upstreamed. Some as is,
> others with modifications.
> 
> There are several simple ideas behind these changes:
>     - reduce an excessive code (condition checks)
>     - drop an excessive peripheral register accesses
>     - if not reduce, then move an excessive code out of spinlocks
>     - if not drop, then move an excessive register
>       accesses out of spinlocks
> 
> This is a v2 of the original RFC series [1]. From that series, patches
> [2] and [3] have already reached mainline. Here few patches are reworked
> with addressing some comments or separating them into more clear pieces,
> more patches are taken from the RFC v1 as is.
> 
> The main intention of this version of RFC series is to reveal
> patch-by-patch IRQ latency impact.
> The measurement is performed with TBM [4], so the use-case is trivial -
> passing a single IRQ twice in a second. Thus no lock contentions nor
> even passing more than one interrupt to a guest at the time use-cases
> are hit.
> 
> The series is based on the current xenbits/staging, commit 7f28661f6a7.
> XEN is build with no DEBUG and no GICv3 support for the staging HEAD and
> each commit. Four runtime configurations are evaluated for each commit:
>     - sched=credit2 vwfi=trap
>     - sched=credit2 vwfi=native
>     - sched=credit vwfi=trap
>     - sched=credit vwfi=native
> 
> Each commit is incrementally cherry-picked for the latency evaluation in
> an order they appear in the table. The table also can be found shared
> as a Google spreadsheet here [5].

Many thanks for generating these numbers, this is very useful.

But: could you make any sense out them? I plotted them, but they don't
seem to be very conclusive. I am scratching my head about the following
issues:
- The behaviour seems to be somewhat different between the four cases.
Which one do you care about, in particular? Is vwfi=native an option for
your use case? Which scheduler do you need or want to use?
- Which of the four columns (max, warm_max, min, avg) are you actually
interested in? For a hard realtime system I would expect max, or maybe
warm_max?
- Some of the patches seem to *increase* the latency. Patch 08/16 for
instance sticks out here. Most of the times the latency decreases again
afterwards, with a later patch, but I wonder if you could just pick the
patches that help or somehow explain those outliers.
- Can you give some more background on how you generated the numbers? I
haven't looked with too much detail into the benchmark, but I wonder about:
 * What is the runtime of your test? It seems to run forever and updates
   the stats twice a second, if I get this correctly. So for how long
   did you let it run?
 * Do we have any idea what the reliability of the values are? Can we
   somehow calculate the standard deviation, for instance? That would
   help to get an idea about the error range we are looking at.
 * Is this still the same system as in [4]? The resolution in there is
   only 120ns, some of the values for two patches differ exactly by that
   amount. So is this actually within the error margin?

Also it seems that this test is the only thing running on the system,
beside the idle VCPU. Is that a reasonable way to assess real world
interrupt latency? For instance that means that the IRQ handler mostly
hits even in the L1 cache, which I wouldn't expect in actual systems.
The method to separate warm_max from max seems to support this. Do you
have some numbers from that 3D benchmark you mentioned earlier, to put
this into perspective and show that improvements in one benefit the
other as well?

Also I looked at some code and started to cook up something myself[6].
The first two patches there should replace your patch 01/16 in a much
cleaner and easier way, along the lines I mentioned before in a reply to
a former post of yours.

Then I looked at the IRQ handler and stumbled upon the function pointers
we are using. I was eyeing them before, because my hunch is they are
costly, especially on big cores, as it might be hard for the CPU to
speculate correctly. Basically something like a call to
gic_hw_ops->gic_read_irq() translates into:
	ldr     x0, [x20]
	ldr     x0, [x0, #72]
	blr     x0
That contains two dependency on loads. If one of them misses in the
cache, the whole pipeline is stalled, if the CPU doesn't speculate both
loads correctly (which it might, but we don't know).
This is extra annoying since those function pointers never change, and
are always pointing to the GICv2 functions if CONFIG_GICV3 is not
defined. On top of this some functions are really trivial, so we pay a
big price for something that might be a single MMIO read or even system
register access. I think the proper solution (TM) would be to patch
these accesses once we know which GIC we are running on. Linux does
something to this effect, which benefits GICv3 in particular. From
quickly looking at it, Xen seems to lack the infrastructure (jump labels
and more sophisticated runtime patching) to easily copy this method.

So the top three patches in [6] address this in a somewhat hackish way,
just to show if this method improves something. I just compile tested
this, so would be curious if you could give it a try and test both
functionality and performance. A nice side effect is that those patches
benefit both the old and new VGIC. The effect on the TBM benchmark might
not be too visible, though, due to the hot caches.

Cheers,
Andre.

[6] https://github.com/Andre-ARM/xen/commits/vgic-opt
> 
> 	sched=credit2 vwfi=trap                 	sched=credit2 vwfi=native               	sched=credit vwfi=trap                  	sched=credit vwfi=native
> 
> 7f28661f6a7ce3d82f881b9afedfebca7f2cf116
> 	max=9480 warm_max=7200 min=6600 avg=6743	max=4680 warm_max=3240 min=3000 avg=3007	max=9480 warm_max=7920 min=6720 avg=7009	max=4560 warm_max=3000 min=2880 avg=2979
> 
> gic:gic-vgic: separate GIV3 code more thoroughly
> 
> 	max=9720 warm_max=6960 min=6600 avg=6617	max=5040 warm_max=3840 min=2880 avg=2905	max=9480 warm_max=7200 min=6600 avg=6871	max=4560 warm_max=3000 min=2880 avg=2887
> 
> gic-vgic:vgic: avoid excessive conversions
> 
> 	max=9360 warm_max=6720 min=6480 avg=6578	max=4800 warm_max=3120 min=2880 avg=2895	max=9480 warm_max=7080 min=6600 avg=6804	max=4800 warm_max=3120 min=2880 avg=2887
> 
> gic:vgic:gic-vgic: introduce non-atomic bitops
> 
> 	max=9120 warm_max=6600 min=6480 avg=6546	max=4920 warm_max=3000 min=2760 avg=2872	max=9120 warm_max=6720 min=6480 avg=6574	max=4200 warm_max=3120 min=2760 avg=2798
> 
> gic: drop interrupts enabling on interrupts processing
> 	max=9240 warm_max=7080 min=6360 avg=6492	max=5040 warm_max=3240 min=2760 avg=2767	max=9240 warm_max=6720 min=6480 avg=6491	max=4440 warm_max=3000 min=2760 avg=2809
> 
> gic-vgic: skip irqs locking in gic_restore_pending_irqs()
> 	max=9000 warm_max=6720 min=6360 avg=6430	max=4320 warm_max=3120 min=2640 avg=2671	max=9240 warm_max=6720 min=6360 avg=6459	max=4440 warm_max=2880 min=2640 avg=2668
> 
> vgic: move pause_flags check out of vgic spinlock
> 	max=9240 warm_max=6720 min=6360 avg=6431	max=4800 warm_max=2880 min=2640 avg=2675	max=9360 warm_max=6600 min=6360 avg=6435	max=4440 warm_max=2760 min=2640 avg=2647
> 
> vgic: move irq_to_pending out of lock
> 	max=8520 warm_max=7440 min=6360 avg=6444	max=4680 warm_max=3000 min=2640 avg=2753	max=9480 warm_max=6720 min=6360 avg=6445	max=4200 warm_max=3000 min=2640 avg=2667
> 
> gic-vgic:vgic: do not keep disabled IRQs in any of queues
> 	max=9120 warm_max=7920 min=6360 avg=6447	max=4440 warm_max=2760 min=2760 avg=2767	max=10440 warm_max=7560 min=6360 avg=6459	max=4440 warm_max=3840 min=2640 avg=2669
> 
> xen/arm: Re-enable interrupt later in the trap path
> 	max=9720 warm_max=9120 min=6360 avg=6441	max=4440 warm_max=2880 min=2760 avg=2767	max=9360 warm_max=6960 min=6360 avg=6451	max=4680 warm_max=2880 min=2640 avg=2675
> 
> gic-vgic: skip irqs locking in vgic_sync_from_lrs
> 	max=9240 warm_max=7080 min=6360 avg=6431	max=4920 warm_max=3120 min=2640 avg=2678	max=9480 warm_max=6960 min=6360 avg=6443	max=4680 warm_max=2880 min=2640 avg=2667
> 
> gic-v2: Write HCR only on change
> 	max=9840 warm_max=6600 min=6360 avg=6459	max=4440 warm_max=2760 min=2520 avg=2527	max=9480 warm_max=7920 min=6360 avg=6445	max=4320 warm_max=2760 min=2520 avg=2527
> 
> gic-v2: avoid HCR reading for GICv2
> 	max=9480 warm_max=7680 min=6360 avg=6443	max=4320 warm_max=2880 min=2520 avg=2527	max=9360 warm_max=7080 min=6720 avg=6750	max=3960 warm_max=2640 min=2400 avg=2487
> 
> hack: arm/domain: simplify context restore from idle vcpu
> 	max=9360 warm_max=6720 min=6000 avg=6214	max=5040 warm_max=2640 min=2520 avg=2527	max=9480 warm_max=7080 min=6240 avg=6367	max=4080 warm_max=2880 min=2400 avg=2527
> 
> hack: move gicv2 LRs reads and writes out of spinlocks
> 	max=9480 warm_max=6840 min=6600 avg=6612	max=4800 warm_max=2760 min=2640 avg=2739	max=9000 warm_max=7200 min=6600 avg=6636	max=4560 warm_max=2760 min=2520 avg=2619
> 
> gic: vgic: align frequently accessed data by cache line size
> 	max=9840 warm_max=6600 min=6240 avg=6288	max=4440 warm_max=2880 min=2640 avg=2682	max=8280 warm_max=6720 min=6360 avg=6488	max=4080 warm_max=2880 min=2640 avg=2678
> 
> gic: separate ppi processing
> 	NOT SUITABLE FOR EVALUATION WITH TBM
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html
> [2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html
> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html
> [4] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00881.html
> [5] https://docs.google.com/spreadsheets/d/1J_u9UDowaonnaKtgiugXqtIFT-c2E4Ss2vxgL6NnbNo/edit?usp=sharing
> 
> Andrii Anisov (15):
>   gic:gic-vgic: separate GIV3 code more thoroughly
>   gic-vgic:vgic: avoid excessive conversions
>   gic:vgic:gic-vgic: introduce non-atomic bitops
>   gic: drop interrupts enabling on interrupts processing
>   gic-vgic: skip irqs locking in gic_restore_pending_irqs()
>   vgic: move pause_flags check out of vgic spinlock
>   vgic: move irq_to_pending out of lock
>   gic-vgic:vgic: do not keep disabled IRQs in any of queues
>   gic-vgic: skip irqs locking in vgic_sync_from_lrs
>   gic-v2: Write HCR only on change
>   gic-v2: avoid HCR reading for GICv2
>   hack: arm/domain: simplify context restore from idle vcpu
>   hack: move gicv2 LRs reads and writes out of spinlocks
>   gic: vgic: align frequently accessed data by cache line size
>   gic: separate ppi processing
> 
> Julien Grall (1):
>   xen/arm: Re-enable interrupt later in the trap path
> 
> [11] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03282.html
> 
>  xen/arch/arm/arm64/entry.S   | 11 +++---
>  xen/arch/arm/domain.c        | 25 +++++++-----
>  xen/arch/arm/gic-v2.c        | 82 +++++++++++++++++++++++++-------------
>  xen/arch/arm/gic-v3-its.c    |  2 +
>  xen/arch/arm/gic-v3-lpi.c    |  2 +
>  xen/arch/arm/gic-v3.c        |  4 +-
>  xen/arch/arm/gic-vgic.c      | 87 +++++++++++++++++++++++++----------------
>  xen/arch/arm/gic.c           | 32 +++++++++++++--
>  xen/arch/arm/irq.c           | 32 +++++++++++++++
>  xen/arch/arm/traps.c         |  6 +++
>  xen/arch/arm/vgic-v3-its.c   |  2 +-
>  xen/arch/arm/vgic.c          | 93 +++++++++++++++++++++++++++++++++++++-------
>  xen/arch/arm/vgic/vgic.c     |  2 +
>  xen/include/asm-arm/config.h |  2 +-
>  xen/include/asm-arm/gic.h    | 10 ++---
>  xen/include/asm-arm/irq.h    |  3 ++
>  xen/include/asm-arm/vgic.h   | 24 ++++++++----
>  xen/include/xen/sched.h      |  1 +
>  18 files changed, 310 insertions(+), 110 deletions(-)
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2
  2019-01-02 18:33 ` [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 André Przywara
@ 2019-01-16 17:32   ` Andrii Anisov
  2019-01-21 17:48   ` Julien Grall
  1 sibling, 0 replies; 22+ messages in thread
From: Andrii Anisov @ 2019-01-16 17:32 UTC (permalink / raw)
  To: André Przywara, xen-devel
  Cc: Julien Grall, Stefano Stabellini, Andrii Anisov

Hello Andre,

On 02.01.19 20:33, André Przywara wrote:
> Many thanks for generating these numbers, this is very useful.
> 
> But: could you make any sense out them? I plotted them, but they don't
> seem to be very conclusive.

Those numbers are mostly intended to show per patch effects. But I kept them
consequently merged because of several dependencies between patches. Also
number of possible patches combination is quite  big :), so I stuck at that
one. I guess plotting is not quite informative in this particular case.

> I am scratching my head about the following
> issues:
> - The behaviour seems to be somewhat different between the four cases.
> Which one do you care about, in particular? Is vwfi=native an option for
> your use case? Which scheduler do you need or want to use?

> - Which of the four columns (max, warm_max, min, avg) are you actually
> interested in? For a hard realtime system I would expect max, or maybe
> warm_max?
I think of max(warm_max) for RT, and avg for multimedia (infotainment) use
cases.

> - Some of the patches seem to *increase* the latency.
Surprisingly to me, some patches *do* increase the latency. But I kept them on
the line. In order to show all effects.

> Patch 08/16 for
> instance sticks out here. Most of the times the latency decreases again
> afterwards, with a later patch, but I wonder if you could just pick the
> patches that help or somehow explain those outliers.

> - Can you give some more background on how you generated the numbers? I
> haven't looked with too much detail into the benchmark, but I wonder about:
>   * What is the runtime of your test? It seems to run forever and updates
>     the stats twice a second, if I get this correctly. So for how long
>     did you let it run?
I did run it until avg stabilized near some minimal value, and a bit more.
Please note that avg is a moving average here.

>   * Do we have any idea what the reliability of the values are? Can we
>     somehow calculate the standard deviation, for instance?
I guess TBM could be changed in that way.

> That would
>     help to get an idea about the error range we are looking at.

>   * Is this still the same system as in [4]?
Yes, sure. The system is the same.

> The resolution in there is
>     only 120ns, some of the values for two patches differ exactly by that
>     amount.Timer resolution is 120ns. And I can't squize something better on my HW :(.

> So is this actually within the error margin?
Here the avg value gives a clue of the patch impact.

> 
> Also it seems that this test is the only thing running on the system,
> beside the idle VCPU. Is that a reasonable way to assess real world
> interrupt latency?
Not really. It is a trivial, but reproducible test bench. It gives numbers for the
straight-forward interrupt delivery path only. No contentions on lock, no concurrent
interrupts inserting, no multiple interrupts processing per hyp entry are covered.

> For instance that means that the IRQ handler mostly
> hits even in the L1 cache, which I wouldn't expect in actual systems.
Agree.

> The method to separate warm_max from max seems to support this. Do you
> have some numbers from that 3D benchmark you mentioned earlier, to put
> this into perspective and show that improvements in one benefit the
> other as well?
I did not check this particular patch series with glmark2 yet.

> Also I looked at some code and started to cook up something myself[6].
> The first two patches there should replace your patch 01/16 in a much
> cleaner and easier way, along the lines I mentioned before in a reply to
> a former post of yours.

So, numbers for those two patches on top of 7f28661f6a7ce3d82f881b9afedfebca7f2cf116 are as following:

xen/arm: VGIC: Optimise is_lpi() for GICv2-only configuration
xen/arm: VGIC: wrap PRISTINE_LPI bit check
	max=9600 warm_max=6840 min=6600 avg=6608	max=4320 warm_max=3120 min=2880 avg=2905	max=9120 warm_max=7200 min=6600 avg=6870	max=4320 warm_max=3000 min=2880 avg=2887

> 
> Then I looked at the IRQ handler and stumbled upon the function pointers
> we are using. I was eyeing them before, because my hunch is they are
> costly, especially on big cores, as it might be hard for the CPU to
> speculate correctly. Basically something like a call to
> gic_hw_ops->gic_read_irq() translates into:
> 	ldr     x0, [x20]
> 	ldr     x0, [x0, #72]
> 	blr     x0
> That contains two dependency on loads. If one of them misses in the
> cache, the whole pipeline is stalled, if the CPU doesn't speculate both
> loads correctly (which it might, but we don't know).
> This is extra annoying since those function pointers never change, and
> are always pointing to the GICv2 functions if CONFIG_GICV3 is not
> defined. On top of this some functions are really trivial, so we pay a
> big price for something that might be a single MMIO read or even system
> register access. I think the proper solution (TM) would be to patch
> these accesses once we know which GIC we are running on. Linux does
> something to this effect, which benefits GICv3 in particular. From
> quickly looking at it, Xen seems to lack the infrastructure (jump labels
> and more sophisticated runtime patching) to easily copy this method.
> 
> So the top three patches in [6] address this in a somewhat hackish way,
> just to show if this method improves something. I just compile tested
> this, so would be curious if you could give it a try and test both
> functionality and performance. A nice side effect is that those patches
> benefit both the old and new VGIC. The effect on the TBM benchmark might
> not be too visible, though, due to the hot caches.

Numbers for those two patches on top of 7f28661f6a7ce3d82f881b9afedfebca7f2cf116 are as following:

xen/arm: VGIC: always use GICv2 functions if GICV3 is not defined
xen/arm: VGIC: export GICv2 functions
xen/arm: VGIC: wrap gic_hw_ops calls
	max=9360 warm_max=6960 min=6600 avg=6684	max=4320 warm_max=3120 min=2880 avg=2981	max=9600 warm_max=7440 min=6600 avg=7027	max=4560 warm_max=3000 min=2880 avg=2889

As you can see numbers are inconsistent. For the column 3 max is increased.
I guess it should be checked for glmark2 with several concurrent VMs.

> 
> Cheers,
> Andre.
> 
> [6] https://github.com/Andre-ARM/xen/commits/vgic-opt
>>
>> 	sched=credit2 vwfi=trap                 	sched=credit2 vwfi=native               	sched=credit vwfi=trap                  	sched=credit vwfi=native
>>
>> 7f28661f6a7ce3d82f881b9afedfebca7f2cf116
>> 	max=9480 warm_max=7200 min=6600 avg=6743	max=4680 warm_max=3240 min=3000 avg=3007	max=9480 warm_max=7920 min=6720 avg=7009	max=4560 warm_max=3000 min=2880 avg=2979
>>
>> gic:gic-vgic: separate GIV3 code more thoroughly
>>
>> 	max=9720 warm_max=6960 min=6600 avg=6617	max=5040 warm_max=3840 min=2880 avg=2905	max=9480 warm_max=7200 min=6600 avg=6871	max=4560 warm_max=3000 min=2880 avg=2887
>>
>> gic-vgic:vgic: avoid excessive conversions
>>
>> 	max=9360 warm_max=6720 min=6480 avg=6578	max=4800 warm_max=3120 min=2880 avg=2895	max=9480 warm_max=7080 min=6600 avg=6804	max=4800 warm_max=3120 min=2880 avg=2887
>>
>> gic:vgic:gic-vgic: introduce non-atomic bitops
>>
>> 	max=9120 warm_max=6600 min=6480 avg=6546	max=4920 warm_max=3000 min=2760 avg=2872	max=9120 warm_max=6720 min=6480 avg=6574	max=4200 warm_max=3120 min=2760 avg=2798
>>
>> gic: drop interrupts enabling on interrupts processing
>> 	max=9240 warm_max=7080 min=6360 avg=6492	max=5040 warm_max=3240 min=2760 avg=2767	max=9240 warm_max=6720 min=6480 avg=6491	max=4440 warm_max=3000 min=2760 avg=2809
>>
>> gic-vgic: skip irqs locking in gic_restore_pending_irqs()
>> 	max=9000 warm_max=6720 min=6360 avg=6430	max=4320 warm_max=3120 min=2640 avg=2671	max=9240 warm_max=6720 min=6360 avg=6459	max=4440 warm_max=2880 min=2640 avg=2668
>>
>> vgic: move pause_flags check out of vgic spinlock
>> 	max=9240 warm_max=6720 min=6360 avg=6431	max=4800 warm_max=2880 min=2640 avg=2675	max=9360 warm_max=6600 min=6360 avg=6435	max=4440 warm_max=2760 min=2640 avg=2647
>>
>> vgic: move irq_to_pending out of lock
>> 	max=8520 warm_max=7440 min=6360 avg=6444	max=4680 warm_max=3000 min=2640 avg=2753	max=9480 warm_max=6720 min=6360 avg=6445	max=4200 warm_max=3000 min=2640 avg=2667
>>
>> gic-vgic:vgic: do not keep disabled IRQs in any of queues
>> 	max=9120 warm_max=7920 min=6360 avg=6447	max=4440 warm_max=2760 min=2760 avg=2767	max=10440 warm_max=7560 min=6360 avg=6459	max=4440 warm_max=3840 min=2640 avg=2669
>>
>> xen/arm: Re-enable interrupt later in the trap path
>> 	max=9720 warm_max=9120 min=6360 avg=6441	max=4440 warm_max=2880 min=2760 avg=2767	max=9360 warm_max=6960 min=6360 avg=6451	max=4680 warm_max=2880 min=2640 avg=2675
>>
>> gic-vgic: skip irqs locking in vgic_sync_from_lrs
>> 	max=9240 warm_max=7080 min=6360 avg=6431	max=4920 warm_max=3120 min=2640 avg=2678	max=9480 warm_max=6960 min=6360 avg=6443	max=4680 warm_max=2880 min=2640 avg=2667
>>
>> gic-v2: Write HCR only on change
>> 	max=9840 warm_max=6600 min=6360 avg=6459	max=4440 warm_max=2760 min=2520 avg=2527	max=9480 warm_max=7920 min=6360 avg=6445	max=4320 warm_max=2760 min=2520 avg=2527
>>
>> gic-v2: avoid HCR reading for GICv2
>> 	max=9480 warm_max=7680 min=6360 avg=6443	max=4320 warm_max=2880 min=2520 avg=2527	max=9360 warm_max=7080 min=6720 avg=6750	max=3960 warm_max=2640 min=2400 avg=2487
>>
>> hack: arm/domain: simplify context restore from idle vcpu
>> 	max=9360 warm_max=6720 min=6000 avg=6214	max=5040 warm_max=2640 min=2520 avg=2527	max=9480 warm_max=7080 min=6240 avg=6367	max=4080 warm_max=2880 min=2400 avg=2527
>>
>> hack: move gicv2 LRs reads and writes out of spinlocks
>> 	max=9480 warm_max=6840 min=6600 avg=6612	max=4800 warm_max=2760 min=2640 avg=2739	max=9000 warm_max=7200 min=6600 avg=6636	max=4560 warm_max=2760 min=2520 avg=2619
>>
>> gic: vgic: align frequently accessed data by cache line size
>> 	max=9840 warm_max=6600 min=6240 avg=6288	max=4440 warm_max=2880 min=2640 avg=2682	max=8280 warm_max=6720 min=6360 avg=6488	max=4080 warm_max=2880 min=2640 avg=2678
>>
>> gic: separate ppi processing
>> 	NOT SUITABLE FOR EVALUATION WITH TBM
>>
>> [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html
>> [2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html
>> [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html
>> [4] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00881.html
>> [5] https://docs.google.com/spreadsheets/d/1J_u9UDowaonnaKtgiugXqtIFT-c2E4Ss2vxgL6NnbNo/edit?usp=sharing
>>
>> Andrii Anisov (15):
>>    gic:gic-vgic: separate GIV3 code more thoroughly
>>    gic-vgic:vgic: avoid excessive conversions
>>    gic:vgic:gic-vgic: introduce non-atomic bitops
>>    gic: drop interrupts enabling on interrupts processing
>>    gic-vgic: skip irqs locking in gic_restore_pending_irqs()
>>    vgic: move pause_flags check out of vgic spinlock
>>    vgic: move irq_to_pending out of lock
>>    gic-vgic:vgic: do not keep disabled IRQs in any of queues
>>    gic-vgic: skip irqs locking in vgic_sync_from_lrs
>>    gic-v2: Write HCR only on change
>>    gic-v2: avoid HCR reading for GICv2
>>    hack: arm/domain: simplify context restore from idle vcpu
>>    hack: move gicv2 LRs reads and writes out of spinlocks
>>    gic: vgic: align frequently accessed data by cache line size
>>    gic: separate ppi processing
>>
>> Julien Grall (1):
>>    xen/arm: Re-enable interrupt later in the trap path
>>
>> [11] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03282.html
>>
>>   xen/arch/arm/arm64/entry.S   | 11 +++---
>>   xen/arch/arm/domain.c        | 25 +++++++-----
>>   xen/arch/arm/gic-v2.c        | 82 +++++++++++++++++++++++++-------------
>>   xen/arch/arm/gic-v3-its.c    |  2 +
>>   xen/arch/arm/gic-v3-lpi.c    |  2 +
>>   xen/arch/arm/gic-v3.c        |  4 +-
>>   xen/arch/arm/gic-vgic.c      | 87 +++++++++++++++++++++++++----------------
>>   xen/arch/arm/gic.c           | 32 +++++++++++++--
>>   xen/arch/arm/irq.c           | 32 +++++++++++++++
>>   xen/arch/arm/traps.c         |  6 +++
>>   xen/arch/arm/vgic-v3-its.c   |  2 +-
>>   xen/arch/arm/vgic.c          | 93 +++++++++++++++++++++++++++++++++++++-------
>>   xen/arch/arm/vgic/vgic.c     |  2 +
>>   xen/include/asm-arm/config.h |  2 +-
>>   xen/include/asm-arm/gic.h    | 10 ++---
>>   xen/include/asm-arm/irq.h    |  3 ++
>>   xen/include/asm-arm/vgic.h   | 24 ++++++++----
>>   xen/include/xen/sched.h      |  1 +
>>   18 files changed, 310 insertions(+), 110 deletions(-)
>>
> 

-- 
Sincerely,
Andrii Anisov.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2
  2019-01-02 18:33 ` [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 André Przywara
  2019-01-16 17:32   ` Andrii Anisov
@ 2019-01-21 17:48   ` Julien Grall
  1 sibling, 0 replies; 22+ messages in thread
From: Julien Grall @ 2019-01-21 17:48 UTC (permalink / raw)
  To: André Przywara, Andrii Anisov, xen-devel
  Cc: Stefano Stabellini, Andrii Anisov

Hi,

On 02/01/2019 18:33, André Przywara wrote:
> On 26/12/2018 11:20, Andrii Anisov wrote:
> Then I looked at the IRQ handler and stumbled upon the function pointers
> we are using. I was eyeing them before, because my hunch is they are
> costly, especially on big cores, as it might be hard for the CPU to
> speculate correctly. Basically something like a call to
> gic_hw_ops->gic_read_irq() translates into:
> 	ldr     x0, [x20]
> 	ldr     x0, [x0, #72]
> 	blr     x0
> That contains two dependency on loads. If one of them misses in the
> cache, the whole pipeline is stalled, if the CPU doesn't speculate both
> loads correctly (which it might, but we don't know).
> This is extra annoying since those function pointers never change, and
> are always pointing to the GICv2 functions if CONFIG_GICV3 is not
> defined. On top of this some functions are really trivial, so we pay a
> big price for something that might be a single MMIO read or even system
> register access. I think the proper solution (TM) would be to patch
> these accesses once we know which GIC we are running on. Linux does
> something to this effect, which benefits GICv3 in particular. From
> quickly looking at it, Xen seems to lack the infrastructure (jump labels
> and more sophisticated runtime patching) to easily copy this method.

The x86 folks are looking at reducing the overhead for call (see [1]).

We could implement similar framework for patching the GIC call.

Cheers,

[1] 	<5B97C3D402000078001E747C@prv1-mh.provo.novell.com>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2
  2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
                   ` (16 preceding siblings ...)
  2019-01-02 18:33 ` [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 André Przywara
@ 2019-01-21 18:14 ` Julien Grall
  17 siblings, 0 replies; 22+ messages in thread
From: Julien Grall @ 2019-01-21 18:14 UTC (permalink / raw)
  To: Andrii Anisov, xen-devel
  Cc: Andre Przywara, Stefano Stabellini, Andrii Anisov

Hi Andrii,

Thank you for the numbers.

On 26/12/2018 11:20, Andrii Anisov wrote:
> From: Andrii Anisov <andrii_anisov@epam.com>
> 
> This patch series is an attempt to reduce IRQ latency with the
> old GIC implementation (gic-vgic). These patches originally based
> on XEN 4.10 release. The motivation was to improve benchmark
> results of a system given to a customer for evaluation.
> This patch series is tailored for GICv2 on RCAR H3. Several
> of the most controversial patches (i.e. LRs shadowing) were
> not shared to the customer, and here are for comments and discussion.
> I hope several patches from here could be upstreamed. Some as is,
> others with modifications.

As long as it is tailored, I will not be able to properly review them. So a good 
start is to try to get the series upstream is removing all your hack and 
avoiding tailoring this to the RCAR H3.

> 
> There are several simple ideas behind these changes:
>      - reduce an excessive code (condition checks)
>      - drop an excessive peripheral register accesses
>      - if not reduce, then move an excessive code out of spinlocks
>      - if not drop, then move an excessive register
>        accesses out of spinlocks
> 
> This is a v2 of the original RFC series [1]. From that series, patches
> [2] and [3] have already reached mainline. Here few patches are reworked
> with addressing some comments or separating them into more clear pieces,
> more patches are taken from the RFC v1 as is.

It is a quite frustrating to see 16 more patches in my inbox with most of my 
comments not addressed or even answered. The number could just have been posted 
on the RFCv1 avoiding to spam my mailbox.

> 
> The main intention of this version of RFC series is to reveal
> patch-by-patch IRQ latency impact.
> The measurement is performed with TBM [4], so the use-case is trivial -
> passing a single IRQ twice in a second. Thus no lock contentions nor
> even passing more than one interrupt to a guest at the time use-cases
> are hit.
> 
> The series is based on the current xenbits/staging, commit 7f28661f6a7.
> XEN is build with no DEBUG and no GICv3 support for the staging HEAD and
> each commit. Four runtime configurations are evaluated for each commit:
>      - sched=credit2 vwfi=trap
>      - sched=credit2 vwfi=native
>      - sched=credit vwfi=trap
>      - sched=credit vwfi=native
> 
> Each commit is incrementally cherry-picked for the latency evaluation in
> an order they appear in the table. The table also can be found shared
> as a Google spreadsheet here [5].

Can you format the Google spreadsheet in a usable way? By that I mean having 
only one number per column so we can add a bit more formula in it.

I will comment on the number once they are in better shape.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-01-21 18:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-26 11:20 [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 Andrii Anisov
2018-12-26 11:20 ` [RFC v2 01/16] gic:gic-vgic: separate GIV3 code more thoroughly Andrii Anisov
2018-12-26 11:20 ` [RFC v2 02/16] gic-vgic:vgic: avoid excessive conversions Andrii Anisov
2018-12-26 11:20 ` [RFC v2 03/16] gic:vgic:gic-vgic: introduce non-atomic bitops Andrii Anisov
2018-12-26 11:20 ` [RFC v2 04/16] gic: drop interrupts enabling on interrupts processing Andrii Anisov
2018-12-26 11:20 ` [RFC v2 05/16] gic-vgic: skip irqs locking in gic_restore_pending_irqs() Andrii Anisov
2018-12-26 11:20 ` [RFC v2 06/16] vgic: move pause_flags check out of vgic spinlock Andrii Anisov
2018-12-26 11:20 ` [RFC v2 07/16] hack:vgic: move irq_to_pending out of lock Andrii Anisov
2018-12-26 11:20 ` [RFC v2 08/16] gic-vgic:vgic: do not keep disabled IRQs in any of queues Andrii Anisov
2018-12-26 11:20 ` [RFC v2 09/16] xen/arm: Re-enable interrupt later in the trap path Andrii Anisov
2018-12-26 11:20 ` [RFC v2 10/16] gic-vgic: skip irqs locking in vgic_sync_from_lrs Andrii Anisov
2018-12-26 11:20 ` [RFC v2 11/16] gic-v2: Write HCR only on change Andrii Anisov
2018-12-26 11:20 ` [RFC v2 12/16] gic-v2: avoid HCR reading for GICv2 Andrii Anisov
2018-12-26 11:20 ` [RFC v2 13/16] hack: arm/domain: simplify context restore from idle vcpu Andrii Anisov
2019-01-02 12:24   ` Wei Liu
2018-12-26 11:20 ` [RFC v2 14/16] hack: move gicv2 LRs reads and writes out of spinlocks Andrii Anisov
2018-12-26 11:20 ` [RFC v2 15/16] gic: vgic: align frequently accessed data by cache line size Andrii Anisov
2018-12-26 11:20 ` [RFC v2 16/16] gic: separate ppi processing Andrii Anisov
2019-01-02 18:33 ` [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2 André Przywara
2019-01-16 17:32   ` Andrii Anisov
2019-01-21 17:48   ` Julien Grall
2019-01-21 18:14 ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.