xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
@ 2021-02-20 14:04 Julien Grall
  2021-02-22 11:10 ` Ian Jackson
  2021-02-22 13:45 ` Bertrand Marquis
  0 siblings, 2 replies; 8+ messages in thread
From: Julien Grall @ 2021-02-20 14:04 UTC (permalink / raw)
  To: xen-devel
  Cc: bertrand.marquis, iwj, Julien Grall, Stefano Stabellini,
	Julien Grall, Volodymyr Babchuk

From: Julien Grall <jgrall@amazon.com>

Currently, Xen will send a data abort to a guest trying to write to the
ISPENDR.

Unfortunately, recent version of Linux (at least 5.9+) will start
writing to the register if the interrupt needs to be re-triggered
(see the callback irq_retrigger). This can happen when a driver (such as
the xgbe network driver on AMD Seattle) re-enable an interrupt:

(XEN) d0v0: vGICD: unhandled word write 0x00000004000000 to ISPENDR44
[...]
[   25.635837] Unhandled fault at 0xffff80001000522c
[...]
[   25.818716]  gic_retrigger+0x2c/0x38
[   25.822361]  irq_startup+0x78/0x138
[   25.825920]  __enable_irq+0x70/0x80
[   25.829478]  enable_irq+0x50/0xa0
[   25.832864]  xgbe_one_poll+0xc8/0xd8
[   25.836509]  net_rx_action+0x110/0x3a8
[   25.840328]  __do_softirq+0x124/0x288
[   25.844061]  irq_exit+0xe0/0xf0
[   25.847272]  __handle_domain_irq+0x68/0xc0
[   25.851442]  gic_handle_irq+0xa8/0xe0
[   25.855171]  el1_irq+0xb0/0x180
[   25.858383]  arch_cpu_idle+0x18/0x28
[   25.862028]  default_idle_call+0x24/0x5c
[   25.866021]  do_idle+0x204/0x278
[   25.869319]  cpu_startup_entry+0x24/0x68
[   25.873313]  rest_init+0xd4/0xe4
[   25.876611]  arch_call_rest_init+0x10/0x1c
[   25.880777]  start_kernel+0x5b8/0x5ec

As a consequence, the OS may become unusable.

Implementing the write part of ISPENDR is somewhat easy. For
virtual interrupt, we only need to inject the interrupt again.

For physical interrupt, we need to be more careful as the de-activation
of the virtual interrupt will be propagated to the physical distributor.
For simplicity, the physical interrupt will be set pending so the
workflow will not differ from a "real" interrupt.

Longer term, we could possible directly activate the physical interrupt
and avoid taking an exception to inject the interrupt to the domain.
(This is the approach taken by the new vGIC based on KVM).

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

Note that this doesn't touch the read part for I{S,C}PENDR nor the write
part of ICPENDR because they are more complex to implement.

For physical interrupt, I didn't implement the same solution as KVM because
I couldn't convince myself this could be done race free for physical
interrupt.

This was tested using the IRQ debugfs (CONFIG_GENERIC_IRQ_DEBUGFS=y)
which allows to retrigger an interrupt:

42sh> echo trigger > /sys/kernel/debug/irq/irqs/<irq>

This patch is candidate for 4.15 and also backporting to older trees.
Without this patch, recent Linux version may not boot on Xen on some
platforms (such as AMD Seattle used in OssTest).

The patch is self-contained to implementing a single set of registers.
So this would not introduce any risk on platforms where OSes don't use
those registers.

For the other setup (e.g. AMD Seattle + Linux 5.9+), it cannot be worse
than today.

So therefore, I would consider the risk limited.
---
 xen/arch/arm/vgic-v2.c     | 10 ++++----
 xen/arch/arm/vgic-v3.c     | 18 ++++++---------
 xen/arch/arm/vgic.c        | 47 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/vgic.h |  2 ++
 4 files changed, 62 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 64b141fea586..b2da886adc18 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -472,10 +472,12 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
 
     case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        printk(XENLOG_G_ERR
-               "%pv: vGICD: unhandled word write %#"PRIregister" to ISPENDR%d\n",
-               v, r, gicd_reg - GICD_ISPENDR);
-        return 0;
+        rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ISPENDR, DABT_WORD);
+        if ( rank == NULL ) goto write_ignore;
+
+        vgic_set_irqs_pending(v, r, rank->index);
+
+        return 1;
 
     case VRANGE32(GICD_ICPENDR, GICD_ICPENDRN):
         if ( dabt.size != DABT_WORD ) goto bad_width;
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index fd8cfc156d0c..613f37abab5e 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -808,10 +808,12 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
 
     case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
         if ( dabt.size != DABT_WORD ) goto bad_width;
-        printk(XENLOG_G_ERR
-               "%pv: %s: unhandled word write %#"PRIregister" to ISPENDR%d\n",
-               v, name, r, reg - GICD_ISPENDR);
-        return 0;
+        rank = vgic_rank_offset(v, 1, reg - GICD_ISPENDR, DABT_WORD);
+        if ( rank == NULL ) goto write_ignore;
+
+        vgic_set_irqs_pending(v, r, rank->index);
+
+        return 1;
 
     case VRANGE32(GICD_ICPENDR, GICD_ICPENDRN):
         if ( dabt.size != DABT_WORD ) goto bad_width;
@@ -975,6 +977,7 @@ static int vgic_v3_rdistr_sgi_mmio_write(struct vcpu *v, mmio_info_t *info,
     case VREG32(GICR_ICACTIVER0):
     case VREG32(GICR_ICFGR1):
     case VRANGE32(GICR_IPRIORITYR0, GICR_IPRIORITYR7):
+    case VREG32(GICR_ISPENDR0):
          /*
           * Above registers offset are common with GICD.
           * So handle common with GICD handling
@@ -982,13 +985,6 @@ static int vgic_v3_rdistr_sgi_mmio_write(struct vcpu *v, mmio_info_t *info,
         return __vgic_v3_distr_common_mmio_write("vGICR: SGI", v,
                                                  info, gicr_reg, r);
 
-    case VREG32(GICR_ISPENDR0):
-        if ( dabt.size != DABT_WORD ) goto bad_width;
-        printk(XENLOG_G_ERR
-               "%pv: vGICR: SGI: unhandled word write %#"PRIregister" to ISPENDR0\n",
-               v, r);
-        return 0;
-
     case VREG32(GICR_ICPENDR0):
         if ( dabt.size != DABT_WORD ) goto bad_width;
         printk(XENLOG_G_ERR
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 82f524a35c9e..8f9400a51960 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -423,6 +423,53 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
     }
 }
 
+void vgic_set_irqs_pending(struct vcpu *v, uint32_t r, unsigned int rank)
+{
+    const unsigned long mask = r;
+    unsigned int i;
+    /* The first rank is always per-vCPU */
+    bool private = rank == 0;
+
+    /* LPIs will never be set pending via this function */
+    ASSERT(!is_lpi(32 * rank + 31));
+
+    for_each_set_bit( i, &mask, 32 )
+    {
+        unsigned int irq = i + 32 * rank;
+
+        if ( !private )
+        {
+            struct pending_irq *p = spi_to_pending(v->domain, irq);
+
+            /*
+             * When the domain sets the pending state for a HW interrupt on
+             * the virtual distributor, we set the pending state on the
+             * physical distributor.
+             *
+             * XXX: Investigate whether we would be able to set the
+             * physical interrupt active and save an interruption. (This
+             * is what the new vGIC does).
+             */
+            if ( p->desc != NULL )
+            {
+                unsigned long flags;
+
+                spin_lock_irqsave(&p->desc->lock, flags);
+                gic_set_pending_state(p->desc, true);
+                spin_unlock_irqrestore(&p->desc->lock, flags);
+                continue;
+            }
+        }
+
+        /*
+         * If the interrupt is per-vCPU, then we want to inject the vIRQ
+         * to v, otherwise we should let the function figuring out the
+         * correct vCPU.
+         */
+        vgic_inject_irq(v->domain, private ? v : NULL, irq, true);
+    }
+}
+
 bool vgic_to_sgi(struct vcpu *v, register_t sgir, enum gic_sgi_mode irqmode,
                  int virq, const struct sgi_target *target)
 {
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index ce1e3c4bbdac..62c2ae538db2 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -288,6 +288,8 @@ extern struct vgic_irq_rank *vgic_rank_offset(struct vcpu *v, int b, int n, int
 extern struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned int irq);
 extern void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n);
 extern void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n);
+extern void vgic_set_irqs_pending(struct vcpu *v, uint32_t r,
+                                  unsigned int rank);
 extern void register_vgic_ops(struct domain *d, const struct vgic_ops *ops);
 int vgic_v2_init(struct domain *d, int *mmio_count);
 int vgic_v3_init(struct domain *d, int *mmio_count);
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
  2021-02-20 14:04 [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3} Julien Grall
@ 2021-02-22 11:10 ` Ian Jackson
  2021-02-22 13:45 ` Bertrand Marquis
  1 sibling, 0 replies; 8+ messages in thread
From: Ian Jackson @ 2021-02-22 11:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, bertrand.marquis, Julien Grall, Stefano Stabellini,
	Volodymyr Babchuk

Julien Grall writes ("[PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}"):
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, Xen will send a data abort to a guest trying to write to the
> ISPENDR.
> 
> Unfortunately, recent version of Linux (at least 5.9+) will start
> writing to the register if the interrupt needs to be re-triggered
> (see the callback irq_retrigger). This can happen when a driver (such as
> the xgbe network driver on AMD Seattle) re-enable an interrupt:

Release-Acked-by: Ian Jackson <iwj@xenproject.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
  2021-02-20 14:04 [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3} Julien Grall
  2021-02-22 11:10 ` Ian Jackson
@ 2021-02-22 13:45 ` Bertrand Marquis
  2021-02-23  1:24   ` Stefano Stabellini
  2021-02-25 12:22   ` Julien Grall
  1 sibling, 2 replies; 8+ messages in thread
From: Bertrand Marquis @ 2021-02-22 13:45 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, iwj, Julien Grall, Stefano Stabellini, Volodymyr Babchuk

Hi Julien,

> On 20 Feb 2021, at 14:04, Julien Grall <julien@xen.org> wrote:
> 
> From: Julien Grall <jgrall@amazon.com>
> 
> Currently, Xen will send a data abort to a guest trying to write to the
> ISPENDR.
> 
> Unfortunately, recent version of Linux (at least 5.9+) will start
> writing to the register if the interrupt needs to be re-triggered
> (see the callback irq_retrigger). This can happen when a driver (such as
> the xgbe network driver on AMD Seattle) re-enable an interrupt:
> 
> (XEN) d0v0: vGICD: unhandled word write 0x00000004000000 to ISPENDR44
> [...]
> [   25.635837] Unhandled fault at 0xffff80001000522c
> [...]
> [   25.818716]  gic_retrigger+0x2c/0x38
> [   25.822361]  irq_startup+0x78/0x138
> [   25.825920]  __enable_irq+0x70/0x80
> [   25.829478]  enable_irq+0x50/0xa0
> [   25.832864]  xgbe_one_poll+0xc8/0xd8
> [   25.836509]  net_rx_action+0x110/0x3a8
> [   25.840328]  __do_softirq+0x124/0x288
> [   25.844061]  irq_exit+0xe0/0xf0
> [   25.847272]  __handle_domain_irq+0x68/0xc0
> [   25.851442]  gic_handle_irq+0xa8/0xe0
> [   25.855171]  el1_irq+0xb0/0x180
> [   25.858383]  arch_cpu_idle+0x18/0x28
> [   25.862028]  default_idle_call+0x24/0x5c
> [   25.866021]  do_idle+0x204/0x278
> [   25.869319]  cpu_startup_entry+0x24/0x68
> [   25.873313]  rest_init+0xd4/0xe4
> [   25.876611]  arch_call_rest_init+0x10/0x1c
> [   25.880777]  start_kernel+0x5b8/0x5ec
> 
> As a consequence, the OS may become unusable.
> 
> Implementing the write part of ISPENDR is somewhat easy. For
> virtual interrupt, we only need to inject the interrupt again.
> 
> For physical interrupt, we need to be more careful as the de-activation
> of the virtual interrupt will be propagated to the physical distributor.
> For simplicity, the physical interrupt will be set pending so the
> workflow will not differ from a "real" interrupt.
> 
> Longer term, we could possible directly activate the physical interrupt
> and avoid taking an exception to inject the interrupt to the domain.
> (This is the approach taken by the new vGIC based on KVM).
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

This is something which will not be done by a guest very often so I think your
implementation actually makes it simpler and reduce possibilities of race conditions
so I am not even sure that the XXX comment is needed.
But i am ok with it being in or not so:

Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>

I did some tests by manually generating interrupts and I can confirm that this
works as expected.

Cheers
Bertrand

> 
> ---
> 
> Note that this doesn't touch the read part for I{S,C}PENDR nor the write
> part of ICPENDR because they are more complex to implement.
> 
> For physical interrupt, I didn't implement the same solution as KVM because
> I couldn't convince myself this could be done race free for physical
> interrupt.
> 
> This was tested using the IRQ debugfs (CONFIG_GENERIC_IRQ_DEBUGFS=y)
> which allows to retrigger an interrupt:
> 
> 42sh> echo trigger > /sys/kernel/debug/irq/irqs/<irq>
> 
> This patch is candidate for 4.15 and also backporting to older trees.
> Without this patch, recent Linux version may not boot on Xen on some
> platforms (such as AMD Seattle used in OssTest).
> 
> The patch is self-contained to implementing a single set of registers.
> So this would not introduce any risk on platforms where OSes don't use
> those registers.
> 
> For the other setup (e.g. AMD Seattle + Linux 5.9+), it cannot be worse
> than today.
> 
> So therefore, I would consider the risk limited.
> ---
> xen/arch/arm/vgic-v2.c     | 10 ++++----
> xen/arch/arm/vgic-v3.c     | 18 ++++++---------
> xen/arch/arm/vgic.c        | 47 ++++++++++++++++++++++++++++++++++++++
> xen/include/asm-arm/vgic.h |  2 ++
> 4 files changed, 62 insertions(+), 15 deletions(-)
> 
> diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
> index 64b141fea586..b2da886adc18 100644
> --- a/xen/arch/arm/vgic-v2.c
> +++ b/xen/arch/arm/vgic-v2.c
> @@ -472,10 +472,12 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
> 
>     case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
>         if ( dabt.size != DABT_WORD ) goto bad_width;
> -        printk(XENLOG_G_ERR
> -               "%pv: vGICD: unhandled word write %#"PRIregister" to ISPENDR%d\n",
> -               v, r, gicd_reg - GICD_ISPENDR);
> -        return 0;
> +        rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ISPENDR, DABT_WORD);
> +        if ( rank == NULL ) goto write_ignore;
> +
> +        vgic_set_irqs_pending(v, r, rank->index);
> +
> +        return 1;
> 
>     case VRANGE32(GICD_ICPENDR, GICD_ICPENDRN):
>         if ( dabt.size != DABT_WORD ) goto bad_width;
> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> index fd8cfc156d0c..613f37abab5e 100644
> --- a/xen/arch/arm/vgic-v3.c
> +++ b/xen/arch/arm/vgic-v3.c
> @@ -808,10 +808,12 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
> 
>     case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
>         if ( dabt.size != DABT_WORD ) goto bad_width;
> -        printk(XENLOG_G_ERR
> -               "%pv: %s: unhandled word write %#"PRIregister" to ISPENDR%d\n",
> -               v, name, r, reg - GICD_ISPENDR);
> -        return 0;
> +        rank = vgic_rank_offset(v, 1, reg - GICD_ISPENDR, DABT_WORD);
> +        if ( rank == NULL ) goto write_ignore;
> +
> +        vgic_set_irqs_pending(v, r, rank->index);
> +
> +        return 1;
> 
>     case VRANGE32(GICD_ICPENDR, GICD_ICPENDRN):
>         if ( dabt.size != DABT_WORD ) goto bad_width;
> @@ -975,6 +977,7 @@ static int vgic_v3_rdistr_sgi_mmio_write(struct vcpu *v, mmio_info_t *info,
>     case VREG32(GICR_ICACTIVER0):
>     case VREG32(GICR_ICFGR1):
>     case VRANGE32(GICR_IPRIORITYR0, GICR_IPRIORITYR7):
> +    case VREG32(GICR_ISPENDR0):
>          /*
>           * Above registers offset are common with GICD.
>           * So handle common with GICD handling
> @@ -982,13 +985,6 @@ static int vgic_v3_rdistr_sgi_mmio_write(struct vcpu *v, mmio_info_t *info,
>         return __vgic_v3_distr_common_mmio_write("vGICR: SGI", v,
>                                                  info, gicr_reg, r);
> 
> -    case VREG32(GICR_ISPENDR0):
> -        if ( dabt.size != DABT_WORD ) goto bad_width;
> -        printk(XENLOG_G_ERR
> -               "%pv: vGICR: SGI: unhandled word write %#"PRIregister" to ISPENDR0\n",
> -               v, r);
> -        return 0;
> -
>     case VREG32(GICR_ICPENDR0):
>         if ( dabt.size != DABT_WORD ) goto bad_width;
>         printk(XENLOG_G_ERR
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 82f524a35c9e..8f9400a51960 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -423,6 +423,53 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
>     }
> }
> 
> +void vgic_set_irqs_pending(struct vcpu *v, uint32_t r, unsigned int rank)
> +{
> +    const unsigned long mask = r;
> +    unsigned int i;
> +    /* The first rank is always per-vCPU */
> +    bool private = rank == 0;
> +
> +    /* LPIs will never be set pending via this function */
> +    ASSERT(!is_lpi(32 * rank + 31));
> +
> +    for_each_set_bit( i, &mask, 32 )
> +    {
> +        unsigned int irq = i + 32 * rank;
> +
> +        if ( !private )
> +        {
> +            struct pending_irq *p = spi_to_pending(v->domain, irq);
> +
> +            /*
> +             * When the domain sets the pending state for a HW interrupt on
> +             * the virtual distributor, we set the pending state on the
> +             * physical distributor.
> +             *
> +             * XXX: Investigate whether we would be able to set the
> +             * physical interrupt active and save an interruption. (This
> +             * is what the new vGIC does).
> +             */
> +            if ( p->desc != NULL )
> +            {
> +                unsigned long flags;
> +
> +                spin_lock_irqsave(&p->desc->lock, flags);
> +                gic_set_pending_state(p->desc, true);
> +                spin_unlock_irqrestore(&p->desc->lock, flags);
> +                continue;
> +            }
> +        }
> +
> +        /*
> +         * If the interrupt is per-vCPU, then we want to inject the vIRQ
> +         * to v, otherwise we should let the function figuring out the
> +         * correct vCPU.
> +         */
> +        vgic_inject_irq(v->domain, private ? v : NULL, irq, true);
> +    }
> +}
> +
> bool vgic_to_sgi(struct vcpu *v, register_t sgir, enum gic_sgi_mode irqmode,
>                  int virq, const struct sgi_target *target)
> {
> diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
> index ce1e3c4bbdac..62c2ae538db2 100644
> --- a/xen/include/asm-arm/vgic.h
> +++ b/xen/include/asm-arm/vgic.h
> @@ -288,6 +288,8 @@ extern struct vgic_irq_rank *vgic_rank_offset(struct vcpu *v, int b, int n, int
> extern struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned int irq);
> extern void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n);
> extern void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n);
> +extern void vgic_set_irqs_pending(struct vcpu *v, uint32_t r,
> +                                  unsigned int rank);
> extern void register_vgic_ops(struct domain *d, const struct vgic_ops *ops);
> int vgic_v2_init(struct domain *d, int *mmio_count);
> int vgic_v3_init(struct domain *d, int *mmio_count);
> -- 
> 2.17.1
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
  2021-02-22 13:45 ` Bertrand Marquis
@ 2021-02-23  1:24   ` Stefano Stabellini
  2021-02-23 10:31     ` Julien Grall
  2021-02-25 12:22   ` Julien Grall
  1 sibling, 1 reply; 8+ messages in thread
From: Stefano Stabellini @ 2021-02-23  1:24 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: Julien Grall, xen-devel, iwj, Julien Grall, Stefano Stabellini,
	Volodymyr Babchuk, dfaggioli, George.Dunlap

On Mon, 22 Feb 2021, Bertrand Marquis wrote:
> Hi Julien,
> 
> > On 20 Feb 2021, at 14:04, Julien Grall <julien@xen.org> wrote:
> > 
> > From: Julien Grall <jgrall@amazon.com>
> > 
> > Currently, Xen will send a data abort to a guest trying to write to the
> > ISPENDR.
> > 
> > Unfortunately, recent version of Linux (at least 5.9+) will start
> > writing to the register if the interrupt needs to be re-triggered
> > (see the callback irq_retrigger). This can happen when a driver (such as
> > the xgbe network driver on AMD Seattle) re-enable an interrupt:
> > 
> > (XEN) d0v0: vGICD: unhandled word write 0x00000004000000 to ISPENDR44
> > [...]
> > [   25.635837] Unhandled fault at 0xffff80001000522c
> > [...]
> > [   25.818716]  gic_retrigger+0x2c/0x38
> > [   25.822361]  irq_startup+0x78/0x138
> > [   25.825920]  __enable_irq+0x70/0x80
> > [   25.829478]  enable_irq+0x50/0xa0
> > [   25.832864]  xgbe_one_poll+0xc8/0xd8
> > [   25.836509]  net_rx_action+0x110/0x3a8
> > [   25.840328]  __do_softirq+0x124/0x288
> > [   25.844061]  irq_exit+0xe0/0xf0
> > [   25.847272]  __handle_domain_irq+0x68/0xc0
> > [   25.851442]  gic_handle_irq+0xa8/0xe0
> > [   25.855171]  el1_irq+0xb0/0x180
> > [   25.858383]  arch_cpu_idle+0x18/0x28
> > [   25.862028]  default_idle_call+0x24/0x5c
> > [   25.866021]  do_idle+0x204/0x278
> > [   25.869319]  cpu_startup_entry+0x24/0x68
> > [   25.873313]  rest_init+0xd4/0xe4
> > [   25.876611]  arch_call_rest_init+0x10/0x1c
> > [   25.880777]  start_kernel+0x5b8/0x5ec
> > 
> > As a consequence, the OS may become unusable.
> > 
> > Implementing the write part of ISPENDR is somewhat easy. For
> > virtual interrupt, we only need to inject the interrupt again.
> > 
> > For physical interrupt, we need to be more careful as the de-activation
> > of the virtual interrupt will be propagated to the physical distributor.
> > For simplicity, the physical interrupt will be set pending so the
> > workflow will not differ from a "real" interrupt.
> > 
> > Longer term, we could possible directly activate the physical interrupt
> > and avoid taking an exception to inject the interrupt to the domain.
> > (This is the approach taken by the new vGIC based on KVM).
> > 
> > Signed-off-by: Julien Grall <jgrall@amazon.com>

Thanks for the patch!


> This is something which will not be done by a guest very often so I think your
> implementation actually makes it simpler and reduce possibilities of race conditions
> so I am not even sure that the XXX comment is needed.
> But i am ok with it being in or not so:
> 
> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
> 
> I did some tests by manually generating interrupts and I can confirm that this
> works as expected.

And thank you for testing.

Yes it is definitely true that it shouldn't be done by the guest often
so it is not worth optimizing. For example, if somebody is trying to
measure latency using this, they are really barking at the wrong tree.
And I like the simplicity of this implementation; the patch looks good.


So I am not too worries about the cost of the roundtrip. But should we
be worried about the guest continuously creating events for itself in a
loop and potentially "kidnapping" a pcpu by looping around ISPENDR? I am
talking about cases where 1 pcpu is shared between 2 vcpus of different
guests. I thought we should have a brief conversation about it before
allowing it.

The consequence of this patch is that a guest can cause vcpu_unblock(v),
hence vcpu_wake(v), to be called for its own vcpu, which basically
always changes v to RUNSTATE_runnable. However, that alone shouldn't
allow v to always come up ahead of any other vcpus in the queue, right?
It should be safe. I just wanted a second opinion on this :-)

It was possible to trigger interrupts for your own vcpus even before, but
now the code path is going to be direct for virtual interrupts.



> > ---
> > 
> > Note that this doesn't touch the read part for I{S,C}PENDR nor the write
> > part of ICPENDR because they are more complex to implement.
> > 
> > For physical interrupt, I didn't implement the same solution as KVM because
> > I couldn't convince myself this could be done race free for physical
> > interrupt.
> > 
> > This was tested using the IRQ debugfs (CONFIG_GENERIC_IRQ_DEBUGFS=y)
> > which allows to retrigger an interrupt:
> > 
> > 42sh> echo trigger > /sys/kernel/debug/irq/irqs/<irq>
> > 
> > This patch is candidate for 4.15 and also backporting to older trees.
> > Without this patch, recent Linux version may not boot on Xen on some
> > platforms (such as AMD Seattle used in OssTest).
> > 
> > The patch is self-contained to implementing a single set of registers.
> > So this would not introduce any risk on platforms where OSes don't use
> > those registers.
> > 
> > For the other setup (e.g. AMD Seattle + Linux 5.9+), it cannot be worse
> > than today.
> > 
> > So therefore, I would consider the risk limited.
> > ---
> > xen/arch/arm/vgic-v2.c     | 10 ++++----
> > xen/arch/arm/vgic-v3.c     | 18 ++++++---------
> > xen/arch/arm/vgic.c        | 47 ++++++++++++++++++++++++++++++++++++++
> > xen/include/asm-arm/vgic.h |  2 ++
> > 4 files changed, 62 insertions(+), 15 deletions(-)
> > 
> > diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
> > index 64b141fea586..b2da886adc18 100644
> > --- a/xen/arch/arm/vgic-v2.c
> > +++ b/xen/arch/arm/vgic-v2.c
> > @@ -472,10 +472,12 @@ static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info,
> > 
> >     case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
> >         if ( dabt.size != DABT_WORD ) goto bad_width;
> > -        printk(XENLOG_G_ERR
> > -               "%pv: vGICD: unhandled word write %#"PRIregister" to ISPENDR%d\n",
> > -               v, r, gicd_reg - GICD_ISPENDR);
> > -        return 0;
> > +        rank = vgic_rank_offset(v, 1, gicd_reg - GICD_ISPENDR, DABT_WORD);
> > +        if ( rank == NULL ) goto write_ignore;
> > +
> > +        vgic_set_irqs_pending(v, r, rank->index);
> > +
> > +        return 1;
> > 
> >     case VRANGE32(GICD_ICPENDR, GICD_ICPENDRN):
> >         if ( dabt.size != DABT_WORD ) goto bad_width;
> > diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> > index fd8cfc156d0c..613f37abab5e 100644
> > --- a/xen/arch/arm/vgic-v3.c
> > +++ b/xen/arch/arm/vgic-v3.c
> > @@ -808,10 +808,12 @@ static int __vgic_v3_distr_common_mmio_write(const char *name, struct vcpu *v,
> > 
> >     case VRANGE32(GICD_ISPENDR, GICD_ISPENDRN):
> >         if ( dabt.size != DABT_WORD ) goto bad_width;
> > -        printk(XENLOG_G_ERR
> > -               "%pv: %s: unhandled word write %#"PRIregister" to ISPENDR%d\n",
> > -               v, name, r, reg - GICD_ISPENDR);
> > -        return 0;
> > +        rank = vgic_rank_offset(v, 1, reg - GICD_ISPENDR, DABT_WORD);
> > +        if ( rank == NULL ) goto write_ignore;
> > +
> > +        vgic_set_irqs_pending(v, r, rank->index);
> > +
> > +        return 1;
> > 
> >     case VRANGE32(GICD_ICPENDR, GICD_ICPENDRN):
> >         if ( dabt.size != DABT_WORD ) goto bad_width;
> > @@ -975,6 +977,7 @@ static int vgic_v3_rdistr_sgi_mmio_write(struct vcpu *v, mmio_info_t *info,
> >     case VREG32(GICR_ICACTIVER0):
> >     case VREG32(GICR_ICFGR1):
> >     case VRANGE32(GICR_IPRIORITYR0, GICR_IPRIORITYR7):
> > +    case VREG32(GICR_ISPENDR0):
> >          /*
> >           * Above registers offset are common with GICD.
> >           * So handle common with GICD handling
> > @@ -982,13 +985,6 @@ static int vgic_v3_rdistr_sgi_mmio_write(struct vcpu *v, mmio_info_t *info,
> >         return __vgic_v3_distr_common_mmio_write("vGICR: SGI", v,
> >                                                  info, gicr_reg, r);
> > 
> > -    case VREG32(GICR_ISPENDR0):
> > -        if ( dabt.size != DABT_WORD ) goto bad_width;
> > -        printk(XENLOG_G_ERR
> > -               "%pv: vGICR: SGI: unhandled word write %#"PRIregister" to ISPENDR0\n",
> > -               v, r);
> > -        return 0;
> > -
> >     case VREG32(GICR_ICPENDR0):
> >         if ( dabt.size != DABT_WORD ) goto bad_width;
> >         printk(XENLOG_G_ERR
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index 82f524a35c9e..8f9400a51960 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -423,6 +423,53 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
> >     }
> > }
> > 
> > +void vgic_set_irqs_pending(struct vcpu *v, uint32_t r, unsigned int rank)
> > +{
> > +    const unsigned long mask = r;
> > +    unsigned int i;
> > +    /* The first rank is always per-vCPU */
> > +    bool private = rank == 0;
> > +
> > +    /* LPIs will never be set pending via this function */
> > +    ASSERT(!is_lpi(32 * rank + 31));
> > +
> > +    for_each_set_bit( i, &mask, 32 )
> > +    {
> > +        unsigned int irq = i + 32 * rank;
> > +
> > +        if ( !private )
> > +        {
> > +            struct pending_irq *p = spi_to_pending(v->domain, irq);
> > +
> > +            /*
> > +             * When the domain sets the pending state for a HW interrupt on
> > +             * the virtual distributor, we set the pending state on the
> > +             * physical distributor.
> > +             *
> > +             * XXX: Investigate whether we would be able to set the
> > +             * physical interrupt active and save an interruption. (This
> > +             * is what the new vGIC does).
> > +             */
> > +            if ( p->desc != NULL )
> > +            {
> > +                unsigned long flags;
> > +
> > +                spin_lock_irqsave(&p->desc->lock, flags);
> > +                gic_set_pending_state(p->desc, true);
> > +                spin_unlock_irqrestore(&p->desc->lock, flags);
> > +                continue;
> > +            }
> > +        }
> > +
> > +        /*
> > +         * If the interrupt is per-vCPU, then we want to inject the vIRQ
> > +         * to v, otherwise we should let the function figuring out the
> > +         * correct vCPU.
> > +         */
> > +        vgic_inject_irq(v->domain, private ? v : NULL, irq, true);
> > +    }
> > +}
> > +
> > bool vgic_to_sgi(struct vcpu *v, register_t sgir, enum gic_sgi_mode irqmode,
> >                  int virq, const struct sgi_target *target)
> > {
> > diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
> > index ce1e3c4bbdac..62c2ae538db2 100644
> > --- a/xen/include/asm-arm/vgic.h
> > +++ b/xen/include/asm-arm/vgic.h
> > @@ -288,6 +288,8 @@ extern struct vgic_irq_rank *vgic_rank_offset(struct vcpu *v, int b, int n, int
> > extern struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned int irq);
> > extern void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n);
> > extern void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n);
> > +extern void vgic_set_irqs_pending(struct vcpu *v, uint32_t r,
> > +                                  unsigned int rank);
> > extern void register_vgic_ops(struct domain *d, const struct vgic_ops *ops);
> > int vgic_v2_init(struct domain *d, int *mmio_count);
> > int vgic_v3_init(struct domain *d, int *mmio_count);
> > -- 
> > 2.17.1
> > 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
  2021-02-23  1:24   ` Stefano Stabellini
@ 2021-02-23 10:31     ` Julien Grall
  2021-02-23 20:40       ` Stefano Stabellini
  0 siblings, 1 reply; 8+ messages in thread
From: Julien Grall @ 2021-02-23 10:31 UTC (permalink / raw)
  To: Stefano Stabellini, Bertrand Marquis
  Cc: xen-devel, iwj, Julien Grall, Volodymyr Babchuk, dfaggioli,
	George.Dunlap

Hi Stefano,

On 23/02/2021 01:24, Stefano Stabellini wrote:
> On Mon, 22 Feb 2021, Bertrand Marquis wrote:
>>> On 20 Feb 2021, at 14:04, Julien Grall <julien@xen.org> wrote:
> The consequence of this patch is that a guest can cause vcpu_unblock(v),
> hence vcpu_wake(v), to be called for its own vcpu, which basically
> always changes v to RUNSTATE_runnable. However, that alone shouldn't
> allow v to always come up ahead of any other vcpus in the queue, right?
> It should be safe. I just wanted a second opinion on this :-)

vcpu_wake() only tells the scheduler that the vCPU can be run, it is 
then up to the scheduler to decide what to do. AFAIU, for credit{1, 2}, 
each vCPU will have some credit. If you run out of credit, then you vCPU 
will not be descheduled if there is work do it.

> 
> It was possible to trigger interrupts for your own vcpus even before, but
> now the code path is going to be direct for virtual interrupts.

You can already trigger "directly" virtual interrupts using the event 
channels.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
  2021-02-23 10:31     ` Julien Grall
@ 2021-02-23 20:40       ` Stefano Stabellini
  0 siblings, 0 replies; 8+ messages in thread
From: Stefano Stabellini @ 2021-02-23 20:40 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Bertrand Marquis, xen-devel, iwj,
	Julien Grall, Volodymyr Babchuk, dfaggioli, George.Dunlap

On Tue, 23 Feb 2021, Julien Grall wrote:
> Hi Stefano,
> 
> On 23/02/2021 01:24, Stefano Stabellini wrote:
> > On Mon, 22 Feb 2021, Bertrand Marquis wrote:
> > > > On 20 Feb 2021, at 14:04, Julien Grall <julien@xen.org> wrote:
> > The consequence of this patch is that a guest can cause vcpu_unblock(v),
> > hence vcpu_wake(v), to be called for its own vcpu, which basically
> > always changes v to RUNSTATE_runnable. However, that alone shouldn't
> > allow v to always come up ahead of any other vcpus in the queue, right?
> > It should be safe. I just wanted a second opinion on this :-)
> 
> vcpu_wake() only tells the scheduler that the vCPU can be run, it is then up
> to the scheduler to decide what to do. AFAIU, for credit{1, 2}, each vCPU will
> have some credit. If you run out of credit, then you vCPU will not be
> descheduled if there is work do it.

OK, great, it matches my understanding. Thanks for checking.

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
  2021-02-22 13:45 ` Bertrand Marquis
  2021-02-23  1:24   ` Stefano Stabellini
@ 2021-02-25 12:22   ` Julien Grall
  2021-02-25 21:27     ` Stefano Stabellini
  1 sibling, 1 reply; 8+ messages in thread
From: Julien Grall @ 2021-02-25 12:22 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: xen-devel, iwj, Julien Grall, Stefano Stabellini, Volodymyr Babchuk



On 22/02/2021 13:45, Bertrand Marquis wrote:
> Hi Julien,
> 
>> On 20 Feb 2021, at 14:04, Julien Grall <julien@xen.org> wrote:
>>
>> From: Julien Grall <jgrall@amazon.com>
>>
>> Currently, Xen will send a data abort to a guest trying to write to the
>> ISPENDR.
>>
>> Unfortunately, recent version of Linux (at least 5.9+) will start
>> writing to the register if the interrupt needs to be re-triggered
>> (see the callback irq_retrigger). This can happen when a driver (such as
>> the xgbe network driver on AMD Seattle) re-enable an interrupt:
>>
>> (XEN) d0v0: vGICD: unhandled word write 0x00000004000000 to ISPENDR44
>> [...]
>> [   25.635837] Unhandled fault at 0xffff80001000522c
>> [...]
>> [   25.818716]  gic_retrigger+0x2c/0x38
>> [   25.822361]  irq_startup+0x78/0x138
>> [   25.825920]  __enable_irq+0x70/0x80
>> [   25.829478]  enable_irq+0x50/0xa0
>> [   25.832864]  xgbe_one_poll+0xc8/0xd8
>> [   25.836509]  net_rx_action+0x110/0x3a8
>> [   25.840328]  __do_softirq+0x124/0x288
>> [   25.844061]  irq_exit+0xe0/0xf0
>> [   25.847272]  __handle_domain_irq+0x68/0xc0
>> [   25.851442]  gic_handle_irq+0xa8/0xe0
>> [   25.855171]  el1_irq+0xb0/0x180
>> [   25.858383]  arch_cpu_idle+0x18/0x28
>> [   25.862028]  default_idle_call+0x24/0x5c
>> [   25.866021]  do_idle+0x204/0x278
>> [   25.869319]  cpu_startup_entry+0x24/0x68
>> [   25.873313]  rest_init+0xd4/0xe4
>> [   25.876611]  arch_call_rest_init+0x10/0x1c
>> [   25.880777]  start_kernel+0x5b8/0x5ec
>>
>> As a consequence, the OS may become unusable.
>>
>> Implementing the write part of ISPENDR is somewhat easy. For
>> virtual interrupt, we only need to inject the interrupt again.
>>
>> For physical interrupt, we need to be more careful as the de-activation
>> of the virtual interrupt will be propagated to the physical distributor.
>> For simplicity, the physical interrupt will be set pending so the
>> workflow will not differ from a "real" interrupt.
>>
>> Longer term, we could possible directly activate the physical interrupt
>> and avoid taking an exception to inject the interrupt to the domain.
>> (This is the approach taken by the new vGIC based on KVM).
>>
>> Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> This is something which will not be done by a guest very often so I think your
> implementation actually makes it simpler and reduce possibilities of race conditions
> so I am not even sure that the XXX comment is needed.

I think the XXX is useful as if someone notice an issue with the code, 
then they know what they could try.

I am open to suggestion how we could keep track of potential improvement.

> But i am ok with it being in or not so:
> 
> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>

Thanks!

Cheers,
-- 
Julien Grall


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
  2021-02-25 12:22   ` Julien Grall
@ 2021-02-25 21:27     ` Stefano Stabellini
  0 siblings, 0 replies; 8+ messages in thread
From: Stefano Stabellini @ 2021-02-25 21:27 UTC (permalink / raw)
  To: Julien Grall
  Cc: Bertrand Marquis, xen-devel, iwj, Julien Grall,
	Stefano Stabellini, Volodymyr Babchuk

On Thu, 25 Feb 2021, Julien Grall wrote:
> On 22/02/2021 13:45, Bertrand Marquis wrote:
> > Hi Julien,
> > 
> > > On 20 Feb 2021, at 14:04, Julien Grall <julien@xen.org> wrote:
> > > 
> > > From: Julien Grall <jgrall@amazon.com>
> > > 
> > > Currently, Xen will send a data abort to a guest trying to write to the
> > > ISPENDR.
> > > 
> > > Unfortunately, recent version of Linux (at least 5.9+) will start
> > > writing to the register if the interrupt needs to be re-triggered
> > > (see the callback irq_retrigger). This can happen when a driver (such as
> > > the xgbe network driver on AMD Seattle) re-enable an interrupt:
> > > 
> > > (XEN) d0v0: vGICD: unhandled word write 0x00000004000000 to ISPENDR44
> > > [...]
> > > [   25.635837] Unhandled fault at 0xffff80001000522c
> > > [...]
> > > [   25.818716]  gic_retrigger+0x2c/0x38
> > > [   25.822361]  irq_startup+0x78/0x138
> > > [   25.825920]  __enable_irq+0x70/0x80
> > > [   25.829478]  enable_irq+0x50/0xa0
> > > [   25.832864]  xgbe_one_poll+0xc8/0xd8
> > > [   25.836509]  net_rx_action+0x110/0x3a8
> > > [   25.840328]  __do_softirq+0x124/0x288
> > > [   25.844061]  irq_exit+0xe0/0xf0
> > > [   25.847272]  __handle_domain_irq+0x68/0xc0
> > > [   25.851442]  gic_handle_irq+0xa8/0xe0
> > > [   25.855171]  el1_irq+0xb0/0x180
> > > [   25.858383]  arch_cpu_idle+0x18/0x28
> > > [   25.862028]  default_idle_call+0x24/0x5c
> > > [   25.866021]  do_idle+0x204/0x278
> > > [   25.869319]  cpu_startup_entry+0x24/0x68
> > > [   25.873313]  rest_init+0xd4/0xe4
> > > [   25.876611]  arch_call_rest_init+0x10/0x1c
> > > [   25.880777]  start_kernel+0x5b8/0x5ec
> > > 
> > > As a consequence, the OS may become unusable.
> > > 
> > > Implementing the write part of ISPENDR is somewhat easy. For
> > > virtual interrupt, we only need to inject the interrupt again.
> > > 
> > > For physical interrupt, we need to be more careful as the de-activation
> > > of the virtual interrupt will be propagated to the physical distributor.
> > > For simplicity, the physical interrupt will be set pending so the
> > > workflow will not differ from a "real" interrupt.
> > > 
> > > Longer term, we could possible directly activate the physical interrupt
> > > and avoid taking an exception to inject the interrupt to the domain.
> > > (This is the approach taken by the new vGIC based on KVM).
> > > 
> > > Signed-off-by: Julien Grall <jgrall@amazon.com>
> > 
> > This is something which will not be done by a guest very often so I think
> > your
> > implementation actually makes it simpler and reduce possibilities of race
> > conditions
> > so I am not even sure that the XXX comment is needed.
> 
> I think the XXX is useful as if someone notice an issue with the code, then
> they know what they could try.
> 
> I am open to suggestion how we could keep track of potential improvement.

It is worth capturing it somewhere. Maybe the commit message is better
than as an in-code comment?

Either way it is fine by me and feel free to make that kind of change on
commit.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-25 21:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-20 14:04 [PATCH for-4.15] xen/vgic: Implement write to ISPENDR in vGICv{2, 3} Julien Grall
2021-02-22 11:10 ` Ian Jackson
2021-02-22 13:45 ` Bertrand Marquis
2021-02-23  1:24   ` Stefano Stabellini
2021-02-23 10:31     ` Julien Grall
2021-02-23 20:40       ` Stefano Stabellini
2021-02-25 12:22   ` Julien Grall
2021-02-25 21:27     ` Stefano Stabellini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).