From mboxrd@z Thu Jan  1 00:00:00 1970
From: Ian Campbell <Ian.Campbell@citrix.com>
Subject: Re: [PATCH v5 4/6] xen/arm: inflight irqs during
	migration
Date: Wed, 18 Jun 2014 12:04:03 +0100
Message-ID: <1403089443.24051.20.camel@kazak.uk.xensource.com>
References: <alpine.DEB.2.02.1406111607450.13771@kaball.uk.xensource.com>
	<1402504032-13267-4-git-send-email-stefano.stabellini@eu.citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1402504032-13267-4-git-send-email-stefano.stabellini@eu.citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: julien.grall@citrix.com, xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On Wed, 2014-06-11 at 17:27 +0100, Stefano Stabellini wrote:
> We need to take special care when migrating irqs that are already
> inflight from one vcpu to another. See "The effect of changes to an
> GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
> Controller Architecture Specification.
> 
> The main issue from the Xen point of view is that the lr_pending and
> inflight lists are per-vcpu. The lock we take to protect them is also
> per-vcpu.
> 
> In order to avoid issues, we set a new flag GIC_IRQ_GUEST_MIGRATING, so
> that we can recognize when we receive an irq while the previous one is
> still inflight (given that we are only dealing with hardware interrupts
> here, it just means that its LR hasn't been cleared yet on the old vcpu).
> 
> If GIC_IRQ_GUEST_MIGRATING is set, we only set GIC_IRQ_GUEST_QUEUED and
> interrupt the old vcpu. When clearing the LR on the old vcpu, we take
> special care of injecting the interrupt into the new vcpu. To do that we
> need to release the old vcpu lock before taking the new vcpu lock.
> 
> Using barriers and test_bit on GIC_IRQ_GUEST_MIGRATING, make sure that
> vgic_migrate_irq can run at the same time as gic_update_one_lr on
> different cpus with consistent results.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 
> ---
> 
> Changes in v5:
> - pass unsigned long to find_next_bit for alignment on aarch64;
> - call vgic_get_target_vcpu instead of gic_add_to_lr_pending to add the
> irq in the right inflight queue;
> - add barrier and bit tests to make sure that vgic_migrate_irq and
> gic_update_one_lr can run simultaneously on different cpus without
> issues;
> - rework the loop to identify the new vcpu when ITARGETSR is written;
> - use find_first_bit instead of find_next_bit.
> ---
>  xen/arch/arm/gic.c           |   25 ++++++++++++++--
>  xen/arch/arm/vgic.c          |   66 +++++++++++++++++++++++++++++++++++++-----
>  xen/include/asm-arm/domain.h |    4 +++
>  3 files changed, 85 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 5e502df..8ed242e 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -677,10 +677,31 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>          clear_bit(GIC_IRQ_GUEST_ACTIVE, &p->status);
>          p->lr = GIC_INVALID_LR;
>          if ( test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) &&
> -             test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) )
> +             test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) &&
> +             !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
>              gic_raise_guest_irq(v, irq, p->priority);
> -        else
> +        else {
>              list_del_init(&p->inflight);
> +
> +            /* inflight is cleared before clearing
> +             * GIC_IRQ_GUEST_MIGRATING */
> +            dsb(sy);

Is sy really necessary? THis is all about state stored in RAM not the
actual GIC, right? I think "ish" is probably sufficient. POssibly even a
dmb of some sort.

> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index 4d655af..e640de9 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -543,6 +562,8 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
>      int offset = (int)(info->gpa - v->domain->arch.vgic.dbase);
>      int gicd_reg = REG(offset);
>      uint32_t tr;
> +    unsigned long trl;
> +    int i;
>  
>      switch ( gicd_reg )
>      {
> @@ -626,22 +647,45 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
>          rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR);
>          if ( rank == NULL) goto write_ignore;
>          /* 8-bit vcpu mask for this domain */
> -        tr = (1 << v->domain->max_vcpus) - 1;
> -        tr = tr | (tr << 8) | (tr << 16) | (tr << 24);
> -        tr &= *r;
> +        trl = (1 << v->domain->max_vcpus) - 1;
> +        if ( dabt.size == 2 )
> +            trl = trl | (trl << 8) | (trl << 16) | (trl << 24);
> +        else
> +            trl = (trl << (8 * (offset & 0x3)));
> +        trl &= *r;

What does trl stand for? Why is it an unsigned long when r and tr are
uint32_t?

>          /* ignore zero writes */
> -        if ( !tr )
> +        if ( !trl )
>              goto write_ignore;
>          if ( dabt.size == 2 &&
> -            !((tr & 0xff) && (tr & (0xff << 8)) &&
> -             (tr & (0xff << 16)) && (tr & (0xff << 24))))
> +            !((trl & 0xff) && (trl & (0xff << 8)) &&
> +             (trl & (0xff << 16)) && (trl & (0xff << 24))))
>              goto write_ignore;

I'm still not sure what this is ;-)

>          vgic_lock_rank(v, rank);
> +        i = 0;
> +        while ( (i = find_next_bit((const unsigned long *)&trl, 32, i)) < 32 )

Unnecessary cast (if trl really should be an unsigned long).

> +        {
> +            unsigned int irq, target, old_target;
> +            unsigned long old_target_mask;
> +            struct vcpu *v_target, *v_old;
> +
> +            target = i % 8;
> +            old_target_mask = byte_read(rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR)], 0, i/8);
> +            old_target = find_first_bit((const unsigned long *) &old_target_mask, 8);

Another unnecessary cast.

> +
> +            if ( target != old_target )

Given the implementation only supports a single target I'm wondering if
maybe we should just store the configured target.

If we weren't required to read back the actual value then we could even
dispense with rank->itargets completely and just fabricate it.

> +            {
> +                irq = offset + (i / 8);
> +                v_target = v->domain->vcpu[target];
> +                v_old = v->domain->vcpu[old_target];
> +                vgic_migrate_irq(v_old, v_target, irq);
> +            }
> +            i += 8 - target;

I think this whole loop would be less subtle as:

	for ( byte = 0 ; byte < 4 ; byte++ )
		target = find_first_bit((char *)(&tr)+byte, 8)
		if ( target > 8 ) continue;
		old_target  =...
	}

The use of find_next_bit is not really semantically what is happening
here.

Ian.