From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefano Stabellini Subject: Re: [PATCH v5 4/6] xen/arm: inflight irqs during migration Date: Thu, 19 Jun 2014 19:32:01 +0100 Message-ID: References: <1402504032-13267-4-git-send-email-stefano.stabellini@eu.citrix.com> <1403089443.24051.20.camel@kazak.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1403089443.24051.20.camel@kazak.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: julien.grall@citrix.com, xen-devel@lists.xensource.com, Stefano Stabellini List-Id: xen-devel@lists.xenproject.org On Wed, 18 Jun 2014, Ian Campbell wrote: > On Wed, 2014-06-11 at 17:27 +0100, Stefano Stabellini wrote: > > We need to take special care when migrating irqs that are already > > inflight from one vcpu to another. See "The effect of changes to an > > GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt > > Controller Architecture Specification. > > > > The main issue from the Xen point of view is that the lr_pending and > > inflight lists are per-vcpu. The lock we take to protect them is also > > per-vcpu. > > > > In order to avoid issues, we set a new flag GIC_IRQ_GUEST_MIGRATING, so > > that we can recognize when we receive an irq while the previous one is > > still inflight (given that we are only dealing with hardware interrupts > > here, it just means that its LR hasn't been cleared yet on the old vcpu). > > > > If GIC_IRQ_GUEST_MIGRATING is set, we only set GIC_IRQ_GUEST_QUEUED and > > interrupt the old vcpu. When clearing the LR on the old vcpu, we take > > special care of injecting the interrupt into the new vcpu. To do that we > > need to release the old vcpu lock before taking the new vcpu lock. > > > > Using barriers and test_bit on GIC_IRQ_GUEST_MIGRATING, make sure that > > vgic_migrate_irq can run at the same time as gic_update_one_lr on > > different cpus with consistent results. > > > > Signed-off-by: Stefano Stabellini > > > > --- > > > > Changes in v5: > > - pass unsigned long to find_next_bit for alignment on aarch64; > > - call vgic_get_target_vcpu instead of gic_add_to_lr_pending to add the > > irq in the right inflight queue; > > - add barrier and bit tests to make sure that vgic_migrate_irq and > > gic_update_one_lr can run simultaneously on different cpus without > > issues; > > - rework the loop to identify the new vcpu when ITARGETSR is written; > > - use find_first_bit instead of find_next_bit. > > --- > > xen/arch/arm/gic.c | 25 ++++++++++++++-- > > xen/arch/arm/vgic.c | 66 +++++++++++++++++++++++++++++++++++++----- > > xen/include/asm-arm/domain.h | 4 +++ > > 3 files changed, 85 insertions(+), 10 deletions(-) > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index 5e502df..8ed242e 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -677,10 +677,31 @@ static void gic_update_one_lr(struct vcpu *v, int i) > > clear_bit(GIC_IRQ_GUEST_ACTIVE, &p->status); > > p->lr = GIC_INVALID_LR; > > if ( test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) && > > - test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) ) > > + test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) && > > + !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) ) > > gic_raise_guest_irq(v, irq, p->priority); > > - else > > + else { > > list_del_init(&p->inflight); > > + > > + /* inflight is cleared before clearing > > + * GIC_IRQ_GUEST_MIGRATING */ > > + dsb(sy); > > Is sy really necessary? THis is all about state stored in RAM not the > actual GIC, right? I think "ish" is probably sufficient. POssibly even a > dmb of some sort. Yeah, it is only about the state in RAM. It only needs to be visible across all the cores, not the devices. So I guess the right function to call would be smp_wmb() that translates to dmb(ishst)? That makes me think that I should add smp_rmb() to the corresponding critical section: vgic_migrate_irq. > > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c > > index 4d655af..e640de9 100644 > > --- a/xen/arch/arm/vgic.c > > +++ b/xen/arch/arm/vgic.c > > @@ -543,6 +562,8 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info) > > int offset = (int)(info->gpa - v->domain->arch.vgic.dbase); > > int gicd_reg = REG(offset); > > uint32_t tr; > > + unsigned long trl; > > + int i; > > > > switch ( gicd_reg ) > > { > > @@ -626,22 +647,45 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info) > > rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR); > > if ( rank == NULL) goto write_ignore; > > /* 8-bit vcpu mask for this domain */ > > - tr = (1 << v->domain->max_vcpus) - 1; > > - tr = tr | (tr << 8) | (tr << 16) | (tr << 24); > > - tr &= *r; > > + trl = (1 << v->domain->max_vcpus) - 1; > > + if ( dabt.size == 2 ) > > + trl = trl | (trl << 8) | (trl << 16) | (trl << 24); > > + else > > + trl = (trl << (8 * (offset & 0x3))); > > + trl &= *r; > > What does trl stand for? Why is it an unsigned long when r and tr are > uint32_t? It is an unfortunate name. trl is exactly like tr but unsigned long instead of uint32_t. It is unsigned long so that it can be used as an argument of find_next_bit below. uint32_t cannot be used because it doesn't have the right alignment on armv8. > > /* ignore zero writes */ > > - if ( !tr ) > > + if ( !trl ) > > goto write_ignore; > > if ( dabt.size == 2 && > > - !((tr & 0xff) && (tr & (0xff << 8)) && > > - (tr & (0xff << 16)) && (tr & (0xff << 24)))) > > + !((trl & 0xff) && (trl & (0xff << 8)) && > > + (trl & (0xff << 16)) && (trl & (0xff << 24)))) > > goto write_ignore; > > I'm still not sure what this is ;-) All the bytes in trl needs to be non-zero for the write to be valid (considering a 0 write to any of the byte fields as invalid). > > vgic_lock_rank(v, rank); > > + i = 0; > > + while ( (i = find_next_bit((const unsigned long *)&trl, 32, i)) < 32 ) > > Unnecessary cast (if trl really should be an unsigned long). The argument to find_next_bit really needs to be unsigned long. I'll remove the cast. > > + { > > + unsigned int irq, target, old_target; > > + unsigned long old_target_mask; > > + struct vcpu *v_target, *v_old; > > + > > + target = i % 8; > > + old_target_mask = byte_read(rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR)], 0, i/8); > > + old_target = find_first_bit((const unsigned long *) &old_target_mask, 8); > > Another unnecessary cast. Right. > > + > > + if ( target != old_target ) > > Given the implementation only supports a single target I'm wondering if > maybe we should just store the configured target. > > If we weren't required to read back the actual value then we could even > dispense with rank->itargets completely and just fabricate it. Yeah, that's a good point, I am not sure. The spec says that it is valid to statically configure GICD_ITARGETSR and ignore any writes to it, but it doesn't say that we can ignore /some/ writes to it. > > + { > > + irq = offset + (i / 8); > > + v_target = v->domain->vcpu[target]; > > + v_old = v->domain->vcpu[old_target]; > > + vgic_migrate_irq(v_old, v_target, irq); > > + } > > + i += 8 - target; > > I think this whole loop would be less subtle as: > > for ( byte = 0 ; byte < 4 ; byte++ ) > target = find_first_bit((char *)(&tr)+byte, 8) That would have alignment issues, wouldn't it? > if ( target > 8 ) continue; > old_target =... > } > > The use of find_next_bit is not really semantically what is happening > here.