Re: [RFC] ARM: New (Xen) VGIC design document

From: Stefano Stabellini <sstabellini@kernel.org>
To: Andre Przywara <andre.przywara@linaro.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Eric Auger <eric.auger@redhat.com>,
	Julien Grall <julien.grall@arm.com>,
	xen-devel@lists.xenproject.org,
	Christoffer Dall <christoffer.dall@linaro.org>
Subject: Re: [RFC] ARM: New (Xen) VGIC design document
Date: Thu, 2 Nov 2017 10:56:53 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1711021053080.1707@sstabellini-ThinkPad-X260> (raw)
In-Reply-To: <57cd8f84-8a56-132f-cb9a-a6889f743e53@linaro.org>

On Thu, 2 Nov 2017, Andre Przywara wrote:
> >>>> (CC:ing some KVM/ARM folks involved in the VGIC)
> >>>>
> >>>> starting with the addition of the ITS support we were seeing more and
> >>>> more issues with the current implementation of our ARM Generic Interrupt
> >>>> Controller (GIC) emulation, the VGIC.
> >>>> Among other approaches to fix those issues it was proposed to copy the
> >>>> VGIC emulation used in KVM. This one was suffering from very similar
> >>>> issues, and a clean design from scratch lead to a very robust and
> >>>> capable re-implementation. Interestingly this implementation is fairly
> >>>> self-contained, so it seems feasible to copy it. Hopefully we only need
> >>>> minor adjustments, possibly we can even copy it verbatim with some
> >>>> additional glue layer code.
> >>>>
> >>>> Stefano asked for getting a design overview, to assess the feasibility
> >>>> of copying the KVM code without reviewing tons of code in the first
> >>>> place.
> >>>> So to follow Xen rules for new features, this design document below is
> >>>> an attempt to describe the current KVM VGIC design - in a hypervisor
> >>>> agnostic session. It is a bit of a retro-fit design description, as it
> >>>> is not strictly forward-looking only, but actually describing the
> >>>> existing implemenation [1].
> >>>>
> >>>> Please have a look and let me know:
> >>>> 1) if this document has the right scope
> >>>> 2) if this document has the right level of detail
> >>>> 3) if there are points missing from the document
> >>>> 3) if the design in general is a fit
> >>>
> >>> Please read the following statements as genuine questions and concerns.
> >>> Most ideas on this document are good. Some of them I have even suggested
> >>> them myself in the context of GIC improvements for Xen. I asked for a
> >>> couple of clarifications.
> >>>
> >>> But I don't see why we cannot implement these ideas on top of the
> >>> existing code, rather than with a separate codebase, ending up with two
> >>> drivers. I would prefer a natual evolution. Specifically, the following
> >>> improvements would be simple and would give us most of the benefits on
> >>> top of the current codebase:
> >>> - adding the irq lock, and the refcount
> >>> - taking both vcpu locks when necessary (on migration code for example
> >>>   it would help a lot), the lower vcpu_id first
> >>> - level irq emulation
> >>
> >> I think some of those points you mentioned are not easily implemented in
> >> the current Xen. For instance I ran into locking order issues with those
> >> *two* inflight and lr_queue lists, when trying to implement the lock and
> >> the refcount.
> >> Also this "put vIRQs into LRs early, but possibly rip them out again" is
> >> really complicating things a lot.
> >>
> >> I believe only level IRQs could be added in a relatively straight
> >> forward manner.
> >>
> >> So the problem with the evolutionary approach is that it generates a lot
> >> of patches, some of them quite invasive, others creating hard-to-read
> >> diffs, which are both hard to review.
> >> And chances are that the actual result would be pretty close to the KVM
> >> code. To be clear: I hacked the Xen VGIC into the KVM direction in a few
> >> days some months ago, but it took me *weeks* to make sane patches of
> >> only the first part of it.
> >> And this would not cover all those general, tedious corner cases that
> >> the VGIC comes with. Those would need to be fixed in a painful process,
> >> which we could avoid by "lifting" the KVM code.
> > 
> > I hear you, but the principal cost here is the review time, not the
> > development time. Julien told me that it would be pretty much the same
> > for him in terms of time it takes to review the changes, it doesn't
> > matter if it's a new driver or changes to the existing driver. For me,
> > it wouldn't be the same: I think it would take me far less time to
> > review them if they were against the existing codebase.
> 
> I am not so sure about this. The changes are quite dramatic, and those
> changes tend to produce horrible diffs. Or we try to mitigate this, but
> this comes at a cost of having *many* patches, which take a while to
> produce.
> But if we instantiate a new VGIC implementation from scratch, we can
> provide very nice-to-review patches, because the patches can focus on
> logical changes and don't need to care about bisectability.

All right

> > However, as I wrote, this is not my foremost concern. I would be up to
> > committing myself to review this even if we decide to go for a new
> > driver.
> > 
> > 
> >>> If we do end up with a second separate driver for technical or process
> >>> reasons, I would expect the regular Xen submission/review process to be
> >>> followed. The code style will be different, the hooks into the rest of
> >>> the hypervisors will be different and things will be generally changed.
> >>> The new V/GIC might be derived from KVM, but it should end up looking
> >>> and feeling like a 100% genuine Xen component. After all, we'll
> >>> maintain it going forward. I don't want a copy of a Linux driver with
> >>> glue code. The Xen community cannot be expected not to review the
> >>> submission, but if we review it, then we'll ask for changes. Once we
> >>> change the code, there will be no point in keeping the Linux code
> >>> separate with glue code. We should fully adapt it to Xen.
> >>
> >> I see your point, and this actually simplifies *my* work, but I am a bit
> >> worried about the effects of having two separate implementations which
> >> then diverge over time.
> >> In the moment we have two separate implementations as well, but they are
> >> quite different, which has the advantage of doing things differently
> >> enough to help in finding bugs in the other one (something we should
> >> actually exploit in testing, I believe).
> > 
> > It is a matter of ownership and responsibilities. The gic and vgic
> > components are critical to the hypervisor functionalities, and as Xen
> > Project we need to take ownership of them. It means fixing bugs and
> > maintaining them going forward. It makes sense to have them fully
> > integrated into Xen.
> 
> Yes, I can see that. I now came to the belief that taking the KVM code
> *verbatim* is not worth the effort: in the moment I am struggling with
> tiny, but nasty details to be solved.
> If we allow the code to be changed, we get much more freedom.

Sounds good.

> >> So how is your feeling towards some shared "libvgic"? I understand that
> >> people are not too happy about that extra maintenance cost of having a
> >> separate repository, but I am curious what your, Marc's and
> >> Christoffer's take is on this idea.
> > 
> > I am open to this discussion. It is nice in theory, but it is hard to
> > put into practice. I think neither Julien and I nor Christoffer and Marc
> > like the idea of a separate repository. It is a pain and it is ugly. But
> > if we don't have a single repository, how can we share the codebase?
> > 
> > Also keep in mind that Xen and Linux have different release cycles and
> > they go into freeze at different times. It affects when/how fixes can
> > get into the codebase.
> > 
> > Unless you come up with a clever idea on how to make this work, I think
> > we are better off with our own version of the driver.
> 
> Yeah, I agree, it would probably be quite some pain, which is hard to
> justify, especially from the Linux side.

Right

> >>>> ### Virtual IRQ references
> >>>>
> >>>> There is a function `vgic_get_irq()` which returns a reference to a virtual IRQ
> >>>> given its number.
> >>>> For private IRQs and SPIs it is expected that this just indexes a static array.
> >>>> For LPIs (which are dynamically allocated at run time) this is expected to
> >>>> iterate a data structure (like a linked list) to find the right structure.
> >>>> In any case a call to `vgic_get_irq` will increase a refcount, which will
> >>>> prevent LPIs from being de-allocated while another part of the VGIC is still
> >>>> holding a reference. Thus any caller to `vgic_get_irq` shall call
> >>>> `vgic_put_irq()` after it is done with handling this interrupt.
> >>>> An exception would be if the virtual IRQ is eventually injected into a VCPU. In
> >>>> this case the VCPU holds that reference and it is kept as long as the guest
> >>>> sees this virtual IRQ. The refcount would only be decreased upon the IRQ having
> >>>> been EOIed by the guest and it having been removed from the VCPU list.
> >>>
> >>> I understand the idea behind a refcount and sounds like a good thing to
> >>> have.
> >>>
> >>> Let me ask you a couple of questions. How does it help with the issue
> >>> that an LPI could be discarded and remapped (MAPTI) from another
> >>> pcpu while it could still be in an LR?
> >>
> >> On DISCARD we remove it from the list of mapped LPIs, but don't free the
> >> structure. So any vgic_get_lpi(lpi_nr) won't find it anymore. But since
> >> the interrupt is in an LR, the VCPU's ap_list still references the
> >> vgic_irq structure, so we can do the whole IRQ life cycle management
> >> just as normal (because being a list member is what counts when it comes
> >> to a "life" interrupt).
> >> Once this LPI is EOIed, we remove it from the VCPU list, which decreases
> >> the refcount and most probably will free the memory, since the value has
> >> become zero by then. Normally, without unmapping it before, the
> >> reference held by the ITS list would make sure the refcount stays
> >> greater than 0.
> >>
> >> Now when there is a MAPTI to the same LPI number meanwhile, this will
> >> allocate a new structure (this is a new interrupt!) and enters this into
> >> the ITS list. So anyone asking for this new LPI *number* will get the
> >> reference to the new IRQ. Think: deleting a file and creating a new one
> >> with the same name on a UNIX system, any old users of an already opened
> >> file descriptor will still use the deleted file, but an open() will
> >> return a handle to the new file.
> > 
> > This needs to be captured in the doc.
> > 
> > Are vgic_irq struct dynamically allocated?
> > Is there a reutilization
> > scheme to avoid a malicious guest from spamming Xen with LPI requests?
> > Multiple struct vgic_irq for the same LPI would cause even more memory
> > allocations.
> 
> Interesting point. I need to think about a neat solution. For normal
> cases I think we might want to stick with the current Xen scheme of
> allocating the vIRQ structs when we map a device,then handing out
> pointers to some array member on vgic_add_lpi(). Maybe we re-pointer the
> existing vIRQ to point to some other location, and use the device
> provided storage

Yes, I think we need to take some ideas from our long ITS design
sessions to minimize the chances of DOS from a malicious guest. For
example, we have a scheme to reuse pending_irq struct in the ITS that I
think it is worth keeping.

> > If both the old and the new vgic_irq struct end up being written to LRs,
> > wouldn't it cause problems?
> 
> Can't happen. DISCARD removes the pending state. Since LPIs have no
> active state, upon the VCPU exiting this LPI's life cycle has finished.
> So we just keep it around as long as it's still in a VCPU, but it
> vanishes as soon as this VCPU exits.

Please add these details to the doc. In fact, let's turn this doc into a
useful documentation for the new implementation.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel