On Thu, 2021-01-28 at 13:43 +0100, Paolo Bonzini wrote:
> On 11/01/21 20:57, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > It turns out that we can't handle event channels *entirely* in userspace
> > by delivering them as ExtINT, because KVM is a bit picky about when it
> > accepts ExtINT interrupts from a legacy PIC. The in-kernel local APIC
> > has to have LVT0 configured in APIC_MODE_EXTINT and unmasked, which
> > isn't necessarily the case for Xen guests especially on secondary CPUs.
> > 
> > To cope with this, add kvm_xen_get_interrupt() which checks the
> > evtchn_pending_upcall field in the Xen vcpu_info, and delivers the Xen
> > upcall vector (configured by KVM_XEN_ATTR_TYPE_UPCALL_VECTOR) if it's
> > set regardless of LAPIC LVT0 configuration. This gives us the minimum
> > support we need for completely userspace-based implementation of event
> > channels.
> > 
> > This does mean that vcpu_enter_guest() needs to check for the
> > evtchn_pending_upcall flag being set, because it can't rely on someone
> > having set KVM_REQ_EVENT unless we were to add some way for userspace to
> > do so manually.
> > 
> > But actually, I don't quite see how that works reliably for interrupts
> > injected with KVM_INTERRUPT either. In kvm_vcpu_ioctl_interrupt() the
> > KVM_REQ_EVENT request is set once, but that'll get cleared the first time
> > through vcpu_enter_guest(). So if the first exit is for something *else*
> > without interrupts being enabled yet, won't the KVM_REQ_EVENT request
> > have been consumed already and just be lost?
> 
> If the call is for something else and there is an interrupt, 
> inject_pending_event sets *req_immediate_exit to true.  This causes an 
> immediate vmexit after vmentry, and also schedules another KVM_REQ_EVENT 
> processing.
> 
> If the call is for an interrupt but you cannot process it yet (IF=0, 
> STI/MOVSS window, etc.), inject_pending_event calls 
> kvm_x86_ops.enable_irq_window and this will cause KVM_REQ_EVENT to be 
> sent again.

Ah, OK. I see it now; thanks.

> How do you inject the interrupt from userspace? 

The VMM injects the interrupt purely by setting ->evtchn_upcall_pending
in the vcpu_info. That is actually also a Xen guest ABI — guests can
retrigger the vector purely by setting ->evtchn_upcall_pending in their
own vcpu_info and doing anything which triggers a vmexit.

Xen checks it and potentially injects the vector, each time it enters
the guest — just as I've done it here in vcpu_enter_guest().

> IIRC evtchn_upcall_pending is written by the hypervisor upon receiving 
> a hypercall, so wouldn't you need the "dom0" to invoke a KVM_INTERRUPT 
> ioctl (e.g. with irq == 256)?  That KVM_INTERRUPT will set KVM_REQ_EVENT.

Well, right now that would return -EINVAL, so you're suggesting we add
a special case code path to kvm_vcpu_ioctl_interrupt which just sets
KVM_REQ_EVENT without calling kvm_queue_interrupt(), in the case where
irq->irq == KVM_NR_INTERRUPTS?

Then we require that the userspace VMM make that ioctl not only when
it's set ->evtchn_upcall_pending for itself, but *also* poll for the
guest having done so?

In fact, not only the VMM would have to do that polling, but we'd
probably also have to do it on any hypercalls we accelerate in the
kernel (as we're planning to do for IPIs, etc.)

So it has to live in the kernel anyway in *some* form.

So requiring KVM_REQ_EVENT to be set manually probably ends up being
more complex than just checking it directly in vcpu_enter_guest() as I
have done here.

Even before the static key improvement you suggest below, it's a fairly
lightweight check in the common case. If the vcpu_info is set and the
memslots didn't change, it's a single dereference of a userspace
pointer which will rarely fault and need any handling.

> If you want to write a testcase without having to write all the 
> interrupt stuff in the selftests framework, you could set an IDT that 
> has room only for 128 vectors and use interrupt 128 as the upcall 
> vector.  Then successful delivery of the vector will cause a triple fault.

Yeah, my testing of this part so far consists of actually booting Xen
guests — delivery of event channel vectors was the final thing we
needed to make them actually work. I'm planning to come back and work
out how to do a more comprehensive self-test once I do the in-kernel
IPI acceleration and polling support.

I really think I'll have to come up with something better than "I can
make it crash" for those more complex tests, so I haven't bothered with
doing that as an interim step for the basic vector delivery.

> Independent of the answer to the above, this is really the only place 
> where you're adding Xen code to a hot path.  Can you please use a 
> STATIC_KEY_FALSE kvm_has_xen_vcpu (and a static inline function) to 
> quickly return from kvm_xen_has_interrupt() if no vCPU has a shared info 
> set up?

Ack. I'll do that.