Avi Kivity wrote:
> Gregory Haskins wrote:
>>> - works with all guests
>>> - supports hotplug/hotunplug, udev, sysfs, module autoloading, ...
>>> - supported in all OSes
>>> - someone else maintains it
>>>     
>> These points are all valid, and I really struggled with this particular
>> part of the design.  The entire vbus design only requires one IRQ for
>> the entire guest,
>
> Won't this have scaling issues?  One IRQ means one target vcpu. 
> Whereas I'd like virtio devices to span multiple queues, each queue
> with its own MSI IRQ.
Hmm..you know I hadnt really thought of it that way, but you have a
point.  To clarify, my design actually uses one IRQ per "eventq", where
we can have an arbitrary number of eventq's defined (note: today I only
define one eventq, however).  An eventq is actually a shm-ring construct
where I can pass events up to the host like "device added" or "ring X
signaled".  Each individual device based virtio-ring would then
aggregates "signal" events onto this eventq mechanism to actually inject
events to the host.  Only the eventq itself injects an actual IRQ to the
assigned vcpu.

My intended use of multiple eventqs was for prioritization of different
rings.  For instance, we could define 8 priority levels, each with its
own ring/irq.  That way, a virtio-net that supports something like
802.1p could define 8 virtio-rings, one for each priority level.

But this scheme is more targeted at prioritization than per vcpu
irq-balancing.  I support the eventq construct I proposed could still be
used in this fashion since each has its own routable IRQ.  However, I
would have to think about that some more because it is beyond the design
spec.

The good news is that the decision to use the "eventq+irq" approach is
completely contained in the kvm-host+guest.patch.  We could easily
switch to a 1:1 irq:shm-signal if we wanted to, and the device/drivers
would work exactly the same without modification.

>   Also, the single IRQ handler will need to scan for all potential IRQ
> sources.  Even if implemented carefully, this will cause many
> cacheline bounces.
Well, no, I think this part is covered.  As mentioned above, we use a
queuing technique so there is no scanning needed.  Ultimately I would
love to adapt a similar technique to optionally replace the LAPIC.  That
way we can avoid the EOI trap and just consume the next interrupt (if
applicable) from the shm-ring.

>
>>  so its conceivable that I could present a simple
>> "dummy" PCI device with some "VBUS" type PCI-ID, just to piggy back on
>> the IRQ routing logic.  Then userspace could simply pass the IRQ routing
>> info down to the kernel with an ioctl, or something similar.
>>   
>
> Xen does something similar, I believe.
>
>> I think ultimately I was trying to stay away from PCI in general because
>> I want to support environments that do not have PCI.  However, for the
>> kvm-transport case (at least on x86) this isnt really a constraint.
>>
>>   
>
> s/PCI/the native IRQ solution for your platform/. virtio has the same
> problem; on s390 we use the native (if that word ever applies to s390)
> interrupt and device discovery mechanism.

yeah, I agree.  We can contain the "exposure" of PCI to just platforms
within KVM that care about it.

-Greg