On Mon, 2020-11-30 at 12:17 +0000, Joao Martins wrote:
> On 11/30/20 9:41 AM, David Woodhouse wrote:
> > On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote:
> > > userspace registers a @port to an @eventfd, that is bound to a
> > >  @vcpu. This information is then used when the guest does an
> > > EVTCHNOP_send with a port registered with the kernel.
> > 
> > Why do I want this part?
> > 
> 
> It is unnecessary churn to support eventfd at this point.
> 
> The patch subject/name is also a tad misleading, as
> it implements the event channel port offloading with the optional fd
> being just a small detail in addition.

Right, I'd got that the commit title overstated its importance, but was
wondering why the optional fd existed at all.

I looked through the later xen_shim parts, half expecting it to be used
there... but no, they add their own special case next to the place
where eventfd_signal() gets called, instead of hooking the shim up via
an eventfd.

> > > EVTCHNOP_send short-circuiting happens by marking the event as pending
> > > in the shared info and vcpu info pages and doing the upcall. For IPIs
> > > and interdomain event channels, we do the upcall on the assigned vcpu.
> > 
> > This part I understand, 'peeking' at the EVTCHNOP_send hypercall so
> > that we can short-circuit IPI delivery without it having to bounce
> > through userspace.
> > 
> > But why would I then want then short-circuit the short-circuit,
> > providing an eventfd for it to signal... so that I can then just
> > receive the event in userspace in a *different* form to the original
> > hypercall exit I would have got?
> > 
> 
> One thing I didn't quite do at the time, is the whitelisting of unregistered
> ports to userspace. Right now, it's a blacklist i.e. if it's not handled in
> the kernel (IPIs, timer vIRQ, etc) it goes back to userspace. When the only
> ones which go to userspace should be explicitly requested as such
> and otherwise return -ENOENT in the hypercall.

Hm, why would -ENOENT be a fast path which needs to be handled in the
kernel?

> Perhaps eventfd could be a way to express this? Like if you register
> without an eventfd it's offloaded, otherwise it's assigned to userspace,
> or if neither it's then returned an error without bothering the VMM.

I much prefer the simple model where the *only* event channels that the
kernel knows about are the ones it's expected to handle.

For any others, the bypass doesn't kick in, and userspace gets the
KVM_EXIT_HYPERCALL exit.

> But still, eventfd is probably unnecessary complexity when another @type
> (XEN_EVTCHN_TYPE_USER) would serve, and then just exiting to userspace
> and let it route its evtchn port handling to the its own I/O handling thread.

Hmm... so the benefit of the eventfd is that we can wake the I/O thread
directly instead of bouncing out to userspace on the vCPU thread only
for it to send a signal and return to the guest? Did you ever use that,
and it is worth the additional in-kernel code?

Is there any reason we'd want that for IPI or VIRQ event channels, or
can it be only for INTERDOM/UNBOUND event channels which come later?

I'm tempted to leave it out of the first patch, and *maybe* add it back
in a later patch, putting it in the union alongside .virq.type.


                struct kvm_xen_eventfd {
 
 #define XEN_EVTCHN_TYPE_VIRQ 0
 #define XEN_EVTCHN_TYPE_IPI  1
                        __u32 type;
                        __u32 port;
                        __u32 vcpu;
-                       __s32 fd;
 
 #define KVM_XEN_EVENTFD_DEASSIGN       (1 << 0)
 #define KVM_XEN_EVENTFD_UPDATE         (1 << 1)
                        __u32 flags;
                        union {
                                struct {
                                        __u8 type;
                                } virq;
+                              struct {
+                                      __s32 eventfd;
+                              } interdom; /* including unbound */
                                __u32 padding[2];
                        };
               } evtchn;

Does that make sense to you?