On Mon, 2020-11-30 at 12:17 +0000, Joao Martins wrote: > On 11/30/20 9:41 AM, David Woodhouse wrote: > > On Wed, 2019-02-20 at 20:15 +0000, Joao Martins wrote: > > > userspace registers a @port to an @eventfd, that is bound to a > > > @vcpu. This information is then used when the guest does an > > > EVTCHNOP_send with a port registered with the kernel. > > > > Why do I want this part? > > > > It is unnecessary churn to support eventfd at this point. > > The patch subject/name is also a tad misleading, as > it implements the event channel port offloading with the optional fd > being just a small detail in addition. Right, I'd got that the commit title overstated its importance, but was wondering why the optional fd existed at all. I looked through the later xen_shim parts, half expecting it to be used there... but no, they add their own special case next to the place where eventfd_signal() gets called, instead of hooking the shim up via an eventfd. > > > EVTCHNOP_send short-circuiting happens by marking the event as pending > > > in the shared info and vcpu info pages and doing the upcall. For IPIs > > > and interdomain event channels, we do the upcall on the assigned vcpu. > > > > This part I understand, 'peeking' at the EVTCHNOP_send hypercall so > > that we can short-circuit IPI delivery without it having to bounce > > through userspace. > > > > But why would I then want then short-circuit the short-circuit, > > providing an eventfd for it to signal... so that I can then just > > receive the event in userspace in a *different* form to the original > > hypercall exit I would have got? > > > > One thing I didn't quite do at the time, is the whitelisting of unregistered > ports to userspace. Right now, it's a blacklist i.e. if it's not handled in > the kernel (IPIs, timer vIRQ, etc) it goes back to userspace. When the only > ones which go to userspace should be explicitly requested as such > and otherwise return -ENOENT in the hypercall. Hm, why would -ENOENT be a fast path which needs to be handled in the kernel? > Perhaps eventfd could be a way to express this? Like if you register > without an eventfd it's offloaded, otherwise it's assigned to userspace, > or if neither it's then returned an error without bothering the VMM. I much prefer the simple model where the *only* event channels that the kernel knows about are the ones it's expected to handle. For any others, the bypass doesn't kick in, and userspace gets the KVM_EXIT_HYPERCALL exit. > But still, eventfd is probably unnecessary complexity when another @type > (XEN_EVTCHN_TYPE_USER) would serve, and then just exiting to userspace > and let it route its evtchn port handling to the its own I/O handling thread. Hmm... so the benefit of the eventfd is that we can wake the I/O thread directly instead of bouncing out to userspace on the vCPU thread only for it to send a signal and return to the guest? Did you ever use that, and it is worth the additional in-kernel code? Is there any reason we'd want that for IPI or VIRQ event channels, or can it be only for INTERDOM/UNBOUND event channels which come later? I'm tempted to leave it out of the first patch, and *maybe* add it back in a later patch, putting it in the union alongside .virq.type. struct kvm_xen_eventfd { #define XEN_EVTCHN_TYPE_VIRQ 0 #define XEN_EVTCHN_TYPE_IPI 1 __u32 type; __u32 port; __u32 vcpu; - __s32 fd; #define KVM_XEN_EVENTFD_DEASSIGN (1 << 0) #define KVM_XEN_EVENTFD_UPDATE (1 << 1) __u32 flags; union { struct { __u8 type; } virq; + struct { + __s32 eventfd; + } interdom; /* including unbound */ __u32 padding[2]; }; } evtchn; Does that make sense to you?