On Sun, Sep 5, 2021 at 7:24 PM AKASHI Takahiro via Stratos-dev < stratos-dev@op-lists.linaro.org> wrote: > Alex, > > On Fri, Sep 03, 2021 at 10:28:06AM +0100, Alex Benn??e wrote: > > > > AKASHI Takahiro writes: > > > > > Alex, > > > > > > On Wed, Sep 01, 2021 at 01:53:34PM +0100, Alex Benn??e wrote: > > >> > > >> Stefan Hajnoczi writes: > > >> > > >> > [[PGP Signed Part:Undecided]] > > >> > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote: > > >> >> > Could we consider the kernel internally converting IOREQ > messages from > > >> >> > the Xen hypervisor to eventfd events? Would this scale with > other kernel > > >> >> > hypercall interfaces? > > >> >> > > > >> >> > So any thoughts on what directions are worth experimenting with? > > >> >> > > >> >> One option we should consider is for each backend to connect to > Xen via > > >> >> the IOREQ interface. We could generalize the IOREQ interface and > make it > > >> >> hypervisor agnostic. The interface is really trivial and easy to > add. > > >> >> The only Xen-specific part is the notification mechanism, which is > an > > >> >> event channel. If we replaced the event channel with something > else the > > >> >> interface would be generic. See: > > >> >> > https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52 > > >> > > > >> > There have been experiments with something kind of similar in KVM > > >> > recently (see struct ioregionfd_cmd): > > >> > > https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/ > > >> > > >> Reading the cover letter was very useful in showing how this provides > a > > >> separate channel for signalling IO events to userspace instead of > using > > >> the normal type-2 vmexit type event. I wonder how deeply tied the > > >> userspace facing side of this is to KVM? Could it provide a common FD > > >> type interface to IOREQ? > > > > > > Why do you stick to a "FD" type interface? > > > > I mean most user space interfaces on POSIX start with a file descriptor > > and the usual read/write semantics or a series of ioctls. > > Who do you assume is responsible for implementing this kind of > fd semantics, OSs on BE or hypervisor itself? > > I think such interfaces can only be easily implemented on type-2 > hypervisors. > > # In this sense, I don't think rust-vmm, as it is, cannot be > # a general solution. > > > >> As I understand IOREQ this is currently a direct communication between > > >> userspace and the hypervisor using the existing Xen message bus. My > > > > > > With IOREQ server, IO event occurrences are notified to BE via Xen's > event > > > channel, while the actual contexts of IO events (see struct ioreq in > ioreq.h) > > > are put in a queue on a single shared memory page which is to be > assigned > > > beforehand with xenforeignmemory_map_resource hypervisor call. > > > > If we abstracted the IOREQ via the kernel interface you would probably > > just want to put the ioreq structure on a queue rather than expose the > > shared page to userspace. > > Where is that queue? > > > >> worry would be that by adding knowledge of what the underlying > > >> hypervisor is we'd end up with excess complexity in the kernel. For > one > > >> thing we certainly wouldn't want an API version dependency on the > kernel > > >> to understand which version of the Xen hypervisor it was running on. > > > > > > That's exactly what virtio-proxy in my proposal[1] does; All the > hypervisor- > > > specific details of IO event handlings are contained in virtio-proxy > > > and virtio BE will communicate with virtio-proxy through a virtqueue > > > (yes, virtio-proxy is seen as yet another virtio device on BE) and will > > > get IO event-related *RPC* callbacks, either MMIO read or write, from > > > virtio-proxy. > > > > > > See page 8 (protocol flow) and 10 (interfaces) in [1]. > > > > There are two areas of concern with the proxy approach at the moment. > > The first is how the bootstrap of the virtio-proxy channel happens and > > As I said, from BE point of view, virtio-proxy would be seen > as yet another virtio device by which BE could talk to "virtio > proxy" vm or whatever else. > > This way we guarantee BE's hypervisor-agnosticism instead of having > "common" hypervisor interfaces. That is the base of my idea. > > > the second is how many context switches are involved in a transaction. > > Of course with all things there is a trade off. Things involving the > > very tightest latency would probably opt for a bare metal backend which > > I think would imply hypervisor knowledge in the backend binary. > > In configuration phase of virtio device, the latency won't be a big matter. > In device operations (i.e. read/write to block devices), if we can > resolve 'mmap' issue, as Oleksandr is proposing right now, the only issue > is > how efficiently we can deliver notification to the opposite side. Right? > And this is a very common problem whatever approach we would take. > > Anyhow, if we do care the latency in my approach, most of virtio-proxy- > related code can be re-implemented just as a stub (or shim?) library > since the protocols are defined as RPCs. > In this case, however, we would lose the benefit of providing "single > binary" > BE. > (I know this is is an arguable requirement, though.) > > # Would we better discuss what "hypervisor-agnosticism" means? > > Is there a call that you could recommend that we join to discuss this and the topics of this thread? There is definitely interest in pursuing a new interface for Argo that can be implemented in other hypervisors and enable guest binary portability between them, at least on the same hardware architecture, with VirtIO transport as a primary use case. The notes from the Xen Summit Design Session on VirtIO Cross-Project BoF for Xen and Guest OS, which include context about the several separate approaches to VirtIO on Xen, have now been posted here: https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg00472.html Christopher > -Takahiro Akashi > > >