On Sun, Sep 5, 2021 at 7:24 PM AKASHI Takahiro via Stratos-dev <
stratos-dev@op-lists.linaro.org> wrote:

> Alex,
>
> On Fri, Sep 03, 2021 at 10:28:06AM +0100, Alex Benn??e wrote:
> >
> > AKASHI Takahiro <takahiro.akashi@linaro.org> writes:
> >
> > > Alex,
> > >
> > > On Wed, Sep 01, 2021 at 01:53:34PM +0100, Alex Benn??e wrote:
> > >>
> > >> Stefan Hajnoczi <stefanha@redhat.com> writes:
> > >>
> > >> > [[PGP Signed Part:Undecided]]
> > >> > On Wed, Aug 04, 2021 at 12:20:01PM -0700, Stefano Stabellini wrote:
> > >> >> > Could we consider the kernel internally converting IOREQ
> messages from
> > >> >> > the Xen hypervisor to eventfd events? Would this scale with
> other kernel
> > >> >> > hypercall interfaces?
> > >> >> >
> > >> >> > So any thoughts on what directions are worth experimenting with?
> > >> >>
> > >> >> One option we should consider is for each backend to connect to
> Xen via
> > >> >> the IOREQ interface. We could generalize the IOREQ interface and
> make it
> > >> >> hypervisor agnostic. The interface is really trivial and easy to
> add.
> > >> >> The only Xen-specific part is the notification mechanism, which is
> an
> > >> >> event channel. If we replaced the event channel with something
> else the
> > >> >> interface would be generic. See:
> > >> >>
> https://gitlab.com/xen-project/xen/-/blob/staging/xen/include/public/hvm/ioreq.h#L52
> > >> >
> > >> > There have been experiments with something kind of similar in KVM
> > >> > recently (see struct ioregionfd_cmd):
> > >> >
> https://lore.kernel.org/kvm/dad3d025bcf15ece11d9df0ff685e8ab0a4f2edd.1613828727.git.eafanasova@gmail.com/
> > >>
> > >> Reading the cover letter was very useful in showing how this provides
> a
> > >> separate channel for signalling IO events to userspace instead of
> using
> > >> the normal type-2 vmexit type event. I wonder how deeply tied the
> > >> userspace facing side of this is to KVM? Could it provide a common FD
> > >> type interface to IOREQ?
> > >
> > > Why do you stick to a "FD" type interface?
> >
> > I mean most user space interfaces on POSIX start with a file descriptor
> > and the usual read/write semantics or a series of ioctls.
>
> Who do you assume is responsible for implementing this kind of
> fd semantics, OSs on BE or hypervisor itself?
>
> I think such interfaces can only be easily implemented on type-2
> hypervisors.
>
> # In this sense, I don't think rust-vmm, as it is, cannot be
> # a general solution.
>
> > >> As I understand IOREQ this is currently a direct communication between
> > >> userspace and the hypervisor using the existing Xen message bus. My
> > >
> > > With IOREQ server, IO event occurrences are notified to BE via Xen's
> event
> > > channel, while the actual contexts of IO events (see struct ioreq in
> ioreq.h)
> > > are put in a queue on a single shared memory page which is to be
> assigned
> > > beforehand with xenforeignmemory_map_resource hypervisor call.
> >
> > If we abstracted the IOREQ via the kernel interface you would probably
> > just want to put the ioreq structure on a queue rather than expose the
> > shared page to userspace.
>
> Where is that queue?
>
> > >> worry would be that by adding knowledge of what the underlying
> > >> hypervisor is we'd end up with excess complexity in the kernel. For
> one
> > >> thing we certainly wouldn't want an API version dependency on the
> kernel
> > >> to understand which version of the Xen hypervisor it was running on.
> > >
> > > That's exactly what virtio-proxy in my proposal[1] does; All the
> hypervisor-
> > > specific details of IO event handlings are contained in virtio-proxy
> > > and virtio BE will communicate with virtio-proxy through a virtqueue
> > > (yes, virtio-proxy is seen as yet another virtio device on BE) and will
> > > get IO event-related *RPC* callbacks, either MMIO read or write, from
> > > virtio-proxy.
> > >
> > > See page 8 (protocol flow) and 10 (interfaces) in [1].
> >
> > There are two areas of concern with the proxy approach at the moment.
> > The first is how the bootstrap of the virtio-proxy channel happens and
>
> As I said, from BE point of view, virtio-proxy would be seen
> as yet another virtio device by which BE could talk to "virtio
> proxy" vm or whatever else.
>
> This way we guarantee BE's hypervisor-agnosticism instead of having
> "common" hypervisor interfaces. That is the base of my idea.
>
> > the second is how many context switches are involved in a transaction.
> > Of course with all things there is a trade off. Things involving the
> > very tightest latency would probably opt for a bare metal backend which
> > I think would imply hypervisor knowledge in the backend binary.
>
> In configuration phase of virtio device, the latency won't be a big matter.
> In device operations (i.e. read/write to block devices), if we can
> resolve 'mmap' issue, as Oleksandr is proposing right now, the only issue
> is
> how efficiently we can deliver notification to the opposite side. Right?
> And this is a very common problem whatever approach we would take.
>
> Anyhow, if we do care the latency in my approach, most of virtio-proxy-
> related code can be re-implemented just as a stub (or shim?) library
> since the protocols are defined as RPCs.
> In this case, however, we would lose the benefit of providing "single
> binary"
> BE.
> (I know this is is an arguable requirement, though.)
>
> # Would we better discuss what "hypervisor-agnosticism" means?
>
> Is there a call that you could recommend that we join to discuss this and
the topics of this thread?
There is definitely interest in pursuing a new interface for Argo that can
be implemented in other hypervisors and enable guest binary portability
between them, at least on the same hardware architecture, with VirtIO
transport as a primary use case.

The notes from the Xen Summit Design Session on VirtIO Cross-Project BoF
for Xen and Guest OS, which include context about the several separate
approaches to VirtIO on Xen, have now been posted here:
https://lists.xenproject.org/archives/html/xen-devel/2021-09/msg00472.html

Christopher


> -Takahiro Akashi
>
>
>