Re: [virtio-dev] Memory sharing device

From: Frank Yang <lfy@google.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Roman Kiryanov <rkir@google.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	virtio-dev@lists.oasis-open.org,
	Greg Hartman <ghartman@google.com>
Subject: Re: [virtio-dev] Memory sharing device
Date: Tue, 12 Feb 2019 20:59:10 -0800	[thread overview]
Message-ID: <CAEkmjvU2_so0u1xKEQgxBXHLuocGB3ODQYdqiaMGD+QqSGBn6w@mail.gmail.com> (raw)
In-Reply-To: <20190212221738-mutt-send-email-mst@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 8913 bytes --]

On Tue, Feb 12, 2019 at 8:02 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Tue, Feb 12, 2019 at 06:50:29PM -0800, Frank Yang wrote:
> >
> >
> > On Tue, Feb 12, 2019 at 11:06 AM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> >
> >     On Tue, Feb 12, 2019 at 09:26:10AM -0800, Frank Yang wrote:
> >     > BTW, the other unique aspect is that the ping messages allow a
> _host_
> >     pointer
> >     > to serve as the lump of shared memory;
> >     > then there is no need to track buffers in the guest kernel and the
> device
> >     > implementation can perform specialize buffer space management.
> >     > Because it is also host pointer shared memory, it is also
> physically
> >     contiguous
> >     > and there is no scatterlist needed to process the traffic.
> >
> >     Yes at the moment virtio descriptors all pass addresses guest to
> host.
> >
> >     Ability to reverse that was part of the vhost-pci proposal a while
> ago.
> >     BTW that also at least originally had ability to tunnel
> >     multiple devices over a single connection.
> >
> >
> >
> > Can there be a similar proposal for virtio-pci without vhsot?
> >
> >     There was nothing wrong with the proposals I think, they
> >     just had to be polished a bit before making it into the spec.
> >     And that runneling was dropped but I think it can be brought back
> >     if desired, we just didn't see a use for it.
> >
> >
> > Thinking about it more, I think vhost-pci might be too much for us due
> to the
> > vhost requirement (sockets and IPC while we desire a highly process local
> > solution)
>
> I agree because the patches try to document a bunch of stuff.
> But I really just mean taking the host/guest interface
> part from there.
>
> So, are you referring to the new ideas that vhost-pci introduces minus
socket IPC/inter-VM communication, and the vhost server being in the same
process as qemu?
That sounds like we could build something for qemu (Stefan?) that talks to
a virtio-pci-user (?) backend with a similar set of command line arguments.

>
> > But there's nothing preventing us from having the same reversals for
> virtio-pci
> > devices without vhost, right?
>
> Right. I think that if you build something such that vhost pci
> can be an instance of it on top, then it would have
> a lot of value.
>
> I'd be very eager to chase this down. The more interop with existing
virtual PCI concepts the better.

>
> > That's kind of what's being proposed with the shared memory stuff at the
> > moment, though it is not a device type by itself yet (Arguably, it
> should be).
> >
> >
> >     How about that? That sounds close to what you were looking for,
> >     does it not? That would be something to look into -
> >     if your ideas can be used to implement a virtio device
> >     backend by code running within a VM, that would be very interesting.
> >
> >
> > What about a device type, say, virtio-metapci,
>
> I have to say I really dislike that name. It's basically just saying I'm
> not telling you what it is.  Let's try to figure it out.  Looks like
> although it's not a vsock device it's also trying to support creating
> channels with support for passing two types of messages (data and
> control) as well as some shared memory access.  And it also has
> enumeration so opened channels can be tied to what? strings?  PCI Device
> IDs?
>
> I think we can build this relying on PCI device ids, assuming there are
still device IDs readily available.

Then the vsock device was designed for this problem space.  It might not
> be a good fit for you e.g. because of some vsock baggage it has. But one
> of the complex issues it does address is controlling host resource usage
> so guest socket can't DOS host or starve other sockets by throwing data
> at host. Things might slow down but progress will be made. If you are
> building a generic kind of message exchange you could do worse than copy
> that protocol part.
>
> That's a good point and I should make sure we've captured that.

> I don't think the question of why not vsock generally was addressed all
> that well. There's discussion of sockets and poll, but that has nothing
> to do with virtio which is a host/guest interface. If you are basically
> happy with the host/guest interface but want to bind a different driver
> to it, with some minor tweaks, we could create a virtio-wsock which is
> just like virtio-vsock but has a different id, and use that as a
> starting point.  Go wild build a different driver for it.
>
> The virtio-wsock notion also sounds good, though (also from Stefan's
comments) I'd want to clarify
how we would define such a device type
that is both pure in terms of host/guest interface (i.e., not assuming
sockets either in the guest or host),
but doesn't also, at the implementation level, imply that the existing
implementation of v(w)sock
change to accomodate non-socket-based guest/host interfaces.

>
> > that relies on virtio-pci for
> > device enumeration and shared memory handling
> > (assuming it's going to be compatible with the host pointer shared memory
> > implementation),
> > so there's no duplication of the concept of device enumeration nor shared
> > memory operations.
> > But, it works in terms of the ping / event virtqueues, and relies on the
> host
> > hypervisor to dispatch to device implementation callbacks.
>
> All the talk about dispatch and device implementation is just adding to
> confusion.  This isn't something that belongs in virtio spec anyway, and
> e.g. qemu is unlikely to add an in-process plugin support just for this.
>
> A plugin system of some type is what we think is quite valuable,
for decoupling device functionality from QEMU.
Keeping the same process is also attractive because of the lack of need of
IPC.
If there's a lightweight, cross platform way to do IPC via function pointer
mechanisms,
perhaps by dlsym or LoadLibrary on another process,
that could work too, though yes, it would need to be integrated to qemu.

>
> > A potential issue is that such metapci device share the same device id
> > namespace as other virtio-pci devices...but maybe that's OK?
>
> That's a vague question.
> Same device and vendor id needs to imply same driver works.
> I think you use the terminology that doesn't match virtio.
> words device and driver have a specific meaning and that
> doesnt include things like implementation callbacks.
>
>
> > If this can build on virtio-pci, I might be able to come up with a spec
> that
> > assumes virtio-pci as the transport,
> > and assumes (via the WIP host memory sharing work) that host memory can
> be used
> > as buffer storage.
> > The difference is that it will not contain most of the config virtqueue
> stuff
> > (except maybe for create/destroy instance),
> > and it should also work with the existing ecosystem around virtio-pci.
> >
>
> I still can't say from above whether it's in scope for virtio or not.
> All the talk about blobs and controlling both host and guest sounds
> out of scope. But it could be that there are pieces that are
> inscope, and you would use them for whatever vendor specific
> thing you need.
>
> And I spent a lot of time on this by now.
>
> So could you maybe try to extract specifically the host/guest interface
> things that you miss? I got the part where you want to take a buffer
> within BAR and pass it to guest.  But beyond that I didn't get a lot. E.g.
> who is sending most data? host? guest? both? There are control messages
> are these coming from guest?  Do you want to know when guest is done
> with the buffer host allocated? Can you take a buffer away from guest?
>
> Thanks for taking the time to evaluate; we very much appreciate it
and want to resolve the issues you're having!

To answer the other questions:
- We expect most data to be sent by the guest, except in the cases of image
readback,
where the host will write a lot of data.
- The control messages (ping) are driven by the guest only; at most, the
guest can async wait on a long host operation that was triggered by the
guest.
Hence, the events argument in the spec being guest driver, and revents
being populated by the host.
- It is out of scope to know when the guest is done with the host buffer.
- Buffers will be owned by the host, and the guest will not own any buffers
under the current scheme.
This is because it is up to the host-side implementation to decide how to
back new memory allocations from the guest.


> OTOH all the callback discussion is really irrelevant for the virtio tc.
> If things can't be described without this they are out of scope
> for virtio.
>
>  We can describe the current proposal for virtio without explicitly naming
callbacks
on the host, but it would push that kind of implementation burden to the
host qemu,
so I thought it would be a good idea to lay things out end to end
in a way that would be concretely implementable.

> >
> >     --
> >     MST
> >
>

[-- Attachment #2: Type: text/html, Size: 11521 bytes --]