Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC] libvhost-user: implement VHOST_USER_PROTOCOL_F_KICK_CALL_MSGS
Date: Sun, 8 Sep 2019 09:13:50 -0400	[thread overview]
Message-ID: <20190908091207-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <fe0f3f7bfa730088454790dc2d863285c4461134.camel@sipsolutions.net>

On Fri, Sep 06, 2019 at 05:32:02PM +0200, Johannes Berg wrote:
> Hi,
> 
> > Oh. Apparently qemu mailman chose this time to kick me out
> > of list subscription (too many bounces or something?)
> > so I didn't see it.
> 
> D'oh. Well, it's really my mistake, I should've CC'ed you.
> 
> > What worries me is the load this places on the socket.
> > ATM if socket buffer is full qemu locks up, so we
> > need to be careful not to send too many messages.
> 
> Right, sure. I really don't think you ever want to use this extension in
> a "normal VM" use case. :-)
> 
> I think the only use for this extension would be for simulation
> purposes, and even then only combined with the REPLY_ACK and SLAVE_REQ
> extensions, i.e. you explicitly *want* your virtual machine to lock up /
> wait for a response to the KICK command (and respectively, the device to
> wait for a response to the CALL command).

OK so when combined with these, it's OK I think.
Do we want to force this restriction in code maybe then?

> Note that this is basically its sole purpose: ensuring exactly this
> synchronisation! Yes, it's bad for speed, but it's needed in simulation
> when time isn't "real".
> 
> Let me try to explain again, most likely my previous explanation was too
> long winded. WLOG, I'll focus on the "kick" use case, the "call" is the
> same, just the other way around. I'm sure you know that the call is
> asynchronous, i.e. the VM will increment the eventfd counter, and
> "eventually" it becomes readable to the device. Now the device does
> something (as fast as it can, presumably) and returns the buffer to the
> VM.
> 
> Now, imagine you're running in simulation time, i.e. "time travel" mode.
> Briefly, this hacks the idle loop of the (UML) VM to just skip forward
> when there's nothing to do, i.e. if you have a timer firing in 100ms and
> get to idle, time is immediately incremented by 100ms and the timer
> fires. For a single VM/device this is already implemented in UML, and
> while it's already very useful that's only half the story to me.
> 
> Once you have multiple devices and/or VMs, you basically have to keep a
> "simulation calendar" where each participant (VM/device) can put an
> entry, and then whenever they become idle they don't immediately move
> time forward, but instead ask the calendar what's next, and the calendar
> determines who runs.
> 
> Now, for these simulation cases, consider vhost-user again. It's
> absolutely necessary that the calendar is updated all the time, and the
> asynchronous nature of the call breaks that - the device cannot update
> the calendar to put an event there to process the call message.
> 
> With this extension, the device would work in the following way. Assume
> that the device is idle, and waiting for the simulation calendar to tell
> it to run. Now,
> 
>  1) it has an incoming call (message) from VM (which waits for reply)
>  2) the device will now put a new event on the simulation scheduler for
>     a time slot to process the message
>  3) return reply to VM
>  4) device goes back to sleep - this stuff was asynchronously handled
>     outside of the simulation basically.
> 
> In a sense, the code that just ran isn't considered part of the
> simulated device, it's just the transport protocol and part of the
> simulation environment.
> 
> At this point, the device is still waiting for its calendar event to be
> triggered, but now it has a new one to process the message. Now, once
> the VM goes to sleep, the scheduler will check the calendar and
> presumably tell the device to run, which runs and processes the message.
> This repeats for as long as the simulation runs, going both ways (or
> multiple ways if there are more than 2 participants).
> 
> 
> Now, what if you didn't have this synchronisation, ie. we don't have
> this extension or we don't have REPLY_ACK or whatnot?
> 
> In that case, after the step 1 above, the VM will immediately continue
> running. Let's say it'll wait for a response from the device for a few
> hundred milliseconds (of now simulated time). However, depending on the
> scheduling, the device has quite likely not yet put the new event on the
> simulation calendar (that happens in step 2 above). This means that the
> VM's calendar event to wake it up after a few hundred milliseconds will
> immediately trigger, and the simulation ends with the driver getting a
> timeout from the device.
> 
> 
> So - yes, while I understand your concern, I basically think this is not
> something anyone will want to use outside of such simulations. OTOH,
> there are various use cases (I'm doing device simulation, others are
> doing network simulation) that use such a behaviour, and it might be
> nice to support it in a more standard way, rather than everyone having
> their own local hacks for everything, like e.g. the VMSimInt paper(**).
> 
> 
> But again, like I said, no hard feelings if you think such simulation
> has no place in upstream vhost-user.
> 
> 
> (**) I put a copy of their qemu changes on top of 1.6.0 here:
>      https://p.sipsolutions.net/af9a68ded948c07e.txt
> 
> johannes