On Wed, Nov 13, 2019 at 11:01:07AM -0500, Jag Raman wrote:
> 
> 
> On 11/11/2019 11:27 AM, Stefan Hajnoczi wrote:
> > On Thu, Oct 24, 2019 at 05:09:11AM -0400, Jagannathan Raman wrote:
> > > +static void broadcast_msg(MPQemuMsg *msg, bool need_reply)
> > > +{
> > > +    PCIProxyDev *entry;
> > > +    unsigned int pid;
> > > +    int wait;
> > > +
> > > +    QLIST_FOREACH(entry, &proxy_dev_list.devices, next) {
> > > +        if (need_reply) {
> > > +            wait = eventfd(0, EFD_NONBLOCK);
> > > +            msg->num_fds = 1;
> > > +            msg->fds[0] = wait;
> > > +        }
> > > +
> > > +        mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com);
> > > +        if (need_reply) {
> > > +            pid = (uint32_t)wait_for_remote(wait);
> > 
> > Sometimes QEMU really needs to wait for the remote process before it can
> > make progress.  I think this is not one of those cases though.
> > 
> > Since QEMU is event-driven it's problematic to invoke blocking system
> > calls.  The remote process might not respond for a significant amount of
> > time.  Other QEMU threads will be held up waiting for the QEMU global
> > mutex in the meantime (because we hold it!).
> 
> There are places where we wait synchronously for the remote process.
> However, these synchronous waits carry a timeout to prevent the hang
> situation you described above.
> 
> We will add an error recovery in the future. That is, we will respawn
> the remote process if the QEMU times out waiting for it.

Even with a timeout, in the meantime the event loop is blocked.  That
means timers will be delayed by a large amount, the monitor will be
unresponsive, etc.

> > 
> > Please implement heartbeat/ping asynchronously.  The wait eventfd should
> > be read by an event loop fd handler instead.  That way QEMU can continue
> > with running the VM while waiting for the remote process.
> 
> In the current implementation, the heartbeat/ping is asynchronous.
> start_heartbeat_timer() sets up a timer to perform the ping.

The heartbeat/ping is synchronous because broadcast_msg() blocks.

Stefan