On Wed, Nov 13, 2019 at 11:01:07AM -0500, Jag Raman wrote: > > > On 11/11/2019 11:27 AM, Stefan Hajnoczi wrote: > > On Thu, Oct 24, 2019 at 05:09:11AM -0400, Jagannathan Raman wrote: > > > +static void broadcast_msg(MPQemuMsg *msg, bool need_reply) > > > +{ > > > + PCIProxyDev *entry; > > > + unsigned int pid; > > > + int wait; > > > + > > > + QLIST_FOREACH(entry, &proxy_dev_list.devices, next) { > > > + if (need_reply) { > > > + wait = eventfd(0, EFD_NONBLOCK); > > > + msg->num_fds = 1; > > > + msg->fds[0] = wait; > > > + } > > > + > > > + mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com); > > > + if (need_reply) { > > > + pid = (uint32_t)wait_for_remote(wait); > > > > Sometimes QEMU really needs to wait for the remote process before it can > > make progress. I think this is not one of those cases though. > > > > Since QEMU is event-driven it's problematic to invoke blocking system > > calls. The remote process might not respond for a significant amount of > > time. Other QEMU threads will be held up waiting for the QEMU global > > mutex in the meantime (because we hold it!). > > There are places where we wait synchronously for the remote process. > However, these synchronous waits carry a timeout to prevent the hang > situation you described above. > > We will add an error recovery in the future. That is, we will respawn > the remote process if the QEMU times out waiting for it. Even with a timeout, in the meantime the event loop is blocked. That means timers will be delayed by a large amount, the monitor will be unresponsive, etc. > > > > Please implement heartbeat/ping asynchronously. The wait eventfd should > > be read by an event loop fd handler instead. That way QEMU can continue > > with running the VM while waiting for the remote process. > > In the current implementation, the heartbeat/ping is asynchronous. > start_heartbeat_timer() sets up a timer to perform the ping. The heartbeat/ping is synchronous because broadcast_msg() blocks. Stefan