On Fri, Aug 14, 2020 at 04:01:47PM -0700, Elena Ufimtseva wrote: > On Tue, Aug 11, 2020 at 03:41:30PM +0100, Stefan Hajnoczi wrote: > > On Fri, Jul 31, 2020 at 02:20:24PM -0400, Jagannathan Raman wrote: > > > @@ -343,3 +349,49 @@ static void probe_pci_info(PCIDevice *dev, Error **errp) > > > } > > > } > > > } > > > + > > > +static void hb_msg(PCIProxyDev *dev) > > > +{ > > > + DeviceState *ds = DEVICE(dev); > > > + Error *local_err = NULL; > > > + MPQemuMsg msg = { 0 }; > > > + > > > + msg.cmd = PROXY_PING; > > > + msg.bytestream = 0; > > > + msg.size = 0; > > > + > > > + (void)mpqemu_msg_send_and_await_reply(&msg, dev->ioc, &local_err); > > > + if (local_err) { > > > + error_report_err(local_err); > > > + qio_channel_close(dev->ioc, &local_err); > > > + error_setg(&error_fatal, "Lost contact with device %s", ds->id); > > > + } > > > +} > > > > Here is my feedback from the last revision. Was this addressed? > > > > Hi Stefan, > > Thank you for reviewing the patchset. In this version we decided to > shutdown the guest when the heartbeat did not get a reply from the > remote by setting the error_fatal. > Should we approach it differently or you prefer us to get rid of the > heartbeat in this form? I think the only case that this patch handles is when the mpqemu channel is closed. The VM hangs when the channel is still open but the remote is unresponsive. (mpqemu_msg_send_and_await_reply() calls aio_poll() with the global mutex held so vcpus cannot make progress.) The heartbeat mechanism needs to handle the case where the other side isn't responding. It can't hang QEMU. I suggest dropping this patch. It can be done later. Stefan