From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: fam@euphon.net, john.g.johnson@oracle.com,
swapnil.ingle@nutanix.com, mst@redhat.com, qemu-devel@nongnu.org,
kraxel@redhat.com, Jagannathan Raman <jag.raman@oracle.com>,
quintela@redhat.com, armbru@redhat.com,
kanth.ghatraju@oracle.com, felipe@nutanix.com, thuth@redhat.com,
ehabkost@redhat.com, konrad.wilk@oracle.com, dgilbert@redhat.com,
alex.williamson@redhat.com, thanos.makatos@nutanix.com,
rth@twiddle.net, kwolf@redhat.com, berrange@redhat.com,
mreitz@redhat.com, ross.lagerwall@citrix.com,
marcandre.lureau@gmail.com, pbonzini@redhat.com
Subject: Re: [PATCH v8 17/20] multi-process: heartbeat messages to remote
Date: Fri, 14 Aug 2020 16:01:47 -0700 [thread overview]
Message-ID: <20200814230147.GA177362@heatpipe> (raw)
In-Reply-To: <20200811144130.GC18223@stefanha-x1.localdomain>
On Tue, Aug 11, 2020 at 03:41:30PM +0100, Stefan Hajnoczi wrote:
> On Fri, Jul 31, 2020 at 02:20:24PM -0400, Jagannathan Raman wrote:
> > @@ -343,3 +349,49 @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
> > }
> > }
> > }
> > +
> > +static void hb_msg(PCIProxyDev *dev)
> > +{
> > + DeviceState *ds = DEVICE(dev);
> > + Error *local_err = NULL;
> > + MPQemuMsg msg = { 0 };
> > +
> > + msg.cmd = PROXY_PING;
> > + msg.bytestream = 0;
> > + msg.size = 0;
> > +
> > + (void)mpqemu_msg_send_and_await_reply(&msg, dev->ioc, &local_err);
> > + if (local_err) {
> > + error_report_err(local_err);
> > + qio_channel_close(dev->ioc, &local_err);
> > + error_setg(&error_fatal, "Lost contact with device %s", ds->id);
> > + }
> > +}
>
> Here is my feedback from the last revision. Was this addressed?
>
Hi Stefan,
Thank you for reviewing the patchset. In this version we decided to
shutdown the guest when the heartbeat did not get a reply from the
remote by setting the error_fatal.
Should we approach it differently or you prefer us to get rid of the
heartbeat in this form?
Thank you,
Elena
> This patch seems incomplete since no action is taken when the device
> fails to respond. vCPU threads that access the device will still get
> stuck.
>
> The simplest way to make this useful is to close the connection when a
> timeout occurs. Then the G_IO_HUP handler for the UNIX domain socket
> should perform connection cleanup. At that point there are a few
> choices:
>
> 1. Stop guest execution and wait for the host admin to restore the
> mplink so execution can resume. This is similar to how -drive
> rerror=stop pauses the guest when a disk I/O error is encountered.
>
> 2. Stop guest execution but defer it until this stale device is actually
> accessed. This maximizes guest uptime. Guests that rarely access the
> device may not notice at all.
>
> 3. Return 0 from MemoryRegion read operations and ignore writes. The
> guest continues executing but the device is broken. This is risky
> because device drivers inside the guest may not be ready to deal with
> this. The result could be data loss or corruption.
>
> 4. Raise a bus-level event. Maybe PCI error reporting can be used to
> offline the device.
>
> 5. Terminate the guest with an error message.
>
> 6. ?
>
> Until the heartbeat is fully implemented and tested I suggest dropping
> it from this patch series. Remember the G_IO_HUP will happen anyway if
> the remote device process terminates.
next prev parent reply other threads:[~2020-08-15 15:06 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-31 18:20 [PATCH v8 00/20] Initial support for multi-process qemu Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 01/20] memory: alloc RAM from file at offset Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 02/20] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 03/20] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2020-08-04 10:47 ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 04/20] multi-process: setup a machine object for remote device process Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 05/20] multi-process: add qio channel function to transmit Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 06/20] multi-process: define MPQemuMsg format and transmission functions Jagannathan Raman
2020-08-04 12:49 ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 07/20] multi-process: add co-routines to communicate with remote Jagannathan Raman
2020-08-10 16:02 ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 08/20] multi-process: Initialize message handler in remote device Jagannathan Raman
2020-08-04 12:58 ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 09/20] multi-process: Associate fd of a PCIDevice with its object Jagannathan Raman
2020-08-07 16:02 ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 10/20] multi-process: setup memory manager for remote device Jagannathan Raman
2020-08-10 15:27 ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 11/20] multi-process: introduce proxy object Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 12/20] multi-process: Forward PCI config space acceses to the remote process Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 13/20] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2020-08-11 14:04 ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 14/20] multi-process: Synchronize remote memory Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 15/20] multi-process: create IOHUB object to handle irq Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 16/20] multi-process: Retrieve PCI info from remote process Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 17/20] multi-process: heartbeat messages to remote Jagannathan Raman
2020-08-11 14:41 ` Stefan Hajnoczi
2020-08-14 23:01 ` Elena Ufimtseva [this message]
2020-08-19 8:00 ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 18/20] multi-process: perform device reset in the remote process Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 19/20] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 20/20] multi-process: add configure and usage information Jagannathan Raman
2020-08-11 14:56 ` [PATCH v8 00/20] Initial support for multi-process qemu Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200814230147.GA177362@heatpipe \
--to=elena.ufimtseva@oracle.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=fam@euphon.net \
--cc=felipe@nutanix.com \
--cc=jag.raman@oracle.com \
--cc=john.g.johnson@oracle.com \
--cc=kanth.ghatraju@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=kraxel@redhat.com \
--cc=kwolf@redhat.com \
--cc=marcandre.lureau@gmail.com \
--cc=mreitz@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=ross.lagerwall@citrix.com \
--cc=rth@twiddle.net \
--cc=stefanha@redhat.com \
--cc=swapnil.ingle@nutanix.com \
--cc=thanos.makatos@nutanix.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).