qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Elena Ufimtseva <elena.ufimtseva@oracle.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: fam@euphon.net, john.g.johnson@oracle.com,
	swapnil.ingle@nutanix.com, mst@redhat.com, qemu-devel@nongnu.org,
	kraxel@redhat.com, Jagannathan Raman <jag.raman@oracle.com>,
	quintela@redhat.com, armbru@redhat.com,
	kanth.ghatraju@oracle.com, felipe@nutanix.com, thuth@redhat.com,
	ehabkost@redhat.com, konrad.wilk@oracle.com, dgilbert@redhat.com,
	alex.williamson@redhat.com, thanos.makatos@nutanix.com,
	rth@twiddle.net, kwolf@redhat.com, berrange@redhat.com,
	mreitz@redhat.com, ross.lagerwall@citrix.com,
	marcandre.lureau@gmail.com, pbonzini@redhat.com
Subject: Re: [PATCH v8 17/20] multi-process: heartbeat messages to remote
Date: Fri, 14 Aug 2020 16:01:47 -0700	[thread overview]
Message-ID: <20200814230147.GA177362@heatpipe> (raw)
In-Reply-To: <20200811144130.GC18223@stefanha-x1.localdomain>

On Tue, Aug 11, 2020 at 03:41:30PM +0100, Stefan Hajnoczi wrote:
> On Fri, Jul 31, 2020 at 02:20:24PM -0400, Jagannathan Raman wrote:
> > @@ -343,3 +349,49 @@ static void probe_pci_info(PCIDevice *dev, Error **errp)
> >          }
> >      }
> >  }
> > +
> > +static void hb_msg(PCIProxyDev *dev)
> > +{
> > +    DeviceState *ds = DEVICE(dev);
> > +    Error *local_err = NULL;
> > +    MPQemuMsg msg = { 0 };
> > +
> > +    msg.cmd = PROXY_PING;
> > +    msg.bytestream = 0;
> > +    msg.size = 0;
> > +
> > +    (void)mpqemu_msg_send_and_await_reply(&msg, dev->ioc, &local_err);
> > +    if (local_err) {
> > +        error_report_err(local_err);
> > +        qio_channel_close(dev->ioc, &local_err);
> > +        error_setg(&error_fatal, "Lost contact with device %s", ds->id);
> > +    }
> > +}
> 
> Here is my feedback from the last revision. Was this addressed?
>

Hi Stefan,

Thank you for reviewing the patchset. In this version we decided to 
shutdown the guest when the heartbeat did not get a reply from the
remote by setting the error_fatal.
Should we approach it differently or you prefer us to get rid of the
heartbeat in this form?

Thank you,
Elena

>   This patch seems incomplete since no action is taken when the device
>   fails to respond. vCPU threads that access the device will still get
>   stuck.
> 
>   The simplest way to make this useful is to close the connection when a
>   timeout occurs. Then the G_IO_HUP handler for the UNIX domain socket
>   should perform connection cleanup. At that point there are a few
>   choices:
> 
>   1. Stop guest execution and wait for the host admin to restore the
>      mplink so execution can resume. This is similar to how -drive
>      rerror=stop pauses the guest when a disk I/O error is encountered.
> 
>   2. Stop guest execution but defer it until this stale device is actually
>      accessed. This maximizes guest uptime. Guests that rarely access the
>      device may not notice at all.
> 
>   3. Return 0 from MemoryRegion read operations and ignore writes. The
>      guest continues executing but the device is broken. This is risky
>      because device drivers inside the guest may not be ready to deal with
>      this. The result could be data loss or corruption.
> 
>   4. Raise a bus-level event. Maybe PCI error reporting can be used to
>      offline the device.
> 
>   5. Terminate the guest with an error message.
> 
>   6. ?
> 
>   Until the heartbeat is fully implemented and tested I suggest dropping
>   it from this patch series. Remember the G_IO_HUP will happen anyway if
>   the remote device process terminates.




  reply	other threads:[~2020-08-15 15:06 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-31 18:20 [PATCH v8 00/20] Initial support for multi-process qemu Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 01/20] memory: alloc RAM from file at offset Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 02/20] multi-process: Add config option for multi-process QEMU Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 03/20] multi-process: setup PCI host bridge for remote device Jagannathan Raman
2020-08-04 10:47   ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 04/20] multi-process: setup a machine object for remote device process Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 05/20] multi-process: add qio channel function to transmit Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 06/20] multi-process: define MPQemuMsg format and transmission functions Jagannathan Raman
2020-08-04 12:49   ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 07/20] multi-process: add co-routines to communicate with remote Jagannathan Raman
2020-08-10 16:02   ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 08/20] multi-process: Initialize message handler in remote device Jagannathan Raman
2020-08-04 12:58   ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 09/20] multi-process: Associate fd of a PCIDevice with its object Jagannathan Raman
2020-08-07 16:02   ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 10/20] multi-process: setup memory manager for remote device Jagannathan Raman
2020-08-10 15:27   ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 11/20] multi-process: introduce proxy object Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 12/20] multi-process: Forward PCI config space acceses to the remote process Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 13/20] multi-process: PCI BAR read/write handling for proxy & remote endpoints Jagannathan Raman
2020-08-11 14:04   ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 14/20] multi-process: Synchronize remote memory Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 15/20] multi-process: create IOHUB object to handle irq Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 16/20] multi-process: Retrieve PCI info from remote process Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 17/20] multi-process: heartbeat messages to remote Jagannathan Raman
2020-08-11 14:41   ` Stefan Hajnoczi
2020-08-14 23:01     ` Elena Ufimtseva [this message]
2020-08-19  8:00       ` Stefan Hajnoczi
2020-07-31 18:20 ` [PATCH v8 18/20] multi-process: perform device reset in the remote process Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 19/20] multi-process: add the concept description to docs/devel/qemu-multiprocess Jagannathan Raman
2020-07-31 18:20 ` [PATCH v8 20/20] multi-process: add configure and usage information Jagannathan Raman
2020-08-11 14:56 ` [PATCH v8 00/20] Initial support for multi-process qemu Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200814230147.GA177362@heatpipe \
    --to=elena.ufimtseva@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=fam@euphon.net \
    --cc=felipe@nutanix.com \
    --cc=jag.raman@oracle.com \
    --cc=john.g.johnson@oracle.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=marcandre.lureau@gmail.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=ross.lagerwall@citrix.com \
    --cc=rth@twiddle.net \
    --cc=stefanha@redhat.com \
    --cc=swapnil.ingle@nutanix.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).