All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Kagan <rvkagan@yandex-team.ru>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Konstantin Khlebnikov" <khlebnikov@yandex-team.ru>,
	qemu-devel@nongnu.org, yc-core@yandex-team.ru,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eduardo Habkost" <ehabkost@gmail.com>,
	"Eric Blake" <eblake@redhat.com>
Subject: Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event
Date: Mon, 20 Jun 2022 16:49:06 +0300	[thread overview]
Message-ID: <YrB60nlxNeelb6r0@rvkaganb> (raw)
In-Reply-To: <YpTdAPAo8RGD735z@rvkaganb>

On Mon, May 30, 2022 at 06:04:32PM +0300, Roman Kagan wrote:
> On Mon, May 30, 2022 at 01:28:17PM +0200, Markus Armbruster wrote:
> > Roman Kagan <rvkagan@yandex-team.ru> writes:
> > 
> > > On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote:
> > >> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> > >> 
> > >> > This event represents device runtime errors to give time and
> > >> > reason why device is broken.
> > >> 
> > >> Can you give an or more examples of the "device runtime errors" you have
> > >> in mind?
> > >
> > > Initially we wanted to address a situation when a vhost device
> > > discovered an inconsistency during virtqueue processing and silently
> > > stopped the virtqueue.  This resulted in device stall (partial for
> > > multiqueue devices) and we were the last to notice that.
> > >
> > > The solution appeared to be to employ errfd and, upon receiving a
> > > notification through it, to emit a QMP event which is actionable in the
> > > management layer or further up the stack.
> > >
> > > Then we observed that virtio (non-vhost) devices suffer from the same
> > > issue: they only log the error but don't signal it to the management
> > > layer.  The case was very similar so we thought it would make sense to
> > > share the infrastructure and the QMP event between virtio and vhost.
> > >
> > > Then Konstantin went a bit further and generalized the concept into
> > > generic "device runtime error".  I'm personally not completely convinced
> > > this generalization is appropriate here; we'd appreciate the opinions
> > > from the community on the matter.
> > 
> > "Device emulation sending an even on entering certain error states, so
> > that a management application can do something about it" feels
> > reasonable enough to me as a general concept.
> > 
> > The key point is of course "can do something": the event needs to be
> > actionable.  Can you describe possible actions for the cases you
> > implement?
> 
> The first one that we had in mind was informational, like triggering an
> alert in the monitoring system and/or painting the VM as malfunctioning
> in the owner's UI.
> 
> There can be more advanced scenarios like autorecovery by resetting the
> faulty VM, or fencing it if it's a cluster member.

The discussion kind of stalled here.  Do you think the approach makes
sense or not?  Should we try and resubmit the series with a proper cover
letter and possibly other improvements or is it a dead end?

Thanks,
Roman.


  reply	other threads:[~2022-06-20 13:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-19 14:19 [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event Konstantin Khlebnikov
2022-05-19 14:19 ` [PATCH 2/4] virtio: forward errors into qdev_report_runtime_error() Konstantin Khlebnikov
2022-05-24 19:25   ` Vladimir Sementsov-Ogievskiy
2022-05-19 14:19 ` [PATCH 3/4] vhost: add method vhost_set_vring_err Konstantin Khlebnikov
2022-05-19 14:19 ` [PATCH 4/4] vhost: forward vring errors into virtio device Konstantin Khlebnikov
2022-05-24 19:04 ` [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event Vladimir Sementsov-Ogievskiy
2022-05-25  8:26   ` Konstantin Khlebnikov
2022-05-25 10:54 ` Markus Armbruster
2022-05-27 12:49   ` Roman Kagan
2022-05-30 11:28     ` Markus Armbruster
2022-05-30 15:04       ` Roman Kagan
2022-06-20 13:49         ` Roman Kagan [this message]
2022-06-21 11:55           ` Markus Armbruster
2022-06-21 12:02             ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YrB60nlxNeelb6r0@rvkaganb \
    --to=rvkagan@yandex-team.ru \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@gmail.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yc-core@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.