From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:35818)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cornelia.huck@de.ibm.com>) id 1YkUAv-00020L-CM
	for qemu-devel@nongnu.org; Tue, 21 Apr 2015 05:09:02 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <cornelia.huck@de.ibm.com>) id 1YkUAo-0003EJ-Ix
	for qemu-devel@nongnu.org; Tue, 21 Apr 2015 05:09:01 -0400
Received: from e06smtp10.uk.ibm.com ([195.75.94.106]:47075)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <cornelia.huck@de.ibm.com>) id 1YkUAo-0003DU-Au
	for qemu-devel@nongnu.org; Tue, 21 Apr 2015 05:08:54 -0400
Received: from /spool/local
	by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <cornelia.huck@de.ibm.com>;
	Tue, 21 Apr 2015 10:08:51 +0100
Received: from b06cxnps3075.portsmouth.uk.ibm.com
	(d06relay10.portsmouth.uk.ibm.com [9.149.109.195])
	by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id 50CA417D805A
	for <qemu-devel@nongnu.org>; Tue, 21 Apr 2015 10:09:27 +0100 (BST)
Received: from d06av08.portsmouth.uk.ibm.com (d06av08.portsmouth.uk.ibm.com
	[9.149.37.249])
	by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with
	ESMTP id t3L98mtO11141512
	for <qemu-devel@nongnu.org>; Tue, 21 Apr 2015 09:08:48 GMT
Received: from d06av08.portsmouth.uk.ibm.com (localhost [127.0.0.1])
	by d06av08.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with
	ESMTP id t3L98mCf024814
	for <qemu-devel@nongnu.org>; Tue, 21 Apr 2015 03:08:48 -0600
Date: Tue, 21 Apr 2015 11:08:47 +0200
From: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-ID: <20150421110847.20329c65.cornelia.huck@de.ibm.com>
In-Reply-To: <20150421083831.GH21030@fam-t430.nay.redhat.com>
References: <1429257573-7359-1-git-send-email-famz@redhat.com>
	<20150420171330.45514f68.cornelia.huck@de.ibm.com>
	<20150421074402.GG21030@fam-t430.nay.redhat.com>
	<20150421100447.2254d1d6.cornelia.huck@de.ibm.com>
	<20150421083831.GH21030@fam-t430.nay.redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support
 "VIRTIO_CONFIG_S_NEEDS_RESET"
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Fam Zheng <famz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, "Michael S.
	Tsirkin" <mst@redhat.com>, qemu-devel@nongnu.org, "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>, Stefan Hajnoczi <stefanha@redhat.com>, Amit Shah <amit.shah@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>

On Tue, 21 Apr 2015 16:38:31 +0800
Fam Zheng <famz@redhat.com> wrote:

> On Tue, 04/21 10:04, Cornelia Huck wrote:
> > On Tue, 21 Apr 2015 15:44:02 +0800
> > Fam Zheng <famz@redhat.com> wrote:
> > 
> > > On Mon, 04/20 17:13, Cornelia Huck wrote:
> > > > On Fri, 17 Apr 2015 15:59:15 +0800
> > > > Fam Zheng <famz@redhat.com> wrote:
> > > > 
> > > > > Currently, virtio code chooses to kill QEMU if the guest passes any invalid
> > > > > data with vring. That has drawbacks such as losing unsaved data (e.g. when
> > > > > guest user is writing a very long email), or possible denial of service in
> > > > > a nested vm use case where virtio device is passed through.
> > > > > 
> > > > > virtio-1 has introduced a new status bit "NEEDS RESET" which could be used to
> > > > > improve this by communicating the error state between virtio devices and
> > > > > drivers. The device notifies guest upon setting the bit, then the guest driver
> > > > > should detect this bit and report to userspace, or recover the device by
> > > > > resetting it.
> > > > > 
> > > > > This series makes necessary changes in virtio core code, based on which
> > > > > virtio-blk is converted. Other devices now keep the existing behavior by
> > > > > passing in "error_abort". They will be converted in following series. The Linux
> > > > > driver part will also be worked on.
> > > > > 
> > > > > One concern with this behavior change is that it's now harder to notice the
> > > > > actual driver bug that caused the error, as the guest continues to run.  To
> > > > > address that, we could probably add a new error action option to virtio
> > > > > devices,  similar to the "read/write werror" in block layer, so the vm could be
> > > > > paused and the management will get an event in QMP like pvpanic.  This work can
> > > > > be done on top.
> > > > 
> > > > In principle, this looks nice; I'm not sure however how this affects
> > > > non-virtio-1 devices.
> > > > 
> > > > If a device is operating in virtio-1 mode, everything is clearly
> > > > specified: The guest is notified and if it is aware of the NEEDS_RESET
> > > > bit, it can react accordingly.
> > > > 
> > > > But what about legacy devices? Even if they are notified, they don't
> > > > know to check for NEEDS_RESET - and I'm not sure if the undefined
> > > > behaviour after NEEDS_RESET might lead to bigger trouble than killing
> > > > off the guest.
> > > > 
> > > 
> > > The device should become unresponsive to VQ output until guest issues a reset
> > > through bus commands.  Do you have an example of "big trouble" in mind?
> > 
> > I'm not sure what's supposed to happen if NEEDS_RESET is set but not
> > everything is fenced off. The guest may see that queues have become
> > unresponsive, but if we don't stop ioeventfds and fence off
> > notifications, it may easily get into an undefined state internally.
> 
> Yeah, disabling ioeventfds and notifications is a good idea.
> 
> > And if it is connected to other guests via networking, having it limp
> > on may be worse than just killing it off. (Which parts of the data have
> > been cleanly written to disk and which haven't?
> 
> Well, we don't know that even without this series, do we?

We know it hasn't, as the guest is dead :)

> 
> > How is it going to get
> > out of that pickle if it has no good idea of what is wrong?
> 
> If it's virtio-1 compatible, it can reset the device or mark the device
> ususable, either way guest gets a chance to save the work.

My problem is not with virtio-1 devices; although data certainly can't
be written if the device has become unusable.

> 
> If it's not, it's merely an unresponsive device, and guest user can
> reboot/shutdown.

But how does any management software know? If I'm logged into a system
and I notice that saving my data doesn't complete, I can trigger an
action (although reboot/shutdown may not work anymore if too many
threads are waiting on writeback), but how can an automation system
know? It is probably more useful for those setups to have a hard stop
if recovery is not possible - and for legacy systems, that means
killing the guest afaics.

> 
> > 
> > If I have to debug a non-working guest, I prefer a crashed one with a
> > clean state over one that has continued running after the error
> > occurred.
> 
> For debugging purpose, crashing is definitely fine (even better :), but only
> because we won't have critical applications in guest. 

I would argue even for critical applications. They should have a second
guest as backup :)

> It makes sense to user to
> avoid the overkiller "exit(1)"'s in QEMU. They don't even generate a core file.

Let's keep dying, but use abort? Would that help?

> And even if they do, it would be much more painful to recover an unsaved
> libreoffice document from a memory core.

See my reply above.

My concern is mainly about legacy setups that aren't used interactively.