From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57055) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YkUy1-00053O-Qu for qemu-devel@nongnu.org; Tue, 21 Apr 2015 05:59:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YkUxw-0006nM-JH for qemu-devel@nongnu.org; Tue, 21 Apr 2015 05:59:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57488) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YkUxw-0006n4-CB for qemu-devel@nongnu.org; Tue, 21 Apr 2015 05:59:40 -0400 Date: Tue, 21 Apr 2015 11:59:30 +0200 From: "Michael S. Tsirkin" Message-ID: <20150421115819-mutt-send-email-mst@redhat.com> References: <1429257573-7359-1-git-send-email-famz@redhat.com> <20150420171330.45514f68.cornelia.huck@de.ibm.com> <20150421074402.GG21030@fam-t430.nay.redhat.com> <20150421100447.2254d1d6.cornelia.huck@de.ibm.com> <20150421083831.GH21030@fam-t430.nay.redhat.com> <20150421110847.20329c65.cornelia.huck@de.ibm.com> <20150421091653.GI21030@fam-t430.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150421091653.GI21030@fam-t430.nay.redhat.com> Subject: Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: Kevin Wolf , qemu-devel@nongnu.org, Amit Shah , "Aneesh Kumar K.V" , Stefan Hajnoczi , Cornelia Huck , Paolo Bonzini On Tue, Apr 21, 2015 at 05:16:53PM +0800, Fam Zheng wrote: > On Tue, 04/21 11:08, Cornelia Huck wrote: > > On Tue, 21 Apr 2015 16:38:31 +0800 > > Fam Zheng wrote: > > > > > On Tue, 04/21 10:04, Cornelia Huck wrote: > > > > On Tue, 21 Apr 2015 15:44:02 +0800 > > > > Fam Zheng wrote: > > > > > > > > > On Mon, 04/20 17:13, Cornelia Huck wrote: > > > > > > On Fri, 17 Apr 2015 15:59:15 +0800 > > > > > > Fam Zheng wrote: > > > > > > > > > > > > > Currently, virtio code chooses to kill QEMU if the guest passes any invalid > > > > > > > data with vring. That has drawbacks such as losing unsaved data (e.g. when > > > > > > > guest user is writing a very long email), or possible denial of service in > > > > > > > a nested vm use case where virtio device is passed through. > > > > > > > > > > > > > > virtio-1 has introduced a new status bit "NEEDS RESET" which could be used to > > > > > > > improve this by communicating the error state between virtio devices and > > > > > > > drivers. The device notifies guest upon setting the bit, then the guest driver > > > > > > > should detect this bit and report to userspace, or recover the device by > > > > > > > resetting it. > > > > > > > > > > > > > > This series makes necessary changes in virtio core code, based on which > > > > > > > virtio-blk is converted. Other devices now keep the existing behavior by > > > > > > > passing in "error_abort". They will be converted in following series. The Linux > > > > > > > driver part will also be worked on. > > > > > > > > > > > > > > One concern with this behavior change is that it's now harder to notice the > > > > > > > actual driver bug that caused the error, as the guest continues to run. To > > > > > > > address that, we could probably add a new error action option to virtio > > > > > > > devices, similar to the "read/write werror" in block layer, so the vm could be > > > > > > > paused and the management will get an event in QMP like pvpanic. This work can > > > > > > > be done on top. > > > > > > > > > > > > In principle, this looks nice; I'm not sure however how this affects > > > > > > non-virtio-1 devices. > > > > > > > > > > > > If a device is operating in virtio-1 mode, everything is clearly > > > > > > specified: The guest is notified and if it is aware of the NEEDS_RESET > > > > > > bit, it can react accordingly. > > > > > > > > > > > > But what about legacy devices? Even if they are notified, they don't > > > > > > know to check for NEEDS_RESET - and I'm not sure if the undefined > > > > > > behaviour after NEEDS_RESET might lead to bigger trouble than killing > > > > > > off the guest. > > > > > > > > > > > > > > > > The device should become unresponsive to VQ output until guest issues a reset > > > > > through bus commands. Do you have an example of "big trouble" in mind? > > > > > > > > I'm not sure what's supposed to happen if NEEDS_RESET is set but not > > > > everything is fenced off. The guest may see that queues have become > > > > unresponsive, but if we don't stop ioeventfds and fence off > > > > notifications, it may easily get into an undefined state internally. > > > > > > Yeah, disabling ioeventfds and notifications is a good idea. > > > > > > > And if it is connected to other guests via networking, having it limp > > > > on may be worse than just killing it off. (Which parts of the data have > > > > been cleanly written to disk and which haven't? > > > > > > Well, we don't know that even without this series, do we? > > > > We know it hasn't, as the guest is dead :) > > > > > > > > > How is it going to get > > > > out of that pickle if it has no good idea of what is wrong? > > > > > > If it's virtio-1 compatible, it can reset the device or mark the device > > > ususable, either way guest gets a chance to save the work. > > > > My problem is not with virtio-1 devices; although data certainly can't > > be written if the device has become unusable. > > > > > > > > If it's not, it's merely an unresponsive device, and guest user can > > > reboot/shutdown. > > > > But how does any management software know? If I'm logged into a system > > and I notice that saving my data doesn't complete, I can trigger an > > action (although reboot/shutdown may not work anymore if too many > > threads are waiting on writeback), but how can an automation system > > know? It is probably more useful for those setups to have a hard stop > > if recovery is not possible - and for legacy systems, that means > > killing the guest afaics. > > > > > > > > > > > > > If I have to debug a non-working guest, I prefer a crashed one with a > > > > clean state over one that has continued running after the error > > > > occurred. > > > > > > For debugging purpose, crashing is definitely fine (even better :), but only > > > because we won't have critical applications in guest. > > > > I would argue even for critical applications. They should have a second > > guest as backup :) > > > > > It makes sense to user to > > > avoid the overkiller "exit(1)"'s in QEMU. They don't even generate a core file. > > > > Let's keep dying, but use abort? Would that help? > > > > > And even if they do, it would be much more painful to recover an unsaved > > > libreoffice document from a memory core. > > > > See my reply above. > > > > My concern is mainly about legacy setups that aren't used interactively. > > > > How about pausing guest and generating an QMP event? > > Fam This can be an option, default should be backeards-compatible though. -- MST