From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50386) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YkRcm-0002Ma-PU for qemu-devel@nongnu.org; Tue, 21 Apr 2015 02:25:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YkRNa-00024Y-9h for qemu-devel@nongnu.org; Tue, 21 Apr 2015 02:09:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36000) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YkRNa-00024O-2G for qemu-devel@nongnu.org; Tue, 21 Apr 2015 02:09:54 -0400 Date: Tue, 21 Apr 2015 08:09:46 +0200 From: "Michael S. Tsirkin" Message-ID: <20150421060946.GA31186@redhat.com> References: <1429257573-7359-1-git-send-email-famz@redhat.com> <20150420175905-mutt-send-email-mst@redhat.com> <20150421023700.GC8048@fam-t430.nay.redhat.com> <20150421070941-mutt-send-email-mst@redhat.com> <20150421055033.GB21030@fam-t430.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150421055033.GB21030@fam-t430.nay.redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: Kevin Wolf , Rusty Russell , qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, "Aneesh Kumar K.V" , Stefan Hajnoczi , Amit Shah , Paolo Bonzini On Tue, Apr 21, 2015 at 01:50:33PM +0800, Fam Zheng wrote: > On Tue, 04/21 07:22, Michael S. Tsirkin wrote: > > On Tue, Apr 21, 2015 at 10:37:00AM +0800, Fam Zheng wrote: > > > On Mon, 04/20 19:36, Michael S. Tsirkin wrote: > > > > On Fri, Apr 17, 2015 at 03:59:15PM +0800, Fam Zheng wrote: > > > > > Currently, virtio code chooses to kill QEMU if the guest passes= any invalid > > > > > data with vring. > > > > > That has drawbacks such as losing unsaved data (e.g. when > > > > > guest user is writing a very long email), or possible denial of= service in > > > > > a nested vm use case where virtio device is passed through. > > > > >=20 > > > > > virtio-1 has introduced a new status bit "NEEDS RESET" which co= uld be used to > > > > > improve this by communicating the error state between virtio de= vices and > > > > > drivers. The device notifies guest upon setting the bit, then t= he guest driver > > > > > should detect this bit and report to userspace, or recover the = device by > > > > > resetting it. > > > >=20 > > > > Unfortunately, virtio 1 spec does not have a conformance statemen= t > > > > that requires driver to recover. We merely have a non-normative l= ooking > > > > text: > > > > Note: For example, the driver can=E2=80=99t assume requests in f= light > > > > will be completed if DEVICE_NEEDS_RESET is set, nor can it assum= e that > > > > they have not been completed. A good implementation will try to = recover > > > > by issuing a reset. > > > >=20 > > > > Implementing this reset for all devices in a race-free manner mig= ht also > > > > be far from trivial. I think we'd need a feature bit for this. > > > > OTOH as long as we make this a new feature, would an ability to > > > > reset a single VQ be a better match for what you are trying to > > > > achieve? > > >=20 > > > I think that is too complicated as a recovery measure, a device lev= el resetting > > > will be better to get to a deterministic state, at least. > >=20 > > Question would be, how hard is it to stop host from using all queues, > > retrieve all host OS state and re-program it into the device. > > If we need to shadow all OS state within the driver, then that's a lo= t > > of not well tested code with a possibility of introducing more bugs. >=20 > I don't understand the question. In this series the virtio-blk device w= ill not > pop any more requests, and as long as the reset is properly handled, bo= th guest > and host should go back to a good state. > >=20 > > > >=20 > > > > > This series makes necessary changes in virtio core code, based = on which > > > > > virtio-blk is converted. Other devices now keep the existing be= havior by > > > > > passing in "error_abort". They will be converted in following s= eries. The Linux > > > > > driver part will also be worked on. > > > > >=20 > > > > > One concern with this behavior change is that it's now harder t= o notice the > > > > > actual driver bug that caused the error, as the guest continues= to run. To > > > > > address that, we could probably add a new error action option t= o virtio > > > > > devices, similar to the "read/write werror" in block layer, so= the vm could be > > > > > paused and the management will get an event in QMP like pvpanic= . This work can > > > > > be done on top. > > > >=20 > > > > At the architectural level, that's only one concern. Others would= be > > > > - workloads such as openstack handle guest crash better than > > > > a guest that's e.g. slow because of a memory leak > > >=20 > > > What memory leak are you referring to? > >=20 > > That was just an example. If host detects a malformed ring, it will > > crash. But often it doesn't, result is buffers not being used, so gu= est > > can't free them up. > >=20 > > > > - it's easier for guests to probe host for security issues > > > > if guest isn't killed > > > > - guest can flood host log with guest-triggered errors > > >=20 > > > We can still abort() if guest is triggering error too quickly. > >=20 > >=20 > > Absolutely, and if it looked like I'm against error detection and > > recovery, this was not my intent. > >=20 > > I am merely saying we can't apply this patchset as is, deferring > > addressing the issues to patches on top. > >=20 > > But I have an idea: refactor the code to use error_abort.=20 >=20 > That is patch 1-9 of this series. Or do you mean also refactor and pass > error_abort to the memory core? >=20 > Fam So if you like just patches 1-9 applied, this sounds reasonable. I'll provide review comments on the individual patches. > >This way we > > can apply the patchset without making functional changes, and you can > > make progress to complete this, on top. From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [Qemu-devel] [PATCH 00/18] virtio-blk: Support "VIRTIO_CONFIG_S_NEEDS_RESET" Date: Tue, 21 Apr 2015 08:09:46 +0200 Message-ID: <20150421060946.GA31186@redhat.com> References: <1429257573-7359-1-git-send-email-famz@redhat.com> <20150420175905-mutt-send-email-mst@redhat.com> <20150421023700.GC8048@fam-t430.nay.redhat.com> <20150421070941-mutt-send-email-mst@redhat.com> <20150421055033.GB21030@fam-t430.nay.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: Content-Disposition: inline In-Reply-To: <20150421055033.GB21030@fam-t430.nay.redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: virtualization-bounces@lists.linux-foundation.org Errors-To: virtualization-bounces@lists.linux-foundation.org To: Fam Zheng Cc: qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org, "Aneesh Kumar K.V" , Stefan Hajnoczi , Amit Shah , Paolo Bonzini List-Id: virtualization@lists.linuxfoundation.org T24gVHVlLCBBcHIgMjEsIDIwMTUgYXQgMDE6NTA6MzNQTSArMDgwMCwgRmFtIFpoZW5nIHdyb3Rl Ogo+IE9uIFR1ZSwgMDQvMjEgMDc6MjIsIE1pY2hhZWwgUy4gVHNpcmtpbiB3cm90ZToKPiA+IE9u IFR1ZSwgQXByIDIxLCAyMDE1IGF0IDEwOjM3OjAwQU0gKzA4MDAsIEZhbSBaaGVuZyB3cm90ZToK PiA+ID4gT24gTW9uLCAwNC8yMCAxOTozNiwgTWljaGFlbCBTLiBUc2lya2luIHdyb3RlOgo+ID4g PiA+IE9uIEZyaSwgQXByIDE3LCAyMDE1IGF0IDAzOjU5OjE1UE0gKzA4MDAsIEZhbSBaaGVuZyB3 cm90ZToKPiA+ID4gPiA+IEN1cnJlbnRseSwgdmlydGlvIGNvZGUgY2hvb3NlcyB0byBraWxsIFFF TVUgaWYgdGhlIGd1ZXN0IHBhc3NlcyBhbnkgaW52YWxpZAo+ID4gPiA+ID4gZGF0YSB3aXRoIHZy aW5nLgo+ID4gPiA+ID4gVGhhdCBoYXMgZHJhd2JhY2tzIHN1Y2ggYXMgbG9zaW5nIHVuc2F2ZWQg ZGF0YSAoZS5nLiB3aGVuCj4gPiA+ID4gPiBndWVzdCB1c2VyIGlzIHdyaXRpbmcgYSB2ZXJ5IGxv bmcgZW1haWwpLCBvciBwb3NzaWJsZSBkZW5pYWwgb2Ygc2VydmljZSBpbgo+ID4gPiA+ID4gYSBu ZXN0ZWQgdm0gdXNlIGNhc2Ugd2hlcmUgdmlydGlvIGRldmljZSBpcyBwYXNzZWQgdGhyb3VnaC4K PiA+ID4gPiA+IAo+ID4gPiA+ID4gdmlydGlvLTEgaGFzIGludHJvZHVjZWQgYSBuZXcgc3RhdHVz IGJpdCAiTkVFRFMgUkVTRVQiIHdoaWNoIGNvdWxkIGJlIHVzZWQgdG8KPiA+ID4gPiA+IGltcHJv dmUgdGhpcyBieSBjb21tdW5pY2F0aW5nIHRoZSBlcnJvciBzdGF0ZSBiZXR3ZWVuIHZpcnRpbyBk ZXZpY2VzIGFuZAo+ID4gPiA+ID4gZHJpdmVycy4gVGhlIGRldmljZSBub3RpZmllcyBndWVzdCB1 cG9uIHNldHRpbmcgdGhlIGJpdCwgdGhlbiB0aGUgZ3Vlc3QgZHJpdmVyCj4gPiA+ID4gPiBzaG91 bGQgZGV0ZWN0IHRoaXMgYml0IGFuZCByZXBvcnQgdG8gdXNlcnNwYWNlLCBvciByZWNvdmVyIHRo ZSBkZXZpY2UgYnkKPiA+ID4gPiA+IHJlc2V0dGluZyBpdC4KPiA+ID4gPiAKPiA+ID4gPiBVbmZv cnR1bmF0ZWx5LCB2aXJ0aW8gMSBzcGVjIGRvZXMgbm90IGhhdmUgYSBjb25mb3JtYW5jZSBzdGF0 ZW1lbnQKPiA+ID4gPiB0aGF0IHJlcXVpcmVzIGRyaXZlciB0byByZWNvdmVyLiBXZSBtZXJlbHkg aGF2ZSBhIG5vbi1ub3JtYXRpdmUgbG9va2luZwo+ID4gPiA+IHRleHQ6Cj4gPiA+ID4gCU5vdGU6 IEZvciBleGFtcGxlLCB0aGUgZHJpdmVyIGNhbuKAmXQgYXNzdW1lIHJlcXVlc3RzIGluIGZsaWdo dAo+ID4gPiA+IAl3aWxsIGJlIGNvbXBsZXRlZCBpZiBERVZJQ0VfTkVFRFNfUkVTRVQgaXMgc2V0 LCBub3IgY2FuIGl0IGFzc3VtZSB0aGF0Cj4gPiA+ID4gCXRoZXkgaGF2ZSBub3QgYmVlbiBjb21w bGV0ZWQuIEEgZ29vZCBpbXBsZW1lbnRhdGlvbiB3aWxsIHRyeSB0byByZWNvdmVyCj4gPiA+ID4g CWJ5IGlzc3VpbmcgYSByZXNldC4KPiA+ID4gPiAKPiA+ID4gPiBJbXBsZW1lbnRpbmcgdGhpcyBy ZXNldCBmb3IgYWxsIGRldmljZXMgaW4gYSByYWNlLWZyZWUgbWFubmVyIG1pZ2h0IGFsc28KPiA+ ID4gPiBiZSBmYXIgZnJvbSB0cml2aWFsLiAgSSB0aGluayB3ZSdkIG5lZWQgYSBmZWF0dXJlIGJp dCBmb3IgdGhpcy4KPiA+ID4gPiBPVE9IIGFzIGxvbmcgYXMgd2UgbWFrZSB0aGlzIGEgbmV3IGZl YXR1cmUsIHdvdWxkIGFuIGFiaWxpdHkgdG8KPiA+ID4gPiByZXNldCBhIHNpbmdsZSBWUSBiZSBh IGJldHRlciBtYXRjaCBmb3Igd2hhdCB5b3UgYXJlIHRyeWluZyB0bwo+ID4gPiA+IGFjaGlldmU/ Cj4gPiA+IAo+ID4gPiBJIHRoaW5rIHRoYXQgaXMgdG9vIGNvbXBsaWNhdGVkIGFzIGEgcmVjb3Zl cnkgbWVhc3VyZSwgYSBkZXZpY2UgbGV2ZWwgcmVzZXR0aW5nCj4gPiA+IHdpbGwgYmUgYmV0dGVy IHRvIGdldCB0byBhIGRldGVybWluaXN0aWMgc3RhdGUsIGF0IGxlYXN0Lgo+ID4gCj4gPiBRdWVz dGlvbiB3b3VsZCBiZSwgaG93IGhhcmQgaXMgaXQgdG8gc3RvcCBob3N0IGZyb20gdXNpbmcgYWxs IHF1ZXVlcywKPiA+IHJldHJpZXZlIGFsbCBob3N0IE9TIHN0YXRlIGFuZCByZS1wcm9ncmFtIGl0 IGludG8gdGhlIGRldmljZS4KPiA+IElmIHdlIG5lZWQgdG8gc2hhZG93IGFsbCBPUyBzdGF0ZSB3 aXRoaW4gdGhlIGRyaXZlciwgdGhlbiB0aGF0J3MgYSBsb3QKPiA+IG9mIG5vdCB3ZWxsIHRlc3Rl ZCBjb2RlIHdpdGggYSBwb3NzaWJpbGl0eSBvZiBpbnRyb2R1Y2luZyBtb3JlIGJ1Z3MuCj4gCj4g SSBkb24ndCB1bmRlcnN0YW5kIHRoZSBxdWVzdGlvbi4gSW4gdGhpcyBzZXJpZXMgdGhlIHZpcnRp by1ibGsgZGV2aWNlIHdpbGwgbm90Cj4gcG9wIGFueSBtb3JlIHJlcXVlc3RzLCBhbmQgYXMgbG9u ZyBhcyB0aGUgcmVzZXQgaXMgcHJvcGVybHkgaGFuZGxlZCwgYm90aCBndWVzdAo+IGFuZCBob3N0 IHNob3VsZCBnbyBiYWNrIHRvIGEgZ29vZCBzdGF0ZS4KPiA+IAo+ID4gPiA+IAo+ID4gPiA+ID4g VGhpcyBzZXJpZXMgbWFrZXMgbmVjZXNzYXJ5IGNoYW5nZXMgaW4gdmlydGlvIGNvcmUgY29kZSwg YmFzZWQgb24gd2hpY2gKPiA+ID4gPiA+IHZpcnRpby1ibGsgaXMgY29udmVydGVkLiBPdGhlciBk ZXZpY2VzIG5vdyBrZWVwIHRoZSBleGlzdGluZyBiZWhhdmlvciBieQo+ID4gPiA+ID4gcGFzc2lu ZyBpbiAiZXJyb3JfYWJvcnQiLiBUaGV5IHdpbGwgYmUgY29udmVydGVkIGluIGZvbGxvd2luZyBz ZXJpZXMuIFRoZSBMaW51eAo+ID4gPiA+ID4gZHJpdmVyIHBhcnQgd2lsbCBhbHNvIGJlIHdvcmtl ZCBvbi4KPiA+ID4gPiA+IAo+ID4gPiA+ID4gT25lIGNvbmNlcm4gd2l0aCB0aGlzIGJlaGF2aW9y IGNoYW5nZSBpcyB0aGF0IGl0J3Mgbm93IGhhcmRlciB0byBub3RpY2UgdGhlCj4gPiA+ID4gPiBh Y3R1YWwgZHJpdmVyIGJ1ZyB0aGF0IGNhdXNlZCB0aGUgZXJyb3IsIGFzIHRoZSBndWVzdCBjb250 aW51ZXMgdG8gcnVuLiAgVG8KPiA+ID4gPiA+IGFkZHJlc3MgdGhhdCwgd2UgY291bGQgcHJvYmFi bHkgYWRkIGEgbmV3IGVycm9yIGFjdGlvbiBvcHRpb24gdG8gdmlydGlvCj4gPiA+ID4gPiBkZXZp Y2VzLCAgc2ltaWxhciB0byB0aGUgInJlYWQvd3JpdGUgd2Vycm9yIiBpbiBibG9jayBsYXllciwg c28gdGhlIHZtIGNvdWxkIGJlCj4gPiA+ID4gPiBwYXVzZWQgYW5kIHRoZSBtYW5hZ2VtZW50IHdp bGwgZ2V0IGFuIGV2ZW50IGluIFFNUCBsaWtlIHB2cGFuaWMuICBUaGlzIHdvcmsgY2FuCj4gPiA+ ID4gPiBiZSBkb25lIG9uIHRvcC4KPiA+ID4gPiAKPiA+ID4gPiBBdCB0aGUgYXJjaGl0ZWN0dXJh bCBsZXZlbCwgdGhhdCdzIG9ubHkgb25lIGNvbmNlcm4uIE90aGVycyB3b3VsZCBiZQo+ID4gPiA+ IC0gd29ya2xvYWRzIHN1Y2ggYXMgb3BlbnN0YWNrIGhhbmRsZSBndWVzdCBjcmFzaCBiZXR0ZXIg dGhhbgo+ID4gPiA+ICAgYSBndWVzdCB0aGF0J3MgZS5nLiBzbG93IGJlY2F1c2Ugb2YgYSBtZW1v cnkgbGVhawo+ID4gPiAKPiA+ID4gV2hhdCBtZW1vcnkgbGVhayBhcmUgeW91IHJlZmVycmluZyB0 bz8KPiA+IAo+ID4gVGhhdCB3YXMganVzdCBhbiBleGFtcGxlLiAgSWYgaG9zdCBkZXRlY3RzIGEg bWFsZm9ybWVkIHJpbmcsIGl0IHdpbGwKPiA+IGNyYXNoLiAgQnV0IG9mdGVuIGl0IGRvZXNuJ3Qs IHJlc3VsdCBpcyBidWZmZXJzIG5vdCBiZWluZyB1c2VkLCBzbyBndWVzdAo+ID4gY2FuJ3QgZnJl ZSB0aGVtIHVwLgo+ID4gCj4gPiA+ID4gLSBpdCdzIGVhc2llciBmb3IgZ3Vlc3RzIHRvIHByb2Jl IGhvc3QgZm9yIHNlY3VyaXR5IGlzc3Vlcwo+ID4gPiA+ICAgaWYgZ3Vlc3QgaXNuJ3Qga2lsbGVk Cj4gPiA+ID4gLSBndWVzdCBjYW4gZmxvb2QgaG9zdCBsb2cgd2l0aCBndWVzdC10cmlnZ2VyZWQg ZXJyb3JzCj4gPiA+IAo+ID4gPiBXZSBjYW4gc3RpbGwgYWJvcnQoKSBpZiBndWVzdCBpcyB0cmln Z2VyaW5nIGVycm9yIHRvbyBxdWlja2x5Lgo+ID4gCj4gPiAKPiA+IEFic29sdXRlbHksIGFuZCBp ZiBpdCBsb29rZWQgbGlrZSBJJ20gYWdhaW5zdCBlcnJvciBkZXRlY3Rpb24gYW5kCj4gPiByZWNv dmVyeSwgdGhpcyB3YXMgbm90IG15IGludGVudC4KPiA+IAo+ID4gSSBhbSBtZXJlbHkgc2F5aW5n IHdlIGNhbid0IGFwcGx5IHRoaXMgcGF0Y2hzZXQgYXMgaXMsIGRlZmVycmluZwo+ID4gYWRkcmVz c2luZyB0aGUgaXNzdWVzIHRvIHBhdGNoZXMgb24gdG9wLgo+ID4gCj4gPiBCdXQgSSBoYXZlIGFu IGlkZWE6IHJlZmFjdG9yIHRoZSBjb2RlIHRvIHVzZSBlcnJvcl9hYm9ydC4gCj4gCj4gVGhhdCBp cyBwYXRjaCAxLTkgb2YgdGhpcyBzZXJpZXMuIE9yIGRvIHlvdSBtZWFuIGFsc28gcmVmYWN0b3Ig YW5kIHBhc3MKPiBlcnJvcl9hYm9ydCB0byB0aGUgbWVtb3J5IGNvcmU/Cj4gCj4gRmFtCgpTbyBp ZiB5b3UgbGlrZSBqdXN0IHBhdGNoZXMgMS05IGFwcGxpZWQsIHRoaXMgc291bmRzCnJlYXNvbmFi bGUuIEknbGwgcHJvdmlkZSByZXZpZXcgY29tbWVudHMgb24gdGhlIGluZGl2aWR1YWwgcGF0Y2hl cy4KCgoKPiA+VGhpcyB3YXkgd2UKPiA+IGNhbiBhcHBseSB0aGUgcGF0Y2hzZXQgd2l0aG91dCBt YWtpbmcgZnVuY3Rpb25hbCBjaGFuZ2VzLCBhbmQgeW91IGNhbgo+ID4gbWFrZSBwcm9ncmVzcyB0 byBjb21wbGV0ZSB0aGlzLCBvbiB0b3AuCl9fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fClZpcnR1YWxpemF0aW9uIG1haWxpbmcgbGlzdApWaXJ0dWFsaXphdGlv bkBsaXN0cy5saW51eC1mb3VuZGF0aW9uLm9yZwpodHRwczovL2xpc3RzLmxpbnV4Zm91bmRhdGlv bi5vcmcvbWFpbG1hbi9saXN0aW5mby92aXJ0dWFsaXphdGlvbg==