All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
@ 2011-06-01 21:12 Luiz Capitulino
  2011-06-01 21:35 ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-01 21:12 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, Markus Armbruster

Hi there,

There are people who want to use QMP for thin provisioning. That's, the VM is
started with a small storage and when a no space error is triggered, more space
is allocated and the VM is put to run again.

QMP has two limitations that prevent people from doing this today:

1. The BLOCK_IO_ERROR doesn't contain error information

2. Considering we solve item 1, we still have to provide a way for clients
   to query why a VM stopped. This is needed because clients may miss the
   BLOCK_IO_ERROR event or may connect to the VM while it's already stopped

A proposal to solve both problems follow.

A. BLOCK_IO_ERROR information
-----------------------------

We already have discussed this a lot, but didn't reach a consensus. My solution
is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
for example (see the "reason" key):

{ "event": "BLOCK_IO_ERROR",
   "data": { "device": "ide0-hd1",
             "operation": "write",
             "action": "stop",
             "reason": "enospc", }
   "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }

Valid error reasons could be: "enospc", "eio", etc.

B. query-stop-reason
--------------------

I also have a simple solution for item 2. The vm_stop() accepts a reason
argument, so we could store it somewhere and return it as a string, like:

-> { "execute": "query-stop-reason" }
<- { "return": { "reason": "user" } }

Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
"migrate".

Also note that we have a STOP event. It should be extended with the
stop reason too, for completeness.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-01 21:12 [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason Luiz Capitulino
@ 2011-06-01 21:35 ` Anthony Liguori
  2011-06-02  9:06   ` Daniel P. Berrange
                     ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-01 21:35 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
> Hi there,
>
> There are people who want to use QMP for thin provisioning. That's, the VM is
> started with a small storage and when a no space error is triggered, more space
> is allocated and the VM is put to run again.
>
> QMP has two limitations that prevent people from doing this today:
>
> 1. The BLOCK_IO_ERROR doesn't contain error information
>
> 2. Considering we solve item 1, we still have to provide a way for clients
>     to query why a VM stopped. This is needed because clients may miss the
>     BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
>
> A proposal to solve both problems follow.
>
> A. BLOCK_IO_ERROR information
> -----------------------------
>
> We already have discussed this a lot, but didn't reach a consensus. My solution
> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
> for example (see the "reason" key):
>
> { "event": "BLOCK_IO_ERROR",
>     "data": { "device": "ide0-hd1",
>               "operation": "write",
>               "action": "stop",
>               "reason": "enospc", }

you can call the reason whatever you want, but don't call it stringfied 
errno name :-)

In fact, just make reason "no space".

>     "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
>
> Valid error reasons could be: "enospc", "eio", etc.

No etc :-)  Error reasons should we be well known and well documented.

> B. query-stop-reason
> --------------------
>
> I also have a simple solution for item 2. The vm_stop() accepts a reason
> argument, so we could store it somewhere and return it as a string, like:
>
> ->  { "execute": "query-stop-reason" }
> <- { "return": { "reason": "user" } }
>
> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
> "migrate".
>
> Also note that we have a STOP event. It should be extended with the
> stop reason too, for completeness.


Can we just extend query-block?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-01 21:35 ` Anthony Liguori
@ 2011-06-02  9:06   ` Daniel P. Berrange
  2011-06-02 13:08     ` Anthony Liguori
  2011-06-02 17:57   ` Luiz Capitulino
  2011-06-06 11:13   ` Markus Armbruster
  2 siblings, 1 reply; 48+ messages in thread
From: Daniel P. Berrange @ 2011-06-02  9:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	jdenemar, Luiz Capitulino

On Wed, Jun 01, 2011 at 04:35:03PM -0500, Anthony Liguori wrote:
> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
> >Hi there,
> >
> >There are people who want to use QMP for thin provisioning. That's, the VM is
> >started with a small storage and when a no space error is triggered, more space
> >is allocated and the VM is put to run again.
> >
> >QMP has two limitations that prevent people from doing this today:
> >
> >1. The BLOCK_IO_ERROR doesn't contain error information
> >
> >2. Considering we solve item 1, we still have to provide a way for clients
> >    to query why a VM stopped. This is needed because clients may miss the
> >    BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
> >
> >A proposal to solve both problems follow.
> >
> >A. BLOCK_IO_ERROR information
> >-----------------------------
> >
> >We already have discussed this a lot, but didn't reach a consensus. My solution
> >is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
> >for example (see the "reason" key):
> >
> >{ "event": "BLOCK_IO_ERROR",
> >    "data": { "device": "ide0-hd1",
> >              "operation": "write",
> >              "action": "stop",
> >              "reason": "enospc", }
> 
> you can call the reason whatever you want, but don't call it
> stringfied errno name :-)
> 
> In fact, just make reason "no space".
> 
> >    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
> >
> >Valid error reasons could be: "enospc", "eio", etc.
> 
> No etc :-)  Error reasons should we be well known and well documented.
> 
> >B. query-stop-reason
> >--------------------
> >
> >I also have a simple solution for item 2. The vm_stop() accepts a reason
> >argument, so we could store it somewhere and return it as a string, like:
> >
> >->  { "execute": "query-stop-reason" }
> ><- { "return": { "reason": "user" } }
> >
> >Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
> >this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
> >"migrate".
> >
> >Also note that we have a STOP event. It should be extended with the
> >stop reason too, for completeness.
> 
> 
> Can we just extend query-block?

Primarily we want 'query-stop-reason' to tell us what caused the VM
CPUs to stop. If that reason was 'ioerror', then 'query-block' could
be used to find out which particular block device(s) caused the IO
error to occurr & get the "reason" that was in the BLOCK_IO_ERROR
event.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02  9:06   ` Daniel P. Berrange
@ 2011-06-02 13:08     ` Anthony Liguori
  2011-06-02 13:24       ` Jiri Denemark
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-02 13:08 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Luiz Capitulino,
	jdenemar, Markus Armbruster

On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
> On Wed, Jun 01, 2011 at 04:35:03PM -0500, Anthony Liguori wrote:
>>>     "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
>>>
>>> Valid error reasons could be: "enospc", "eio", etc.
>>
>> No etc :-)  Error reasons should we be well known and well documented.
>>
>>> B. query-stop-reason
>>> --------------------
>>>
>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
>>> argument, so we could store it somewhere and return it as a string, like:
>>>
>>> ->   { "execute": "query-stop-reason" }
>>> <- { "return": { "reason": "user" } }
>>>
>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
>>> "migrate".
>>>
>>> Also note that we have a STOP event. It should be extended with the
>>> stop reason too, for completeness.
>>
>>
>> Can we just extend query-block?
>
> Primarily we want 'query-stop-reason' to tell us what caused the VM
> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
> be used to find out which particular block device(s) caused the IO
> error to occurr&  get the "reason" that was in the BLOCK_IO_ERROR
> event.

My concern is that we're over abstracting here.  We're not going to add 
additional stop reasons in the future.

Maybe just add an 'io-error': True to query-state.

Regards,

Anthony Liguori

>
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 13:08     ` Anthony Liguori
@ 2011-06-02 13:24       ` Jiri Denemark
  2011-06-02 14:02         ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Jiri Denemark @ 2011-06-02 13:24 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Luiz Capitulino,
	Markus Armbruster

On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
> >>> B. query-stop-reason
> >>> --------------------
> >>>
> >>> I also have a simple solution for item 2. The vm_stop() accepts a reason
> >>> argument, so we could store it somewhere and return it as a string, like:
> >>>
> >>> ->   { "execute": "query-stop-reason" }
> >>> <- { "return": { "reason": "user" } }
> >>>
> >>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
> >>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
> >>> "migrate".
> >>>
> >>> Also note that we have a STOP event. It should be extended with the
> >>> stop reason too, for completeness.
> >>
> >>
> >> Can we just extend query-block?
> >
> > Primarily we want 'query-stop-reason' to tell us what caused the VM
> > CPUs to stop. If that reason was 'ioerror', then 'query-block' could
> > be used to find out which particular block device(s) caused the IO
> > error to occurr&  get the "reason" that was in the BLOCK_IO_ERROR
> > event.
> 
> My concern is that we're over abstracting here.  We're not going to add 
> additional stop reasons in the future.
> 
> Maybe just add an 'io-error': True to query-state.

Sure, adding a new field to query-state response would work as well. And it
seems like a good idea to me since one already needs to call query-status to
check if CPUs are stopped or not so it makes sense to incorporate the
additional information there as well. And if you want to be safe for the
future, the new field doesn't have to be boolean 'io-error' but it can be the
string 'reason' which Luiz suggested above.

Jirka

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 13:24       ` Jiri Denemark
@ 2011-06-02 14:02         ` Anthony Liguori
  2011-06-02 18:01           ` Luiz Capitulino
  2011-06-06 11:21           ` Markus Armbruster
  0 siblings, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-02 14:02 UTC (permalink / raw)
  To: Jiri Denemark
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	Luiz Capitulino

On 06/02/2011 08:24 AM, Jiri Denemark wrote:
> On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
>> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
>>>>> B. query-stop-reason
>>>>> --------------------
>>>>>
>>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
>>>>> argument, so we could store it somewhere and return it as a string, like:
>>>>>
>>>>> ->    { "execute": "query-stop-reason" }
>>>>> <- { "return": { "reason": "user" } }
>>>>>
>>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
>>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
>>>>> "migrate".
>>>>>
>>>>> Also note that we have a STOP event. It should be extended with the
>>>>> stop reason too, for completeness.
>>>>
>>>>
>>>> Can we just extend query-block?
>>>
>>> Primarily we want 'query-stop-reason' to tell us what caused the VM
>>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
>>> be used to find out which particular block device(s) caused the IO
>>> error to occurr&   get the "reason" that was in the BLOCK_IO_ERROR
>>> event.
>>
>> My concern is that we're over abstracting here.  We're not going to add
>> additional stop reasons in the future.
>>
>> Maybe just add an 'io-error': True to query-state.
>
> Sure, adding a new field to query-state response would work as well. And it
> seems like a good idea to me since one already needs to call query-status to
> check if CPUs are stopped or not so it makes sense to incorporate the
> additional information there as well. And if you want to be safe for the
> future, the new field doesn't have to be boolean 'io-error' but it can be the
> string 'reason' which Luiz suggested above.


String enumerations are a Bad Thing.  It's impossible to figure out what 
strings are valid and it lacks type safety.

Adding more booleans provides better type safety, and when we move to 
QAPI with a queryable schema, provides a way to figure out exactly what 
combinations are supported by QEMU.

Regards,

Anthony Liguori

>
> Jirka
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-01 21:35 ` Anthony Liguori
  2011-06-02  9:06   ` Daniel P. Berrange
@ 2011-06-02 17:57   ` Luiz Capitulino
  2011-06-02 18:00     ` Anthony Liguori
  2011-06-06 11:13   ` Markus Armbruster
  2 siblings, 1 reply; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-02 17:57 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On Wed, 01 Jun 2011 16:35:03 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
> > Hi there,
> >
> > There are people who want to use QMP for thin provisioning. That's, the VM is
> > started with a small storage and when a no space error is triggered, more space
> > is allocated and the VM is put to run again.
> >
> > QMP has two limitations that prevent people from doing this today:
> >
> > 1. The BLOCK_IO_ERROR doesn't contain error information
> >
> > 2. Considering we solve item 1, we still have to provide a way for clients
> >     to query why a VM stopped. This is needed because clients may miss the
> >     BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
> >
> > A proposal to solve both problems follow.
> >
> > A. BLOCK_IO_ERROR information
> > -----------------------------
> >
> > We already have discussed this a lot, but didn't reach a consensus. My solution
> > is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
> > for example (see the "reason" key):
> >
> > { "event": "BLOCK_IO_ERROR",
> >     "data": { "device": "ide0-hd1",
> >               "operation": "write",
> >               "action": "stop",
> >               "reason": "enospc", }
> 
> you can call the reason whatever you want, but don't call it stringfied 
> errno name :-)
> 
> In fact, just make reason "no space".

You mean, we should do:

  "reason": "no space"

Or that we should make it a boolean, like:

 "no space": true

I'm ok with either way. But in case you meant the second one, I guess
we should make "reason" a dictionary so that we can group related
information when we extend the field, for example:

 "reason": { "no space": false, "no permission": true }

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 17:57   ` Luiz Capitulino
@ 2011-06-02 18:00     ` Anthony Liguori
  2011-06-02 18:09       ` Luiz Capitulino
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-02 18:00 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On 06/02/2011 12:57 PM, Luiz Capitulino wrote:
> On Wed, 01 Jun 2011 16:35:03 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
>>> Hi there,
>>>
>>> There are people who want to use QMP for thin provisioning. That's, the VM is
>>> started with a small storage and when a no space error is triggered, more space
>>> is allocated and the VM is put to run again.
>>>
>>> QMP has two limitations that prevent people from doing this today:
>>>
>>> 1. The BLOCK_IO_ERROR doesn't contain error information
>>>
>>> 2. Considering we solve item 1, we still have to provide a way for clients
>>>      to query why a VM stopped. This is needed because clients may miss the
>>>      BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
>>>
>>> A proposal to solve both problems follow.
>>>
>>> A. BLOCK_IO_ERROR information
>>> -----------------------------
>>>
>>> We already have discussed this a lot, but didn't reach a consensus. My solution
>>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
>>> for example (see the "reason" key):
>>>
>>> { "event": "BLOCK_IO_ERROR",
>>>      "data": { "device": "ide0-hd1",
>>>                "operation": "write",
>>>                "action": "stop",
>>>                "reason": "enospc", }
>>
>> you can call the reason whatever you want, but don't call it stringfied
>> errno name :-)
>>
>> In fact, just make reason "no space".
>
> You mean, we should do:
>
>    "reason": "no space"
>
> Or that we should make it a boolean, like:
>
>   "no space": true


Do we need reason in BLOCK_IO_ERROR if query-block returns this information?

>
> I'm ok with either way. But in case you meant the second one, I guess
> we should make "reason" a dictionary so that we can group related
> information when we extend the field, for example:
>
>   "reason": { "no space": false, "no permission": true }

Why would we ever have "no permission"?

Part of my argument for not having reason is I don't think we actually 
need to be this generic.  I think we're over abstracting.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 14:02         ` Anthony Liguori
@ 2011-06-02 18:01           ` Luiz Capitulino
  2011-06-02 18:32             ` Anthony Liguori
  2011-06-03  9:26             ` Daniel P. Berrange
  2011-06-06 11:21           ` Markus Armbruster
  1 sibling, 2 replies; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-02 18:01 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, Jiri Denemark, qemu-devel,
	Markus Armbruster

On Thu, 02 Jun 2011 09:02:30 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/02/2011 08:24 AM, Jiri Denemark wrote:
> > On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
> >> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
> >>>>> B. query-stop-reason
> >>>>> --------------------
> >>>>>
> >>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
> >>>>> argument, so we could store it somewhere and return it as a string, like:
> >>>>>
> >>>>> ->    { "execute": "query-stop-reason" }
> >>>>> <- { "return": { "reason": "user" } }
> >>>>>
> >>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
> >>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
> >>>>> "migrate".
> >>>>>
> >>>>> Also note that we have a STOP event. It should be extended with the
> >>>>> stop reason too, for completeness.
> >>>>
> >>>>
> >>>> Can we just extend query-block?
> >>>
> >>> Primarily we want 'query-stop-reason' to tell us what caused the VM
> >>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
> >>> be used to find out which particular block device(s) caused the IO
> >>> error to occurr&   get the "reason" that was in the BLOCK_IO_ERROR
> >>> event.
> >>
> >> My concern is that we're over abstracting here.  We're not going to add
> >> additional stop reasons in the future.
> >>
> >> Maybe just add an 'io-error': True to query-state.
> >
> > Sure, adding a new field to query-state response would work as well. And it
> > seems like a good idea to me since one already needs to call query-status to
> > check if CPUs are stopped or not so it makes sense to incorporate the
> > additional information there as well. And if you want to be safe for the
> > future, the new field doesn't have to be boolean 'io-error' but it can be the
> > string 'reason' which Luiz suggested above.
> 
> 
> String enumerations are a Bad Thing.  It's impossible to figure out what 
> strings are valid and it lacks type safety.
> 
> Adding more booleans provides better type safety, and when we move to 
> QAPI with a queryable schema, provides a way to figure out exactly what 
> combinations are supported by QEMU.

To summarize:

 1. Add a 'io-error' field to query-status (which is only present if
    field 'running' is false)

 2. Extend query-block to contain error information associated with the
    device. This is interesting, because this information will be available
    even if the error didn't cause the VM to stop

Seems good enough to me, comments?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 18:00     ` Anthony Liguori
@ 2011-06-02 18:09       ` Luiz Capitulino
  2011-06-02 18:33         ` Anthony Liguori
  2011-06-06  9:25         ` [Qemu-devel] " Kevin Wolf
  0 siblings, 2 replies; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-02 18:09 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On Thu, 02 Jun 2011 13:00:04 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/02/2011 12:57 PM, Luiz Capitulino wrote:
> > On Wed, 01 Jun 2011 16:35:03 -0500
> > Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
> >>> Hi there,
> >>>
> >>> There are people who want to use QMP for thin provisioning. That's, the VM is
> >>> started with a small storage and when a no space error is triggered, more space
> >>> is allocated and the VM is put to run again.
> >>>
> >>> QMP has two limitations that prevent people from doing this today:
> >>>
> >>> 1. The BLOCK_IO_ERROR doesn't contain error information
> >>>
> >>> 2. Considering we solve item 1, we still have to provide a way for clients
> >>>      to query why a VM stopped. This is needed because clients may miss the
> >>>      BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
> >>>
> >>> A proposal to solve both problems follow.
> >>>
> >>> A. BLOCK_IO_ERROR information
> >>> -----------------------------
> >>>
> >>> We already have discussed this a lot, but didn't reach a consensus. My solution
> >>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
> >>> for example (see the "reason" key):
> >>>
> >>> { "event": "BLOCK_IO_ERROR",
> >>>      "data": { "device": "ide0-hd1",
> >>>                "operation": "write",
> >>>                "action": "stop",
> >>>                "reason": "enospc", }
> >>
> >> you can call the reason whatever you want, but don't call it stringfied
> >> errno name :-)
> >>
> >> In fact, just make reason "no space".
> >
> > You mean, we should do:
> >
> >    "reason": "no space"
> >
> > Or that we should make it a boolean, like:
> >
> >   "no space": true
> 
> 
> Do we need reason in BLOCK_IO_ERROR if query-block returns this information?

True, no.

> > I'm ok with either way. But in case you meant the second one, I guess
> > we should make "reason" a dictionary so that we can group related
> > information when we extend the field, for example:
> >
> >   "reason": { "no space": false, "no permission": true }
> 
> Why would we ever have "no permission"?

It's an I/O error. I have a report from a developer who was getting
the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
it turned out to be no permission.

> Part of my argument for not having reason is I don't think we actually 
> need to be this generic.  I think we're over abstracting.

I'm quite sure we'll want to add new errors reasons in the near future.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 18:01           ` Luiz Capitulino
@ 2011-06-02 18:32             ` Anthony Liguori
  2011-06-02 18:57               ` Luiz Capitulino
  2011-06-03  9:26             ` Daniel P. Berrange
  1 sibling, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-02 18:32 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, Jiri Denemark, qemu-devel,
	Markus Armbruster

On 06/02/2011 01:01 PM, Luiz Capitulino wrote:
> On Thu, 02 Jun 2011 09:02:30 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>> On 06/02/2011 08:24 AM, Jiri Denemark wrote:
>>> On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
>>>> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
>>>>>>> B. query-stop-reason
>>>>>>> --------------------
>>>>>>>
>>>>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
>>>>>>> argument, so we could store it somewhere and return it as a string, like:
>>>>>>>
>>>>>>> ->     { "execute": "query-stop-reason" }
>>>>>>> <- { "return": { "reason": "user" } }
>>>>>>>
>>>>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
>>>>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
>>>>>>> "migrate".
>>>>>>>
>>>>>>> Also note that we have a STOP event. It should be extended with the
>>>>>>> stop reason too, for completeness.
>>>>>>
>>>>>>
>>>>>> Can we just extend query-block?
>>>>>
>>>>> Primarily we want 'query-stop-reason' to tell us what caused the VM
>>>>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
>>>>> be used to find out which particular block device(s) caused the IO
>>>>> error to occurr&    get the "reason" that was in the BLOCK_IO_ERROR
>>>>> event.
>>>>
>>>> My concern is that we're over abstracting here.  We're not going to add
>>>> additional stop reasons in the future.
>>>>
>>>> Maybe just add an 'io-error': True to query-state.
>>>
>>> Sure, adding a new field to query-state response would work as well. And it
>>> seems like a good idea to me since one already needs to call query-status to
>>> check if CPUs are stopped or not so it makes sense to incorporate the
>>> additional information there as well. And if you want to be safe for the
>>> future, the new field doesn't have to be boolean 'io-error' but it can be the
>>> string 'reason' which Luiz suggested above.
>>
>>
>> String enumerations are a Bad Thing.  It's impossible to figure out what
>> strings are valid and it lacks type safety.
>>
>> Adding more booleans provides better type safety, and when we move to
>> QAPI with a queryable schema, provides a way to figure out exactly what
>> combinations are supported by QEMU.
>
> To summarize:
>
>   1. Add a 'io-error' field to query-status (which is only present if
>      field 'running' is false)

It may or may not be present.  Lack of presence does not tell you anything.

It is only true when running is false AND the guest was stopped because 
of an io error.

>
>   2. Extend query-block to contain error information associated with the
>      device. This is interesting, because this information will be available
>      even if the error didn't cause the VM to stop

Well we need at least some way to indicate that a block device is in a 
failed state.  For instance, if you have two block device, but you miss 
the IO_ERROR event, you need to figure out which of the two devices is 
giving errors.

But I was thinking of something that had the semantics of, last_iop_failed.

Regards,

Anthony Liguori

> Seems good enough to me, comments?
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 18:09       ` Luiz Capitulino
@ 2011-06-02 18:33         ` Anthony Liguori
  2011-06-02 19:13           ` Luiz Capitulino
  2011-06-06  9:25         ` [Qemu-devel] " Kevin Wolf
  1 sibling, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-02 18:33 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On 06/02/2011 01:09 PM, Luiz Capitulino wrote:
> On Thu, 02 Jun 2011 13:00:04 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>> On 06/02/2011 12:57 PM, Luiz Capitulino wrote:
>>> On Wed, 01 Jun 2011 16:35:03 -0500
>>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
>>>
>>>> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
>>>>> Hi there,
>>>>>
>>>>> There are people who want to use QMP for thin provisioning. That's, the VM is
>>>>> started with a small storage and when a no space error is triggered, more space
>>>>> is allocated and the VM is put to run again.
>>>>>
>>>>> QMP has two limitations that prevent people from doing this today:
>>>>>
>>>>> 1. The BLOCK_IO_ERROR doesn't contain error information
>>>>>
>>>>> 2. Considering we solve item 1, we still have to provide a way for clients
>>>>>       to query why a VM stopped. This is needed because clients may miss the
>>>>>       BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
>>>>>
>>>>> A proposal to solve both problems follow.
>>>>>
>>>>> A. BLOCK_IO_ERROR information
>>>>> -----------------------------
>>>>>
>>>>> We already have discussed this a lot, but didn't reach a consensus. My solution
>>>>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
>>>>> for example (see the "reason" key):
>>>>>
>>>>> { "event": "BLOCK_IO_ERROR",
>>>>>       "data": { "device": "ide0-hd1",
>>>>>                 "operation": "write",
>>>>>                 "action": "stop",
>>>>>                 "reason": "enospc", }
>>>>
>>>> you can call the reason whatever you want, but don't call it stringfied
>>>> errno name :-)
>>>>
>>>> In fact, just make reason "no space".
>>>
>>> You mean, we should do:
>>>
>>>     "reason": "no space"
>>>
>>> Or that we should make it a boolean, like:
>>>
>>>    "no space": true
>>
>>
>> Do we need reason in BLOCK_IO_ERROR if query-block returns this information?
>
> True, no.
>
>>> I'm ok with either way. But in case you meant the second one, I guess
>>> we should make "reason" a dictionary so that we can group related
>>> information when we extend the field, for example:
>>>
>>>    "reason": { "no space": false, "no permission": true }
>>
>> Why would we ever have "no permission"?

Why did it happen?  It's not clear to me when read/write would return 
EPERM.  open() should fail.  In fact, EPERM is not mentioned in man 2 read.

Regards,

Anthony Liguori

>
> It's an I/O error. I have a report from a developer who was getting
> the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
> it turned out to be no permission.
>
>> Part of my argument for not having reason is I don't think we actually
>> need to be this generic.  I think we're over abstracting.
>
> I'm quite sure we'll want to add new errors reasons in the near future.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 18:32             ` Anthony Liguori
@ 2011-06-02 18:57               ` Luiz Capitulino
  2011-06-02 19:35                 ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-02 18:57 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, Jiri Denemark, qemu-devel,
	Markus Armbruster

On Thu, 02 Jun 2011 13:32:25 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/02/2011 01:01 PM, Luiz Capitulino wrote:
> > On Thu, 02 Jun 2011 09:02:30 -0500
> > Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >> On 06/02/2011 08:24 AM, Jiri Denemark wrote:
> >>> On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
> >>>> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
> >>>>>>> B. query-stop-reason
> >>>>>>> --------------------
> >>>>>>>
> >>>>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
> >>>>>>> argument, so we could store it somewhere and return it as a string, like:
> >>>>>>>
> >>>>>>> ->     { "execute": "query-stop-reason" }
> >>>>>>> <- { "return": { "reason": "user" } }
> >>>>>>>
> >>>>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
> >>>>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
> >>>>>>> "migrate".
> >>>>>>>
> >>>>>>> Also note that we have a STOP event. It should be extended with the
> >>>>>>> stop reason too, for completeness.
> >>>>>>
> >>>>>>
> >>>>>> Can we just extend query-block?
> >>>>>
> >>>>> Primarily we want 'query-stop-reason' to tell us what caused the VM
> >>>>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
> >>>>> be used to find out which particular block device(s) caused the IO
> >>>>> error to occurr&    get the "reason" that was in the BLOCK_IO_ERROR
> >>>>> event.
> >>>>
> >>>> My concern is that we're over abstracting here.  We're not going to add
> >>>> additional stop reasons in the future.
> >>>>
> >>>> Maybe just add an 'io-error': True to query-state.
> >>>
> >>> Sure, adding a new field to query-state response would work as well. And it
> >>> seems like a good idea to me since one already needs to call query-status to
> >>> check if CPUs are stopped or not so it makes sense to incorporate the
> >>> additional information there as well. And if you want to be safe for the
> >>> future, the new field doesn't have to be boolean 'io-error' but it can be the
> >>> string 'reason' which Luiz suggested above.
> >>
> >>
> >> String enumerations are a Bad Thing.  It's impossible to figure out what
> >> strings are valid and it lacks type safety.
> >>
> >> Adding more booleans provides better type safety, and when we move to
> >> QAPI with a queryable schema, provides a way to figure out exactly what
> >> combinations are supported by QEMU.
> >
> > To summarize:
> >
> >   1. Add a 'io-error' field to query-status (which is only present if
> >      field 'running' is false)
> 
> It may or may not be present.  Lack of presence does not tell you anything.
> 
> It is only true when running is false AND the guest was stopped because 
> of an io error.

Right.

> >
> >   2. Extend query-block to contain error information associated with the
> >      device. This is interesting, because this information will be available
> >      even if the error didn't cause the VM to stop
> 
> Well we need at least some way to indicate that a block device is in a 
> failed state.  For instance, if you have two block device, but you miss 
> the IO_ERROR event, you need to figure out which of the two devices is 
> giving errors.

Can't query-block be used for that? The 'io-error' key will only be present
for the failing device(s).

> 
> But I was thinking of something that had the semantics of, last_iop_failed.
> 
> Regards,
> 
> Anthony Liguori
> 
> > Seems good enough to me, comments?
> >
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 18:33         ` Anthony Liguori
@ 2011-06-02 19:13           ` Luiz Capitulino
  2011-06-02 20:03             ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-02 19:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On Thu, 02 Jun 2011 13:33:52 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/02/2011 01:09 PM, Luiz Capitulino wrote:
> > On Thu, 02 Jun 2011 13:00:04 -0500
> > Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >> On 06/02/2011 12:57 PM, Luiz Capitulino wrote:
> >>> On Wed, 01 Jun 2011 16:35:03 -0500
> >>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
> >>>
> >>>> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
> >>>>> Hi there,
> >>>>>
> >>>>> There are people who want to use QMP for thin provisioning. That's, the VM is
> >>>>> started with a small storage and when a no space error is triggered, more space
> >>>>> is allocated and the VM is put to run again.
> >>>>>
> >>>>> QMP has two limitations that prevent people from doing this today:
> >>>>>
> >>>>> 1. The BLOCK_IO_ERROR doesn't contain error information
> >>>>>
> >>>>> 2. Considering we solve item 1, we still have to provide a way for clients
> >>>>>       to query why a VM stopped. This is needed because clients may miss the
> >>>>>       BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
> >>>>>
> >>>>> A proposal to solve both problems follow.
> >>>>>
> >>>>> A. BLOCK_IO_ERROR information
> >>>>> -----------------------------
> >>>>>
> >>>>> We already have discussed this a lot, but didn't reach a consensus. My solution
> >>>>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
> >>>>> for example (see the "reason" key):
> >>>>>
> >>>>> { "event": "BLOCK_IO_ERROR",
> >>>>>       "data": { "device": "ide0-hd1",
> >>>>>                 "operation": "write",
> >>>>>                 "action": "stop",
> >>>>>                 "reason": "enospc", }
> >>>>
> >>>> you can call the reason whatever you want, but don't call it stringfied
> >>>> errno name :-)
> >>>>
> >>>> In fact, just make reason "no space".
> >>>
> >>> You mean, we should do:
> >>>
> >>>     "reason": "no space"
> >>>
> >>> Or that we should make it a boolean, like:
> >>>
> >>>    "no space": true
> >>
> >>
> >> Do we need reason in BLOCK_IO_ERROR if query-block returns this information?
> >
> > True, no.
> >
> >>> I'm ok with either way. But in case you meant the second one, I guess
> >>> we should make "reason" a dictionary so that we can group related
> >>> information when we extend the field, for example:
> >>>
> >>>    "reason": { "no space": false, "no permission": true }
> >>
> >> Why would we ever have "no permission"?
> 
> Why did it happen?  It's not clear to me when read/write would return 
> EPERM.  open() should fail.  In fact, EPERM is not mentioned in man 2 read.

Actually, the error was an EACCESS which might sound more bizarre :)

What happened was that the device file in question had its permission
changed during VM execution due to a bug somewhere else. I'm not sure if
the error was returned in a read() or write() (Kevin might have more details).

This is a bit extreme and I'd agree it's arguable whether or not we should
report EACCESS, but I had this in mind and ended up mentioning it...

Maybe libvirt guys could provide more input wrt the error reason usage.
If we don't have valid use cases for other errors, then I'll agree that
providing only "no space" is enough.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 18:57               ` Luiz Capitulino
@ 2011-06-02 19:35                 ` Anthony Liguori
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-02 19:35 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, Jiri Denemark, qemu-devel,
	Markus Armbruster

On 06/02/2011 01:57 PM, Luiz Capitulino wrote:
> On Thu, 02 Jun 2011 13:32:25 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>> On 06/02/2011 01:01 PM, Luiz Capitulino wrote:
>>> On Thu, 02 Jun 2011 09:02:30 -0500
>>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
>>>
>>>> On 06/02/2011 08:24 AM, Jiri Denemark wrote:
>>>>> On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
>>>>>> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
>>>>>>>>> B. query-stop-reason
>>>>>>>>> --------------------
>>>>>>>>>
>>>>>>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
>>>>>>>>> argument, so we could store it somewhere and return it as a string, like:
>>>>>>>>>
>>>>>>>>> ->      { "execute": "query-stop-reason" }
>>>>>>>>> <- { "return": { "reason": "user" } }
>>>>>>>>>
>>>>>>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
>>>>>>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
>>>>>>>>> "migrate".
>>>>>>>>>
>>>>>>>>> Also note that we have a STOP event. It should be extended with the
>>>>>>>>> stop reason too, for completeness.
>>>>>>>>
>>>>>>>>
>>>>>>>> Can we just extend query-block?
>>>>>>>
>>>>>>> Primarily we want 'query-stop-reason' to tell us what caused the VM
>>>>>>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
>>>>>>> be used to find out which particular block device(s) caused the IO
>>>>>>> error to occurr&     get the "reason" that was in the BLOCK_IO_ERROR
>>>>>>> event.
>>>>>>
>>>>>> My concern is that we're over abstracting here.  We're not going to add
>>>>>> additional stop reasons in the future.
>>>>>>
>>>>>> Maybe just add an 'io-error': True to query-state.
>>>>>
>>>>> Sure, adding a new field to query-state response would work as well. And it
>>>>> seems like a good idea to me since one already needs to call query-status to
>>>>> check if CPUs are stopped or not so it makes sense to incorporate the
>>>>> additional information there as well. And if you want to be safe for the
>>>>> future, the new field doesn't have to be boolean 'io-error' but it can be the
>>>>> string 'reason' which Luiz suggested above.
>>>>
>>>>
>>>> String enumerations are a Bad Thing.  It's impossible to figure out what
>>>> strings are valid and it lacks type safety.
>>>>
>>>> Adding more booleans provides better type safety, and when we move to
>>>> QAPI with a queryable schema, provides a way to figure out exactly what
>>>> combinations are supported by QEMU.
>>>
>>> To summarize:
>>>
>>>    1. Add a 'io-error' field to query-status (which is only present if
>>>       field 'running' is false)
>>
>> It may or may not be present.  Lack of presence does not tell you anything.
>>
>> It is only true when running is false AND the guest was stopped because
>> of an io error.
>
> Right.
>
>>>
>>>    2. Extend query-block to contain error information associated with the
>>>       device. This is interesting, because this information will be available
>>>       even if the error didn't cause the VM to stop
>>
>> Well we need at least some way to indicate that a block device is in a
>> failed state.  For instance, if you have two block device, but you miss
>> the IO_ERROR event, you need to figure out which of the two devices is
>> giving errors.
>
> Can't query-block be used for that? The 'io-error' key will only be present
> for the failing device(s).

Yes, I think we're in violent agreement.

Regards,

Anthony Liguori

>
>>
>> But I was thinking of something that had the semantics of, last_iop_failed.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> Seems good enough to me, comments?
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 19:13           ` Luiz Capitulino
@ 2011-06-02 20:03             ` Anthony Liguori
  2011-06-02 20:13               ` [Qemu-devel] [libvirt] " Eric Blake
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-02 20:03 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, libvir-list, Stefan Hajnoczi, qemu-devel,
	Markus Armbruster, jdenemar

On 06/02/2011 02:13 PM, Luiz Capitulino wrote:
> On Thu, 02 Jun 2011 13:33:52 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>> On 06/02/2011 01:09 PM, Luiz Capitulino wrote:
>>> On Thu, 02 Jun 2011 13:00:04 -0500
>>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
>>>
>>>> On 06/02/2011 12:57 PM, Luiz Capitulino wrote:
>>>>> On Wed, 01 Jun 2011 16:35:03 -0500
>>>>> Anthony Liguori<anthony@codemonkey.ws>    wrote:
>>>>>
>>>>>> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
>>>>>>> Hi there,
>>>>>>>
>>>>>>> There are people who want to use QMP for thin provisioning. That's, the VM is
>>>>>>> started with a small storage and when a no space error is triggered, more space
>>>>>>> is allocated and the VM is put to run again.
>>>>>>>
>>>>>>> QMP has two limitations that prevent people from doing this today:
>>>>>>>
>>>>>>> 1. The BLOCK_IO_ERROR doesn't contain error information
>>>>>>>
>>>>>>> 2. Considering we solve item 1, we still have to provide a way for clients
>>>>>>>        to query why a VM stopped. This is needed because clients may miss the
>>>>>>>        BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
>>>>>>>
>>>>>>> A proposal to solve both problems follow.
>>>>>>>
>>>>>>> A. BLOCK_IO_ERROR information
>>>>>>> -----------------------------
>>>>>>>
>>>>>>> We already have discussed this a lot, but didn't reach a consensus. My solution
>>>>>>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
>>>>>>> for example (see the "reason" key):
>>>>>>>
>>>>>>> { "event": "BLOCK_IO_ERROR",
>>>>>>>        "data": { "device": "ide0-hd1",
>>>>>>>                  "operation": "write",
>>>>>>>                  "action": "stop",
>>>>>>>                  "reason": "enospc", }
>>>>>>
>>>>>> you can call the reason whatever you want, but don't call it stringfied
>>>>>> errno name :-)
>>>>>>
>>>>>> In fact, just make reason "no space".
>>>>>
>>>>> You mean, we should do:
>>>>>
>>>>>      "reason": "no space"
>>>>>
>>>>> Or that we should make it a boolean, like:
>>>>>
>>>>>     "no space": true
>>>>
>>>>
>>>> Do we need reason in BLOCK_IO_ERROR if query-block returns this information?
>>>
>>> True, no.
>>>
>>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>>> we should make "reason" a dictionary so that we can group related
>>>>> information when we extend the field, for example:
>>>>>
>>>>>     "reason": { "no space": false, "no permission": true }
>>>>
>>>> Why would we ever have "no permission"?
>>
>> Why did it happen?  It's not clear to me when read/write would return
>> EPERM.  open() should fail.  In fact, EPERM is not mentioned in man 2 read.
>
> Actually, the error was an EACCESS which might sound more bizarre :)
>
> What happened was that the device file in question had its permission
> changed during VM execution due to a bug somewhere else. I'm not sure if
> the error was returned in a read() or write() (Kevin might have more details).

Strange, EACCES should only happen on open().  Is it possible that 
somehow a reopen was happening?

> This is a bit extreme and I'd agree it's arguable whether or not we should
> report EACCESS, but I had this in mind and ended up mentioning it...

If we can't explain why an error would occur, we shouldn't make it part 
of the protocol :-)

> Maybe libvirt guys could provide more input wrt the error reason usage.
> If we don't have valid use cases for other errors, then I'll agree that
> providing only "no space" is enough.

Definitely!  Adding libvirt to the CC to help encourage their input.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [libvirt] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 20:03             ` Anthony Liguori
@ 2011-06-02 20:13               ` Eric Blake
  2011-06-02 20:55                 ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Blake @ 2011-06-02 20:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, libvir-list, jdenemar, qemu-devel, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]

On 06/02/2011 02:03 PM, Anthony Liguori wrote:
>>>>>>>> { "event": "BLOCK_IO_ERROR",
>>>>>>>>        "data": { "device": "ide0-hd1",
>>>>>>>>                  "operation": "write",
>>>>>>>>                  "action": "stop",
>>>>>>>>                  "reason": "enospc", }
>>>>>>>

>>>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>>>> we should make "reason" a dictionary so that we can group related
>>>>>> information when we extend the field, for example:
>>>>>>
>>>>>>     "reason": { "no space": false, "no permission": true }

The idea for an event with details certainly has merit.

>>>>>
>>>>> Why would we ever have "no permission"?

SELinux denial, perhaps?

> 
>> Maybe libvirt guys could provide more input wrt the error reason usage.
>> If we don't have valid use cases for other errors, then I'll agree that
>> providing only "no space" is enough.
> 
> Definitely!  Adding libvirt to the CC to help encourage their input.

We could always start with just one reason "no space", and add more
later if and when we come up for use cases.

-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] [libvirt] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 20:13               ` [Qemu-devel] [libvirt] " Eric Blake
@ 2011-06-02 20:55                 ` Anthony Liguori
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-02 20:55 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, libvir-list, jdenemar, qemu-devel, Luiz Capitulino

On 06/02/2011 03:13 PM, Eric Blake wrote:
> On 06/02/2011 02:03 PM, Anthony Liguori wrote:
>>>>>>>>> { "event": "BLOCK_IO_ERROR",
>>>>>>>>>         "data": { "device": "ide0-hd1",
>>>>>>>>>                   "operation": "write",
>>>>>>>>>                   "action": "stop",
>>>>>>>>>                   "reason": "enospc", }
>>>>>>>>
>
>>>>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>>>>> we should make "reason" a dictionary so that we can group related
>>>>>>> information when we extend the field, for example:
>>>>>>>
>>>>>>>      "reason": { "no space": false, "no permission": true }
>
> The idea for an event with details certainly has merit.

In an ideal would, would would just embed the BlockDeviceInfo structure 
in the event and call it a day.  In a less than ideal world, I think 
it's better to make a call to query-block after having received the 
BLOCK_IO_ERROR event.

>
>>>>>>
>>>>>> Why would we ever have "no permission"?
>
> SELinux denial, perhaps?

Maybe, but that would take a considerable amount of magic to make happen 
in practice.  Really only sounds plausible as an sVirt bug.  I think you 
could only make this happen if you you doing dynamic labelling.

Regards,

Anthony Liguori

>
>>
>>> Maybe libvirt guys could provide more input wrt the error reason usage.
>>> If we don't have valid use cases for other errors, then I'll agree that
>>> providing only "no space" is enough.
>>
>> Definitely!  Adding libvirt to the CC to help encourage their input.
>
> We could always start with just one reason "no space", and add more
> later if and when we come up for use cases.
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 18:01           ` Luiz Capitulino
  2011-06-02 18:32             ` Anthony Liguori
@ 2011-06-03  9:26             ` Daniel P. Berrange
  2011-06-03 12:43               ` Anthony Liguori
  1 sibling, 1 reply; 48+ messages in thread
From: Daniel P. Berrange @ 2011-06-03  9:26 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	Jiri Denemark

On Thu, Jun 02, 2011 at 03:01:24PM -0300, Luiz Capitulino wrote:
> On Thu, 02 Jun 2011 09:02:30 -0500
> Anthony Liguori <anthony@codemonkey.ws> wrote:
> 
> > On 06/02/2011 08:24 AM, Jiri Denemark wrote:
> > > On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
> > >> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
> > >>>>> B. query-stop-reason
> > >>>>> --------------------
> > >>>>>
> > >>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
> > >>>>> argument, so we could store it somewhere and return it as a string, like:
> > >>>>>
> > >>>>> ->    { "execute": "query-stop-reason" }
> > >>>>> <- { "return": { "reason": "user" } }
> > >>>>>
> > >>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
> > >>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
> > >>>>> "migrate".
> > >>>>>
> > >>>>> Also note that we have a STOP event. It should be extended with the
> > >>>>> stop reason too, for completeness.
> > >>>>
> > >>>>
> > >>>> Can we just extend query-block?
> > >>>
> > >>> Primarily we want 'query-stop-reason' to tell us what caused the VM
> > >>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
> > >>> be used to find out which particular block device(s) caused the IO
> > >>> error to occurr&   get the "reason" that was in the BLOCK_IO_ERROR
> > >>> event.
> > >>
> > >> My concern is that we're over abstracting here.  We're not going to add
> > >> additional stop reasons in the future.
> > >>
> > >> Maybe just add an 'io-error': True to query-state.
> > >
> > > Sure, adding a new field to query-state response would work as well. And it
> > > seems like a good idea to me since one already needs to call query-status to
> > > check if CPUs are stopped or not so it makes sense to incorporate the
> > > additional information there as well. And if you want to be safe for the
> > > future, the new field doesn't have to be boolean 'io-error' but it can be the
> > > string 'reason' which Luiz suggested above.
> > 
> > 
> > String enumerations are a Bad Thing.  It's impossible to figure out what 
> > strings are valid and it lacks type safety.
> > 
> > Adding more booleans provides better type safety, and when we move to 
> > QAPI with a queryable schema, provides a way to figure out exactly what 
> > combinations are supported by QEMU.
> 
> To summarize:
> 
>  1. Add a 'io-error' field to query-status (which is only present if
>     field 'running' is false)

This isn't really enough. There are many reasons why a VM may have
transitioned to the paused state, of which IO Error is merely one.
The query-status needs to be able to report what the reason for
the transitioning to the paused state is.

>  2. Extend query-block to contain error information associated with the
>     device. This is interesting, because this information will be available
>     even if the error didn't cause the VM to stop

Apps will only look for this data, if the query-status stop reason
indicates that it was stopped due to an I/O error.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03  9:26             ` Daniel P. Berrange
@ 2011-06-03 12:43               ` Anthony Liguori
  2011-06-03 12:57                 ` Daniel P. Berrange
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-03 12:43 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	Jiri Denemark, Luiz Capitulino

On 06/03/2011 04:26 AM, Daniel P. Berrange wrote:
> On Thu, Jun 02, 2011 at 03:01:24PM -0300, Luiz Capitulino wrote:
>> On Thu, 02 Jun 2011 09:02:30 -0500
>> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>>
>>> On 06/02/2011 08:24 AM, Jiri Denemark wrote:
>>>> On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
>>>>> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
>>>>>>>> B. query-stop-reason
>>>>>>>> --------------------
>>>>>>>>
>>>>>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
>>>>>>>> argument, so we could store it somewhere and return it as a string, like:
>>>>>>>>
>>>>>>>> ->     { "execute": "query-stop-reason" }
>>>>>>>> <- { "return": { "reason": "user" } }
>>>>>>>>
>>>>>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
>>>>>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
>>>>>>>> "migrate".
>>>>>>>>
>>>>>>>> Also note that we have a STOP event. It should be extended with the
>>>>>>>> stop reason too, for completeness.
>>>>>>>
>>>>>>>
>>>>>>> Can we just extend query-block?
>>>>>>
>>>>>> Primarily we want 'query-stop-reason' to tell us what caused the VM
>>>>>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
>>>>>> be used to find out which particular block device(s) caused the IO
>>>>>> error to occurr&    get the "reason" that was in the BLOCK_IO_ERROR
>>>>>> event.
>>>>>
>>>>> My concern is that we're over abstracting here.  We're not going to add
>>>>> additional stop reasons in the future.
>>>>>
>>>>> Maybe just add an 'io-error': True to query-state.
>>>>
>>>> Sure, adding a new field to query-state response would work as well. And it
>>>> seems like a good idea to me since one already needs to call query-status to
>>>> check if CPUs are stopped or not so it makes sense to incorporate the
>>>> additional information there as well. And if you want to be safe for the
>>>> future, the new field doesn't have to be boolean 'io-error' but it can be the
>>>> string 'reason' which Luiz suggested above.
>>>
>>>
>>> String enumerations are a Bad Thing.  It's impossible to figure out what
>>> strings are valid and it lacks type safety.
>>>
>>> Adding more booleans provides better type safety, and when we move to
>>> QAPI with a queryable schema, provides a way to figure out exactly what
>>> combinations are supported by QEMU.
>>
>> To summarize:
>>
>>   1. Add a 'io-error' field to query-status (which is only present if
>>      field 'running' is false)
>
> This isn't really enough. There are many reasons why a VM may have
> transitioned to the paused state, of which IO Error is merely one.
> The query-status needs to be able to report what the reason for
> the transitioning to the paused state is.

No, there's only two reasons:

1) IO Error (and user configured pause on I/O error)

2) The result of some user action (an explicit stop, live migration, etc.)

The fact that all of these things call vm_stop() internal is an 
implementation detail.  Adding a string parameter to vm_stop() of a 
reason may seem like an easy thing to do but you're taking something 
that is an internal concept in QEMU and making it part of an interface 
that needs to be supported forever.

That's why I'm suggesting modelling a user visible concept (I/O errors 
stop a guest) instead of trying to model an internal QEMU concept 
(vm_stop()).

If you have other user visible concepts that you want to know about, 
please share the use-cases and we can think about how to model it such 
that it's not exposing internal QEMU details.

Regards,

Anthony Liguori

>
>>   2. Extend query-block to contain error information associated with the
>>      device. This is interesting, because this information will be available
>>      even if the error didn't cause the VM to stop
>
> Apps will only look for this data, if the query-status stop reason
> indicates that it was stopped due to an I/O error.
>
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 12:43               ` Anthony Liguori
@ 2011-06-03 12:57                 ` Daniel P. Berrange
  2011-06-03 13:26                   ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel P. Berrange @ 2011-06-03 12:57 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	Jiri Denemark, Luiz Capitulino

On Fri, Jun 03, 2011 at 07:43:24AM -0500, Anthony Liguori wrote:
> On 06/03/2011 04:26 AM, Daniel P. Berrange wrote:
> >On Thu, Jun 02, 2011 at 03:01:24PM -0300, Luiz Capitulino wrote:
> >>On Thu, 02 Jun 2011 09:02:30 -0500
> >>Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >>
> >>>On 06/02/2011 08:24 AM, Jiri Denemark wrote:
> >>>>On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
> >>>>>On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
> >>>>>>>>B. query-stop-reason
> >>>>>>>>--------------------
> >>>>>>>>
> >>>>>>>>I also have a simple solution for item 2. The vm_stop() accepts a reason
> >>>>>>>>argument, so we could store it somewhere and return it as a string, like:
> >>>>>>>>
> >>>>>>>>->     { "execute": "query-stop-reason" }
> >>>>>>>><- { "return": { "reason": "user" } }
> >>>>>>>>
> >>>>>>>>Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
> >>>>>>>>this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
> >>>>>>>>"migrate".
> >>>>>>>>
> >>>>>>>>Also note that we have a STOP event. It should be extended with the
> >>>>>>>>stop reason too, for completeness.
> >>>>>>>
> >>>>>>>
> >>>>>>>Can we just extend query-block?
> >>>>>>
> >>>>>>Primarily we want 'query-stop-reason' to tell us what caused the VM
> >>>>>>CPUs to stop. If that reason was 'ioerror', then 'query-block' could
> >>>>>>be used to find out which particular block device(s) caused the IO
> >>>>>>error to occurr&    get the "reason" that was in the BLOCK_IO_ERROR
> >>>>>>event.
> >>>>>
> >>>>>My concern is that we're over abstracting here.  We're not going to add
> >>>>>additional stop reasons in the future.
> >>>>>
> >>>>>Maybe just add an 'io-error': True to query-state.
> >>>>
> >>>>Sure, adding a new field to query-state response would work as well. And it
> >>>>seems like a good idea to me since one already needs to call query-status to
> >>>>check if CPUs are stopped or not so it makes sense to incorporate the
> >>>>additional information there as well. And if you want to be safe for the
> >>>>future, the new field doesn't have to be boolean 'io-error' but it can be the
> >>>>string 'reason' which Luiz suggested above.
> >>>
> >>>
> >>>String enumerations are a Bad Thing.  It's impossible to figure out what
> >>>strings are valid and it lacks type safety.
> >>>
> >>>Adding more booleans provides better type safety, and when we move to
> >>>QAPI with a queryable schema, provides a way to figure out exactly what
> >>>combinations are supported by QEMU.
> >>
> >>To summarize:
> >>
> >>  1. Add a 'io-error' field to query-status (which is only present if
> >>     field 'running' is false)
> >
> >This isn't really enough. There are many reasons why a VM may have
> >transitioned to the paused state, of which IO Error is merely one.
> >The query-status needs to be able to report what the reason for
> >the transitioning to the paused state is.
> 
> No, there's only two reasons:
> 
> 1) IO Error (and user configured pause on I/O error)
> 
> 2) The result of some user action (an explicit stop, live migration, etc.)
> 
> The fact that all of these things call vm_stop() internal is an
> implementation detail.  Adding a string parameter to vm_stop() of a
> reason may seem like an easy thing to do but you're taking something
> that is an internal concept in QEMU and making it part of an
> interface that needs to be supported forever.
> 
> That's why I'm suggesting modelling a user visible concept (I/O
> errors stop a guest) instead of trying to model an internal QEMU
> concept (vm_stop()).
> 
> If you have other user visible concepts that you want to know about,
> please share the use-cases and we can think about how to model it
> such that it's not exposing internal QEMU details.

None of the requested info is exposing internal QEMU impl details
with one exception. The reasons are either administrative commands,
host OS failures, guest OS failures, or the exception, KVM internal
emulation failure.

The core problem is that an app connects to QEMU, finds it is paused,
and wants to decide what action to take. If the guest is paused due
to a previous admin 'stop' command, it will allow resuming. If it is
paused due to guest OS poweroff, it might decide to issue a 'system_reset'
command and then 'resume'. If it is paused due to watchdog, it might
decide it wants to pmemsave the guest OS, and then system_reset+resume.
If it is paused because KVM hit an emulation failure, it may wish to
attach to the debugger interface and capture VM/QEMU state.

The other problem is that a sysadmin finds a guest unexpectedly paused,
and the mgmt app can't tell it why and they want to troubleshoot the
problem. QEMU should be able to tell the sysadmin why it is in this
state, so they can proceed with trouble shooting in a suitable direction,
whether the host OS, KVM itself, or the guest OS, or the mgt tool.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 12:57                 ` Daniel P. Berrange
@ 2011-06-03 13:26                   ` Anthony Liguori
  2011-06-03 13:39                     ` Daniel P. Berrange
  2011-06-03 13:41                     ` Jan Kiszka
  0 siblings, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-03 13:26 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Luiz Capitulino,
	Jiri Denemark, Markus Armbruster

On 06/03/2011 07:57 AM, Daniel P. Berrange wrote:
> On Fri, Jun 03, 2011 at 07:43:24AM -0500, Anthony Liguori wrote:
>> On 06/03/2011 04:26 AM, Daniel P. Berrange wrote:
>> errors stop a guest) instead of trying to model an internal QEMU
>> concept (vm_stop()).
>>
>> If you have other user visible concepts that you want to know about,
>> please share the use-cases and we can think about how to model it
>> such that it's not exposing internal QEMU details.
>
> None of the requested info is exposing internal QEMU impl details
> with one exception. The reasons are either administrative commands,
> host OS failures, guest OS failures, or the exception, KVM internal
> emulation failure.
>
> The core problem is that an app connects to QEMU, finds it is paused,
> and wants to decide what action to take. If the guest is paused due
> to a previous admin 'stop' command,

Let's be very clear here.  QEMU does not provide a way to figure out 
what the previous QMP user did.  That is not a use case we support today 
and it's not one we can support by just adding a reason to stop.  It's 
far more complicated than just that.

> it will allow resuming. If it is
> paused due to guest OS poweroff,

This is legitimate but only occurs if you use -no-shutdown.  So having a 
query-state have a "powered-off" flag would be a Good Thing.

> it might decide to issue a 'system_reset'
> command and then 'resume'. If it is paused due to watchdog,

I think what we're getting at is the need for an enumeration.  So let's 
introduce one.  Here's what I propose:

SQMP
query-status
------------

Return a json-object with the following information:

- "running": true if the VM is running, or false if it is paused (json-bool)
- "singlestep": true if the VM is in single step mode,
                 false otherwise (json-bool)
- "status": one of the following values (json-string) (optional)
       "prelaunch" - QEMU was started with -S and guest has not started
       "running" - guest is actively running
       "singlestep" - guest is running in single step mode
       "paused" - guest has been paused via the 'stop' command
       "postmigrate" - guest is paused following a successful 'migrate'
       "shutdown" - guest is shut down (and -no-shutdown is in use)
       "io-error" - the last IOP has failed and the device is configured 
to pause on I/O errors
       "watchdog-error" - the watchdog action is configured to pause and 
has been triggered

Example:

-> { "execute": "query-status" }
<- { "return": { "running": true, "singlestep": false, "status": 
"running" } }

EQMP

Regards,

Anthony Liguori

  it might
> decide it wants to pmemsave the guest OS, and then system_reset+resume.
> If it is paused because KVM hit an emulation failure, it may wish to
> attach to the debugger interface and capture VM/QEMU state.
>
> The other problem is that a sysadmin finds a guest unexpectedly paused,
> and the mgmt app can't tell it why and they want to troubleshoot the
> problem. QEMU should be able to tell the sysadmin why it is in this
> state, so they can proceed with trouble shooting in a suitable direction,
> whether the host OS, KVM itself, or the guest OS, or the mgt tool.
>
> Daniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 13:26                   ` Anthony Liguori
@ 2011-06-03 13:39                     ` Daniel P. Berrange
  2011-06-03 13:44                       ` Luiz Capitulino
  2011-06-03 13:41                     ` Jan Kiszka
  1 sibling, 1 reply; 48+ messages in thread
From: Daniel P. Berrange @ 2011-06-03 13:39 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Luiz Capitulino,
	Jiri Denemark, Markus Armbruster

On Fri, Jun 03, 2011 at 08:26:56AM -0500, Anthony Liguori wrote:
> On 06/03/2011 07:57 AM, Daniel P. Berrange wrote:
> >On Fri, Jun 03, 2011 at 07:43:24AM -0500, Anthony Liguori wrote:
> >>On 06/03/2011 04:26 AM, Daniel P. Berrange wrote:
> >>errors stop a guest) instead of trying to model an internal QEMU
> >>concept (vm_stop()).
> >>
> >>If you have other user visible concepts that you want to know about,
> >>please share the use-cases and we can think about how to model it
> >>such that it's not exposing internal QEMU details.
> >
> >None of the requested info is exposing internal QEMU impl details
> >with one exception. The reasons are either administrative commands,
> >host OS failures, guest OS failures, or the exception, KVM internal
> >emulation failure.
> >
> >The core problem is that an app connects to QEMU, finds it is paused,
> >and wants to decide what action to take. If the guest is paused due
> >to a previous admin 'stop' command,
> 
> Let's be very clear here.  QEMU does not provide a way to figure out
> what the previous QMP user did.  That is not a use case we support
> today and it's not one we can support by just adding a reason to
> stop.  It's far more complicated than just that.
> 
> >it will allow resuming. If it is
> >paused due to guest OS poweroff,
> 
> This is legitimate but only occurs if you use -no-shutdown.  So
> having a query-state have a "powered-off" flag would be a Good
> Thing.
> 
> >it might decide to issue a 'system_reset'
> >command and then 'resume'. If it is paused due to watchdog,
> 
> I think what we're getting at is the need for an enumeration.  So
> let's introduce one.  Here's what I propose:
> 
> SQMP
> query-status
> ------------
> 
> Return a json-object with the following information:
> 
> - "running": true if the VM is running, or false if it is paused (json-bool)
> - "singlestep": true if the VM is in single step mode,
>                 false otherwise (json-bool)
> - "status": one of the following values (json-string) (optional)
>       "prelaunch" - QEMU was started with -S and guest has not started
>       "running" - guest is actively running
>       "singlestep" - guest is running in single step mode
>       "paused" - guest has been paused via the 'stop' command
>       "postmigrate" - guest is paused following a successful 'migrate'
>       "shutdown" - guest is shut down (and -no-shutdown is in use)
>       "io-error" - the last IOP has failed and the device is
> configured to pause on I/O errors
>       "watchdog-error" - the watchdog action is configured to pause
> and has been triggered

Perhaps I didn't communicate well, but this pretty much matches
what I was trying to ask for in my previous message, so gets
my vote!


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 13:26                   ` Anthony Liguori
  2011-06-03 13:39                     ` Daniel P. Berrange
@ 2011-06-03 13:41                     ` Jan Kiszka
  2011-06-03 13:51                       ` Anthony Liguori
  1 sibling, 1 reply; 48+ messages in thread
From: Jan Kiszka @ 2011-06-03 13:41 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Luiz Capitulino,
	Jiri Denemark, Markus Armbruster

On 2011-06-03 15:26, Anthony Liguori wrote:
> I think what we're getting at is the need for an enumeration.  So let's
> introduce one.  Here's what I propose:
> 
> SQMP
> query-status
> ------------
> 
> Return a json-object with the following information:
> 
> - "running": true if the VM is running, or false if it is paused
> (json-bool)
> - "singlestep": true if the VM is in single step mode,
>                 false otherwise (json-bool)
> - "status": one of the following values (json-string) (optional)
>       "prelaunch" - QEMU was started with -S and guest has not started
>       "running" - guest is actively running
>       "singlestep" - guest is running in single step mode

"singlestep" is just a subset of "debug" stops. Better use the latter.

>       "paused" - guest has been paused via the 'stop' command
>       "postmigrate" - guest is paused following a successful 'migrate'
>       "shutdown" - guest is shut down (and -no-shutdown is in use)
>       "io-error" - the last IOP has failed and the device is configured
> to pause on I/O errors
>       "watchdog-error" - the watchdog action is configured to pause and
> has been triggered

And "panic" or "internal-error".

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 13:39                     ` Daniel P. Berrange
@ 2011-06-03 13:44                       ` Luiz Capitulino
  2011-06-03 14:01                         ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-03 13:44 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	Jiri Denemark

On Fri, 3 Jun 2011 14:39:41 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Fri, Jun 03, 2011 at 08:26:56AM -0500, Anthony Liguori wrote:
> > On 06/03/2011 07:57 AM, Daniel P. Berrange wrote:
> > >On Fri, Jun 03, 2011 at 07:43:24AM -0500, Anthony Liguori wrote:
> > >>On 06/03/2011 04:26 AM, Daniel P. Berrange wrote:
> > >>errors stop a guest) instead of trying to model an internal QEMU
> > >>concept (vm_stop()).
> > >>
> > >>If you have other user visible concepts that you want to know about,
> > >>please share the use-cases and we can think about how to model it
> > >>such that it's not exposing internal QEMU details.
> > >
> > >None of the requested info is exposing internal QEMU impl details
> > >with one exception. The reasons are either administrative commands,
> > >host OS failures, guest OS failures, or the exception, KVM internal
> > >emulation failure.
> > >
> > >The core problem is that an app connects to QEMU, finds it is paused,
> > >and wants to decide what action to take. If the guest is paused due
> > >to a previous admin 'stop' command,
> > 
> > Let's be very clear here.  QEMU does not provide a way to figure out
> > what the previous QMP user did.  That is not a use case we support
> > today and it's not one we can support by just adding a reason to
> > stop.  It's far more complicated than just that.
> > 
> > >it will allow resuming. If it is
> > >paused due to guest OS poweroff,
> > 
> > This is legitimate but only occurs if you use -no-shutdown.  So
> > having a query-state have a "powered-off" flag would be a Good
> > Thing.
> > 
> > >it might decide to issue a 'system_reset'
> > >command and then 'resume'. If it is paused due to watchdog,
> > 
> > I think what we're getting at is the need for an enumeration.  So
> > let's introduce one.  Here's what I propose:
> > 
> > SQMP
> > query-status
> > ------------
> > 
> > Return a json-object with the following information:
> > 
> > - "running": true if the VM is running, or false if it is paused (json-bool)
> > - "singlestep": true if the VM is in single step mode,
> >                 false otherwise (json-bool)
> > - "status": one of the following values (json-string) (optional)
> >       "prelaunch" - QEMU was started with -S and guest has not started
> >       "running" - guest is actively running
> >       "singlestep" - guest is running in single step mode
> >       "paused" - guest has been paused via the 'stop' command
> >       "postmigrate" - guest is paused following a successful 'migrate'
> >       "shutdown" - guest is shut down (and -no-shutdown is in use)
> >       "io-error" - the last IOP has failed and the device is
> > configured to pause on I/O errors
> >       "watchdog-error" - the watchdog action is configured to pause
> > and has been triggered
> 
> Perhaps I didn't communicate well, but this pretty much matches
> what I was trying to ask for in my previous message, so gets
> my vote!

Mine too, I like it. Expect patches next week :)

My only comment is that, in case this an improved version of
query-status we could have a new command (like query-statys2 or
query-vm-status).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 13:41                     ` Jan Kiszka
@ 2011-06-03 13:51                       ` Anthony Liguori
  2011-06-03 13:59                         ` Jan Kiszka
  2011-06-03 14:03                         ` Daniel P. Berrange
  0 siblings, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-03 13:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	Jiri Denemark, Luiz Capitulino

On 06/03/2011 08:41 AM, Jan Kiszka wrote:
> On 2011-06-03 15:26, Anthony Liguori wrote:
>> I think what we're getting at is the need for an enumeration.  So let's
>> introduce one.  Here's what I propose:
>>
>> SQMP
>> query-status
>> ------------
>>
>> Return a json-object with the following information:
>>
>> - "running": true if the VM is running, or false if it is paused
>> (json-bool)
>> - "singlestep": true if the VM is in single step mode,
>>                  false otherwise (json-bool)
>> - "status": one of the following values (json-string) (optional)
>>        "prelaunch" - QEMU was started with -S and guest has not started
>>        "running" - guest is actively running
>>        "singlestep" - guest is running in single step mode
>
> "singlestep" is just a subset of "debug" stops. Better use the latter.
>
>>        "paused" - guest has been paused via the 'stop' command
>>        "postmigrate" - guest is paused following a successful 'migrate'
>>        "shutdown" - guest is shut down (and -no-shutdown is in use)
>>        "io-error" - the last IOP has failed and the device is configured
>> to pause on I/O errors
>>        "watchdog-error" - the watchdog action is configured to pause and
>> has been triggered
>
> And "panic" or "internal-error".

Can you add the request help spec text too?  Is "internal-error" a KVM 
emulation error?  If so, I'd rather make it "kvm-emulation-error".

Regards,

Anthony Liguori

>
> Jan
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 13:51                       ` Anthony Liguori
@ 2011-06-03 13:59                         ` Jan Kiszka
  2011-06-03 14:03                         ` Daniel P. Berrange
  1 sibling, 0 replies; 48+ messages in thread
From: Jan Kiszka @ 2011-06-03 13:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	Jiri Denemark, Luiz Capitulino

On 2011-06-03 15:51, Anthony Liguori wrote:
> On 06/03/2011 08:41 AM, Jan Kiszka wrote:
>> On 2011-06-03 15:26, Anthony Liguori wrote:
>>> I think what we're getting at is the need for an enumeration.  So let's
>>> introduce one.  Here's what I propose:
>>>
>>> SQMP
>>> query-status
>>> ------------
>>>
>>> Return a json-object with the following information:
>>>
>>> - "running": true if the VM is running, or false if it is paused
>>> (json-bool)
>>> - "singlestep": true if the VM is in single step mode,
>>>                  false otherwise (json-bool)
>>> - "status": one of the following values (json-string) (optional)
>>>        "prelaunch" - QEMU was started with -S and guest has not started
>>>        "running" - guest is actively running
>>>        "singlestep" - guest is running in single step mode
>>
>> "singlestep" is just a subset of "debug" stops. Better use the latter.
>>
>>>        "paused" - guest has been paused via the 'stop' command
>>>        "postmigrate" - guest is paused following a successful 'migrate'
>>>        "shutdown" - guest is shut down (and -no-shutdown is in use)
>>>        "io-error" - the last IOP has failed and the device is configured
>>> to pause on I/O errors
>>>        "watchdog-error" - the watchdog action is configured to pause and
>>> has been triggered
>>
>> And "panic" or "internal-error".
> 
> Can you add the request help spec text too?  Is "internal-error" a KVM 
> emulation error?  If so, I'd rather make it "kvm-emulation-error".

It would currently map on VMSTOP_PANIC, which is KVM_EXIT_UNKNOWN,
KVM_EXIT_INTERNAL_ERROR or something arch-specific. Moreover, people may
put TCG errors or whatever under this stop reason in the future. If we
wanted "kvm-emulation-error", we would have to introduce
VMSTOP_KVM_EMULATION_ERROR.

But I think internal error is sufficiently precise for this purpose:
"Fatal internal error that prevents further guest execution."

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 13:44                       ` Luiz Capitulino
@ 2011-06-03 14:01                         ` Anthony Liguori
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-03 14:01 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster,
	Jiri Denemark

On 06/03/2011 08:44 AM, Luiz Capitulino wrote:
> On Fri, 3 Jun 2011 14:39:41 +0100
> "Daniel P. Berrange"<berrange@redhat.com>  wrote:
>
>> On Fri, Jun 03, 2011 at 08:26:56AM -0500, Anthony Liguori wrote:
>>> On 06/03/2011 07:57 AM, Daniel P. Berrange wrote:
>>>> On Fri, Jun 03, 2011 at 07:43:24AM -0500, Anthony Liguori wrote:
>>>>> On 06/03/2011 04:26 AM, Daniel P. Berrange wrote:
>>>>> errors stop a guest) instead of trying to model an internal QEMU
>>>>> concept (vm_stop()).
>>>>>
>>>>> If you have other user visible concepts that you want to know about,
>>>>> please share the use-cases and we can think about how to model it
>>>>> such that it's not exposing internal QEMU details.
>>>>
>>>> None of the requested info is exposing internal QEMU impl details
>>>> with one exception. The reasons are either administrative commands,
>>>> host OS failures, guest OS failures, or the exception, KVM internal
>>>> emulation failure.
>>>>
>>>> The core problem is that an app connects to QEMU, finds it is paused,
>>>> and wants to decide what action to take. If the guest is paused due
>>>> to a previous admin 'stop' command,
>>>
>>> Let's be very clear here.  QEMU does not provide a way to figure out
>>> what the previous QMP user did.  That is not a use case we support
>>> today and it's not one we can support by just adding a reason to
>>> stop.  It's far more complicated than just that.
>>>
>>>> it will allow resuming. If it is
>>>> paused due to guest OS poweroff,
>>>
>>> This is legitimate but only occurs if you use -no-shutdown.  So
>>> having a query-state have a "powered-off" flag would be a Good
>>> Thing.
>>>
>>>> it might decide to issue a 'system_reset'
>>>> command and then 'resume'. If it is paused due to watchdog,
>>>
>>> I think what we're getting at is the need for an enumeration.  So
>>> let's introduce one.  Here's what I propose:
>>>
>>> SQMP
>>> query-status
>>> ------------
>>>
>>> Return a json-object with the following information:
>>>
>>> - "running": true if the VM is running, or false if it is paused (json-bool)
>>> - "singlestep": true if the VM is in single step mode,
>>>                  false otherwise (json-bool)
>>> - "status": one of the following values (json-string) (optional)
>>>        "prelaunch" - QEMU was started with -S and guest has not started
>>>        "running" - guest is actively running
>>>        "singlestep" - guest is running in single step mode
>>>        "paused" - guest has been paused via the 'stop' command
>>>        "postmigrate" - guest is paused following a successful 'migrate'
>>>        "shutdown" - guest is shut down (and -no-shutdown is in use)
>>>        "io-error" - the last IOP has failed and the device is
>>> configured to pause on I/O errors
>>>        "watchdog-error" - the watchdog action is configured to pause
>>> and has been triggered
>>
>> Perhaps I didn't communicate well, but this pretty much matches
>> what I was trying to ask for in my previous message, so gets
>> my vote!
>
> Mine too, I like it. Expect patches next week :)
>
> My only comment is that, in case this an improved version of
> query-status we could have a new command (like query-statys2 or
> query-vm-status).

No, let's just keep query-status and add the new field.

For compatibility, you need to fall back to using running/singlestep anyway.

Regards,

Anthony Liguori

>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-03 13:51                       ` Anthony Liguori
  2011-06-03 13:59                         ` Jan Kiszka
@ 2011-06-03 14:03                         ` Daniel P. Berrange
  1 sibling, 0 replies; 48+ messages in thread
From: Daniel P. Berrange @ 2011-06-03 14:03 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Jan Kiszka, qemu-devel, Markus Armbruster,
	Stefan Hajnoczi, Jiri Denemark, Luiz Capitulino

On Fri, Jun 03, 2011 at 08:51:29AM -0500, Anthony Liguori wrote:
> On 06/03/2011 08:41 AM, Jan Kiszka wrote:
> >On 2011-06-03 15:26, Anthony Liguori wrote:
> >>I think what we're getting at is the need for an enumeration.  So let's
> >>introduce one.  Here's what I propose:
> >>
> >>SQMP
> >>query-status
> >>------------
> >>
> >>Return a json-object with the following information:
> >>
> >>- "running": true if the VM is running, or false if it is paused
> >>(json-bool)
> >>- "singlestep": true if the VM is in single step mode,
> >>                 false otherwise (json-bool)
> >>- "status": one of the following values (json-string) (optional)
> >>       "prelaunch" - QEMU was started with -S and guest has not started
> >>       "running" - guest is actively running
> >>       "singlestep" - guest is running in single step mode
> >
> >"singlestep" is just a subset of "debug" stops. Better use the latter.
> >
> >>       "paused" - guest has been paused via the 'stop' command
> >>       "postmigrate" - guest is paused following a successful 'migrate'
> >>       "shutdown" - guest is shut down (and -no-shutdown is in use)
> >>       "io-error" - the last IOP has failed and the device is configured
> >>to pause on I/O errors
> >>       "watchdog-error" - the watchdog action is configured to pause and
> >>has been triggered
> >
> >And "panic" or "internal-error".
> 
> Can you add the request help spec text too?  Is "internal-error" a
> KVM emulation error?  If so, I'd rather make it
> "kvm-emulation-error".

Yeah, its for the kvm-all.c code which calls vm_stop(VMSTOP_PANIC);
when it gets either an internal error, or an unhandled exit.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 18:09       ` Luiz Capitulino
  2011-06-02 18:33         ` Anthony Liguori
@ 2011-06-06  9:25         ` Kevin Wolf
  2011-06-06 11:27           ` Markus Armbruster
  2011-06-06 13:08           ` Anthony Liguori
  1 sibling, 2 replies; 48+ messages in thread
From: Kevin Wolf @ 2011-06-06  9:25 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

Am 02.06.2011 20:09, schrieb Luiz Capitulino:
> On Thu, 02 Jun 2011 13:00:04 -0500
> Anthony Liguori <anthony@codemonkey.ws> wrote:
> 
>> On 06/02/2011 12:57 PM, Luiz Capitulino wrote:
>>> On Wed, 01 Jun 2011 16:35:03 -0500
>>> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>>>
>>>> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
>>>>> Hi there,
>>>>>
>>>>> There are people who want to use QMP for thin provisioning. That's, the VM is
>>>>> started with a small storage and when a no space error is triggered, more space
>>>>> is allocated and the VM is put to run again.
>>>>>
>>>>> QMP has two limitations that prevent people from doing this today:
>>>>>
>>>>> 1. The BLOCK_IO_ERROR doesn't contain error information
>>>>>
>>>>> 2. Considering we solve item 1, we still have to provide a way for clients
>>>>>      to query why a VM stopped. This is needed because clients may miss the
>>>>>      BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
>>>>>
>>>>> A proposal to solve both problems follow.
>>>>>
>>>>> A. BLOCK_IO_ERROR information
>>>>> -----------------------------
>>>>>
>>>>> We already have discussed this a lot, but didn't reach a consensus. My solution
>>>>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
>>>>> for example (see the "reason" key):
>>>>>
>>>>> { "event": "BLOCK_IO_ERROR",
>>>>>      "data": { "device": "ide0-hd1",
>>>>>                "operation": "write",
>>>>>                "action": "stop",
>>>>>                "reason": "enospc", }
>>>>
>>>> you can call the reason whatever you want, but don't call it stringfied
>>>> errno name :-)
>>>>
>>>> In fact, just make reason "no space".
>>>
>>> You mean, we should do:
>>>
>>>    "reason": "no space"
>>>
>>> Or that we should make it a boolean, like:
>>>
>>>   "no space": true
>>
>>
>> Do we need reason in BLOCK_IO_ERROR if query-block returns this information?
> 
> True, no.
> 
>>> I'm ok with either way. But in case you meant the second one, I guess
>>> we should make "reason" a dictionary so that we can group related
>>> information when we extend the field, for example:
>>>
>>>   "reason": { "no space": false, "no permission": true }

Splitting up enums into a number of booleans looks like a bad idea to
me. It makes things more verbose than they should be, and even worse, it
implies that more than one field could be true.

If this new schema thing doesn't support proper enums, that's something
that should be changed.

>>
>> Why would we ever have "no permission"?
> 
> It's an I/O error. I have a report from a developer who was getting
> the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
> it turned out to be no permission.

And I want to add that it's a PITA to handle bug report when the only
message you get from qemu is "something went wrong". Sorry, that's not
useful at all. I want to see the real error reason (and at least for
debugging this means, I want to see the errno value/string).

Finding out that it was -EACCES in fact cost me (and QA who ran into the
problem) much more time than it should have. It's simply too much that
you need to attach gdb to find out what really happened.

Kevin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-01 21:35 ` Anthony Liguori
  2011-06-02  9:06   ` Daniel P. Berrange
  2011-06-02 17:57   ` Luiz Capitulino
@ 2011-06-06 11:13   ` Markus Armbruster
  2011-06-06 11:28     ` Markus Armbruster
  2 siblings, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2011-06-06 11:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Luiz Capitulino

Anthony Liguori <anthony@codemonkey.ws> writes:

> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
>> Hi there,
>>
>> There are people who want to use QMP for thin provisioning. That's, the VM is
>> started with a small storage and when a no space error is triggered, more space
>> is allocated and the VM is put to run again.
>>
>> QMP has two limitations that prevent people from doing this today:
>>
>> 1. The BLOCK_IO_ERROR doesn't contain error information
>>
>> 2. Considering we solve item 1, we still have to provide a way for clients
>>     to query why a VM stopped. This is needed because clients may miss the
>>     BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
>>
>> A proposal to solve both problems follow.
>>
>> A. BLOCK_IO_ERROR information
>> -----------------------------
>>
>> We already have discussed this a lot, but didn't reach a consensus. My solution
>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
>> for example (see the "reason" key):
>>
>> { "event": "BLOCK_IO_ERROR",
>>     "data": { "device": "ide0-hd1",
>>               "operation": "write",
>>               "action": "stop",
>>               "reason": "enospc", }
>
> you can call the reason whatever you want, but don't call it
> stringfied errno name :-)

Care to explain why?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-02 14:02         ` Anthony Liguori
  2011-06-02 18:01           ` Luiz Capitulino
@ 2011-06-06 11:21           ` Markus Armbruster
  1 sibling, 0 replies; 48+ messages in thread
From: Markus Armbruster @ 2011-06-06 11:21 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, Jiri Denemark, qemu-devel, Luiz Capitulino

Anthony Liguori <anthony@codemonkey.ws> writes:

> On 06/02/2011 08:24 AM, Jiri Denemark wrote:
>> On Thu, Jun 02, 2011 at 08:08:35 -0500, Anthony Liguori wrote:
>>> On 06/02/2011 04:06 AM, Daniel P. Berrange wrote:
>>>>>> B. query-stop-reason
>>>>>> --------------------
>>>>>>
>>>>>> I also have a simple solution for item 2. The vm_stop() accepts a reason
>>>>>> argument, so we could store it somewhere and return it as a string, like:
>>>>>>
>>>>>> ->    { "execute": "query-stop-reason" }
>>>>>> <- { "return": { "reason": "user" } }
>>>>>>
>>>>>> Valid reasons could be: "user", "debug", "shutdown", "diskfull" (hey,
>>>>>> this should be "ioerror", no?), "watchdog", "panic", "savevm", "loadvm",
>>>>>> "migrate".
>>>>>>
>>>>>> Also note that we have a STOP event. It should be extended with the
>>>>>> stop reason too, for completeness.
>>>>>
>>>>>
>>>>> Can we just extend query-block?
>>>>
>>>> Primarily we want 'query-stop-reason' to tell us what caused the VM
>>>> CPUs to stop. If that reason was 'ioerror', then 'query-block' could
>>>> be used to find out which particular block device(s) caused the IO
>>>> error to occurr&   get the "reason" that was in the BLOCK_IO_ERROR
>>>> event.
>>>
>>> My concern is that we're over abstracting here.  We're not going to add
>>> additional stop reasons in the future.
>>>
>>> Maybe just add an 'io-error': True to query-state.
>>
>> Sure, adding a new field to query-state response would work as well. And it
>> seems like a good idea to me since one already needs to call query-status to
>> check if CPUs are stopped or not so it makes sense to incorporate the
>> additional information there as well. And if you want to be safe for the
>> future, the new field doesn't have to be boolean 'io-error' but it can be the
>> string 'reason' which Luiz suggested above.
>
>
> String enumerations are a Bad Thing.  It's impossible to figure out
> what strings are valid and it lacks type safety.
>
> Adding more booleans provides better type safety, and when we move to
> QAPI with a queryable schema, provides a way to figure out exactly
> what combinations are supported by QEMU.

Faking enumerations with strings has its drawbacks.  Doesn't mean we
should fake them with a bunch of booleans.  What about having the real
thing instead?

I'm not claiming the proper solution to the problem at hand is an
enumeration, just challenging the idea that we should use booleans
instead of enumerations (string or otherwise).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-06  9:25         ` [Qemu-devel] " Kevin Wolf
@ 2011-06-06 11:27           ` Markus Armbruster
  2011-06-06 13:09             ` Anthony Liguori
  2011-06-06 13:08           ` Anthony Liguori
  1 sibling, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2011-06-06 11:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, jdenemar, qemu-devel, Luiz Capitulino

Kevin Wolf <kwolf@redhat.com> writes:

> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
>> On Thu, 02 Jun 2011 13:00:04 -0500
>> Anthony Liguori <anthony@codemonkey.ws> wrote:
>> 
>>> On 06/02/2011 12:57 PM, Luiz Capitulino wrote:
>>>> On Wed, 01 Jun 2011 16:35:03 -0500
>>>> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>>>>
>>>>> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
>>>>>> Hi there,
>>>>>>
>>>>>> There are people who want to use QMP for thin provisioning. That's, the VM is
>>>>>> started with a small storage and when a no space error is triggered, more space
>>>>>> is allocated and the VM is put to run again.
>>>>>>
>>>>>> QMP has two limitations that prevent people from doing this today:
>>>>>>
>>>>>> 1. The BLOCK_IO_ERROR doesn't contain error information
>>>>>>
>>>>>> 2. Considering we solve item 1, we still have to provide a way for clients
>>>>>>      to query why a VM stopped. This is needed because clients may miss the
>>>>>>      BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
>>>>>>
>>>>>> A proposal to solve both problems follow.
>>>>>>
>>>>>> A. BLOCK_IO_ERROR information
>>>>>> -----------------------------
>>>>>>
>>>>>> We already have discussed this a lot, but didn't reach a consensus. My solution
>>>>>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
>>>>>> for example (see the "reason" key):
>>>>>>
>>>>>> { "event": "BLOCK_IO_ERROR",
>>>>>>      "data": { "device": "ide0-hd1",
>>>>>>                "operation": "write",
>>>>>>                "action": "stop",
>>>>>>                "reason": "enospc", }
>>>>>
>>>>> you can call the reason whatever you want, but don't call it stringfied
>>>>> errno name :-)
>>>>>
>>>>> In fact, just make reason "no space".
>>>>
>>>> You mean, we should do:
>>>>
>>>>    "reason": "no space"
>>>>
>>>> Or that we should make it a boolean, like:
>>>>
>>>>   "no space": true
>>>
>>>
>>> Do we need reason in BLOCK_IO_ERROR if query-block returns this information?
>> 
>> True, no.
>> 
>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>> we should make "reason" a dictionary so that we can group related
>>>> information when we extend the field, for example:
>>>>
>>>>   "reason": { "no space": false, "no permission": true }
>
> Splitting up enums into a number of booleans looks like a bad idea to
> me. It makes things more verbose than they should be, and even worse, it
> implies that more than one field could be true.
>
> If this new schema thing doesn't support proper enums, that's something
> that should be changed.
>
>>>
>>> Why would we ever have "no permission"?
>> 
>> It's an I/O error. I have a report from a developer who was getting
>> the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
>> it turned out to be no permission.
>
> And I want to add that it's a PITA to handle bug report when the only
> message you get from qemu is "something went wrong". Sorry, that's not
> useful at all. I want to see the real error reason (and at least for
> debugging this means, I want to see the errno value/string).

And I want it straight, not wrapped in a pile of pseudo-abstractions
that make me go to the source code to figure out how to unwrap them.

[...]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-06 11:13   ` Markus Armbruster
@ 2011-06-06 11:28     ` Markus Armbruster
  0 siblings, 0 replies; 48+ messages in thread
From: Markus Armbruster @ 2011-06-06 11:28 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Luiz Capitulino

Markus Armbruster <armbru@redhat.com> writes:

> Anthony Liguori <anthony@codemonkey.ws> writes:
[...]
>> you can call the reason whatever you want, but don't call it
>> stringfied errno name :-)
>
> Care to explain why?

Found a bit of explanation downthread.  Sorry for the noise.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-06  9:25         ` [Qemu-devel] " Kevin Wolf
  2011-06-06 11:27           ` Markus Armbruster
@ 2011-06-06 13:08           ` Anthony Liguori
  2011-06-06 15:27             ` Daniel P. Berrange
  2011-06-07 14:46             ` Luiz Capitulino
  1 sibling, 2 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-06 13:08 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster,
	Luiz Capitulino

On 06/06/2011 04:25 AM, Kevin Wolf wrote:
> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>> we should make "reason" a dictionary so that we can group related
>>>> information when we extend the field, for example:
>>>>
>>>>    "reason": { "no space": false, "no permission": true }
>
> Splitting up enums into a number of booleans looks like a bad idea to
> me. It makes things more verbose than they should be, and even worse, it
> implies that more than one field could be true.

I agree.  What I had suggested was to not have a reason at all.

>
> If this new schema thing doesn't support proper enums, that's something
> that should be changed.

It does BTW.

>>>
>>> Why would we ever have "no permission"?
>>
>> It's an I/O error. I have a report from a developer who was getting
>> the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
>> it turned out to be no permission.
>
> And I want to add that it's a PITA to handle bug report when the only
> message you get from qemu is "something went wrong". Sorry, that's not
> useful at all. I want to see the real error reason (and at least for
> debugging this means, I want to see the errno value/string).
>
> Finding out that it was -EACCES in fact cost me (and QA who ran into the
> problem) much more time than it should have. It's simply too much that
> you need to attach gdb to find out what really happened.

You want a log file and you want libvirt to actually let QEMU write to a 
log file.

Since we already have trace points, could we have a white list of trace 
points that are written to a log file by default?

QMP is a stable interface that we have to maintain forever.  We can 
write whatever we want to a log file and change what gets written at will.

Regards,

Anthony Liguori

> Kevin
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-06 11:27           ` Markus Armbruster
@ 2011-06-06 13:09             ` Anthony Liguori
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-06 13:09 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Luiz Capitulino

On 06/06/2011 06:27 AM, Markus Armbruster wrote:
> Kevin Wolf<kwolf@redhat.com>  writes:
>
>> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
>>> On Thu, 02 Jun 2011 13:00:04 -0500
>>> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>>>
>>>> On 06/02/2011 12:57 PM, Luiz Capitulino wrote:
>>>>> On Wed, 01 Jun 2011 16:35:03 -0500
>>>>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
>>>>>
>>>>>> On 06/01/2011 04:12 PM, Luiz Capitulino wrote:
>>>>>>> Hi there,
>>>>>>>
>>>>>>> There are people who want to use QMP for thin provisioning. That's, the VM is
>>>>>>> started with a small storage and when a no space error is triggered, more space
>>>>>>> is allocated and the VM is put to run again.
>>>>>>>
>>>>>>> QMP has two limitations that prevent people from doing this today:
>>>>>>>
>>>>>>> 1. The BLOCK_IO_ERROR doesn't contain error information
>>>>>>>
>>>>>>> 2. Considering we solve item 1, we still have to provide a way for clients
>>>>>>>       to query why a VM stopped. This is needed because clients may miss the
>>>>>>>       BLOCK_IO_ERROR event or may connect to the VM while it's already stopped
>>>>>>>
>>>>>>> A proposal to solve both problems follow.
>>>>>>>
>>>>>>> A. BLOCK_IO_ERROR information
>>>>>>> -----------------------------
>>>>>>>
>>>>>>> We already have discussed this a lot, but didn't reach a consensus. My solution
>>>>>>> is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
>>>>>>> for example (see the "reason" key):
>>>>>>>
>>>>>>> { "event": "BLOCK_IO_ERROR",
>>>>>>>       "data": { "device": "ide0-hd1",
>>>>>>>                 "operation": "write",
>>>>>>>                 "action": "stop",
>>>>>>>                 "reason": "enospc", }
>>>>>>
>>>>>> you can call the reason whatever you want, but don't call it stringfied
>>>>>> errno name :-)
>>>>>>
>>>>>> In fact, just make reason "no space".
>>>>>
>>>>> You mean, we should do:
>>>>>
>>>>>     "reason": "no space"
>>>>>
>>>>> Or that we should make it a boolean, like:
>>>>>
>>>>>    "no space": true
>>>>
>>>>
>>>> Do we need reason in BLOCK_IO_ERROR if query-block returns this information?
>>>
>>> True, no.
>>>
>>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>>> we should make "reason" a dictionary so that we can group related
>>>>> information when we extend the field, for example:
>>>>>
>>>>>    "reason": { "no space": false, "no permission": true }
>>
>> Splitting up enums into a number of booleans looks like a bad idea to
>> me. It makes things more verbose than they should be, and even worse, it
>> implies that more than one field could be true.
>>
>> If this new schema thing doesn't support proper enums, that's something
>> that should be changed.
>>
>>>>
>>>> Why would we ever have "no permission"?
>>>
>>> It's an I/O error. I have a report from a developer who was getting
>>> the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
>>> it turned out to be no permission.
>>
>> And I want to add that it's a PITA to handle bug report when the only
>> message you get from qemu is "something went wrong". Sorry, that's not
>> useful at all. I want to see the real error reason (and at least for
>> debugging this means, I want to see the errno value/string).
>
> And I want it straight, not wrapped in a pile of pseudo-abstractions
> that make me go to the source code to figure out how to unwrap them.

A set of default logged trace points would be perfect, no?

Regards,

Anthony Liguori

>
> [...]
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-06 13:08           ` Anthony Liguori
@ 2011-06-06 15:27             ` Daniel P. Berrange
  2011-06-06 15:30               ` Luiz Capitulino
  2011-06-07 14:46             ` Luiz Capitulino
  1 sibling, 1 reply; 48+ messages in thread
From: Daniel P. Berrange @ 2011-06-06 15:27 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Luiz Capitulino,
	jdenemar, Markus Armbruster

On Mon, Jun 06, 2011 at 08:08:51AM -0500, Anthony Liguori wrote:
> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
> >Am 02.06.2011 20:09, schrieb Luiz Capitulino:
> >>>>I'm ok with either way. But in case you meant the second one, I guess
> >>>>we should make "reason" a dictionary so that we can group related
> >>>>information when we extend the field, for example:
> >>>>
> >>>>   "reason": { "no space": false, "no permission": true }
> >
> >Splitting up enums into a number of booleans looks like a bad idea to
> >me. It makes things more verbose than they should be, and even worse, it
> >implies that more than one field could be true.
> 
> I agree.  What I had suggested was to not have a reason at all.
> 
> >
> >If this new schema thing doesn't support proper enums, that's something
> >that should be changed.
> 
> It does BTW.
> 
> >>>
> >>>Why would we ever have "no permission"?
> >>
> >>It's an I/O error. I have a report from a developer who was getting
> >>the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
> >>it turned out to be no permission.
> >
> >And I want to add that it's a PITA to handle bug report when the only
> >message you get from qemu is "something went wrong". Sorry, that's not
> >useful at all. I want to see the real error reason (and at least for
> >debugging this means, I want to see the errno value/string).
> >
> >Finding out that it was -EACCES in fact cost me (and QA who ran into the
> >problem) much more time than it should have. It's simply too much that
> >you need to attach gdb to find out what really happened.
> 
> You want a log file and you want libvirt to actually let QEMU write
> to a log file.

Anything QEMU writes to stderr or stdout gets recorded in a
per-VM logfile already at /var/log/libvirt/qemu/$GUESTNAME.log

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-06 15:27             ` Daniel P. Berrange
@ 2011-06-06 15:30               ` Luiz Capitulino
  2011-06-08 12:59                 ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-06 15:30 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster, jdenemar

On Mon, 6 Jun 2011 16:27:55 +0100
"Daniel P. Berrange" <berrange@redhat.com> wrote:

> On Mon, Jun 06, 2011 at 08:08:51AM -0500, Anthony Liguori wrote:
> > On 06/06/2011 04:25 AM, Kevin Wolf wrote:
> > >Am 02.06.2011 20:09, schrieb Luiz Capitulino:
> > >>>>I'm ok with either way. But in case you meant the second one, I guess
> > >>>>we should make "reason" a dictionary so that we can group related
> > >>>>information when we extend the field, for example:
> > >>>>
> > >>>>   "reason": { "no space": false, "no permission": true }
> > >
> > >Splitting up enums into a number of booleans looks like a bad idea to
> > >me. It makes things more verbose than they should be, and even worse, it
> > >implies that more than one field could be true.
> > 
> > I agree.  What I had suggested was to not have a reason at all.
> > 
> > >
> > >If this new schema thing doesn't support proper enums, that's something
> > >that should be changed.
> > 
> > It does BTW.
> > 
> > >>>
> > >>>Why would we ever have "no permission"?
> > >>
> > >>It's an I/O error. I have a report from a developer who was getting
> > >>the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
> > >>it turned out to be no permission.
> > >
> > >And I want to add that it's a PITA to handle bug report when the only
> > >message you get from qemu is "something went wrong". Sorry, that's not
> > >useful at all. I want to see the real error reason (and at least for
> > >debugging this means, I want to see the errno value/string).
> > >
> > >Finding out that it was -EACCES in fact cost me (and QA who ran into the
> > >problem) much more time than it should have. It's simply too much that
> > >you need to attach gdb to find out what really happened.
> > 
> > You want a log file and you want libvirt to actually let QEMU write
> > to a log file.
> 
> Anything QEMU writes to stderr or stdout gets recorded in a
> per-VM logfile already at /var/log/libvirt/qemu/$GUESTNAME.log

I can submit a patch to write this info to stderr (in case we don't do that
yet).

> 
> Regards,
> Daniel

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-06 13:08           ` Anthony Liguori
  2011-06-06 15:27             ` Daniel P. Berrange
@ 2011-06-07 14:46             ` Luiz Capitulino
  2011-06-07 15:39               ` Anthony Liguori
  2011-06-07 15:41               ` Markus Armbruster
  1 sibling, 2 replies; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-07 14:46 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On Mon, 06 Jun 2011 08:08:51 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
> > Am 02.06.2011 20:09, schrieb Luiz Capitulino:
> >>>> I'm ok with either way. But in case you meant the second one, I guess
> >>>> we should make "reason" a dictionary so that we can group related
> >>>> information when we extend the field, for example:
> >>>>
> >>>>    "reason": { "no space": false, "no permission": true }
> >
> > Splitting up enums into a number of booleans looks like a bad idea to
> > me. It makes things more verbose than they should be, and even worse, it
> > implies that more than one field could be true.
> 
> I agree.  What I had suggested was to not have a reason at all.

Is it better if we add a new enum to query-block? Like the "io-error" key we
have talked about earlier? Like:

 "io-error": "no space"

We could have "no space", "low level" (that's how the man page defines EIO) and
"unknown".

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-07 14:46             ` Luiz Capitulino
@ 2011-06-07 15:39               ` Anthony Liguori
  2011-06-07 15:54                 ` Luiz Capitulino
  2011-06-07 15:41               ` Markus Armbruster
  1 sibling, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-07 15:39 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On 06/07/2011 09:46 AM, Luiz Capitulino wrote:
> On Mon, 06 Jun 2011 08:08:51 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
>>> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
>>>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>>>> we should make "reason" a dictionary so that we can group related
>>>>>> information when we extend the field, for example:
>>>>>>
>>>>>>     "reason": { "no space": false, "no permission": true }
>>>
>>> Splitting up enums into a number of booleans looks like a bad idea to
>>> me. It makes things more verbose than they should be, and even worse, it
>>> implies that more than one field could be true.
>>
>> I agree.  What I had suggested was to not have a reason at all.
>
> Is it better if we add a new enum to query-block? Like the "io-error" key we
> have talked about earlier? Like:
>
>   "io-error": "no space"

1) enums have to follow some rules.  One obvious rule would be there 
can't be spaces in the enum value.

2) Do we have an defined enum values besides no-space?  If not, let's 
not artificially add an enum.

>
> We could have "no space", "low level" (that's how the man page defines EIO) and
> "unknown".

3) what's the difference between "no space" and "low level"?

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-07 14:46             ` Luiz Capitulino
  2011-06-07 15:39               ` Anthony Liguori
@ 2011-06-07 15:41               ` Markus Armbruster
  2011-06-07 16:31                 ` Anthony Liguori
  1 sibling, 1 reply; 48+ messages in thread
From: Markus Armbruster @ 2011-06-07 15:41 UTC (permalink / raw)
  To: Luiz Capitulino; +Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel

Luiz Capitulino <lcapitulino@redhat.com> writes:

> On Mon, 06 Jun 2011 08:08:51 -0500
> Anthony Liguori <anthony@codemonkey.ws> wrote:
>
>> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
>> > Am 02.06.2011 20:09, schrieb Luiz Capitulino:
>> >>>> I'm ok with either way. But in case you meant the second one, I guess
>> >>>> we should make "reason" a dictionary so that we can group related
>> >>>> information when we extend the field, for example:
>> >>>>
>> >>>>    "reason": { "no space": false, "no permission": true }
>> >
>> > Splitting up enums into a number of booleans looks like a bad idea to
>> > me. It makes things more verbose than they should be, and even worse, it
>> > implies that more than one field could be true.
>> 
>> I agree.  What I had suggested was to not have a reason at all.
>
> Is it better if we add a new enum to query-block? Like the "io-error" key we
> have talked about earlier? Like:
>
>  "io-error": "no space"
>
> We could have "no space", "low level" (that's how the man page defines EIO) and
> "unknown".

As mentioned before, I prefer my errnos straight, not wrapped in a pile
of pseudo-abstractions that make me go to the source code to figure out
how to unwrap them :)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-07 15:39               ` Anthony Liguori
@ 2011-06-07 15:54                 ` Luiz Capitulino
  2011-06-07 16:32                   ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-07 15:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On Tue, 07 Jun 2011 10:39:45 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/07/2011 09:46 AM, Luiz Capitulino wrote:
> > On Mon, 06 Jun 2011 08:08:51 -0500
> > Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
> >>> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
> >>>>>> I'm ok with either way. But in case you meant the second one, I guess
> >>>>>> we should make "reason" a dictionary so that we can group related
> >>>>>> information when we extend the field, for example:
> >>>>>>
> >>>>>>     "reason": { "no space": false, "no permission": true }
> >>>
> >>> Splitting up enums into a number of booleans looks like a bad idea to
> >>> me. It makes things more verbose than they should be, and even worse, it
> >>> implies that more than one field could be true.
> >>
> >> I agree.  What I had suggested was to not have a reason at all.
> >
> > Is it better if we add a new enum to query-block? Like the "io-error" key we
> > have talked about earlier? Like:
> >
> >   "io-error": "no space"
> 
> 1) enums have to follow some rules.  One obvious rule would be there 
> can't be spaces in the enum value.
> 
> 2) Do we have an defined enum values besides no-space?  If not, let's 
> not artificially add an enum.
> 
> >
> > We could have "no space", "low level" (that's how the man page defines EIO) and
> > "unknown".
> 
> 3) what's the difference between "no space" and "low level"?

The latter means the device doesn't have enough space to write more data,
the former is I/O I guess it's more device specific.

What's your suggestion?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-07 15:41               ` Markus Armbruster
@ 2011-06-07 16:31                 ` Anthony Liguori
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-07 16:31 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Luiz Capitulino

On 06/07/2011 10:41 AM, Markus Armbruster wrote:
> Luiz Capitulino<lcapitulino@redhat.com>  writes:
>
>> On Mon, 06 Jun 2011 08:08:51 -0500
>> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>>
>>> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
>>>> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
>>>>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>>>>> we should make "reason" a dictionary so that we can group related
>>>>>>> information when we extend the field, for example:
>>>>>>>
>>>>>>>     "reason": { "no space": false, "no permission": true }
>>>>
>>>> Splitting up enums into a number of booleans looks like a bad idea to
>>>> me. It makes things more verbose than they should be, and even worse, it
>>>> implies that more than one field could be true.
>>>
>>> I agree.  What I had suggested was to not have a reason at all.
>>
>> Is it better if we add a new enum to query-block? Like the "io-error" key we
>> have talked about earlier? Like:
>>
>>   "io-error": "no space"
>>
>> We could have "no space", "low level" (that's how the man page defines EIO) and
>> "unknown".
>
> As mentioned before, I prefer my errnos straight, not wrapped in a pile
> of pseudo-abstractions that make me go to the source code to figure out
> how to unwrap them :)

You want a log file.  You want to spit out open strings that contain 
actual errno numbers.

I'm 100% supportive of that.

But QMP is an interface that we have to maintain forever and passing a 
valid that can be different on different platforms, that depends on the 
semantics of whatever function calls happen to currently be in the path, 
is not a good long term interface.

We need to separate what we want to have as developers trying to debug 
and what we need to provide as a long term supported management interface.

If we separately the two, we'll be much happier in both places.

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-07 15:54                 ` Luiz Capitulino
@ 2011-06-07 16:32                   ` Anthony Liguori
  2011-06-07 17:43                     ` Luiz Capitulino
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-07 16:32 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On 06/07/2011 10:54 AM, Luiz Capitulino wrote:
> On Tue, 07 Jun 2011 10:39:45 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>> On 06/07/2011 09:46 AM, Luiz Capitulino wrote:
>>> On Mon, 06 Jun 2011 08:08:51 -0500
>>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
>>>
>>>> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
>>>>> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
>>>>>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>>>>>> we should make "reason" a dictionary so that we can group related
>>>>>>>> information when we extend the field, for example:
>>>>>>>>
>>>>>>>>      "reason": { "no space": false, "no permission": true }
>>>>>
>>>>> Splitting up enums into a number of booleans looks like a bad idea to
>>>>> me. It makes things more verbose than they should be, and even worse, it
>>>>> implies that more than one field could be true.
>>>>
>>>> I agree.  What I had suggested was to not have a reason at all.
>>>
>>> Is it better if we add a new enum to query-block? Like the "io-error" key we
>>> have talked about earlier? Like:
>>>
>>>    "io-error": "no space"
>>
>> 1) enums have to follow some rules.  One obvious rule would be there
>> can't be spaces in the enum value.
>>
>> 2) Do we have an defined enum values besides no-space?  If not, let's
>> not artificially add an enum.
>>
>>>
>>> We could have "no space", "low level" (that's how the man page defines EIO) and
>>> "unknown".
>>
>> 3) what's the difference between "no space" and "low level"?
>
> The latter means the device doesn't have enough space to write more data,
> the former is I/O I guess it's more device specific.

Sorry, I meant to ask, what's the difference between "low level" and 
"unknown".

Regards,

Anthony Liguori

>
> What's your suggestion?
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-07 16:32                   ` Anthony Liguori
@ 2011-06-07 17:43                     ` Luiz Capitulino
  2011-06-07 18:43                       ` Anthony Liguori
  0 siblings, 1 reply; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-07 17:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On Tue, 07 Jun 2011 11:32:14 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/07/2011 10:54 AM, Luiz Capitulino wrote:
> > On Tue, 07 Jun 2011 10:39:45 -0500
> > Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >> On 06/07/2011 09:46 AM, Luiz Capitulino wrote:
> >>> On Mon, 06 Jun 2011 08:08:51 -0500
> >>> Anthony Liguori<anthony@codemonkey.ws>   wrote:
> >>>
> >>>> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
> >>>>> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
> >>>>>>>> I'm ok with either way. But in case you meant the second one, I guess
> >>>>>>>> we should make "reason" a dictionary so that we can group related
> >>>>>>>> information when we extend the field, for example:
> >>>>>>>>
> >>>>>>>>      "reason": { "no space": false, "no permission": true }
> >>>>>
> >>>>> Splitting up enums into a number of booleans looks like a bad idea to
> >>>>> me. It makes things more verbose than they should be, and even worse, it
> >>>>> implies that more than one field could be true.
> >>>>
> >>>> I agree.  What I had suggested was to not have a reason at all.
> >>>
> >>> Is it better if we add a new enum to query-block? Like the "io-error" key we
> >>> have talked about earlier? Like:
> >>>
> >>>    "io-error": "no space"
> >>
> >> 1) enums have to follow some rules.  One obvious rule would be there
> >> can't be spaces in the enum value.
> >>
> >> 2) Do we have an defined enum values besides no-space?  If not, let's
> >> not artificially add an enum.
> >>
> >>>
> >>> We could have "no space", "low level" (that's how the man page defines EIO) and
> >>> "unknown".
> >>
> >> 3) what's the difference between "no space" and "low level"?
> >
> > The latter means the device doesn't have enough space to write more data,
> > the former is I/O I guess it's more device specific.
> 
> Sorry, I meant to ask, what's the difference between "low level" and 
> "unknown".

"low level" is EIO, "unknown" is everything else (EINVAL, EPIPE, ...).

> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > What's your suggestion?
> >
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-07 17:43                     ` Luiz Capitulino
@ 2011-06-07 18:43                       ` Anthony Liguori
  2011-06-07 18:48                         ` Luiz Capitulino
  0 siblings, 1 reply; 48+ messages in thread
From: Anthony Liguori @ 2011-06-07 18:43 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On 06/07/2011 12:43 PM, Luiz Capitulino wrote:
> On Tue, 07 Jun 2011 11:32:14 -0500
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>> Sorry, I meant to ask, what's the difference between "low level" and
>> "unknown".
>
> "low level" is EIO, "unknown" is everything else (EINVAL, EPIPE, ...).

How can you document this to the user?  What is a management tool 
supposed to do differently if it receives "low level" vs. "unknown".

AFAICT, the tool is going to treat the both the same way and can't make 
a rational decision based on the different.  IMHO, that means it's 
effectively the same thing.

Regards,

Anthony Liguori

>
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>>
>>> What's your suggestion?
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-07 18:43                       ` Anthony Liguori
@ 2011-06-07 18:48                         ` Luiz Capitulino
  0 siblings, 0 replies; 48+ messages in thread
From: Luiz Capitulino @ 2011-06-07 18:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Stefan Hajnoczi, jdenemar, qemu-devel, Markus Armbruster

On Tue, 07 Jun 2011 13:43:27 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 06/07/2011 12:43 PM, Luiz Capitulino wrote:
> > On Tue, 07 Jun 2011 11:32:14 -0500
> > Anthony Liguori<anthony@codemonkey.ws>  wrote:
> >
> >> Sorry, I meant to ask, what's the difference between "low level" and
> >> "unknown".
> >
> > "low level" is EIO, "unknown" is everything else (EINVAL, EPIPE, ...).
> 
> How can you document this to the user?  What is a management tool 
> supposed to do differently if it receives "low level" vs. "unknown".
> 
> AFAICT, the tool is going to treat the both the same way and can't make 
> a rational decision based on the different.  IMHO, that means it's 
> effectively the same thing.

It's fine with me for having "low level" be both then.

> 
> Regards,
> 
> Anthony Liguori
> 
> >
> >>
> >> Regards,
> >>
> >> Anthony Liguori
> >>
> >>>
> >>> What's your suggestion?
> >>>
> >>
> >
> >
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason
  2011-06-06 15:30               ` Luiz Capitulino
@ 2011-06-08 12:59                 ` Anthony Liguori
  0 siblings, 0 replies; 48+ messages in thread
From: Anthony Liguori @ 2011-06-08 12:59 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Markus Armbruster, jdenemar

On 06/06/2011 10:30 AM, Luiz Capitulino wrote:
> On Mon, 6 Jun 2011 16:27:55 +0100
> "Daniel P. Berrange"<berrange@redhat.com>  wrote:
>
>> On Mon, Jun 06, 2011 at 08:08:51AM -0500, Anthony Liguori wrote:
>>> On 06/06/2011 04:25 AM, Kevin Wolf wrote:
>>>> Am 02.06.2011 20:09, schrieb Luiz Capitulino:
>>>>>>> I'm ok with either way. But in case you meant the second one, I guess
>>>>>>> we should make "reason" a dictionary so that we can group related
>>>>>>> information when we extend the field, for example:
>>>>>>>
>>>>>>>    "reason": { "no space": false, "no permission": true }
>>>>
>>>> Splitting up enums into a number of booleans looks like a bad idea to
>>>> me. It makes things more verbose than they should be, and even worse, it
>>>> implies that more than one field could be true.
>>>
>>> I agree.  What I had suggested was to not have a reason at all.
>>>
>>>>
>>>> If this new schema thing doesn't support proper enums, that's something
>>>> that should be changed.
>>>
>>> It does BTW.
>>>
>>>>>>
>>>>>> Why would we ever have "no permission"?
>>>>>
>>>>> It's an I/O error. I have a report from a developer who was getting
>>>>> the BLOCK_IO_ERROR event and had to debug qemu to know the error cause,
>>>>> it turned out to be no permission.
>>>>
>>>> And I want to add that it's a PITA to handle bug report when the only
>>>> message you get from qemu is "something went wrong". Sorry, that's not
>>>> useful at all. I want to see the real error reason (and at least for
>>>> debugging this means, I want to see the errno value/string).
>>>>
>>>> Finding out that it was -EACCES in fact cost me (and QA who ran into the
>>>> problem) much more time than it should have. It's simply too much that
>>>> you need to attach gdb to find out what really happened.
>>>
>>> You want a log file and you want libvirt to actually let QEMU write
>>> to a log file.
>>
>> Anything QEMU writes to stderr or stdout gets recorded in a
>> per-VM logfile already at /var/log/libvirt/qemu/$GUESTNAME.log
>
> I can submit a patch to write this info to stderr (in case we don't do that
> yet).

That's not terribly friendly to users though.

We need something akin to a -log stderr so that normal users can 
redirect to a file.  I think libvirt would only need to add this to 
their invocation and everything would Just Work.

Regards,

Anthony Liguori

>
>>
>> Regards,
>> Daniel
>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2011-06-08 13:01 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-01 21:12 [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason Luiz Capitulino
2011-06-01 21:35 ` Anthony Liguori
2011-06-02  9:06   ` Daniel P. Berrange
2011-06-02 13:08     ` Anthony Liguori
2011-06-02 13:24       ` Jiri Denemark
2011-06-02 14:02         ` Anthony Liguori
2011-06-02 18:01           ` Luiz Capitulino
2011-06-02 18:32             ` Anthony Liguori
2011-06-02 18:57               ` Luiz Capitulino
2011-06-02 19:35                 ` Anthony Liguori
2011-06-03  9:26             ` Daniel P. Berrange
2011-06-03 12:43               ` Anthony Liguori
2011-06-03 12:57                 ` Daniel P. Berrange
2011-06-03 13:26                   ` Anthony Liguori
2011-06-03 13:39                     ` Daniel P. Berrange
2011-06-03 13:44                       ` Luiz Capitulino
2011-06-03 14:01                         ` Anthony Liguori
2011-06-03 13:41                     ` Jan Kiszka
2011-06-03 13:51                       ` Anthony Liguori
2011-06-03 13:59                         ` Jan Kiszka
2011-06-03 14:03                         ` Daniel P. Berrange
2011-06-06 11:21           ` Markus Armbruster
2011-06-02 17:57   ` Luiz Capitulino
2011-06-02 18:00     ` Anthony Liguori
2011-06-02 18:09       ` Luiz Capitulino
2011-06-02 18:33         ` Anthony Liguori
2011-06-02 19:13           ` Luiz Capitulino
2011-06-02 20:03             ` Anthony Liguori
2011-06-02 20:13               ` [Qemu-devel] [libvirt] " Eric Blake
2011-06-02 20:55                 ` Anthony Liguori
2011-06-06  9:25         ` [Qemu-devel] " Kevin Wolf
2011-06-06 11:27           ` Markus Armbruster
2011-06-06 13:09             ` Anthony Liguori
2011-06-06 13:08           ` Anthony Liguori
2011-06-06 15:27             ` Daniel P. Berrange
2011-06-06 15:30               ` Luiz Capitulino
2011-06-08 12:59                 ` Anthony Liguori
2011-06-07 14:46             ` Luiz Capitulino
2011-06-07 15:39               ` Anthony Liguori
2011-06-07 15:54                 ` Luiz Capitulino
2011-06-07 16:32                   ` Anthony Liguori
2011-06-07 17:43                     ` Luiz Capitulino
2011-06-07 18:43                       ` Anthony Liguori
2011-06-07 18:48                         ` Luiz Capitulino
2011-06-07 15:41               ` Markus Armbruster
2011-06-07 16:31                 ` Anthony Liguori
2011-06-06 11:13   ` Markus Armbruster
2011-06-06 11:28     ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.