qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, groug@kaod.org, qemu-ppc@nongnu.org,
	armbru@redhat.com, David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events
Date: Wed, 31 Mar 2021 16:40:16 -0300	[thread overview]
Message-ID: <85172cec-3634-14b6-9e7b-a3527c482abd@gmail.com> (raw)
In-Reply-To: <20210330012831.2ce0514c@redhat.com>



On 3/29/21 8:28 PM, Igor Mammedov wrote:
> On Wed, 24 Mar 2021 16:09:59 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>> On 3/23/21 10:40 PM, David Gibson wrote:
>>> On Tue, Mar 23, 2021 at 02:10:22PM -0300, Daniel Henrique Barboza wrote:
>>>>
>>>>
>>>> On 3/22/21 10:12 PM, David Gibson wrote:
>>>>> On Fri, Mar 12, 2021 at 05:07:36PM -0300, Daniel Henrique Barboza wrote:
>>>>>> Hi,
>>>>>>
>>>>>> This series adds 2 new QAPI events, DEVICE_NOT_DELETED and
>>>>>> DEVICE_UNPLUG_ERROR. They were (and are still being) discussed in [1].
>>>>>>
>>>>>> Patches 1 and 3 are independent of the ppc patches and can be applied
>>>>>> separately. Patches 2 and 4 are based on David's ppc-for-6.0 branch and
>>>>>> are dependent on the QAPI patches.
>>>>>
>>>>> Implementation looks fine, but I think there's a bit more to discuss
>>>>> before we can apply.
>>>>>
>>>>> I think it would make sense to re-order this and put UNPLUG_ERROR
>>>>> first.  Its semantics are clearer, and I think there's a stronger case
>>>>> for it.
>>>>
>>>> Alright
>>>>   
>>>>>
>>>>> I'm a bit less sold on DEVICE_NOT_DELETED, after consideration.  Does
>>>>> it really tell the user/management anything useful beyond what
>>>>> receiving neither a DEVICE_DELETED nor a DEVICE_UNPLUG_ERROR does?
>>>>
>>>>
>>>> It informs that the hotunplug operation exceed the timeout that QEMU
>>>> internals considers adequate, but QEMU can't assert that it was caused
>>>> by an error or an unexpected delay. The end result is that the device
>>>> is not going to be deleted from QMP, so DEVICE_NOT_DELETED.
>>>
>>> Is it, though?  I mean, it is with this implementation for papr:
>>> because we clear the unplug_requested flag, even if the guest later
>>> tries to complete the unplug, it will fail.
>>>
>>> But if I understand what Markus was saying correctly, that might not
>>> be possible for all hotplug systems.  I believe Markus was suggesting
>>> that DEVICE_NOT_DELETED could just mean that we haven't deleted the
>>> device yet, but it could still happen later.
>>>
>>> And in that case, I'm not yet sold on the value of a message that
>>> essentially just means "Ayup, still dunno what's happening, sorry".
>>>    
>>>> Perhaps we should just be straightforward and create a DEVICE_UNPLUG_TIMEOUT
>>>> event.
>>>
>>> Hm... what if we added a "reason" field to UNPLUG_ERROR.  That could
>>> be "guest rejected hotplug", or something more specific, in the rare
>>> case that the guest has a way of signalling something more specific,
>>> or "timeout" - but the later *only* to be sent in cases where on the
>>> timeout we're able to block any later completion of the unplug (as we
>>> can on papr).
> 
> Is canceling unplug on timeout documented somewhere (like some spec)?

Nope.

> 
> If not it might (theoretically) confuse guest when it tries to unplug
> after timeout and leave guest in some unexpected state.

I consider this a remote possibility due to the generous timeout we're
using in the machine, but it is a possibility nevertheless.

> 
>>
>> I believe that's already covered by the existing API:
>>
>>
>> +# @DEVICE_UNPLUG_ERROR:
>> +#
>> +# Emitted when a device hot unplug error occurs.
>> +#
>> +# @device: device name
>> +#
>> +# @msg: Informative message
>>
>> The 'informative message' would be the reason the event occurred. In patch
>> 4/4, for the memory hotunplug refused by the guest, it is being set as:
>>
>>        qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
>>                                     "for device %s", dev->id);
>>        qapi_event_send_device_unplug_error(dev->id, qapi_error);
>>
>>
>>
>> We could use the same DEVICE_UNPLUG_ERROR event in the CPU hotunplug timeout
>> case (currently on patch 2/4) by just changing 'msg', e.g.:
>>
>>
>>        qapi_error = g_strdup_printf("CPU hotunplug timeout for device %s", dev->id);
>>        qapi_event_send_device_unplug_error(dev->id, qapi_error);
>>
> 
> lets make everything support ACPI (just kidding).

I would love to make PAPR more ACPI and less .... PAPR.

> 
> maybe we can reuse already existing ACPI_DEVICE_OST instead of DEVICE_UNPLUG_ERROR
> which sort of does the same thing (and more) but instead of strings uses status codes
> defined by spec.
> 
> Idea similar to DEVICE_UNPLUG_ERROR was considered back then, but instead of QEMU being
> a poor translator of status codes to non machine-readable strings we went with
> exposing well documented status codes to user. This way user can implement
> specific reactions to particular errors just looking at JSON + spec.


This is not a bad idea. Problem is that we don't have all ACPI_DEVICE_OST
fields to fill in because, well, both the timeout and the cancell mechanism
aren't specified there.


Thanks,


DHB

> 
>> Thanks,
>>
>> DHB
>>
>>
>>>
>>> Thoughs, Markus?
>>>    
>>
> 


  parent reply	other threads:[~2021-03-31 19:41 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-12 20:07 [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events Daniel Henrique Barboza
2021-03-12 20:07 ` [PATCH 1/4] qapi/qdev.json: add DEVICE_NOT_DELETED event Daniel Henrique Barboza
2021-03-23 18:00   ` Eric Blake
2021-03-23 18:12     ` Daniel Henrique Barboza
2021-03-12 20:07 ` [PATCH 2/4] spapr_drc.c: send DEVICE_NOT_DELETED event on unplug timeout Daniel Henrique Barboza
2021-03-12 20:07 ` [PATCH 3/4] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event Daniel Henrique Barboza
2021-03-12 20:07 ` [PATCH 4/4] spapr.c: use DEVICE_UNPLUG_ERROR event in spapr_memory_unplug_rollback() Daniel Henrique Barboza
2021-03-23  1:12 ` [PATCH 0/4] DEVICE_NOT_DELETED/DEVICE_UNPLUG_ERROR QAPI events David Gibson
2021-03-23 17:10   ` Daniel Henrique Barboza
2021-03-24  1:40     ` David Gibson
2021-03-24 19:09       ` Daniel Henrique Barboza
2021-03-25  1:32         ` David Gibson
2021-03-29 23:28         ` Igor Mammedov
2021-03-30 23:46           ` David Gibson
2021-03-31  9:49             ` Igor Mammedov
2021-04-01  1:31               ` David Gibson
2021-03-31 19:47             ` Daniel Henrique Barboza
2021-04-01  1:36               ` David Gibson
2021-03-31 19:40           ` Daniel Henrique Barboza [this message]
2021-04-20 17:11 ` Daniel Henrique Barboza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85172cec-3634-14b6-9e7b-a3527c482abd@gmail.com \
    --to=danielhb413@gmail.com \
    --cc=armbru@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=imammedo@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).