All of lore.kernel.org
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Lukas Straub <lukasstraub2@web.de>
Cc: "Kevin Wolf" <kwolf@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	qemu-block <qemu-block@nongnu.org>,
	"Juan Quintela" <quintela@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Max Reitz" <mreitz@redhat.com>
Subject: Re: [PATCH v9 1/8] Introduce yank feature
Date: Fri, 30 Oct 2020 15:02:09 +0100	[thread overview]
Message-ID: <87lffnstge.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <20201030113212.37ddf1fb@luklap> (Lukas Straub's message of "Fri,  30 Oct 2020 11:32:12 +0100")

Lukas Straub <lukasstraub2@web.de> writes:

> On Thu, 29 Oct 2020 17:36:14 +0100
> Markus Armbruster <armbru@redhat.com> wrote:
>
>> Nothing major, looks almost ready to me.
>> 
>> Lukas Straub <lukasstraub2@web.de> writes:
>> 
>> > The yank feature allows to recover from hanging qemu by "yanking"
>> > at various parts. Other qemu systems can register themselves and
>> > multiple yank functions. Then all yank functions for selected
>> > instances can be called by the 'yank' out-of-band qmp command.
>> > Available instances can be queried by a 'query-yank' oob command.
>> >
>> > Signed-off-by: Lukas Straub <lukasstraub2@web.de>
>> > Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > ---
>> >  include/qemu/yank.h |  95 ++++++++++++++++++++
>> >  qapi/misc.json      | 106 ++++++++++++++++++++++
>> >  util/meson.build    |   1 +
>> >  util/yank.c         | 213 ++++++++++++++++++++++++++++++++++++++++++++  
>> 
>> checkpatch.pl warns:
>> 
>>     WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
>> 
>> Can we find a maintainer for the two new files?
>
> Yes, I'm maintaining this for now, see patch 7.

Thanks!  Would it make sense to add the yank stuff to a new QAPI module
yank.json instead of misc.jaon, so the new MAINTAINERS stanza can cover
it?

[...]
>> > diff --git a/qapi/misc.json b/qapi/misc.json
>> > index 40df513856..3b7de02a4d 100644
>> > --- a/qapi/misc.json
>> > +++ b/qapi/misc.json
[...]
>> > +##
>> > +# @YankInstance:
>> > +#
>> > +# A yank instance can be yanked with the "yank" qmp command to recover from a
>> > +# hanging qemu.  
>> 
>> QEMU
>> 
>> > +#
>> > +# Currently implemented yank instances:
>> > +#  -nbd block device:
>> > +#   Yanking it will shutdown the connection to the nbd server without
>> > +#   attempting to reconnect.
>> > +#  -socket chardev:
>> > +#   Yanking it will shutdown the connected socket.
>> > +#  -migration:
>> > +#   Yanking it will shutdown all migration connections.  
>> 
>> To my surprise, this is recognized as bullet list markup.  But please
>> put a space between the bullet and the text anyway.
>> 
>> Also: "shutdown" is a noun, the verb is spelled "shut down".
>
> Both changed for the next version.
>
>> In my review of v8, I asked how yanking migration is related to command
>> migrate_cancel.  Daniel explained:
>> 
>>     migrate_cancel will do a shutdown() on the primary migration socket only.
>>     In addition it will toggle the migration state.
>> 
>>     Yanking will do a shutdown on all migration sockets (important for
>>     multifd), but won't touch migration state or any other aspect of QEMU
>>     code.
>> 
>>     Overall yanking has less potential for things to go wrong than the
>>     migrate_cancel method, as it doesn't try to do any kind of cleanup
>>     or migration.
>> 
>> Would it make sense to work this into the documentation?
>
> How about this?
>
>   - migration:
>     Yanking it will shut down all migration connections. Unlike
>     @migrate_cancel, it will not notify the migration process,
>     so migration will go into @failed state, instead of @cancelled
>     state.

Works for me.  Advice on when to use it rather than migrate_cancel would
be nice, though.

>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'union': 'YankInstance',
>> > +  'base': { 'type': 'YankInstanceType' },
>> > +  'discriminator': 'type',
>> > +  'data': {
>> > +      'blockdev': 'YankInstanceBlockdev',
>> > +      'chardev': 'YankInstanceChardev' } }
>> > +
>> > +##
>> > +# @yank:
>> > +#
>> > +# Recover from hanging qemu by yanking the specified instances. See  
>> 
>> QEMU
>> 
>> "Try to recover" would be more precise, I think.
>
> Changed for the next version.
>
>> > +# "YankInstance" for more information.
>> > +#
>> > +# Takes a list of @YankInstance as argument.
>> > +#
>> > +# Returns: nothing.
>> > +#
>> > +# Example:
>> > +#
>> > +# -> { "execute": "yank",
>> > +#      "arguments": {
>> > +#          "instances": [
>> > +#               { "type": "block-node",
>> > +#                 "node-name": "nbd0" }
>> > +#          ] } }
>> > +# <- { "return": {} }
>> > +#
>> > +# Since: 5.2
>> > +##
>> > +{ 'command': 'yank',
>> > +  'data': { 'instances': ['YankInstance'] },
>> > +  'allow-oob': true }
>> > +
[...]
>> > diff --git a/util/yank.c b/util/yank.c
>> > new file mode 100644
>> > index 0000000000..0b3a816706
>> > --- /dev/null
>> > +++ b/util/yank.c
[...]
>> > +void qmp_yank(YankInstanceList *instances,
>> > +              Error **errp)
>> > +{
>> > +    YankInstanceList *tail;
>> > +    YankInstanceEntry *entry;
>> > +    YankFuncAndParam *func_entry;
>> > +
>> > +    qemu_mutex_lock(&yank_lock);
>> > +    for (tail = instances; tail; tail = tail->next) {
>> > +        entry = yank_find_entry(tail->value);
>> > +        if (!entry) {
>> > +            error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND, "Instance not found");  
>> 
>> Quote error.h:
>> 
>>  * Note: use of error classes other than ERROR_CLASS_GENERIC_ERROR is
>>  * strongly discouraged.
>> 
>> Any particular reason for ERROR_CLASS_DEVICE_NOT_FOUND?  If not, then
>> error_setg(), please.
>
> There may be a time-to-check to time-to-use race condition here. Suppose
> the management application (via QMP) calls 'blockev-del nbd0', then QEMU
> hangs, so after a timeout it calls 'yank nbd0' while shortly before that
> QEMU unhangs for some reason and removes the blockdev. Then yank returns
> this error.
>
> QMP errors should not be parsed except for the error class, so the the
> management application can only differentiate between this (ignorable)
> error and other (possibly fatal) ones by parsing the error class.

I see.

DeviceNotFound doesn't really fit, but none of the other error classes
is any better.

I think the line

      # Returns: nothing.

in the QAPI schema (quoted above) should be expanded to something like


      # Returns: - Nothing on success
                 - If the YankInstance doesn't exist, DeviceNotFound

>> > +            qemu_mutex_unlock(&yank_lock);
>> > +            return;
>> > +        }
>> > +    }
>> > +    for (tail = instances; tail; tail = tail->next) {
>> > +        entry = yank_find_entry(tail->value);
>> > +        assert(entry);
>> > +        QLIST_FOREACH(func_entry, &entry->yankfns, next) {
>> > +            func_entry->func(func_entry->opaque);
>> > +        }
>> > +    }
>> > +    qemu_mutex_unlock(&yank_lock);
>> > +}
[...]



  reply	other threads:[~2020-10-30 14:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-28 18:45 [PATCH v9 0/8] Introduce 'yank' oob qmp command to recover from hanging qemu Lukas Straub
2020-10-28 18:45 ` [PATCH v9 1/8] Introduce yank feature Lukas Straub
2020-10-29 16:36   ` Markus Armbruster
2020-10-30 10:32     ` Lukas Straub
2020-10-30 14:02       ` Markus Armbruster [this message]
2020-10-30 15:19         ` Lukas Straub
2020-10-28 18:45 ` [PATCH v9 2/8] block/nbd.c: Add " Lukas Straub
2020-10-28 18:45 ` [PATCH v9 3/8] chardev/char-socket.c: " Lukas Straub
2020-10-28 18:45 ` [PATCH v9 4/8] migration: " Lukas Straub
2020-10-28 18:45 ` [PATCH v9 5/8] io/channel-tls.c: make qio_channel_tls_shutdown thread-safe Lukas Straub
2020-10-28 18:45 ` [PATCH v9 6/8] io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown Lukas Straub
2020-10-28 18:45 ` [PATCH v9 7/8] MAINTAINERS: Add myself as maintainer for yank feature Lukas Straub
2020-10-28 18:45 ` [PATCH v9 8/8] tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test Lukas Straub

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lffnstge.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lukasstraub2@web.de \
    --cc=marcandre.lureau@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.