From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>,
"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
Denis Lunev <den@virtuozzo.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"stefanha@redhat.com" <stefanha@redhat.com>,
"mreitz@redhat.com" <mreitz@redhat.com>
Subject: Re: [PATCH v9 2/3] block/nbd: nbd reconnect
Date: Tue, 24 Sep 2019 08:05:43 +0000 [thread overview]
Message-ID: <e65d1cd0-9ed4-698e-16e5-a6e6487d098d@virtuozzo.com> (raw)
In-Reply-To: <b4532b07-71a9-1890-949d-a017ede348ee@redhat.com>
23.09.2019 22:23, Eric Blake wrote:
> On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Implement reconnect. To achieve this:
>>
>> 1. add new modes:
>> connecting-wait: means, that reconnecting is in progress, and there
>> were small number of reconnect attempts, so all requests are
>> waiting for the connection.
>> connecting-nowait: reconnecting is in progress, there were a lot of
>> attempts of reconnect, all requests will return errors.
>>
>> two old modes are used too:
>> connected: normal state
>> quit: exiting after fatal error or on close
>>
>> Possible transitions are:
>>
>> * -> quit
>> connecting-* -> connected
>> connecting-wait -> connecting-nowait (transition is done after
>> reconnect-delay seconds in connecting-wait mode)
>> connected -> connecting-wait
>>
>> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>> connection_co, tries to reconnect unlimited times.
>>
>> 3. Retry nbd queries on channel error, if we are in connecting-wait
>> state.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>> block/nbd.c | 332 ++++++++++++++++++++++++++++++++++++++++++----------
>> 1 file changed, 269 insertions(+), 63 deletions(-)
>>
>
>> +
>> +static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
>> +{
>> + Error *local_err = NULL;
>> +
>> + if (!nbd_client_connecting(s)) {
>> + return;
>> + }
>> +
>> + /* Wait for completion of all in-flight requests */
>> +
>> + qemu_co_mutex_lock(&s->send_mutex);
>> +
>> + while (s->in_flight > 0) {
>> + qemu_co_mutex_unlock(&s->send_mutex);
>> + nbd_recv_coroutines_wake_all(s);
>> + s->wait_in_flight = true;
>> + qemu_coroutine_yield();
>> + s->wait_in_flight = false;
>> + qemu_co_mutex_lock(&s->send_mutex);
>> + }
>> +
>> + qemu_co_mutex_unlock(&s->send_mutex);
>> +
>> + if (!nbd_client_connecting(s)) {
>> + return;
>> + }
>> +
>> + /*
>> + * Now we are sure that nobody is accessing the channel, and no one will
>> + * try until we set the state to CONNECTED.
>> + */
>> +
>> + /* Finalize previous connection if any */
>> + if (s->ioc) {
>> + nbd_client_detach_aio_context(s->bs);
>> + object_unref(OBJECT(s->sioc));
>> + s->sioc = NULL;
>> + object_unref(OBJECT(s->ioc));
>> + s->ioc = NULL;
>> + }
>> +
>> + s->connect_status = nbd_client_connect(s->bs, &local_err);
>> + error_free(s->connect_err);
>> + s->connect_err = NULL;
>> + error_propagate(&s->connect_err, local_err);
>> + local_err = NULL;
>> +
>
> Looks like a dead assignment to local_err.
Hmm, agree, it's dead.
> But I see elsewhere you add
> it, because you convert straight-line code into loops where it matters.
>
>> + if (s->connect_status < 0) {
>> + /* failed attempt */
>> + return;
>> + }
>> +
>> + /* successfully connected */
>> + s->state = NBD_CLIENT_CONNECTED;
>> + qemu_co_queue_restart_all(&s->free_sema);
>> +}
>> +
>> +static coroutine_fn void nbd_reconnect_loop(BDRVNBDState *s)
>
> Coroutine functions should generally have '_co_' in their name. I'd
> prefer nbd_co_reconnect_loop.
OK, agreed.
>
>> +{
>> + uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> + uint64_t delay_ns = s->reconnect_delay * NANOSECONDS_PER_SECOND;
>> + uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
>> + uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
>> +
>> + nbd_reconnect_attempt(s);
>> +
>> + while (nbd_client_connecting(s)) {
>> + if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
>> + qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
>> + {
>> + s->state = NBD_CLIENT_CONNECTING_NOWAIT;
>> + qemu_co_queue_restart_all(&s->free_sema);
>> + }
>> +
>> + qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
>> + &s->connection_co_sleep_ns_state);
>> + if (s->drained) {
>> + bdrv_dec_in_flight(s->bs);
>> + s->wait_drained_end = true;
>> + while (s->drained) {
>> + /*
>> + * We may be entered once from nbd_client_attach_aio_context_bh
>> + * and then from nbd_client_co_drain_end. So here is a loop.
>> + */
>> + qemu_coroutine_yield();
>> + }
>> + bdrv_inc_in_flight(s->bs);
>> + }
>> + if (timeout < max_timeout) {
>> + timeout *= 2;
>> + }
>
> Exponential backup, ok. If I read the loop correctly, you've hardcoded
> the max_timeout at 16s, which means the overall timeout is about 30s
> when adding in the time of the earlier iterations. Does that need to be
> tunable? But for now I can live with it.
I think, we can add an option later if needed.
>
>> +
>> + nbd_reconnect_attempt(s);
>> + }
>> }
>>
>> static coroutine_fn void nbd_connection_entry(void *opaque)
>> @@ -177,16 +330,26 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>> * Therefore we keep an additional in_flight reference all the time and
>> * only drop it temporarily here.
>> */
>> +
>> + if (nbd_client_connecting(s)) {
>> + nbd_reconnect_loop(s);
>> + }
>> +
>> + if (s->state != NBD_CLIENT_CONNECTED) {
>> + continue;
>> + }
>
> Is 'continue' right, even if s->state == QUIT?
No matter, as we jump to "while (s->state != NBD_CLIENT_QUIT) {".
>
>> +
>> assert(s->reply.handle == 0);
>> ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
>>
>> if (local_err) {
>> trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
>> error_free(local_err);
>> + local_err = NULL;
>
> Could be fun in concert with your proposal to get rid of local_err ;)
> But here, we aren't using error_propagate().
And we don't have Error **errp parameter here too.
>
>> }
>> if (ret <= 0) {
>> nbd_channel_error(s, ret ? ret : -EIO);
>> - break;
>> + continue;
>> }
>>
>> /*
>
> We're getting really close. If you can answer my questions above, and
> the only thing left is adding _co_ in the function name, I could tweak
> that locally to spare you a v10. At any rate, I'm tentatively queuing
> this on my NBD tree; I'll probably do a pull request today without it,
> and save it for next week's PR after I've had a week to hammer on it in
> local tests.
>
Thank you! That's great!
--
Best regards,
Vladimir
next prev parent reply other threads:[~2019-09-24 8:08 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-09-17 17:13 [Qemu-devel] [PATCH v9 0/3] NBD reconnect Vladimir Sementsov-Ogievskiy
2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 1/3] qemu-coroutine-sleep: introduce qemu_co_sleep_wake Vladimir Sementsov-Ogievskiy
2019-09-23 14:53 ` Eric Blake
2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 2/3] block/nbd: nbd reconnect Vladimir Sementsov-Ogievskiy
2019-09-23 19:23 ` Eric Blake
2019-09-24 8:05 ` Vladimir Sementsov-Ogievskiy [this message]
2019-10-04 17:29 ` Vladimir Sementsov-Ogievskiy
2019-09-17 17:13 ` [Qemu-devel] [PATCH v9 3/3] iotests: test " Vladimir Sementsov-Ogievskiy
2019-09-23 19:51 ` Eric Blake
2019-09-24 8:31 ` Vladimir Sementsov-Ogievskiy
2019-10-04 18:05 ` Eric Blake
2019-10-07 10:48 ` Vladimir Sementsov-Ogievskiy
2019-10-07 20:03 ` Eric Blake
2019-09-24 9:27 ` [PATCH v9 4/3] " Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e65d1cd0-9ed4-698e-16e5-a6e6487d098d@virtuozzo.com \
--to=vsementsov@virtuozzo.com \
--cc=den@virtuozzo.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).