Re: [PATCH v9 2/3] block/nbd: nbd reconnect - Vladimir Sementsov-Ogievskiy

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
	Denis Lunev <den@virtuozzo.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>
Subject: Re: [PATCH v9 2/3] block/nbd: nbd reconnect
Date: Fri, 4 Oct 2019 17:29:39 +0000	[thread overview]
Message-ID: <311a03c3-4e16-425a-9b9d-291aa258df44@email.android.com> (raw)
In-Reply-To: <b4532b07-71a9-1890-949d-a017ede348ee@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6595 bytes --]

23 Sep 2019 22:23  Eric Blake <eblake@redhat.com> wrote:

On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
> Implement reconnect. To achieve this:
>
> 1. add new modes:
>    connecting-wait: means, that reconnecting is in progress, and there
>      were small number of reconnect attempts, so all requests are
>      waiting for the connection.
>    connecting-nowait: reconnecting is in progress, there were a lot of
>      attempts of reconnect, all requests will return errors.
>
>    two old modes are used too:
>    connected: normal state
>    quit: exiting after fatal error or on close
>
> Possible transitions are:
>
>    * -> quit
>    connecting-* -> connected
>    connecting-wait -> connecting-nowait (transition is done after
>                       reconnect-delay seconds in connecting-wait mode)
>    connected -> connecting-wait
>
> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>     connection_co, tries to reconnect unlimited times.
>
> 3. Retry nbd queries on channel error, if we are in connecting-wait
>     state.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd.c | 332 ++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 269 insertions(+), 63 deletions(-)
>

> +
> +static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
> +{
> +    Error *local_err = NULL;
> +
> +    if (!nbd_client_connecting(s)) {
> +        return;
> +    }
> +
> +    /* Wait for completion of all in-flight requests */
> +
> +    qemu_co_mutex_lock(&s->send_mutex);
> +
> +    while (s->in_flight > 0) {
> +        qemu_co_mutex_unlock(&s->send_mutex);
> +        nbd_recv_coroutines_wake_all(s);
> +        s->wait_in_flight = true;
> +        qemu_coroutine_yield();
> +        s->wait_in_flight = false;
> +        qemu_co_mutex_lock(&s->send_mutex);
> +    }
> +
> +    qemu_co_mutex_unlock(&s->send_mutex);
> +
> +    if (!nbd_client_connecting(s)) {
> +        return;
> +    }
> +
> +    /*
> +     * Now we are sure that nobody is accessing the channel, and no one will
> +     * try until we set the state to CONNECTED.
> +     */
> +
> +    /* Finalize previous connection if any */
> +    if (s->ioc) {
> +        nbd_client_detach_aio_context(s->bs);
> +        object_unref(OBJECT(s->sioc));
> +        s->sioc = NULL;
> +        object_unref(OBJECT(s->ioc));
> +        s->ioc = NULL;
> +    }
> +
> +    s->connect_status = nbd_client_connect(s->bs, &local_err);
> +    error_free(s->connect_err);
> +    s->connect_err = NULL;
> +    error_propagate(&s->connect_err, local_err);
> +    local_err = NULL;
> +

Looks like a dead assignment to local_err.  But I see elsewhere you add
it, because you convert straight-line code into loops where it matters.

> +    if (s->connect_status < 0) {
> +        /* failed attempt */
> +        return;
> +    }
> +
> +    /* successfully connected */
> +    s->state = NBD_CLIENT_CONNECTED;
> +    qemu_co_queue_restart_all(&s->free_sema);
> +}
> +
> +static coroutine_fn void nbd_reconnect_loop(BDRVNBDState *s)

Coroutine functions should generally have '_co_' in their name.  I'd
prefer nbd_co_reconnect_loop.

> +{
> +    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> +    uint64_t delay_ns = s->reconnect_delay * NANOSECONDS_PER_SECOND;
> +    uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
> +    uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
> +
> +    nbd_reconnect_attempt(s);
> +
> +    while (nbd_client_connecting(s)) {
> +        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
> +            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
> +        {
> +            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> +            qemu_co_queue_restart_all(&s->free_sema);
> +        }
> +
> +        qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
> +                                  &s->connection_co_sleep_ns_state);
> +        if (s->drained) {
> +            bdrv_dec_in_flight(s->bs);
> +            s->wait_drained_end = true;
> +            while (s->drained) {
> +                /*
> +                 * We may be entered once from nbd_client_attach_aio_context_bh
> +                 * and then from nbd_client_co_drain_end. So here is a loop.
> +                 */
> +                qemu_coroutine_yield();
> +            }
> +            bdrv_inc_in_flight(s->bs);
> +        }
> +        if (timeout < max_timeout) {
> +            timeout *= 2;
> +        }

Exponential backup, ok.  If I read the loop correctly, you've hardcoded
the max_timeout at 16s, which means the overall timeout is about 30s
when adding in the time of the earlier iterations.  Does that need to be
tunable?  But for now I can live with it.

> +
> +        nbd_reconnect_attempt(s);
> +    }
>  }
>
>  static coroutine_fn void nbd_connection_entry(void *opaque)
> @@ -177,16 +330,26 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>           * Therefore we keep an additional in_flight reference all the time and
>           * only drop it temporarily here.
>           */
> +
> +        if (nbd_client_connecting(s)) {
> +            nbd_reconnect_loop(s);
> +        }
> +
> +        if (s->state != NBD_CLIENT_CONNECTED) {
> +            continue;
> +        }

Is 'continue' right, even if s->state == QUIT?

> +
>          assert(s->reply.handle == 0);
>          ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
>
>          if (local_err) {
>              trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
>              error_free(local_err);
> +            local_err = NULL;

Could be fun in concert with your proposal to get rid of local_err ;)
But here, we aren't using error_propagate().

>          }
>          if (ret <= 0) {
>              nbd_channel_error(s, ret ? ret : -EIO);
> -            break;
> +            continue;
>          }
>
>          /*

We're getting really close.  If you can answer my questions above, and
the only thing left is adding _co_ in the function name, I could tweak
that locally to spare you a v10.  At any rate, I'm tentatively queuing
this on my NBD tree; I'll probably do a pull request today without it,
and save it for next week's PR after I've had a week to hammer on it in
local tests.

Kindly remind. Not a problem of you remember but didn't have time for it.

Best regards,
Vladimir

[-- Attachment #2: Type: text/html, Size: 13339 bytes --]