Re: [PATCH v9 2/3] block/nbd: nbd reconnect - Vladimir Sementsov-Ogievskiy

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: "kwolf@redhat.com" <kwolf@redhat.com>,
	Denis Lunev <den@virtuozzo.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>
Subject: Re: [PATCH v9 2/3] block/nbd: nbd reconnect
Date: Tue, 24 Sep 2019 08:05:43 +0000	[thread overview]
Message-ID: <e65d1cd0-9ed4-698e-16e5-a6e6487d098d@virtuozzo.com> (raw)
In-Reply-To: <b4532b07-71a9-1890-949d-a017ede348ee@redhat.com>

23.09.2019 22:23, Eric Blake wrote:
> On 9/17/19 12:13 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Implement reconnect. To achieve this:
>>
>> 1. add new modes:
>>     connecting-wait: means, that reconnecting is in progress, and there
>>       were small number of reconnect attempts, so all requests are
>>       waiting for the connection.
>>     connecting-nowait: reconnecting is in progress, there were a lot of
>>       attempts of reconnect, all requests will return errors.
>>
>>     two old modes are used too:
>>     connected: normal state
>>     quit: exiting after fatal error or on close
>>
>> Possible transitions are:
>>
>>     * -> quit
>>     connecting-* -> connected
>>     connecting-wait -> connecting-nowait (transition is done after
>>                        reconnect-delay seconds in connecting-wait mode)
>>     connected -> connecting-wait
>>
>> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>>      connection_co, tries to reconnect unlimited times.
>>
>> 3. Retry nbd queries on channel error, if we are in connecting-wait
>>      state.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/nbd.c | 332 ++++++++++++++++++++++++++++++++++++++++++----------
>>   1 file changed, 269 insertions(+), 63 deletions(-)
>>
> 
>> +
>> +static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
>> +{
>> +    Error *local_err = NULL;
>> +
>> +    if (!nbd_client_connecting(s)) {
>> +        return;
>> +    }
>> +
>> +    /* Wait for completion of all in-flight requests */
>> +
>> +    qemu_co_mutex_lock(&s->send_mutex);
>> +
>> +    while (s->in_flight > 0) {
>> +        qemu_co_mutex_unlock(&s->send_mutex);
>> +        nbd_recv_coroutines_wake_all(s);
>> +        s->wait_in_flight = true;
>> +        qemu_coroutine_yield();
>> +        s->wait_in_flight = false;
>> +        qemu_co_mutex_lock(&s->send_mutex);
>> +    }
>> +
>> +    qemu_co_mutex_unlock(&s->send_mutex);
>> +
>> +    if (!nbd_client_connecting(s)) {
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * Now we are sure that nobody is accessing the channel, and no one will
>> +     * try until we set the state to CONNECTED.
>> +     */
>> +
>> +    /* Finalize previous connection if any */
>> +    if (s->ioc) {
>> +        nbd_client_detach_aio_context(s->bs);
>> +        object_unref(OBJECT(s->sioc));
>> +        s->sioc = NULL;
>> +        object_unref(OBJECT(s->ioc));
>> +        s->ioc = NULL;
>> +    }
>> +
>> +    s->connect_status = nbd_client_connect(s->bs, &local_err);
>> +    error_free(s->connect_err);
>> +    s->connect_err = NULL;
>> +    error_propagate(&s->connect_err, local_err);
>> +    local_err = NULL;
>> +
> 
> Looks like a dead assignment to local_err. 

Hmm, agree, it's dead.

> But I see elsewhere you add
> it, because you convert straight-line code into loops where it matters.
> 
>> +    if (s->connect_status < 0) {
>> +        /* failed attempt */
>> +        return;
>> +    }
>> +
>> +    /* successfully connected */
>> +    s->state = NBD_CLIENT_CONNECTED;
>> +    qemu_co_queue_restart_all(&s->free_sema);
>> +}
>> +
>> +static coroutine_fn void nbd_reconnect_loop(BDRVNBDState *s)
> 
> Coroutine functions should generally have '_co_' in their name.  I'd
> prefer nbd_co_reconnect_loop.

OK, agreed.

> 
>> +{
>> +    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
>> +    uint64_t delay_ns = s->reconnect_delay * NANOSECONDS_PER_SECOND;
>> +    uint64_t timeout = 1 * NANOSECONDS_PER_SECOND;
>> +    uint64_t max_timeout = 16 * NANOSECONDS_PER_SECOND;
>> +
>> +    nbd_reconnect_attempt(s);
>> +
>> +    while (nbd_client_connecting(s)) {
>> +        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
>> +            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
>> +        {
>> +            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
>> +            qemu_co_queue_restart_all(&s->free_sema);
>> +        }
>> +
>> +        qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
>> +                                  &s->connection_co_sleep_ns_state);
>> +        if (s->drained) {
>> +            bdrv_dec_in_flight(s->bs);
>> +            s->wait_drained_end = true;
>> +            while (s->drained) {
>> +                /*
>> +                 * We may be entered once from nbd_client_attach_aio_context_bh
>> +                 * and then from nbd_client_co_drain_end. So here is a loop.
>> +                 */
>> +                qemu_coroutine_yield();
>> +            }
>> +            bdrv_inc_in_flight(s->bs);
>> +        }
>> +        if (timeout < max_timeout) {
>> +            timeout *= 2;
>> +        }
> 
> Exponential backup, ok.  If I read the loop correctly, you've hardcoded
> the max_timeout at 16s, which means the overall timeout is about 30s
> when adding in the time of the earlier iterations.  Does that need to be
> tunable?  But for now I can live with it.

I think, we can add an option later if needed.

> 
>> +
>> +        nbd_reconnect_attempt(s);
>> +    }
>>   }
>>   
>>   static coroutine_fn void nbd_connection_entry(void *opaque)
>> @@ -177,16 +330,26 @@ static coroutine_fn void nbd_connection_entry(void *opaque)
>>            * Therefore we keep an additional in_flight reference all the time and
>>            * only drop it temporarily here.
>>            */
>> +
>> +        if (nbd_client_connecting(s)) {
>> +            nbd_reconnect_loop(s);
>> +        }
>> +
>> +        if (s->state != NBD_CLIENT_CONNECTED) {
>> +            continue;
>> +        }
> 
> Is 'continue' right, even if s->state == QUIT?

No matter, as we jump to "while (s->state != NBD_CLIENT_QUIT) {".

> 
>> +
>>           assert(s->reply.handle == 0);
>>           ret = nbd_receive_reply(s->bs, s->ioc, &s->reply, &local_err);
>>   
>>           if (local_err) {
>>               trace_nbd_read_reply_entry_fail(ret, error_get_pretty(local_err));
>>               error_free(local_err);
>> +            local_err = NULL;
> 
> Could be fun in concert with your proposal to get rid of local_err ;)
> But here, we aren't using error_propagate().

And we don't have Error **errp parameter here too.

> 
>>           }
>>           if (ret <= 0) {
>>               nbd_channel_error(s, ret ? ret : -EIO);
>> -            break;
>> +            continue;
>>           }
>>   
>>           /*
> 
> We're getting really close.  If you can answer my questions above, and
> the only thing left is adding _co_ in the function name, I could tweak
> that locally to spare you a v10.  At any rate, I'm tentatively queuing
> this on my NBD tree; I'll probably do a pull request today without it,
> and save it for next week's PR after I've had a week to hammer on it in
> local tests.
> 

Thank you! That's great!

-- 
Best regards,
Vladimir