All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>
Cc: "armbru@redhat.com" <armbru@redhat.com>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	Denis Lunev <den@virtuozzo.com>
Subject: Re: [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect
Date: Tue, 5 Feb 2019 17:07:17 +0000	[thread overview]
Message-ID: <e78b4a65-e103-7df7-fa77-7d67699b9c91@virtuozzo.com> (raw)
In-Reply-To: <8fff6a6e-1030-64d7-4c96-4704970b898c@redhat.com>

16.01.2019 20:04, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Implement reconnect. To achieve this:
>>
>> 1. add new modes:
>>     connecting-wait: means, that reconnecting is in progress, and there
>>       were small number of reconnect attempts, so all requests are
>>       waiting for the connection.
>>     connecting-nowait: reconnecting is in progress, there were a lot of
>>       attempts of reconnect, all requests will return errors.
>>
>>     two old modes are used too:
>>     connected: normal state
>>     quit: exiting after fatal error or on close
> 
> What makes an error fatal?  Without reconnect, life is simple - if the
> server sends something we can't parse, we permanently turn the device
> into an error condition - because we have no way to get back in sync
> with the server for further commands.  Your patch allows reconnect
> attempts where the connection is down (we failed to send to the server
> or failed to receive the server's reply), but why can we not ALSO
> attempt to reconnect after a parse error?  A reconnect would let us get
> back in sync for attempting further commands.  You're right that the
> current command should probably fail in that case (if the server sent us
> garbage for a specific request, it will probably do so again on a repeat
> of that request; which is different than when we don't even know what
> the server would have sent because of a disconnect).

My thought was that protocol error is fatal, as it shows that server is
buggy, and it seems unsafe to communicate to buggy server..

> 
> Or, put another way, we KNOW we have (corner) cases where a mis-aligned
> image can currently cause the server to return BLOCK_STATUS replies that
> aren't aligned to the advertised minimumm block size.  Attempting to
> read the last sector of an image then causes the client to see the
> misaligned reply and complain, which we are treating as fatal.

Do we have fixes for it?

> But why
> not instead just fail that particular read, but still attempt a
> reconnect, in order to attempt further reads elsewhere in the image that
> do not trip up the server's misaligned reply?
> 

Hmm, for these cases, if we consider this errors not fatal, we don't need
even a reconnect..

If we want to consider protocol errors to be recoverable, we need reconnect only
on wrong magic and may be some kind of inconsistent variable data lenghts..

And it may need addition option, like --strict=false

>>
>> Possible transitions are:
>>
>>     * -> quit
>>     connecting-* -> connected
>>     connecting-wait -> connecting-nowait (transition is done after
>>                        reconnect-delay seconds in connecting-wait mode)
>>     connected -> connecting-wait
>>
>> 2. Implement reconnect in connection_co. So, in connecting-* mode,
>>      connection_co, tries to reconnect unlimited times.
>>
>> 3. Retry nbd queries on channel error, if we are in connecting-wait
>>      state.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/nbd-client.h |   4 +
>>   block/nbd-client.c | 304 +++++++++++++++++++++++++++++++++++++++++++----------
>>   2 files changed, 255 insertions(+), 53 deletions(-)
>>
> 
>> @@ -781,16 +936,21 @@ static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
>>       } else {
>>           assert(request->type != NBD_CMD_WRITE);
>>       }
>> -    ret = nbd_co_send_request(bs, request, write_qiov);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>>   
>> -    ret = nbd_co_receive_return_code(client, request->handle,
>> -                                     &request_ret, &local_err);
>> -    if (local_err) {
>> -        error_report_err(local_err);
>> -    }
>> +    do {
>> +        ret = nbd_co_send_request(bs, request, write_qiov);
>> +        if (ret < 0) {
>> +            continue;
>> +        }
>> +
>> +        ret = nbd_co_receive_return_code(client, request->handle,
>> +                                         &request_ret, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            local_err = NULL;
> 
> Conflicts with the conversion to use trace points.
> 


-- 
Best regards,
Vladimir

  reply	other threads:[~2019-02-05 17:07 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-31 17:30 [Qemu-devel] [PATCH v4 00/10] NBD reconnect Vladimir Sementsov-Ogievskiy
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 01/10] block/nbd-client: split channel errors from export errors Vladimir Sementsov-Ogievskiy
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 02/10] block/nbd: move connection code from block/nbd to block/nbd-client Vladimir Sementsov-Ogievskiy
2019-01-16 15:56   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 03/10] block/nbd-client: split connection from initialization Vladimir Sementsov-Ogievskiy
2019-01-16 15:52   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 04/10] block/nbd-client: fix nbd_reply_chunk_iter_receive Vladimir Sementsov-Ogievskiy
2019-01-16 16:01   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 05/10] block/nbd-client: don't check ioc Vladimir Sementsov-Ogievskiy
2019-01-16 16:05   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 06/10] block/nbd-client: move from quit to state Vladimir Sementsov-Ogievskiy
2019-01-16 16:25   ` Eric Blake
2019-01-16 16:58     ` Daniel P. Berrangé
2019-02-05 16:35       ` Vladimir Sementsov-Ogievskiy
2019-02-06  8:51         ` Vladimir Sementsov-Ogievskiy
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 07/10] block/nbd-client: rename read_reply_co to connection_co Vladimir Sementsov-Ogievskiy
2019-01-16 16:35   ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay Vladimir Sementsov-Ogievskiy
2019-01-04 22:25   ` Eric Blake
2019-02-05 16:48     ` Vladimir Sementsov-Ogievskiy
2019-04-11 15:47     ` Vladimir Sementsov-Ogievskiy
2019-04-11 15:47       ` Vladimir Sementsov-Ogievskiy
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 09/10] block/nbd-client: nbd reconnect Vladimir Sementsov-Ogievskiy
2018-11-02 12:39   ` Vladimir Sementsov-Ogievskiy
2019-01-16 17:04   ` Eric Blake
2019-02-05 17:07     ` Vladimir Sementsov-Ogievskiy [this message]
2019-02-05 17:15       ` Eric Blake
2018-07-31 17:30 ` [Qemu-devel] [PATCH v4 10/10] iotests: test " Vladimir Sementsov-Ogievskiy
2019-01-16 17:11   ` Eric Blake
2019-04-11 16:02     ` Vladimir Sementsov-Ogievskiy
2019-04-11 16:02       ` Vladimir Sementsov-Ogievskiy
     [not found] ` <fc24ba9e-e325-6478-cb22-bc0a256c6e87@virtuozzo.com>
2018-10-09 19:33   ` [Qemu-devel] [Qemu-block] [PATCH v4 00/10] NBD reconnect John Snow
2018-10-09 21:59     ` Vladimir Sementsov-Ogievskiy
2018-12-12 10:33 ` [Qemu-devel] ping " Vladimir Sementsov-Ogievskiy
2018-12-29 12:23 ` [Qemu-devel] ping3 " Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e78b4a65-e103-7df7-fa77-7d67699b9c91@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=armbru@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.