All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, hreitz@redhat.com,
	kwolf@redhat.com, rkagan@virtuozzo.com
Subject: Re: [PATCH v6 4/5] block/nbd: drop connection_co
Date: Mon, 27 Sep 2021 14:57:36 +0300	[thread overview]
Message-ID: <6bdbd25a-666e-be2e-bbe4-da9317ef9f62@virtuozzo.com> (raw)
In-Reply-To: <20210924174439.jphzecreh7usmecj@redhat.com>

24.09.2021 20:44, Eric Blake wrote:
> On Thu, Sep 02, 2021 at 01:38:04PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> OK, that's a big rewrite of the logic.
> 
> And a time-consuming review on my part!
> 
>>
>> Pre-patch we have an always running coroutine - connection_co. It does
>> reply receiving and reconnecting. And it leads to a lot of difficult
>> and unobvious code around drained sections and context switch. We also
>> abuse bs->in_flight counter which is increased for connection_co and
>> temporary decreased in points where we want to allow drained section to
>> begin. One of these place is in another file: in nbd_read_eof() in
>> nbd/client.c.
>>
>> We also cancel reconnect and requests waiting for reconnect on drained
>> begin which is not correct. And this patch fixes that.
>>
>> Let's finally drop this always running coroutine and go another way:
>> do both reconnect and receiving in request coroutines.
>>
>> The detailed list of changes below (in the sequence of diff hunks).
> 
> Well, depending on the git order file in use ;)
> 
>>
>> 1. receiving coroutines are woken directly from nbd_channel_error, when
>>     we change s->state
>>
>> 2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
>>     and in nbd_teardown_connection() all requests should already be
>>     finished (and reconnect is done from request). So
>>     nbd_co_establish_connection_cancel() is called from
>>     nbd_cancel_in_flight() (to cancel the request that is doing
>>     nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
>>     (previously we didn't need it, as reconnect delay only should cancel
>>     active requests not the reconnection itself. But now reconnection
> 
> Missing )
> 
>>     itself is done in the separate thread (we now call
>>     nbd_client_connection_enable_retry() in nbd_open()), and we need to
>>     cancel the requests that waits in nbd_co_establish_connection()
> 
> Singular/plural disagreement. I think the intended meaning is
> 'requests that wait' and not 'request that waits'.
> 
>>     now).
>>
>> 2. We do receive headers in request coroutine. But we also should
> 
> Second point 2; I'll call it 2A below, because it looks related to
> point 8.

Ohh. OK.

> 
>>     dispatch replies for another pending requests. So,
> 
> s/another/other/
> 
>>     nbd_connection_entry() is turned into nbd_receive_replies(), which
>>     does reply dispatching until it receive another request headers, and
> 
> s/until/while/, s/another/other/
> 
>>     returns when it receive the requested header.
> 
> receives
> 
>>
>> 3. All old staff around drained sections and context switch is dropped.
>>     In details:
>>     - we don't need to move connection_co to new aio context, as we
>>       don't have connection_co anymore
>>     - we don't have a fake "request" of connection_co (extra increasing
>>       in_flight), so don't care with it in drain_begin/end
>>     - we don't stop reconnection during drained section anymore. This
>>       means that drain_begin may wait for a long time (up to
>>       reconnect_delay). But that's an improvement and more correct
>>       behavior see below[*]
>>
>> 4. In nbd_teardown_connection() we don't have to wait for
>>     connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
>>     is moved here from removed connection_co.
>>
>> 5. In nbd_co_do_establish_connection() we now should handle
>>     NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
>>     NBD_CLIENT_CONNECTING_NOWAIT, it still should call
>>     nbd_co_establish_connection() (who knows, maybe connection already
>>     established by thread in background). But we shouldn't wait: if
> 
> maybe the connection was already established by another thread in the background

"another thread" sounds vague. I think better: by connection thread.

> 
>>     nbd_co_establish_connection() can't return new channel immediately
>>     the request should fail (we are in NBD_CLIENT_CONNECTING_NOWAIT
>>     state).
>>
>> 6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
>>     other requests in the caller, so here we just assert that fact.
>>     Also delay time is now initialized here: we can easily detect first
>>     attempt and start a timer.
>>
>> 7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
>>     retries are fully handle by thread (nbd/client-connection.c), delay
>>     timer we initialize in nbd_reconnect_attempt(), we don't have to
>>     bother with s->drained and friends. nbd_reconnect_attempt() now
>>     called from nbd_co_send_request().
> 
> A lot going on there, but it's making sense so far.
> 
>>
>> 8. nbd_connection_entry is dropped: reconnect is now handled by
>>     nbd_co_send_request(), receiving reply is now handled by
>>     nbd_receive_replies(): all handled from request coroutines.
>>
>> 9. So, welcome new nbd_receive_replies() called from request coroutine,
>>     that receives reply header instead of nbd_connection_entry().
>>     Like with sending requests, only one coroutine may receive in a
>>     moment. So we introduce receive_mutex, which is locked around
>>     nbd_receive_reply(). It also protects some related fields. Still,
>>     full audit of thread-safety in nbd driver is a separate task.
> 
> It sounds like you're more worried about auditing for which locks are
> held longer than necessary, rather than expecting to find cases where
> we aren't thread-safe because we are lacking locks?  At any rate, as
> this patch is already easier to reason about, I'm okay deferring that
> audit to later patches.

I just try to not interduce new not-threadsafe things. But I'm not sure do we have a good kind of threadsafety in nbd driver code now.

> 
>>     New function waits for a reply with specified handle being received
>>     and works rather simple:
>>
>>     Under mutex:
>>       - if current handle is 0, do receive by hand. If another handle
>>         received - switch to other request coroutine, release mutex and
>>         yield. Otherwise return success
>>       - if current handle == requested handle, we are done
>>       - otherwise, release mutex and yield
>>
>> 10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
>>      needed. Also waiting in free_sema queue we now wait for one of two
>>      conditions:
>>      - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
>>      - connectING, in_flight == 0, so we can call
>>        nbd_reconnect_attempt()
>>      And this logic is protected by s->send_mutex
>>
>>      Also, on failure we don't have to care of removed s->connection_co
>>
>> 11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
>>      s->connection_co we just call new nbd_receive_replies().
>>
>> 12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
>>      which means that handling of the whole reply is finished. Here we
>>      need to wake one of coroutines sleeping in nbd_receive_replies().
>>      If now one sleeps - do nothing. That's another behavior change: we
> 
> s/now one sleeps/none are sleeping/
> 
>>      don't have endless recv() in the idle time. It may be considered as
>>      a drawback. If so, it may be fixed later.
> 
> The case where no coroutine is waiting to receive a reply should only
> happen due to a malicious server (a compliant server will only send
> replies matching an existing pending request).

With new code, when we don't have any in-flight requests - there are no waiting coroutines as well.
And nobody will note if connection silently get lost.

With old code, we always have recv() call called from connection_co, so we should catch disconnects (at least some kinds of disconnects or if keep-alive is set up properly) sooner and start reconnection procedure.

> 
>>
>> 13. nbd_reply_chunk_iter_receive(): don't care about removed
>>      connection_co, just ping in_flight waiters.
>>
>> 14. Don't create connection_co, enable retry in the connection thread
>>      (we don't have own reconnect loop anymore)
>>
>> 15. We need now nbd_co_establish_connection_cancel() call in
>>      nbd_cancel_in_flight(), to cancel the request that doing connection
>>      attempt.
> 
> s/need now/now need to add a/
> s/request that doing/request that is doing a/
> 
>>
>> [*], ok, now we don't cancel reconnect on drain begin. That's correct:
>>      reconnect feature leads to possibility of long-running requests (up
>>      to reconnect delay). Still, drain begin is not a reason to kill
>>      long requests. We should wait for them.
>>
>>      This also means, that we can again reproduce a dead-lock, described
>>      in 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace.
>>      Why we are OK with it:
>>      1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
>>         after reconnect delay. Actually 8c517de24a8a1dc fixed a bug in
>>         NBD logic, that was not described in 8c517de24a8a1dc and led to
>>         forever dead-lock. The problem was that nobody woken free_sema
> 
> s/woken/woke the/
> 
>>         queue, but drain_begin can't finish until there is a request in
>>         free_sema queue. Now we have a reconnect delay timer that works
>>         well.
>>      2. It's not a problem of NBD driver, it's a problem of ide code,
>>         that does drain_begin under global mutex
> 
> I agree that this point means that it is appropriate to address that
> in a separate patch series.  Hopefully someone tackles it before 6.2,
> but even if not, I agree that the worst that happens is that we have a
> command stalled waiting on a long timeout, rather than actual
> deadlock.
> 
>>      3. That doesn't reproduce if chose scsi instead of ide.
> 
> Seems like this sentence fragment fits better as the tail of 2, rather
> than as a separate bullet 3?  I'll reword as:
> 
> ...it's a problem of the ide code, because it does drain_begin under
> the global mutex; the problem doesn't reproduce when using scsi
> instead of ide.

Sounds good

> 
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---

[..]

> 
> Verdict: I found some typos/grammar fixes, but I think I can touch
> those up.  I'm now hammering on the patches in my NBD tree, and
> barring any surprises, this will be in my next pull request Monday at
> the latest.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

Thanks a lot for reviewing, and time spent to understand and fix my English )

-- 
Best regards,
Vladimir


  reply	other threads:[~2021-09-27 11:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-02 10:38 [PATCH v6 0/5] block/nbd: drop connection_co Vladimir Sementsov-Ogievskiy
2021-09-02 10:38 ` [PATCH v6 1/5] block/nbd: nbd_channel_error() shutdown channel unconditionally Vladimir Sementsov-Ogievskiy
2021-09-03 16:16   ` Eric Blake
2021-09-02 10:38 ` [PATCH v6 2/5] block/nbd: move nbd_recv_coroutines_wake_all() up Vladimir Sementsov-Ogievskiy
2021-09-02 10:38 ` [PATCH v6 3/5] block/nbd: refactor nbd_recv_coroutines_wake_all() Vladimir Sementsov-Ogievskiy
2021-09-03 17:36   ` Eric Blake
2021-09-02 10:38 ` [PATCH v6 4/5] block/nbd: drop connection_co Vladimir Sementsov-Ogievskiy
2021-09-24 17:44   ` Eric Blake
2021-09-27 11:57     ` Vladimir Sementsov-Ogievskiy [this message]
2021-09-02 10:38 ` [PATCH v6 5/5] block/nbd: check that received handle is valid Vladimir Sementsov-Ogievskiy
2021-09-03 17:54   ` Eric Blake
2021-09-22  7:48 ` [PATCH v6 0/5] block/nbd: drop connection_co Vladimir Sementsov-Ogievskiy
2021-09-22 13:12   ` Eric Blake
2021-09-22 13:49     ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6bdbd25a-666e-be2e-bbe4-da9317ef9f62@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rkagan@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.