Re: [PATCH v6 4/5] block/nbd: drop connection_co - Vladimir Sementsov-Ogievskiy

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, hreitz@redhat.com,
	kwolf@redhat.com, rkagan@virtuozzo.com
Subject: Re: [PATCH v6 4/5] block/nbd: drop connection_co
Date: Mon, 27 Sep 2021 14:57:36 +0300	[thread overview]
Message-ID: <6bdbd25a-666e-be2e-bbe4-da9317ef9f62@virtuozzo.com> (raw)
In-Reply-To: <20210924174439.jphzecreh7usmecj@redhat.com>

24.09.2021 20:44, Eric Blake wrote:
> On Thu, Sep 02, 2021 at 01:38:04PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> OK, that's a big rewrite of the logic.
> 
> And a time-consuming review on my part!
> 
>>
>> Pre-patch we have an always running coroutine - connection_co. It does
>> reply receiving and reconnecting. And it leads to a lot of difficult
>> and unobvious code around drained sections and context switch. We also
>> abuse bs->in_flight counter which is increased for connection_co and
>> temporary decreased in points where we want to allow drained section to
>> begin. One of these place is in another file: in nbd_read_eof() in
>> nbd/client.c.
>>
>> We also cancel reconnect and requests waiting for reconnect on drained
>> begin which is not correct. And this patch fixes that.
>>
>> Let's finally drop this always running coroutine and go another way:
>> do both reconnect and receiving in request coroutines.
>>
>> The detailed list of changes below (in the sequence of diff hunks).
> 
> Well, depending on the git order file in use ;)
> 
>>
>> 1. receiving coroutines are woken directly from nbd_channel_error, when
>>     we change s->state
>>
>> 2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
>>     and in nbd_teardown_connection() all requests should already be
>>     finished (and reconnect is done from request). So
>>     nbd_co_establish_connection_cancel() is called from
>>     nbd_cancel_in_flight() (to cancel the request that is doing
>>     nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
>>     (previously we didn't need it, as reconnect delay only should cancel
>>     active requests not the reconnection itself. But now reconnection
> 
> Missing )
> 
>>     itself is done in the separate thread (we now call
>>     nbd_client_connection_enable_retry() in nbd_open()), and we need to
>>     cancel the requests that waits in nbd_co_establish_connection()
> 
> Singular/plural disagreement. I think the intended meaning is
> 'requests that wait' and not 'request that waits'.
> 
>>     now).
>>
>> 2. We do receive headers in request coroutine. But we also should
> 
> Second point 2; I'll call it 2A below, because it looks related to
> point 8.

Ohh. OK.

> 
>>     dispatch replies for another pending requests. So,
> 
> s/another/other/
> 
>>     nbd_connection_entry() is turned into nbd_receive_replies(), which
>>     does reply dispatching until it receive another request headers, and
> 
> s/until/while/, s/another/other/
> 
>>     returns when it receive the requested header.
> 
> receives
> 
>>
>> 3. All old staff around drained sections and context switch is dropped.
>>     In details:
>>     - we don't need to move connection_co to new aio context, as we
>>       don't have connection_co anymore
>>     - we don't have a fake "request" of connection_co (extra increasing
>>       in_flight), so don't care with it in drain_begin/end
>>     - we don't stop reconnection during drained section anymore. This
>>       means that drain_begin may wait for a long time (up to
>>       reconnect_delay). But that's an improvement and more correct
>>       behavior see below[*]
>>
>> 4. In nbd_teardown_connection() we don't have to wait for
>>     connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
>>     is moved here from removed connection_co.
>>
>> 5. In nbd_co_do_establish_connection() we now should handle
>>     NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
>>     NBD_CLIENT_CONNECTING_NOWAIT, it still should call
>>     nbd_co_establish_connection() (who knows, maybe connection already
>>     established by thread in background). But we shouldn't wait: if
> 
> maybe the connection was already established by another thread in the background

"another thread" sounds vague. I think better: by connection thread.

> 
>>     nbd_co_establish_connection() can't return new channel immediately
>>     the request should fail (we are in NBD_CLIENT_CONNECTING_NOWAIT
>>     state).
>>
>> 6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
>>     other requests in the caller, so here we just assert that fact.
>>     Also delay time is now initialized here: we can easily detect first
>>     attempt and start a timer.
>>
>> 7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
>>     retries are fully handle by thread (nbd/client-connection.c), delay
>>     timer we initialize in nbd_reconnect_attempt(), we don't have to
>>     bother with s->drained and friends. nbd_reconnect_attempt() now
>>     called from nbd_co_send_request().
> 
> A lot going on there, but it's making sense so far.
> 
>>
>> 8. nbd_connection_entry is dropped: reconnect is now handled by
>>     nbd_co_send_request(), receiving reply is now handled by
>>     nbd_receive_replies(): all handled from request coroutines.
>>
>> 9. So, welcome new nbd_receive_replies() called from request coroutine,
>>     that receives reply header instead of nbd_connection_entry().
>>     Like with sending requests, only one coroutine may receive in a
>>     moment. So we introduce receive_mutex, which is locked around
>>     nbd_receive_reply(). It also protects some related fields. Still,
>>     full audit of thread-safety in nbd driver is a separate task.
> 
> It sounds like you're more worried about auditing for which locks are
> held longer than necessary, rather than expecting to find cases where
> we aren't thread-safe because we are lacking locks?  At any rate, as
> this patch is already easier to reason about, I'm okay deferring that
> audit to later patches.

I just try to not interduce new not-threadsafe things. But I'm not sure do we have a good kind of threadsafety in nbd driver code now.

> 
>>     New function waits for a reply with specified handle being received
>>     and works rather simple:
>>
>>     Under mutex:
>>       - if current handle is 0, do receive by hand. If another handle
>>         received - switch to other request coroutine, release mutex and
>>         yield. Otherwise return success
>>       - if current handle == requested handle, we are done
>>       - otherwise, release mutex and yield
>>
>> 10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
>>      needed. Also waiting in free_sema queue we now wait for one of two
>>      conditions:
>>      - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
>>      - connectING, in_flight == 0, so we can call
>>        nbd_reconnect_attempt()
>>      And this logic is protected by s->send_mutex
>>
>>      Also, on failure we don't have to care of removed s->connection_co
>>
>> 11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
>>      s->connection_co we just call new nbd_receive_replies().
>>
>> 12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
>>      which means that handling of the whole reply is finished. Here we
>>      need to wake one of coroutines sleeping in nbd_receive_replies().
>>      If now one sleeps - do nothing. That's another behavior change: we
> 
> s/now one sleeps/none are sleeping/
> 
>>      don't have endless recv() in the idle time. It may be considered as
>>      a drawback. If so, it may be fixed later.
> 
> The case where no coroutine is waiting to receive a reply should only
> happen due to a malicious server (a compliant server will only send
> replies matching an existing pending request).

With new code, when we don't have any in-flight requests - there are no waiting coroutines as well.
And nobody will note if connection silently get lost.

With old code, we always have recv() call called from connection_co, so we should catch disconnects (at least some kinds of disconnects or if keep-alive is set up properly) sooner and start reconnection procedure.

> 
>>
>> 13. nbd_reply_chunk_iter_receive(): don't care about removed
>>      connection_co, just ping in_flight waiters.
>>
>> 14. Don't create connection_co, enable retry in the connection thread
>>      (we don't have own reconnect loop anymore)
>>
>> 15. We need now nbd_co_establish_connection_cancel() call in
>>      nbd_cancel_in_flight(), to cancel the request that doing connection
>>      attempt.
> 
> s/need now/now need to add a/
> s/request that doing/request that is doing a/
> 
>>
>> [*], ok, now we don't cancel reconnect on drain begin. That's correct:
>>      reconnect feature leads to possibility of long-running requests (up
>>      to reconnect delay). Still, drain begin is not a reason to kill
>>      long requests. We should wait for them.
>>
>>      This also means, that we can again reproduce a dead-lock, described
>>      in 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace.
>>      Why we are OK with it:
>>      1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
>>         after reconnect delay. Actually 8c517de24a8a1dc fixed a bug in
>>         NBD logic, that was not described in 8c517de24a8a1dc and led to
>>         forever dead-lock. The problem was that nobody woken free_sema
> 
> s/woken/woke the/
> 
>>         queue, but drain_begin can't finish until there is a request in
>>         free_sema queue. Now we have a reconnect delay timer that works
>>         well.
>>      2. It's not a problem of NBD driver, it's a problem of ide code,
>>         that does drain_begin under global mutex
> 
> I agree that this point means that it is appropriate to address that
> in a separate patch series.  Hopefully someone tackles it before 6.2,
> but even if not, I agree that the worst that happens is that we have a
> command stalled waiting on a long timeout, rather than actual
> deadlock.
> 
>>      3. That doesn't reproduce if chose scsi instead of ide.
> 
> Seems like this sentence fragment fits better as the tail of 2, rather
> than as a separate bullet 3?  I'll reword as:
> 
> ...it's a problem of the ide code, because it does drain_begin under
> the global mutex; the problem doesn't reproduce when using scsi
> instead of ide.

Sounds good

> 
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---

[..]

> 
> Verdict: I found some typos/grammar fixes, but I think I can touch
> those up.  I'm now hammering on the patches in my NBD tree, and
> barring any surprises, this will be in my next pull request Monday at
> the latest.
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

Thanks a lot for reviewing, and time spent to understand and fix my English )

-- 
Best regards,
Vladimir