* [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe
@ 2022-04-12 19:41 Paolo Bonzini
2022-04-12 19:41 ` [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard Paolo Bonzini
` (7 more replies)
0 siblings, 8 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:41 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
The main point of this series is patch 6, which removes the dubious and
probably wrong use of atomics in block/nbd.c. This in turn is enabled
mostly by the cleanups in patches 3-5. Together, they introduce a
QemuMutex that synchronizes the NBD client coroutines, the reconnect_delay
timer and nbd_cancel_in_flight() as well.
The fixes happen to remove an incorrect use of qemu_co_queue_restart_all
and qemu_co_enter_next on the s->free_sema CoQueue, which was not guarded
by s->send_mutex.
The rest is bugfixes, simplifying the code a bit, and extra documentation.
Paolo Bonzini (8):
nbd: actually implement reply_possible safeguard
nbd: mark more coroutine_fns
nbd: remove peppering of nbd_client_connected
nbd: keep send_mutex/free_sema handling outside
nbd_co_do_establish_connection
nbd: use a QemuMutex to synchronize reconnection with coroutines
nbd: move s->state under requests_lock
nbd: take receive_mutex when reading requests[].receiving
nbd: document what is protected by the CoMutexes
block/coroutines.h | 4 +-
block/nbd.c | 295 +++++++++++++++++++++++----------------------
2 files changed, 154 insertions(+), 145 deletions(-)
--
2.35.1
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
@ 2022-04-12 19:41 ` Paolo Bonzini
2022-04-12 22:05 ` Eric Blake
2022-04-12 19:41 ` [PATCH for-7.1 2/8] nbd: mark more coroutine_fns Paolo Bonzini
` (6 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:41 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
The .reply_possible field of s->requests is never set to false. This is
not a big problem as it is only a safeguard to detect protocol errors,
but fix it anyway.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/nbd.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/block/nbd.c b/block/nbd.c
index 567872ac53..6a5e410e5f 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -454,15 +454,16 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
nbd_channel_error(s, -EINVAL);
return -EINVAL;
}
- if (s->reply.handle == handle) {
- /* We are done */
- return 0;
- }
ind2 = HANDLE_TO_INDEX(s, s->reply.handle);
if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].reply_possible) {
nbd_channel_error(s, -EINVAL);
return -EINVAL;
}
+ s->requests[ind2].reply_possible = nbd_reply_is_structured(&s->reply);
+ if (s->reply.handle == handle) {
+ /* We are done */
+ return 0;
+ }
nbd_recv_coroutine_wake_one(&s->requests[ind2]);
}
}
--
2.35.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH for-7.1 2/8] nbd: mark more coroutine_fns
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
2022-04-12 19:41 ` [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard Paolo Bonzini
@ 2022-04-12 19:41 ` Paolo Bonzini
2022-04-13 12:25 ` Eric Blake
2022-04-12 19:41 ` [PATCH for-7.1 3/8] nbd: remove peppering of nbd_client_connected Paolo Bonzini
` (5 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:41 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
Several coroutine functions in block/nbd.c are not marked as such. This
patch adds a few more markers; it is not exhaustive, but it focuses
especially on:
- places that wake other coroutines, because aio_co_wake() has very
different semantics inside a coroutine (queuing after yield vs. entering
immediately);
- functions with _co_ in their names, to avoid confusion
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/nbd.c | 64 ++++++++++++++++++++++++++---------------------------
1 file changed, 32 insertions(+), 32 deletions(-)
diff --git a/block/nbd.c b/block/nbd.c
index 6a5e410e5f..81b319318e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -133,7 +133,7 @@ static bool nbd_client_connected(BDRVNBDState *s)
return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED;
}
-static bool nbd_recv_coroutine_wake_one(NBDClientRequest *req)
+static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
{
if (req->receiving) {
req->receiving = false;
@@ -144,7 +144,7 @@ static bool nbd_recv_coroutine_wake_one(NBDClientRequest *req)
return false;
}
-static void nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
+static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
{
int i;
@@ -155,7 +155,7 @@ static void nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
}
}
-static void nbd_channel_error(BDRVNBDState *s, int ret)
+static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
{
if (nbd_client_connected(s)) {
qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
@@ -468,9 +468,9 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
}
}
-static int nbd_co_send_request(BlockDriverState *bs,
- NBDRequest *request,
- QEMUIOVector *qiov)
+static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
+ NBDRequest *request,
+ QEMUIOVector *qiov)
{
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
int rc, i = -1;
@@ -724,9 +724,9 @@ static int nbd_parse_error_payload(NBDStructuredReplyChunk *chunk,
return 0;
}
-static int nbd_co_receive_offset_data_payload(BDRVNBDState *s,
- uint64_t orig_offset,
- QEMUIOVector *qiov, Error **errp)
+static int coroutine_fn nbd_co_receive_offset_data_payload(BDRVNBDState *s,
+ uint64_t orig_offset,
+ QEMUIOVector *qiov, Error **errp)
{
QEMUIOVector sub_qiov;
uint64_t offset;
@@ -1042,8 +1042,8 @@ break_loop:
return false;
}
-static int nbd_co_receive_return_code(BDRVNBDState *s, uint64_t handle,
- int *request_ret, Error **errp)
+static int coroutine_fn nbd_co_receive_return_code(BDRVNBDState *s, uint64_t handle,
+ int *request_ret, Error **errp)
{
NBDReplyChunkIter iter;
@@ -1056,9 +1056,9 @@ static int nbd_co_receive_return_code(BDRVNBDState *s, uint64_t handle,
return iter.ret;
}
-static int nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t handle,
- uint64_t offset, QEMUIOVector *qiov,
- int *request_ret, Error **errp)
+static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t handle,
+ uint64_t offset, QEMUIOVector *qiov,
+ int *request_ret, Error **errp)
{
NBDReplyChunkIter iter;
NBDReply reply;
@@ -1108,10 +1108,10 @@ static int nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t handle,
return iter.ret;
}
-static int nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
- uint64_t handle, uint64_t length,
- NBDExtent *extent,
- int *request_ret, Error **errp)
+static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
+ uint64_t handle, uint64_t length,
+ NBDExtent *extent,
+ int *request_ret, Error **errp)
{
NBDReplyChunkIter iter;
NBDReply reply;
@@ -1168,8 +1168,8 @@ static int nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
return iter.ret;
}
-static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
- QEMUIOVector *write_qiov)
+static int coroutine_fn nbd_co_request(BlockDriverState *bs, NBDRequest *request,
+ QEMUIOVector *write_qiov)
{
int ret, request_ret;
Error *local_err = NULL;
@@ -1205,9 +1205,9 @@ static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
return ret ? ret : request_ret;
}
-static int nbd_client_co_preadv(BlockDriverState *bs, int64_t offset,
- int64_t bytes, QEMUIOVector *qiov,
- BdrvRequestFlags flags)
+static int coroutine_fn nbd_client_co_preadv(BlockDriverState *bs, int64_t offset,
+ int64_t bytes, QEMUIOVector *qiov,
+ BdrvRequestFlags flags)
{
int ret, request_ret;
Error *local_err = NULL;
@@ -1264,9 +1264,9 @@ static int nbd_client_co_preadv(BlockDriverState *bs, int64_t offset,
return ret ? ret : request_ret;
}
-static int nbd_client_co_pwritev(BlockDriverState *bs, int64_t offset,
- int64_t bytes, QEMUIOVector *qiov,
- BdrvRequestFlags flags)
+static int coroutine_fn nbd_client_co_pwritev(BlockDriverState *bs, int64_t offset,
+ int64_t bytes, QEMUIOVector *qiov,
+ BdrvRequestFlags flags)
{
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
NBDRequest request = {
@@ -1289,8 +1289,8 @@ static int nbd_client_co_pwritev(BlockDriverState *bs, int64_t offset,
return nbd_co_request(bs, &request, qiov);
}
-static int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
- int64_t bytes, BdrvRequestFlags flags)
+static int coroutine_fn nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+ int64_t bytes, BdrvRequestFlags flags)
{
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
NBDRequest request = {
@@ -1324,7 +1324,7 @@ static int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
return nbd_co_request(bs, &request, NULL);
}
-static int nbd_client_co_flush(BlockDriverState *bs)
+static int coroutine_fn nbd_client_co_flush(BlockDriverState *bs)
{
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
NBDRequest request = { .type = NBD_CMD_FLUSH };
@@ -1339,8 +1339,8 @@ static int nbd_client_co_flush(BlockDriverState *bs)
return nbd_co_request(bs, &request, NULL);
}
-static int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset,
- int64_t bytes)
+static int coroutine_fn nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset,
+ int64_t bytes)
{
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
NBDRequest request = {
@@ -1916,7 +1916,7 @@ fail:
return ret;
}
-static int nbd_co_flush(BlockDriverState *bs)
+static int coroutine_fn nbd_co_flush(BlockDriverState *bs)
{
return nbd_client_co_flush(bs);
}
--
2.35.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH for-7.1 3/8] nbd: remove peppering of nbd_client_connected
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
2022-04-12 19:41 ` [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard Paolo Bonzini
2022-04-12 19:41 ` [PATCH for-7.1 2/8] nbd: mark more coroutine_fns Paolo Bonzini
@ 2022-04-12 19:41 ` Paolo Bonzini
2022-04-13 12:48 ` Eric Blake
2022-04-12 19:42 ` [PATCH for-7.1 4/8] nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection Paolo Bonzini
` (4 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:41 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
It is unnecessary to check nbd_client_connected() because every time
s->state is moved out of NBD_CLIENT_CONNECTED the socket is shut down
and all coroutines are resumed.
The only case where it was actually needed is when the NBD
server disconnects and there is no reconnect-delay. In that
case, nbd_receive_replies() does not set s->reply.handle and
nbd_co_do_receive_one_chunk() cannot continue. For that one case,
check the return value of nbd_receive_replies().
As to the others:
* nbd_receive_replies() can put the current coroutine to sleep if another
reply is ongoing; then it will be woken by nbd_channel_error() called
by the ongoing reply. Or it can try itself to read a reply header and
fail, thus calling nbd_channel_error() itself.
* nbd_co_send_request() will write the body of the request and fail
* nbd_reply_chunk_iter_receive() will call nbd_co_receive_one_chunk()
and then nbd_co_do_receive_one_chunk(), which will handle the failure as
above; or it will just detect a previous call to nbd_iter_channel_error()
via iter->ret < 0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/nbd.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)
diff --git a/block/nbd.c b/block/nbd.c
index 81b319318e..02db52a230 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -410,10 +410,6 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
return 0;
}
- if (!nbd_client_connected(s)) {
- return -EIO;
- }
-
if (s->reply.handle != 0) {
/*
* Some other request is being handled now. It should already be
@@ -515,7 +511,7 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
if (qiov) {
qio_channel_set_cork(s->ioc, true);
rc = nbd_send_request(s->ioc, request);
- if (nbd_client_connected(s) && rc >= 0) {
+ if (rc >= 0) {
if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
NULL) < 0) {
rc = -EIO;
@@ -832,8 +828,8 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
}
*request_ret = 0;
- nbd_receive_replies(s, handle);
- if (!nbd_client_connected(s)) {
+ ret = nbd_receive_replies(s, handle);
+ if (ret < 0) {
error_setg(errp, "Connection closed");
return -EIO;
}
@@ -985,11 +981,6 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
NBDReply local_reply;
NBDStructuredReplyChunk *chunk;
Error *local_err = NULL;
- if (!nbd_client_connected(s)) {
- error_setg(&local_err, "Connection closed");
- nbd_iter_channel_error(iter, -EIO, &local_err);
- goto break_loop;
- }
if (iter->done) {
/* Previous iteration was last. */
@@ -1010,7 +1001,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
}
/* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
- if (nbd_reply_is_simple(reply) || !nbd_client_connected(s)) {
+ if (nbd_reply_is_simple(reply) || iter->ret < 0) {
goto break_loop;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH for-7.1 4/8] nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
` (2 preceding siblings ...)
2022-04-12 19:41 ` [PATCH for-7.1 3/8] nbd: remove peppering of nbd_client_connected Paolo Bonzini
@ 2022-04-12 19:42 ` Paolo Bonzini
2022-04-13 15:42 ` Eric Blake
2022-04-12 19:42 ` [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines Paolo Bonzini
` (3 subsequent siblings)
7 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:42 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
Elevate s->in_flight early so that other incoming requests will wait
on the CoQueue in nbd_co_send_request; restart them after getting back
from nbd_reconnect_attempt. This could be after the reconnect timer or
nbd_cancel_in_flight have cancelled the attempt, so there is no
need anymore to cancel the requests there.
nbd_co_send_request now handles both stopping and restarting pending
requests after a successful connection, and there is no need to
hold send_mutex in nbd_co_do_establish_connection. The current setup
is confusing because nbd_co_do_establish_connection is called both with
send_mutex taken and without it. Before the patch it uses free_sema which
(at least in theory...) is protected by send_mutex, after the patch it
does not anymore.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/coroutines.h | 4 +--
block/nbd.c | 66 ++++++++++++++++++++++------------------------
2 files changed, 33 insertions(+), 37 deletions(-)
diff --git a/block/coroutines.h b/block/coroutines.h
index b293e943c8..8f6e438ef3 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -59,7 +59,7 @@ int coroutine_fn bdrv_co_writev_vmstate(BlockDriverState *bs,
QEMUIOVector *qiov, int64_t pos);
int coroutine_fn
-nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp);
+nbd_co_do_establish_connection(BlockDriverState *bs, bool blocking, Error **errp);
int coroutine_fn
@@ -109,7 +109,7 @@ bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState **file,
int *depth);
int generated_co_wrapper
-nbd_do_establish_connection(BlockDriverState *bs, Error **errp);
+nbd_do_establish_connection(BlockDriverState *bs, bool blocking, Error **errp);
int generated_co_wrapper
blk_do_preadv(BlockBackend *blk, int64_t offset, int64_t bytes,
diff --git a/block/nbd.c b/block/nbd.c
index 02db52a230..0ff41cb914 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -188,9 +188,6 @@ static void reconnect_delay_timer_cb(void *opaque)
if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
s->state = NBD_CLIENT_CONNECTING_NOWAIT;
nbd_co_establish_connection_cancel(s->conn);
- while (qemu_co_enter_next(&s->free_sema, NULL)) {
- /* Resume all queued requests */
- }
}
reconnect_delay_timer_del(s);
@@ -311,11 +308,10 @@ static int nbd_handle_updated_info(BlockDriverState *bs, Error **errp)
}
int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
- Error **errp)
+ bool blocking, Error **errp)
{
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
int ret;
- bool blocking = nbd_client_connecting_wait(s);
IO_CODE();
assert(!s->ioc);
@@ -351,7 +347,6 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
/* successfully connected */
s->state = NBD_CLIENT_CONNECTED;
- qemu_co_queue_restart_all(&s->free_sema);
return 0;
}
@@ -359,25 +354,25 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
/* called under s->send_mutex */
static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
{
- assert(nbd_client_connecting(s));
- assert(s->in_flight == 0);
-
- if (nbd_client_connecting_wait(s) && s->reconnect_delay &&
- !s->reconnect_delay_timer)
- {
- /*
- * It's first reconnect attempt after switching to
- * NBD_CLIENT_CONNECTING_WAIT
- */
- reconnect_delay_timer_init(s,
- qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
- s->reconnect_delay * NANOSECONDS_PER_SECOND);
- }
+ bool blocking = nbd_client_connecting_wait(s);
/*
* Now we are sure that nobody is accessing the channel, and no one will
* try until we set the state to CONNECTED.
*/
+ assert(nbd_client_connecting(s));
+ assert(s->in_flight == 1);
+
+ if (blocking && !s->reconnect_delay_timer) {
+ /*
+ * It's first reconnect attempt after switching to
+ * NBD_CLIENT_CONNECTING_WAIT
+ */
+ g_assert(s->reconnect_delay);
+ reconnect_delay_timer_init(s,
+ qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
+ s->reconnect_delay * NANOSECONDS_PER_SECOND);
+ }
/* Finalize previous connection if any */
if (s->ioc) {
@@ -388,7 +383,9 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
s->ioc = NULL;
}
- nbd_co_do_establish_connection(s->bs, NULL);
+ qemu_co_mutex_unlock(&s->send_mutex);
+ nbd_co_do_establish_connection(s->bs, blocking, NULL);
+ qemu_co_mutex_lock(&s->send_mutex);
/*
* The reconnect attempt is done (maybe successfully, maybe not), so
@@ -474,21 +471,21 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
qemu_co_mutex_lock(&s->send_mutex);
while (s->in_flight == MAX_NBD_REQUESTS ||
- (!nbd_client_connected(s) && s->in_flight > 0))
- {
+ (!nbd_client_connected(s) && s->in_flight > 0)) {
qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
}
- if (nbd_client_connecting(s)) {
- nbd_reconnect_attempt(s);
- }
-
- if (!nbd_client_connected(s)) {
- rc = -EIO;
- goto err;
- }
-
s->in_flight++;
+ if (!nbd_client_connected(s)) {
+ if (nbd_client_connecting(s)) {
+ nbd_reconnect_attempt(s);
+ qemu_co_queue_restart_all(&s->free_sema);
+ }
+ if (!nbd_client_connected(s)) {
+ rc = -EIO;
+ goto err;
+ }
+ }
for (i = 0; i < MAX_NBD_REQUESTS; i++) {
if (s->requests[i].coroutine == NULL) {
@@ -529,8 +526,8 @@ err:
nbd_channel_error(s, rc);
if (i != -1) {
s->requests[i].coroutine = NULL;
- s->in_flight--;
}
+ s->in_flight--;
qemu_co_queue_next(&s->free_sema);
}
qemu_co_mutex_unlock(&s->send_mutex);
@@ -1885,7 +1882,7 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
}
s->state = NBD_CLIENT_CONNECTING_WAIT;
- ret = nbd_do_establish_connection(bs, errp);
+ ret = nbd_do_establish_connection(bs, true, errp);
if (ret < 0) {
goto fail;
}
@@ -2051,7 +2048,6 @@ static void nbd_cancel_in_flight(BlockDriverState *bs)
if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
s->state = NBD_CLIENT_CONNECTING_NOWAIT;
- qemu_co_queue_restart_all(&s->free_sema);
}
nbd_co_establish_connection_cancel(s->conn);
--
2.35.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
` (3 preceding siblings ...)
2022-04-12 19:42 ` [PATCH for-7.1 4/8] nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection Paolo Bonzini
@ 2022-04-12 19:42 ` Paolo Bonzini
2022-04-13 7:35 ` Paolo Bonzini
2022-04-13 15:55 ` Eric Blake
2022-04-12 19:42 ` [PATCH for-7.1 6/8] nbd: move s->state under requests_lock Paolo Bonzini
` (2 subsequent siblings)
7 siblings, 2 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:42 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
The condition for waiting on the s->free_sema queue depends on
both s->in_flight and s->state. The latter is currently using
atomics, but this is quite dubious and probably wrong.
Because s->state is written in the main thread too, for example by
the reconnect timer callback, it cannot be protected by a CoMutex.
Introduce a separate lock that can be used by nbd_co_send_request();
later on this lock will also be used for s->state. There will not
be any contention on the lock unless there is a reconnect, so this
is not performance sensitive.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/nbd.c | 46 +++++++++++++++++++++++++++-------------------
1 file changed, 27 insertions(+), 19 deletions(-)
diff --git a/block/nbd.c b/block/nbd.c
index 0ff41cb914..c908ea6ae3 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -72,17 +72,22 @@ typedef struct BDRVNBDState {
QIOChannel *ioc; /* The current I/O channel */
NBDExportInfo info;
- CoMutex send_mutex;
+ /*
+ * Protects free_sema, in_flight, requests[].coroutine,
+ * reconnect_delay_timer.
+ */
+ QemuMutex requests_lock;
CoQueue free_sema;
-
- CoMutex receive_mutex;
int in_flight;
+ NBDClientRequest requests[MAX_NBD_REQUESTS];
+ QEMUTimer *reconnect_delay_timer;
+
+ CoMutex send_mutex;
+ CoMutex receive_mutex;
NBDClientState state;
- QEMUTimer *reconnect_delay_timer;
QEMUTimer *open_timer;
- NBDClientRequest requests[MAX_NBD_REQUESTS];
NBDReply reply;
BlockDriverState *bs;
@@ -351,7 +356,7 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
return 0;
}
-/* called under s->send_mutex */
+/* Called with s->requests_lock taken. */
static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
{
bool blocking = nbd_client_connecting_wait(s);
@@ -383,9 +388,9 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
s->ioc = NULL;
}
- qemu_co_mutex_unlock(&s->send_mutex);
+ qemu_mutex_unlock(&s->requests_lock);
nbd_co_do_establish_connection(s->bs, blocking, NULL);
- qemu_co_mutex_lock(&s->send_mutex);
+ qemu_mutex_lock(&s->requests_lock);
/*
* The reconnect attempt is done (maybe successfully, maybe not), so
@@ -468,11 +473,10 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
int rc, i = -1;
- qemu_co_mutex_lock(&s->send_mutex);
-
+ qemu_mutex_lock(&s->requests_lock);
while (s->in_flight == MAX_NBD_REQUESTS ||
(!nbd_client_connected(s) && s->in_flight > 0)) {
- qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
+ qemu_co_queue_wait(&s->free_sema, &s->requests_lock);
}
s->in_flight++;
@@ -493,14 +497,14 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
}
}
- g_assert(qemu_in_coroutine());
assert(i < MAX_NBD_REQUESTS);
-
s->requests[i].coroutine = qemu_coroutine_self();
s->requests[i].offset = request->from;
s->requests[i].receiving = false;
s->requests[i].reply_possible = true;
+ qemu_mutex_unlock(&s->requests_lock);
+ qemu_co_mutex_lock(&s->send_mutex);
request->handle = INDEX_TO_HANDLE(s, i);
assert(s->ioc);
@@ -520,17 +524,19 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
} else {
rc = nbd_send_request(s->ioc, request);
}
+ qemu_co_mutex_unlock(&s->send_mutex);
-err:
if (rc < 0) {
+ qemu_mutex_lock(&s->requests_lock);
+err:
nbd_channel_error(s, rc);
if (i != -1) {
s->requests[i].coroutine = NULL;
}
s->in_flight--;
qemu_co_queue_next(&s->free_sema);
+ qemu_mutex_unlock(&s->requests_lock);
}
- qemu_co_mutex_unlock(&s->send_mutex);
return rc;
}
@@ -1020,12 +1026,11 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
return true;
break_loop:
+ qemu_mutex_lock(&s->requests_lock);
s->requests[HANDLE_TO_INDEX(s, handle)].coroutine = NULL;
-
- qemu_co_mutex_lock(&s->send_mutex);
s->in_flight--;
qemu_co_queue_next(&s->free_sema);
- qemu_co_mutex_unlock(&s->send_mutex);
+ qemu_mutex_unlock(&s->requests_lock);
return false;
}
@@ -1858,8 +1863,9 @@ static int nbd_open(BlockDriverState *bs, QDict *options, int flags,
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
s->bs = bs;
- qemu_co_mutex_init(&s->send_mutex);
+ qemu_mutex_init(&s->requests_lock);
qemu_co_queue_init(&s->free_sema);
+ qemu_co_mutex_init(&s->send_mutex);
qemu_co_mutex_init(&s->receive_mutex);
if (!yank_register_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name), errp)) {
@@ -2046,9 +2052,11 @@ static void nbd_cancel_in_flight(BlockDriverState *bs)
reconnect_delay_timer_del(s);
+ qemu_mutex_lock(&s->requests_lock);
if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
s->state = NBD_CLIENT_CONNECTING_NOWAIT;
}
+ qemu_mutex_unlock(&s->requests_lock);
nbd_co_establish_connection_cancel(s->conn);
}
--
2.35.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH for-7.1 6/8] nbd: move s->state under requests_lock
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
` (4 preceding siblings ...)
2022-04-12 19:42 ` [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines Paolo Bonzini
@ 2022-04-12 19:42 ` Paolo Bonzini
2022-04-13 16:23 ` Eric Blake
2022-04-12 19:42 ` [PATCH for-7.1 7/8] nbd: take receive_mutex when reading requests[].receiving Paolo Bonzini
2022-04-12 19:42 ` [PATCH for-7.1 8/8] nbd: document what is protected by the CoMutexes Paolo Bonzini
7 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:42 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
Remove the confusing, and most likely wrong, atomics. The only function
that used to be somewhat in a hot path was nbd_client_connected(),
but it is not anymore after the previous patches.
The function nbd_client_connecting_wait() was used mostly to check if
a request had to be reissued (outside requests_lock), but also
under requests_lock in nbd_client_connecting_wait(). The two uses have to
be separated; for the former we rename it to nbd_client_will_reconnect()
and make it take s->requests_lock; for the latter the access can simply
be inlined. The new name is clearer, and ensures that a missing
conversion is caught by the compiler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/nbd.c | 88 +++++++++++++++++++++++++++++------------------------
1 file changed, 48 insertions(+), 40 deletions(-)
diff --git a/block/nbd.c b/block/nbd.c
index c908ea6ae3..6d80bd59e2 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,7 +35,6 @@
#include "qemu/option.h"
#include "qemu/cutils.h"
#include "qemu/main-loop.h"
-#include "qemu/atomic.h"
#include "qapi/qapi-visit-sockets.h"
#include "qapi/qmp/qstring.h"
@@ -73,10 +72,11 @@ typedef struct BDRVNBDState {
NBDExportInfo info;
/*
- * Protects free_sema, in_flight, requests[].coroutine,
+ * Protects state, free_sema, in_flight, requests[].coroutine,
* reconnect_delay_timer.
*/
QemuMutex requests_lock;
+ NBDClientState state;
CoQueue free_sema;
int in_flight;
NBDClientRequest requests[MAX_NBD_REQUESTS];
@@ -84,7 +84,6 @@ typedef struct BDRVNBDState {
CoMutex send_mutex;
CoMutex receive_mutex;
- NBDClientState state;
QEMUTimer *open_timer;
@@ -133,11 +132,6 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
s->x_dirty_bitmap = NULL;
}
-static bool nbd_client_connected(BDRVNBDState *s)
-{
- return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED;
-}
-
static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
{
if (req->receiving) {
@@ -160,14 +154,15 @@ static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
}
}
-static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
+/* Called with s->requests_lock held. */
+static void coroutine_fn nbd_channel_error_locked(BDRVNBDState *s, int ret)
{
- if (nbd_client_connected(s)) {
+ if (s->state == NBD_CLIENT_CONNECTED) {
qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
}
if (ret == -EIO) {
- if (nbd_client_connected(s)) {
+ if (s->state == NBD_CLIENT_CONNECTED) {
s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
NBD_CLIENT_CONNECTING_NOWAIT;
}
@@ -178,6 +173,12 @@ static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
nbd_recv_coroutines_wake(s, true);
}
+static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
+{
+ QEMU_LOCK_GUARD(&s->requests_lock);
+ nbd_channel_error_locked(s, ret);
+}
+
static void reconnect_delay_timer_del(BDRVNBDState *s)
{
if (s->reconnect_delay_timer) {
@@ -190,20 +191,18 @@ static void reconnect_delay_timer_cb(void *opaque)
{
BDRVNBDState *s = opaque;
- if (qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT) {
- s->state = NBD_CLIENT_CONNECTING_NOWAIT;
- nbd_co_establish_connection_cancel(s->conn);
- }
-
reconnect_delay_timer_del(s);
+ WITH_QEMU_LOCK_GUARD(&s->requests_lock) {
+ if (s->state != NBD_CLIENT_CONNECTING_WAIT) {
+ return;
+ }
+ s->state = NBD_CLIENT_CONNECTING_NOWAIT;
+ }
+ nbd_co_establish_connection_cancel(s->conn);
}
static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
{
- if (qatomic_load_acquire(&s->state) != NBD_CLIENT_CONNECTING_WAIT) {
- return;
- }
-
assert(!s->reconnect_delay_timer);
s->reconnect_delay_timer = aio_timer_new(bdrv_get_aio_context(s->bs),
QEMU_CLOCK_REALTIME,
@@ -226,7 +225,9 @@ static void nbd_teardown_connection(BlockDriverState *bs)
s->ioc = NULL;
}
- s->state = NBD_CLIENT_QUIT;
+ WITH_QEMU_LOCK_GUARD(&s->requests_lock) {
+ s->state = NBD_CLIENT_QUIT;
+ }
}
static void open_timer_del(BDRVNBDState *s)
@@ -255,16 +256,13 @@ static void open_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
timer_mod(s->open_timer, expire_time_ns);
}
-static bool nbd_client_connecting(BDRVNBDState *s)
+static bool nbd_client_will_reconnect(BDRVNBDState *s)
{
- NBDClientState state = qatomic_load_acquire(&s->state);
- return state == NBD_CLIENT_CONNECTING_WAIT ||
- state == NBD_CLIENT_CONNECTING_NOWAIT;
-}
-
-static bool nbd_client_connecting_wait(BDRVNBDState *s)
-{
- return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
+ /*
+ * Called only after a socket error, so this is not performance sensitive.
+ */
+ QEMU_LOCK_GUARD(&s->requests_lock);
+ return s->state == NBD_CLIENT_CONNECTING_WAIT;
}
/*
@@ -351,15 +349,24 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
qio_channel_attach_aio_context(s->ioc, bdrv_get_aio_context(bs));
/* successfully connected */
- s->state = NBD_CLIENT_CONNECTED;
+ WITH_QEMU_LOCK_GUARD(&s->requests_lock) {
+ s->state = NBD_CLIENT_CONNECTED;
+ }
return 0;
}
+/* Called with s->requests_lock held. */
+static bool nbd_client_connecting(BDRVNBDState *s)
+{
+ return s->state == NBD_CLIENT_CONNECTING_WAIT ||
+ s->state == NBD_CLIENT_CONNECTING_NOWAIT;
+}
+
/* Called with s->requests_lock taken. */
static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
{
- bool blocking = nbd_client_connecting_wait(s);
+ bool blocking = s->state == NBD_CLIENT_CONNECTING_WAIT;
/*
* Now we are sure that nobody is accessing the channel, and no one will
@@ -475,17 +482,17 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
qemu_mutex_lock(&s->requests_lock);
while (s->in_flight == MAX_NBD_REQUESTS ||
- (!nbd_client_connected(s) && s->in_flight > 0)) {
+ (s->state != NBD_CLIENT_CONNECTED && s->in_flight > 0)) {
qemu_co_queue_wait(&s->free_sema, &s->requests_lock);
}
s->in_flight++;
- if (!nbd_client_connected(s)) {
+ if (s->state != NBD_CLIENT_CONNECTED) {
if (nbd_client_connecting(s)) {
nbd_reconnect_attempt(s);
qemu_co_queue_restart_all(&s->free_sema);
}
- if (!nbd_client_connected(s)) {
+ if (s->state != NBD_CLIENT_CONNECTED) {
rc = -EIO;
goto err;
}
@@ -529,7 +536,7 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
if (rc < 0) {
qemu_mutex_lock(&s->requests_lock);
err:
- nbd_channel_error(s, rc);
+ nbd_channel_error_locked(s, rc);
if (i != -1) {
s->requests[i].coroutine = NULL;
}
@@ -1193,7 +1200,7 @@ static int coroutine_fn nbd_co_request(BlockDriverState *bs, NBDRequest *request
error_free(local_err);
local_err = NULL;
}
- } while (ret < 0 && nbd_client_connecting_wait(s));
+ } while (ret < 0 && nbd_client_will_reconnect(s));
return ret ? ret : request_ret;
}
@@ -1252,7 +1259,7 @@ static int coroutine_fn nbd_client_co_preadv(BlockDriverState *bs, int64_t offse
error_free(local_err);
local_err = NULL;
}
- } while (ret < 0 && nbd_client_connecting_wait(s));
+ } while (ret < 0 && nbd_client_will_reconnect(s));
return ret ? ret : request_ret;
}
@@ -1410,7 +1417,7 @@ static int coroutine_fn nbd_client_co_block_status(
error_free(local_err);
local_err = NULL;
}
- } while (ret < 0 && nbd_client_connecting_wait(s));
+ } while (ret < 0 && nbd_client_will_reconnect(s));
if (ret < 0 || request_ret < 0) {
return ret ? ret : request_ret;
@@ -1442,8 +1449,9 @@ static void nbd_yank(void *opaque)
BlockDriverState *bs = opaque;
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
- qatomic_store_release(&s->state, NBD_CLIENT_QUIT);
+ QEMU_LOCK_GUARD(&s->requests_lock);
qio_channel_shutdown(QIO_CHANNEL(s->ioc), QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
+ s->state = NBD_CLIENT_QUIT;
}
static void nbd_client_close(BlockDriverState *bs)
--
2.35.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH for-7.1 7/8] nbd: take receive_mutex when reading requests[].receiving
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
` (5 preceding siblings ...)
2022-04-12 19:42 ` [PATCH for-7.1 6/8] nbd: move s->state under requests_lock Paolo Bonzini
@ 2022-04-12 19:42 ` Paolo Bonzini
2022-04-13 21:20 ` Eric Blake
2022-04-12 19:42 ` [PATCH for-7.1 8/8] nbd: document what is protected by the CoMutexes Paolo Bonzini
7 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:42 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
requests[].receiving is set by nbd_receive_replies() under the receive_mutex;
Read it under the same mutex as well. Waking up receivers on errors happens
after each reply finishes processing, in nbd_co_receive_one_chunk().
If there is no currently-active reply, there are two cases:
* either there is no active request at all, in which case no
element of request[] can have .receiving = true
* or nbd_receive_replies() must be running and waiting for receive_mutex;
in that case it will get back to nbd_co_receive_one_chunk() because
the socket has been shutdown, and all waiting coroutines will wake up
in turn.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/nbd.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/block/nbd.c b/block/nbd.c
index 6d80bd59e2..8954243f50 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -132,6 +132,7 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
s->x_dirty_bitmap = NULL;
}
+/* Called with s->receive_mutex taken. */
static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
{
if (req->receiving) {
@@ -143,12 +144,13 @@ static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
return false;
}
-static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
+static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s)
{
int i;
+ QEMU_LOCK_GUARD(&s->receive_mutex);
for (i = 0; i < MAX_NBD_REQUESTS; i++) {
- if (nbd_recv_coroutine_wake_one(&s->requests[i]) && !all) {
+ if (nbd_recv_coroutine_wake_one(&s->requests[i])) {
return;
}
}
@@ -169,8 +171,6 @@ static void coroutine_fn nbd_channel_error_locked(BDRVNBDState *s, int ret)
} else {
s->state = NBD_CLIENT_QUIT;
}
-
- nbd_recv_coroutines_wake(s, true);
}
static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
@@ -433,11 +433,10 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
qemu_coroutine_yield();
/*
- * We may be woken for 3 reasons:
+ * We may be woken for 2 reasons:
* 1. From this function, executing in parallel coroutine, when our
* handle is received.
- * 2. From nbd_channel_error(), when connection is lost.
- * 3. From nbd_co_receive_one_chunk(), when previous request is
+ * 2. From nbd_co_receive_one_chunk(), when previous request is
* finished and s->reply.handle set to 0.
* Anyway, it's OK to lock the mutex and go to the next iteration.
*/
@@ -931,7 +930,7 @@ static coroutine_fn int nbd_co_receive_one_chunk(
}
s->reply.handle = 0;
- nbd_recv_coroutines_wake(s, false);
+ nbd_recv_coroutines_wake(s);
return ret;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH for-7.1 8/8] nbd: document what is protected by the CoMutexes
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
` (6 preceding siblings ...)
2022-04-12 19:42 ` [PATCH for-7.1 7/8] nbd: take receive_mutex when reading requests[].receiving Paolo Bonzini
@ 2022-04-12 19:42 ` Paolo Bonzini
2022-04-13 21:22 ` Eric Blake
7 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-12 19:42 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/nbd.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/block/nbd.c b/block/nbd.c
index 8954243f50..8297da7e89 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -82,12 +82,18 @@ typedef struct BDRVNBDState {
NBDClientRequest requests[MAX_NBD_REQUESTS];
QEMUTimer *reconnect_delay_timer;
+ /* Protects sending data on the socket. */
CoMutex send_mutex;
+
+ /*
+ * Protects receiving reply headers from the socket, as well as the
+ * fields reply, requests[].receiving and requests[].reply_possible
+ */
CoMutex receive_mutex;
+ NBDReply reply;
QEMUTimer *open_timer;
- NBDReply reply;
BlockDriverState *bs;
/* Connection parameters */
--
2.35.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard
2022-04-12 19:41 ` [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard Paolo Bonzini
@ 2022-04-12 22:05 ` Eric Blake
2022-04-13 7:30 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Eric Blake @ 2022-04-12 22:05 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Tue, Apr 12, 2022 at 09:41:57PM +0200, Paolo Bonzini wrote:
> The .reply_possible field of s->requests is never set to false. This is
> not a big problem as it is only a safeguard to detect protocol errors,
> but fix it anyway.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/nbd.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/block/nbd.c b/block/nbd.c
> index 567872ac53..6a5e410e5f 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -454,15 +454,16 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
> nbd_channel_error(s, -EINVAL);
> return -EINVAL;
> }
> - if (s->reply.handle == handle) {
> - /* We are done */
> - return 0;
> - }
> ind2 = HANDLE_TO_INDEX(s, s->reply.handle);
> if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].reply_possible) {
> nbd_channel_error(s, -EINVAL);
> return -EINVAL;
> }
> + s->requests[ind2].reply_possible = nbd_reply_is_structured(&s->reply);
If the reply is simple (not structured), then we expect no further
replies, so this sets things to false. But if the reply is
structured, the answer depends on NBD_REPLY_FLAG_DONE, as in:
s->requests[ind2].reply_possible =
nbd_reply_is_structured(&s->reply) &&
(s->reply.structured.flags & NBD_REPLY_FLAG_DONE);
> + if (s->reply.handle == handle) {
> + /* We are done */
> + return 0;
> + }
> nbd_recv_coroutine_wake_one(&s->requests[ind2]);
> }
> }
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard
2022-04-12 22:05 ` Eric Blake
@ 2022-04-13 7:30 ` Paolo Bonzini
0 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-13 7:30 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel
On 4/13/22 00:05, Eric Blake wrote:
>> + s->requests[ind2].reply_possible = nbd_reply_is_structured(&s->reply);
> If the reply is simple (not structured), then we expect no further
> replies, so this sets things to false. But if the reply is
> structured, the answer depends on NBD_REPLY_FLAG_DONE, as in:
>
> s->requests[ind2].reply_possible =
> nbd_reply_is_structured(&s->reply) &&
> (s->reply.structured.flags & NBD_REPLY_FLAG_DONE);
>
Thinking more about it the field is unnecessary, we can just check
.coroutine in the same fashion.
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines
2022-04-12 19:42 ` [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines Paolo Bonzini
@ 2022-04-13 7:35 ` Paolo Bonzini
2022-04-13 15:55 ` Eric Blake
1 sibling, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-13 7:35 UTC (permalink / raw)
To: qemu-devel; +Cc: eblake
On 4/12/22 21:42, Paolo Bonzini wrote:
> Because s->state is written in the main thread too, for example by
> the reconnect timer callback, it cannot be protected by a CoMutex.
Actually by the yank callback, not the timer (which runs in the "right"
AioContext).
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 2/8] nbd: mark more coroutine_fns
2022-04-12 19:41 ` [PATCH for-7.1 2/8] nbd: mark more coroutine_fns Paolo Bonzini
@ 2022-04-13 12:25 ` Eric Blake
2022-04-13 20:12 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Eric Blake @ 2022-04-13 12:25 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Tue, Apr 12, 2022 at 09:41:58PM +0200, Paolo Bonzini wrote:
> Several coroutine functions in block/nbd.c are not marked as such. This
> patch adds a few more markers; it is not exhaustive, but it focuses
> especially on:
>
> - places that wake other coroutines, because aio_co_wake() has very
> different semantics inside a coroutine (queuing after yield vs. entering
> immediately);
>
> - functions with _co_ in their names, to avoid confusion
Should we add _co_ in the names of the three other functions thus
marked? As in:
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/nbd.c | 64 ++++++++++++++++++++++++++---------------------------
> 1 file changed, 32 insertions(+), 32 deletions(-)
> @@ -133,7 +133,7 @@ static bool nbd_client_connected(BDRVNBDState *s)
> return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTED;
> }
>
> -static bool nbd_recv_coroutine_wake_one(NBDClientRequest *req)
> +static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
This already has _coroutine_ in the name, would it be better as _co_?
> {
> if (req->receiving) {
> req->receiving = false;
> @@ -144,7 +144,7 @@ static bool nbd_recv_coroutine_wake_one(NBDClientRequest *req)
> return false;
> }
>
> -static void nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
> +static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
This already has _coroutines_ in the name, would it be better as _co_?
> {
> int i;
>
> @@ -155,7 +155,7 @@ static void nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
> }
> }
>
> -static void nbd_channel_error(BDRVNBDState *s, int ret)
> +static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
This has no mention of coroutines, but does call
nbd_recv_coroutines_wake. Should we add _co_ in the name?
But as written, your change makes sense to me for adding the label to
all functions in this patch.
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 3/8] nbd: remove peppering of nbd_client_connected
2022-04-12 19:41 ` [PATCH for-7.1 3/8] nbd: remove peppering of nbd_client_connected Paolo Bonzini
@ 2022-04-13 12:48 ` Eric Blake
0 siblings, 0 replies; 22+ messages in thread
From: Eric Blake @ 2022-04-13 12:48 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Tue, Apr 12, 2022 at 09:41:59PM +0200, Paolo Bonzini wrote:
> It is unnecessary to check nbd_client_connected() because every time
> s->state is moved out of NBD_CLIENT_CONNECTED the socket is shut down
> and all coroutines are resumed.
>
> The only case where it was actually needed is when the NBD
> server disconnects and there is no reconnect-delay. In that
> case, nbd_receive_replies() does not set s->reply.handle and
> nbd_co_do_receive_one_chunk() cannot continue. For that one case,
> check the return value of nbd_receive_replies().
>
> As to the others:
>
> * nbd_receive_replies() can put the current coroutine to sleep if another
> reply is ongoing; then it will be woken by nbd_channel_error() called
> by the ongoing reply. Or it can try itself to read a reply header and
> fail, thus calling nbd_channel_error() itself.
>
> * nbd_co_send_request() will write the body of the request and fail
>
> * nbd_reply_chunk_iter_receive() will call nbd_co_receive_one_chunk()
> and then nbd_co_do_receive_one_chunk(), which will handle the failure as
> above; or it will just detect a previous call to nbd_iter_channel_error()
> via iter->ret < 0.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/nbd.c | 17 ++++-------------
> 1 file changed, 4 insertions(+), 13 deletions(-)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 4/8] nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection
2022-04-12 19:42 ` [PATCH for-7.1 4/8] nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection Paolo Bonzini
@ 2022-04-13 15:42 ` Eric Blake
0 siblings, 0 replies; 22+ messages in thread
From: Eric Blake @ 2022-04-13 15:42 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Tue, Apr 12, 2022 at 09:42:00PM +0200, Paolo Bonzini wrote:
> Elevate s->in_flight early so that other incoming requests will wait
> on the CoQueue in nbd_co_send_request; restart them after getting back
> from nbd_reconnect_attempt. This could be after the reconnect timer or
> nbd_cancel_in_flight have cancelled the attempt, so there is no
> need anymore to cancel the requests there.
>
> nbd_co_send_request now handles both stopping and restarting pending
> requests after a successful connection, and there is no need to
> hold send_mutex in nbd_co_do_establish_connection. The current setup
> is confusing because nbd_co_do_establish_connection is called both with
> send_mutex taken and without it. Before the patch it uses free_sema which
> (at least in theory...) is protected by send_mutex, after the patch it
> does not anymore.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/coroutines.h | 4 +--
> block/nbd.c | 66 ++++++++++++++++++++++------------------------
> 2 files changed, 33 insertions(+), 37 deletions(-)
>
> +++ b/block/nbd.c
> @@ -359,25 +354,25 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
> /* called under s->send_mutex */
> static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
> {
> - assert(nbd_client_connecting(s));
> - assert(s->in_flight == 0);
> -
> - if (nbd_client_connecting_wait(s) && s->reconnect_delay &&
> - !s->reconnect_delay_timer)
> - {
> - /*
> - * It's first reconnect attempt after switching to
> - * NBD_CLIENT_CONNECTING_WAIT
> - */
> - reconnect_delay_timer_init(s,
> - qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
> - s->reconnect_delay * NANOSECONDS_PER_SECOND);
> - }
> + bool blocking = nbd_client_connecting_wait(s);
>
> /*
> * Now we are sure that nobody is accessing the channel, and no one will
> * try until we set the state to CONNECTED.
> */
> + assert(nbd_client_connecting(s));
> + assert(s->in_flight == 1);
> +
> + if (blocking && !s->reconnect_delay_timer) {
> + /*
> + * It's first reconnect attempt after switching to
While moving this, we could add the missing article: "It's the first"
> + * NBD_CLIENT_CONNECTING_WAIT
> + */
> + g_assert(s->reconnect_delay);
> + reconnect_delay_timer_init(s,
> + qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
> + s->reconnect_delay * NANOSECONDS_PER_SECOND);
> + }
>
> /* Finalize previous connection if any */
> if (s->ioc) {
> @@ -388,7 +383,9 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
> s->ioc = NULL;
> }
>
> - nbd_co_do_establish_connection(s->bs, NULL);
> + qemu_co_mutex_unlock(&s->send_mutex);
> + nbd_co_do_establish_connection(s->bs, blocking, NULL);
> + qemu_co_mutex_lock(&s->send_mutex);
>
> /*
> * The reconnect attempt is done (maybe successfully, maybe not), so
> @@ -474,21 +471,21 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
> qemu_co_mutex_lock(&s->send_mutex);
>
> while (s->in_flight == MAX_NBD_REQUESTS ||
> - (!nbd_client_connected(s) && s->in_flight > 0))
> - {
> + (!nbd_client_connected(s) && s->in_flight > 0)) {
Mixing in a style change here. Not the end of the world.
The cosmetics are trivial, and the real change of enlarging the scope
of in_flight makes sense to me.
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines
2022-04-12 19:42 ` [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines Paolo Bonzini
2022-04-13 7:35 ` Paolo Bonzini
@ 2022-04-13 15:55 ` Eric Blake
2022-04-13 20:16 ` Paolo Bonzini
1 sibling, 1 reply; 22+ messages in thread
From: Eric Blake @ 2022-04-13 15:55 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Tue, Apr 12, 2022 at 09:42:01PM +0200, Paolo Bonzini wrote:
> The condition for waiting on the s->free_sema queue depends on
> both s->in_flight and s->state. The latter is currently using
> atomics, but this is quite dubious and probably wrong.
>
> Because s->state is written in the main thread too, for example by
> the reconnect timer callback, it cannot be protected by a CoMutex.
> Introduce a separate lock that can be used by nbd_co_send_request();
> later on this lock will also be used for s->state. There will not
> be any contention on the lock unless there is a reconnect, so this
> is not performance sensitive.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/nbd.c | 46 +++++++++++++++++++++++++++-------------------
> 1 file changed, 27 insertions(+), 19 deletions(-)
>
> diff --git a/block/nbd.c b/block/nbd.c
> index 0ff41cb914..c908ea6ae3 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -72,17 +72,22 @@ typedef struct BDRVNBDState {
> QIOChannel *ioc; /* The current I/O channel */
> NBDExportInfo info;
>
> - CoMutex send_mutex;
> + /*
> + * Protects free_sema, in_flight, requests[].coroutine,
> + * reconnect_delay_timer.
> + */
> + QemuMutex requests_lock;
> CoQueue free_sema;
> -
> - CoMutex receive_mutex;
> int in_flight;
> + NBDClientRequest requests[MAX_NBD_REQUESTS];
> + QEMUTimer *reconnect_delay_timer;
> +
> + CoMutex send_mutex;
> + CoMutex receive_mutex;
> NBDClientState state;
>
> - QEMUTimer *reconnect_delay_timer;
> QEMUTimer *open_timer;
>
> - NBDClientRequest requests[MAX_NBD_REQUESTS];
Reordering of the struct makes sense.
> @@ -468,11 +473,10 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
> BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> int rc, i = -1;
>
> - qemu_co_mutex_lock(&s->send_mutex);
> -
> + qemu_mutex_lock(&s->requests_lock);
> while (s->in_flight == MAX_NBD_REQUESTS ||
> (!nbd_client_connected(s) && s->in_flight > 0)) {
> - qemu_co_queue_wait(&s->free_sema, &s->send_mutex);
> + qemu_co_queue_wait(&s->free_sema, &s->requests_lock);
> }
>
> s->in_flight++;
> @@ -493,14 +497,14 @@ static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
> }
> }
>
> - g_assert(qemu_in_coroutine());
Why is this assert dropped? Is it because we've marked the function
with coroutine_fn? If so, should we drop it earlier in the series,
when you added the label?
Otherwise, the patch makes sense to me.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 6/8] nbd: move s->state under requests_lock
2022-04-12 19:42 ` [PATCH for-7.1 6/8] nbd: move s->state under requests_lock Paolo Bonzini
@ 2022-04-13 16:23 ` Eric Blake
2022-04-13 20:21 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Eric Blake @ 2022-04-13 16:23 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Tue, Apr 12, 2022 at 09:42:02PM +0200, Paolo Bonzini wrote:
> Remove the confusing, and most likely wrong, atomics. The only function
> that used to be somewhat in a hot path was nbd_client_connected(),
> but it is not anymore after the previous patches.
>
> The function nbd_client_connecting_wait() was used mostly to check if
> a request had to be reissued (outside requests_lock), but also
> under requests_lock in nbd_client_connecting_wait(). The two uses have to
"Function A was used mostly..., but also under requests_lock in
function A." Reading the rest of the patch, I think...[1]
> be separated; for the former we rename it to nbd_client_will_reconnect()
> and make it take s->requests_lock; for the latter the access can simply
> be inlined. The new name is clearer, and ensures that a missing
> conversion is caught by the compiler.
I take it your experiments with C++ coroutines helped find this ;)
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/nbd.c | 88 +++++++++++++++++++++++++++++------------------------
> 1 file changed, 48 insertions(+), 40 deletions(-)
> @@ -226,7 +225,9 @@ static void nbd_teardown_connection(BlockDriverState *bs)
> s->ioc = NULL;
> }
>
> - s->state = NBD_CLIENT_QUIT;
> + WITH_QEMU_LOCK_GUARD(&s->requests_lock) {
> + s->state = NBD_CLIENT_QUIT;
> + }
> }
This style for protecting s->state at the end of the function takes 3
lines thanks to WITH_QEMU_LOCK_GUARD...[2]
>
> static void open_timer_del(BDRVNBDState *s)
> @@ -255,16 +256,13 @@ static void open_timer_init(BDRVNBDState *s, uint64_t expire_time_ns)
> timer_mod(s->open_timer, expire_time_ns);
> }
>
> -static bool nbd_client_connecting(BDRVNBDState *s)
> +static bool nbd_client_will_reconnect(BDRVNBDState *s)
This part of the diff got hard to read, since you mixed shuffling
functions with a rename. On a closer read, I see that
nbd_client_connecting() was merely moved later[3], while the new name
nbd_client_will_reconnect()...[4]
> {
> - NBDClientState state = qatomic_load_acquire(&s->state);
> - return state == NBD_CLIENT_CONNECTING_WAIT ||
> - state == NBD_CLIENT_CONNECTING_NOWAIT;
> -}
> -
> -static bool nbd_client_connecting_wait(BDRVNBDState *s)
[4]...is indeed happening to nbd_client_connecting_wait(), as promised
in the commit message. Which means:
[1]...so it looks like the first 'function A' did indeed want to be
nbd_client_connecting_wait() which got renamed (since
nbd_client_connecting() was moved later in the file), while...[1]
> -{
> - return qatomic_load_acquire(&s->state) == NBD_CLIENT_CONNECTING_WAIT;
> + /*
> + * Called only after a socket error, so this is not performance sensitive.
> + */
> + QEMU_LOCK_GUARD(&s->requests_lock);
> + return s->state == NBD_CLIENT_CONNECTING_WAIT;
> }
[2]...while here, you only needed two lines, using QEMU_LOCK_GUARD.
Both styles work, but it seems like we should be consistent, and I
would favor the shorter style when all that is being guarded is a
single line.
>
> /*
> @@ -351,15 +349,24 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
> qio_channel_attach_aio_context(s->ioc, bdrv_get_aio_context(bs));
>
> /* successfully connected */
> - s->state = NBD_CLIENT_CONNECTED;
> + WITH_QEMU_LOCK_GUARD(&s->requests_lock) {
> + s->state = NBD_CLIENT_CONNECTED;
> + }
>
> return 0;
> }
Another place where the shorter QEMU_LOCK_GUARD() would work.
>
> +/* Called with s->requests_lock held. */
> +static bool nbd_client_connecting(BDRVNBDState *s)
[3]...here's where the moved function threw me off. Perhaps splitting
out a preliminary patch to just move the function before converting it
to be under s->requests_lock, so that the rename of a different
function doesn't cause a hard-to-grok diff, would be wise.
> +{
> + return s->state == NBD_CLIENT_CONNECTING_WAIT ||
> + s->state == NBD_CLIENT_CONNECTING_NOWAIT;
> +}
> +
> /* Called with s->requests_lock taken. */
> static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
> {
> - bool blocking = nbd_client_connecting_wait(s);
> + bool blocking = s->state == NBD_CLIENT_CONNECTING_WAIT;
[1]...and the second instance of 'function A' in the commit message
should have been nbd_reconnect_attempt().
As messy as the diff was, I still think I understood it. With the
necessary correction to the commit message according to [1], I could
be comfortable with
Reviewed-by: Eric Blake <eblake@redhat.com>
although the suggestion in [3] to split out the function motion to a
separate patch may result in the v2 series looking different enough
that you may want to leave off my R-b to ensure I still review things
carefully.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 2/8] nbd: mark more coroutine_fns
2022-04-13 12:25 ` Eric Blake
@ 2022-04-13 20:12 ` Paolo Bonzini
0 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-13 20:12 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel
On 4/13/22 14:25, Eric Blake wrote:
>> -static bool nbd_recv_coroutine_wake_one(NBDClientRequest *req)
>> +static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
> This already has_coroutine_ in the name, would it be better as_co_?
>
>> {
>> if (req->receiving) {
>> req->receiving = false;
>> @@ -144,7 +144,7 @@ static bool nbd_recv_coroutine_wake_one(NBDClientRequest *req)
>> return false;
>> }
>>
>> -static void nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
>> +static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
> This already has_coroutines_ in the name, would it be better as_co_?
These mean "wake a coroutine", not "I'm in a coroutine", so I'd say they
are fine as is.
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines
2022-04-13 15:55 ` Eric Blake
@ 2022-04-13 20:16 ` Paolo Bonzini
0 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-13 20:16 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel
On 4/13/22 17:55, Eric Blake wrote:
>> - g_assert(qemu_in_coroutine());
> Why is this assert dropped? Is it because we've marked the function
> with coroutine_fn? If so, should we drop it earlier in the series,
> when you added the label?
The label doesn't guarantee much, but in this case it's pretty clear-cut
that we must be in a coroutine, the code just doesn't make sense otherwise.
We're also about to take a CoMutex (and before the patch we had already
taken it) so if anywhere the place for the assertion would be
qemu_co_mutex_lock().
> Otherwise, the patch makes sense to me.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 6/8] nbd: move s->state under requests_lock
2022-04-13 16:23 ` Eric Blake
@ 2022-04-13 20:21 ` Paolo Bonzini
0 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2022-04-13 20:21 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel
On 4/13/22 18:23, Eric Blake wrote:
>>
>> The function nbd_client_connecting_wait() was used mostly to check if
>> a request had to be reissued (outside requests_lock), but also
>> under requests_lock in nbd_client_connecting_wait(). The two uses have to
> "Function A was used mostly..., but also under requests_lock in
> function A." Reading the rest of the patch, I think...[1]
>
>> be separated; for the former we rename it to nbd_client_will_reconnect()
>> and make it take s->requests_lock; for the latter the access can simply
>> be inlined. The new name is clearer, and ensures that a missing
>> conversion is caught by the compiler.
>
> I take it your experiments with C++ coroutines helped find this;)
No, they never went that far. :) Rather, these atomics have always
bugged me, and after Emanuele pointed me to the enter_all without lock,
I noticed that they can be fixed with the same hammer.
>> + QEMU_LOCK_GUARD(&s->requests_lock);
>> + return s->state == NBD_CLIENT_CONNECTING_WAIT;
>> }
>
> [2]...while here, you only needed two lines, using QEMU_LOCK_GUARD.
> Both styles work, but it seems like we should be consistent, and I
> would favor the shorter style when all that is being guarded is a
> single line.
>
QEMU_LOCK_GUARD() is a declaration in some sense (well, it is also a
declaration when you expand the macro) and QEMU in general doesn't do
declaration-after-statement.
Also, QEMU_LOCK_GUARD() emphasizes that the whole function is guarded,
while WITH_QEMU_LOCK_GUARD() has the opposite effect on the reader.
> although the suggestion in [3] to split out the function motion to a
> separate patch may result in the v2 series looking different enough
> that you may want to leave off my R-b to ensure I still review things
> carefully.
Will do.
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 7/8] nbd: take receive_mutex when reading requests[].receiving
2022-04-12 19:42 ` [PATCH for-7.1 7/8] nbd: take receive_mutex when reading requests[].receiving Paolo Bonzini
@ 2022-04-13 21:20 ` Eric Blake
0 siblings, 0 replies; 22+ messages in thread
From: Eric Blake @ 2022-04-13 21:20 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Tue, Apr 12, 2022 at 09:42:03PM +0200, Paolo Bonzini wrote:
> requests[].receiving is set by nbd_receive_replies() under the receive_mutex;
> Read it under the same mutex as well. Waking up receivers on errors happens
> after each reply finishes processing, in nbd_co_receive_one_chunk().
> If there is no currently-active reply, there are two cases:
>
> * either there is no active request at all, in which case no
> element of request[] can have .receiving = true
>
> * or nbd_receive_replies() must be running and waiting for receive_mutex;
> in that case it will get back to nbd_co_receive_one_chunk() because
> the socket has been shutdown, and all waiting coroutines will wake up
> in turn.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/nbd.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH for-7.1 8/8] nbd: document what is protected by the CoMutexes
2022-04-12 19:42 ` [PATCH for-7.1 8/8] nbd: document what is protected by the CoMutexes Paolo Bonzini
@ 2022-04-13 21:22 ` Eric Blake
0 siblings, 0 replies; 22+ messages in thread
From: Eric Blake @ 2022-04-13 21:22 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Tue, Apr 12, 2022 at 09:42:04PM +0200, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/nbd.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/block/nbd.c b/block/nbd.c
> index 8954243f50..8297da7e89 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -82,12 +82,18 @@ typedef struct BDRVNBDState {
> NBDClientRequest requests[MAX_NBD_REQUESTS];
> QEMUTimer *reconnect_delay_timer;
>
> + /* Protects sending data on the socket. */
> CoMutex send_mutex;
> +
> + /*
> + * Protects receiving reply headers from the socket, as well as the
> + * fields reply, requests[].receiving and requests[].reply_possible
> + */
> CoMutex receive_mutex;
> + NBDReply reply;
>
> QEMUTimer *open_timer;
>
> - NBDReply reply;
> BlockDriverState *bs;
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2022-04-13 21:24 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 19:41 [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe Paolo Bonzini
2022-04-12 19:41 ` [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard Paolo Bonzini
2022-04-12 22:05 ` Eric Blake
2022-04-13 7:30 ` Paolo Bonzini
2022-04-12 19:41 ` [PATCH for-7.1 2/8] nbd: mark more coroutine_fns Paolo Bonzini
2022-04-13 12:25 ` Eric Blake
2022-04-13 20:12 ` Paolo Bonzini
2022-04-12 19:41 ` [PATCH for-7.1 3/8] nbd: remove peppering of nbd_client_connected Paolo Bonzini
2022-04-13 12:48 ` Eric Blake
2022-04-12 19:42 ` [PATCH for-7.1 4/8] nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection Paolo Bonzini
2022-04-13 15:42 ` Eric Blake
2022-04-12 19:42 ` [PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines Paolo Bonzini
2022-04-13 7:35 ` Paolo Bonzini
2022-04-13 15:55 ` Eric Blake
2022-04-13 20:16 ` Paolo Bonzini
2022-04-12 19:42 ` [PATCH for-7.1 6/8] nbd: move s->state under requests_lock Paolo Bonzini
2022-04-13 16:23 ` Eric Blake
2022-04-13 20:21 ` Paolo Bonzini
2022-04-12 19:42 ` [PATCH for-7.1 7/8] nbd: take receive_mutex when reading requests[].receiving Paolo Bonzini
2022-04-13 21:20 ` Eric Blake
2022-04-12 19:42 ` [PATCH for-7.1 8/8] nbd: document what is protected by the CoMutexes Paolo Bonzini
2022-04-13 21:22 ` Eric Blake
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.