All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage
@ 2017-08-14 21:34 Eric Blake
  2017-08-15 14:45 ` Vladimir Sementsov-Ogievskiy
  2017-08-15 14:49 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Blake @ 2017-08-14 21:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Kevin Wolf, Max Reitz, open list:Network Block Dev...

When we switched NBD to use coroutines for qemu 2.9 (in particular,
commit a12a712a), we introduced a regression: if a server sends us
garbage (such as a corrupted magic number), we quit the read loop
but do not stop sending further queued commands, resulting in the
client hanging when it never reads the response to those additional
commands.  In qemu 2.8, we properly detected that the server is no
longer reliable, and cancelled all existing pending commands with
EIO, then tore down the socket so that all further command attempts
get EPIPE.

Restore the proper behavior of quitting (almost) all communication
with a broken server: Once we know we are out of sync or otherwise
can't trust the server, we must assume that any further incoming
data is unreliable and therefore end all pending commands with EIO,
and quit trying to send any further commands.  As an exception, we
still (try to) send NBD_CMD_DISC to let the server know we are going
away (in part, because it is easier to do that than to further
refactor nbd_teardown_connection, and in part because it is the
only command where we do not have to wait for a reply).

Based on a patch by Vladimir Sementsov-Ogievskiy.

A malicious server can be created with the following hack,
followed by setting NBD_SERVER_DEBUG to a non-zero value in the
environment when running qemu-nbd:

| --- a/nbd/server.c
| +++ b/nbd/server.c
| @@ -919,6 +919,17 @@ static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
|      stl_be_p(buf + 4, reply->error);
|      stq_be_p(buf + 8, reply->handle);
|
| +    static int debug;
| +    static int count;
| +    if (!count++) {
| +        const char *str = getenv("NBD_SERVER_DEBUG");
| +        if (str) {
| +            debug = atoi(str);
| +        }
| +    }
| +    if (debug && !(count % debug)) {
| +        buf[0] = 0;
| +    }
|      return nbd_write(ioc, buf, sizeof(buf), errp);
|  }

Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
---

Supercedes both Vladimir and my earlier attempts:
https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02131.html
https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg01501.html

 block/nbd-client.h |  1 +
 block/nbd-client.c | 14 ++++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index df80771357..1935ffbcaa 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -29,6 +29,7 @@ typedef struct NBDClientSession {

     Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
     NBDReply reply;
+    bool quit;
 } NBDClientSession;

 NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 25dd28406b..bb17e3da45 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
     int ret;
     Error *local_err = NULL;

-    for (;;) {
+    while (!s->quit) {
         assert(s->reply.handle == 0);
         ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
         if (ret < 0) {
@@ -107,6 +107,9 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
         qemu_coroutine_yield();
     }

+    if (ret < 0) {
+        s->quit = true;
+    }
     nbd_recv_coroutines_enter_all(s);
     s->read_reply_co = NULL;
 }
@@ -135,6 +138,10 @@ static int nbd_co_send_request(BlockDriverState *bs,
     assert(i < MAX_NBD_REQUESTS);
     request->handle = INDEX_TO_HANDLE(s, i);

+    if (s->quit) {
+        qemu_co_mutex_unlock(&s->send_mutex);
+        return -EIO;
+    }
     if (!s->ioc) {
         qemu_co_mutex_unlock(&s->send_mutex);
         return -EPIPE;
@@ -143,7 +150,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
     if (qiov) {
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
-        if (rc >= 0) {
+        if (rc >= 0 && !s->quit) {
             ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, false,
                           NULL);
             if (ret != request->len) {
@@ -168,8 +175,7 @@ static void nbd_co_receive_reply(NBDClientSession *s,
     /* Wait until we're woken up by nbd_read_reply_entry.  */
     qemu_coroutine_yield();
     *reply = s->reply;
-    if (reply->handle != request->handle ||
-        !s->ioc) {
+    if (reply->handle != request->handle || !s->ioc || s->quit) {
         reply->error = EIO;
     } else {
         if (qiov && reply->error == 0) {
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage
  2017-08-14 21:34 [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage Eric Blake
@ 2017-08-15 14:45 ` Vladimir Sementsov-Ogievskiy
  2017-08-15 14:54   ` Eric Blake
  2017-08-15 14:49 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
  1 sibling, 1 reply; 4+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2017-08-15 14:45 UTC (permalink / raw)
  To: qemu-devel

15.08.2017 00:34, Eric Blake wrote:
> When we switched NBD to use coroutines for qemu 2.9 (in particular,
> commit a12a712a), we introduced a regression: if a server sends us
> garbage (such as a corrupted magic number), we quit the read loop
> but do not stop sending further queued commands, resulting in the
> client hanging when it never reads the response to those additional
> commands.  In qemu 2.8, we properly detected that the server is no
> longer reliable, and cancelled all existing pending commands with
> EIO, then tore down the socket so that all further command attempts
> get EPIPE.
>
> Restore the proper behavior of quitting (almost) all communication
> with a broken server: Once we know we are out of sync or otherwise
> can't trust the server, we must assume that any further incoming
> data is unreliable and therefore end all pending commands with EIO,
> and quit trying to send any further commands.  As an exception, we
> still (try to) send NBD_CMD_DISC to let the server know we are going
> away (in part, because it is easier to do that than to further
> refactor nbd_teardown_connection, and in part because it is the
> only command where we do not have to wait for a reply).
>
> Based on a patch by Vladimir Sementsov-Ogievskiy.
>
> A malicious server can be created with the following hack,
> followed by setting NBD_SERVER_DEBUG to a non-zero value in the
> environment when running qemu-nbd:
>
> | --- a/nbd/server.c
> | +++ b/nbd/server.c
> | @@ -919,6 +919,17 @@ static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
> |      stl_be_p(buf + 4, reply->error);
> |      stq_be_p(buf + 8, reply->handle);
> |
> | +    static int debug;
> | +    static int count;
> | +    if (!count++) {
> | +        const char *str = getenv("NBD_SERVER_DEBUG");
> | +        if (str) {
> | +            debug = atoi(str);
> | +        }
> | +    }
> | +    if (debug && !(count % debug)) {
> | +        buf[0] = 0;
> | +    }
> |      return nbd_write(ioc, buf, sizeof(buf), errp);
> |  }
>
> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>
> Supercedes both Vladimir and my earlier attempts:
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02131.html
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg01501.html
>
>   block/nbd-client.h |  1 +
>   block/nbd-client.c | 14 ++++++++++----
>   2 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/block/nbd-client.h b/block/nbd-client.h
> index df80771357..1935ffbcaa 100644
> --- a/block/nbd-client.h
> +++ b/block/nbd-client.h
> @@ -29,6 +29,7 @@ typedef struct NBDClientSession {
>
>       Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
>       NBDReply reply;
> +    bool quit;
>   } NBDClientSession;
>
>   NBDClientSession *nbd_get_client_session(BlockDriverState *bs);
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index 25dd28406b..bb17e3da45 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
>       int ret;
>       Error *local_err = NULL;
>
> -    for (;;) {
> +    while (!s->quit) {

looks like quit will never be true here

>           assert(s->reply.handle == 0);
>           ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>           if (ret < 0) {
> @@ -107,6 +107,9 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
>           qemu_coroutine_yield();
>       }
>
> +    if (ret < 0) {
> +        s->quit = true;

so, you set quit only here.. if we fail on some write, reading coroutine 
will not
know about it and will continue reading..

> +    }
>       nbd_recv_coroutines_enter_all(s);
>       s->read_reply_co = NULL;
>   }
> @@ -135,6 +138,10 @@ static int nbd_co_send_request(BlockDriverState *bs,
>       assert(i < MAX_NBD_REQUESTS);
>       request->handle = INDEX_TO_HANDLE(s, i);
>
> +    if (s->quit) {
> +        qemu_co_mutex_unlock(&s->send_mutex);
> +        return -EIO;
> +    }
>       if (!s->ioc) {
>           qemu_co_mutex_unlock(&s->send_mutex);
>           return -EPIPE;
> @@ -143,7 +150,7 @@ static int nbd_co_send_request(BlockDriverState *bs,
>       if (qiov) {
>           qio_channel_set_cork(s->ioc, true);
>           rc = nbd_send_request(s->ioc, request);
> -        if (rc >= 0) {
> +        if (rc >= 0 && !s->quit) {
>               ret = nbd_rwv(s->ioc, qiov->iov, qiov->niov, request->len, false,
>                             NULL);
>               if (ret != request->len) {
> @@ -168,8 +175,7 @@ static void nbd_co_receive_reply(NBDClientSession *s,
>       /* Wait until we're woken up by nbd_read_reply_entry.  */
>       qemu_coroutine_yield();
>       *reply = s->reply;
> -    if (reply->handle != request->handle ||
> -        !s->ioc) {
> +    if (reply->handle != request->handle || !s->ioc || s->quit) {
>           reply->error = EIO;
>       } else {
>           if (qiov && reply->error == 0) {


-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [Qemu-block] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage
  2017-08-14 21:34 [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage Eric Blake
  2017-08-15 14:45 ` Vladimir Sementsov-Ogievskiy
@ 2017-08-15 14:49 ` Stefan Hajnoczi
  1 sibling, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2017-08-15 14:49 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Paolo Bonzini,
	open list:Network Block Dev...,
	Max Reitz

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]

On Mon, Aug 14, 2017 at 04:34:26PM -0500, Eric Blake wrote:
> When we switched NBD to use coroutines for qemu 2.9 (in particular,
> commit a12a712a), we introduced a regression: if a server sends us
> garbage (such as a corrupted magic number), we quit the read loop
> but do not stop sending further queued commands, resulting in the
> client hanging when it never reads the response to those additional
> commands.  In qemu 2.8, we properly detected that the server is no
> longer reliable, and cancelled all existing pending commands with
> EIO, then tore down the socket so that all further command attempts
> get EPIPE.
> 
> Restore the proper behavior of quitting (almost) all communication
> with a broken server: Once we know we are out of sync or otherwise
> can't trust the server, we must assume that any further incoming
> data is unreliable and therefore end all pending commands with EIO,
> and quit trying to send any further commands.  As an exception, we
> still (try to) send NBD_CMD_DISC to let the server know we are going
> away (in part, because it is easier to do that than to further
> refactor nbd_teardown_connection, and in part because it is the
> only command where we do not have to wait for a reply).
> 
> Based on a patch by Vladimir Sementsov-Ogievskiy.
> 
> A malicious server can be created with the following hack,
> followed by setting NBD_SERVER_DEBUG to a non-zero value in the
> environment when running qemu-nbd:
> 
> | --- a/nbd/server.c
> | +++ b/nbd/server.c
> | @@ -919,6 +919,17 @@ static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
> |      stl_be_p(buf + 4, reply->error);
> |      stq_be_p(buf + 8, reply->handle);
> |
> | +    static int debug;
> | +    static int count;
> | +    if (!count++) {
> | +        const char *str = getenv("NBD_SERVER_DEBUG");
> | +        if (str) {
> | +            debug = atoi(str);
> | +        }
> | +    }
> | +    if (debug && !(count % debug)) {
> | +        buf[0] = 0;
> | +    }
> |      return nbd_write(ioc, buf, sizeof(buf), errp);
> |  }
> 
> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> 
> Supercedes both Vladimir and my earlier attempts:
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg02131.html
> https://lists.gnu.org/archive/html/qemu-devel/2017-08/msg01501.html
> 
>  block/nbd-client.h |  1 +
>  block/nbd-client.c | 14 ++++++++++----
>  2 files changed, 11 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage
  2017-08-15 14:45 ` Vladimir Sementsov-Ogievskiy
@ 2017-08-15 14:54   ` Eric Blake
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Blake @ 2017-08-15 14:54 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2881 bytes --]

On 08/15/2017 09:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> 15.08.2017 00:34, Eric Blake wrote:
>> When we switched NBD to use coroutines for qemu 2.9 (in particular,
>> commit a12a712a), we introduced a regression: if a server sends us
>> garbage (such as a corrupted magic number), we quit the read loop
>> but do not stop sending further queued commands, resulting in the
>> client hanging when it never reads the response to those additional
>> commands.  In qemu 2.8, we properly detected that the server is no
>> longer reliable, and cancelled all existing pending commands with
>> EIO, then tore down the socket so that all further command attempts
>> get EPIPE.
>>
>> Restore the proper behavior of quitting (almost) all communication
>> with a broken server: Once we know we are out of sync or otherwise
>> can't trust the server, we must assume that any further incoming
>> data is unreliable and therefore end all pending commands with EIO,
>> and quit trying to send any further commands.  As an exception, we
>> still (try to) send NBD_CMD_DISC to let the server know we are going
>> away (in part, because it is easier to do that than to further
>> refactor nbd_teardown_connection, and in part because it is the
>> only command where we do not have to wait for a reply).
>>
>> Based on a patch by Vladimir Sementsov-Ogievskiy.
>>

>> +++ b/block/nbd-client.c
>> @@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>>       int ret;
>>       Error *local_err = NULL;
>>
>> -    for (;;) {
>> +    while (!s->quit) {
> 
> looks like quit will never be true here
> 
>>           assert(s->reply.handle == 0);
>>           ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>>           if (ret < 0) {
>> @@ -107,6 +107,9 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>>           qemu_coroutine_yield();
>>       }
>>
>> +    if (ret < 0) {
>> +        s->quit = true;
> 
> so, you set quit only here.. if we fail on some write, reading coroutine
> will not
> know about it and will continue reading..

Looks like you are correct - your version set the flag in more places
than me, but it looks like you're right that we DO want to set the flag
when writing hits a known failure.

Here's what I plan to squash in:

diff --git i/block/nbd-client.c w/block/nbd-client.c
index bb17e3da45..422ecb4307 100644
--- i/block/nbd-client.c
+++ w/block/nbd-client.c
@@ -161,6 +161,9 @@ static int nbd_co_send_request(BlockDriverState *bs,
     } else {
         rc = nbd_send_request(s->ioc, request);
     }
+    if (rc < 0) {
+        s->quit = true;
+    }
     qemu_co_mutex_unlock(&s->send_mutex);
     return rc;
 }

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-08-15 14:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-14 21:34 [Qemu-devel] [PATCH for-2.10 v2] nbd-client: Fix regression when server sends garbage Eric Blake
2017-08-15 14:45 ` Vladimir Sementsov-Ogievskiy
2017-08-15 14:54   ` Eric Blake
2017-08-15 14:49 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.