All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] nbd: Try for cleaner TLS shutdown
@ 2020-03-27 16:19 Eric Blake
  2020-03-27 16:19 ` [PATCH 1/3] crypto: Add qcrypto_tls_shutdown() Eric Blake
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Eric Blake @ 2020-03-27 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, qemu-block

I'm currently trying to debug a hang in 'nbdkit nbd' as a client when
TLS is in use, when dealing with three processes:
nbdkit example1[1] <=tls=> nbdkit nbd[2] <=plain=> qemu-nbd --list[3]

In theory, when process [3] exits, the server side of process [2] is
supposed to disconnect as client from process [1].  But we were
hitting sporadic hangs where process [2] was stuck in a poll() while
in gnutls cleanup code, which appears to be because it did not
recognize a clean shutdown from process [3].

I do not yet have a strong reproducible test case (which might include
adding strategic sleep()s in one-off compilations) to prove that the
problem is an unclean TLS shutdown, but at the same time, the gnutls
documentation is clear that without proper use of gnutls_bye(), a
client cannot distinguish between clean end of communications and a
malicious interruption in service.  Shutting down the write side of a
bi-directional socket as soon as possible makes it easier for a client
to begin its own shutdown actions, and can help in avoiding deadlocks
where both sides of a connection are waiting for the other side to
close() the socket first.  If I can come up with a reliable test case,
this series may be a candidate for 5.0; but for now, I'm only
proposing it for 5.1 (we've had more than one release where qemu was
not exploiting the full power of gnutls_bye(), and even if that
triggers poor interaction with other endpoints such as nbdkit, going
one more release with the same behavior isn't making things worse).

Eric Blake (3):
  crypto: Add qcrypto_tls_shutdown()
  io: Support shutdown of TLS channel
  nbd: Use shutdown(SHUT_WR) after last item sent

 qapi/crypto.json            | 15 +++++++++++++++
 include/crypto/tlssession.h | 24 ++++++++++++++++++++++++
 block/nbd.c                 |  1 +
 crypto/tlssession.c         | 27 +++++++++++++++++++++++++++
 io/channel-tls.c            | 27 ++++++++++++++++++++++++++-
 nbd/client.c                |  3 ++-
 nbd/server.c                |  4 ++++
 7 files changed, 99 insertions(+), 2 deletions(-)

-- 
2.26.0.rc2



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] crypto: Add qcrypto_tls_shutdown()
  2020-03-27 16:19 [PATCH 0/3] nbd: Try for cleaner TLS shutdown Eric Blake
@ 2020-03-27 16:19 ` Eric Blake
  2020-03-31  8:30   ` Markus Armbruster
  2020-03-27 16:19 ` [PATCH 2/3] io: Support shutdown of TLS channel Eric Blake
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2020-03-27 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, Markus Armbruster, qemu-block

Gnutls documents that applications that want to distinguish between a
clean end-of-communication and a malicious client abruptly tearing the
underlying transport out of under our feet need to use gnutls_bye().
Our channel code is already set up to allow shutdown requests, but we
weren't forwarding those to gnutls.  To make that work, we first need
a new entry point that can isolate the rest of our code from the
gnutls interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qapi/crypto.json            | 15 +++++++++++++++
 include/crypto/tlssession.h | 24 ++++++++++++++++++++++++
 crypto/tlssession.c         | 27 +++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/qapi/crypto.json b/qapi/crypto.json
index b2a4cff683ff..1df0f4502885 100644
--- a/qapi/crypto.json
+++ b/qapi/crypto.json
@@ -119,6 +119,21 @@
   'prefix': 'QCRYPTO_IVGEN_ALG',
   'data': ['plain', 'plain64', 'essiv']}

+##
+# @QCryptoShutdownMode:
+#
+# The supported modes for requesting shutdown of a crypto
+# communication channel.
+#
+# @shut-wr: No more writes will be sent, but the remote end can still send
+#           data to be read.
+# @shut-rdwr: No more reads or writes should occur.
+# Since: 5.1
+##
+{ 'enum': 'QCryptoShutdownMode',
+  'prefix': 'QCRYPTO',
+  'data': ['shut-wr', 'shut-rdwr']}
+
 ##
 # @QCryptoBlockFormat:
 #
diff --git a/include/crypto/tlssession.h b/include/crypto/tlssession.h
index 15b9cef086cc..10c670e3b6a2 100644
--- a/include/crypto/tlssession.h
+++ b/include/crypto/tlssession.h
@@ -321,4 +321,28 @@ int qcrypto_tls_session_get_key_size(QCryptoTLSSession *sess,
  */
 char *qcrypto_tls_session_get_peer_name(QCryptoTLSSession *sess);

+/**
+ * qcrypto_tls_shutdown:
+ * @sess: the TLS session object
+ * @how: the desired shutdown mode
+ *
+ * Prepare to terminate the session.  If @how is QCRYPTO_SHUT_WR, this
+ * side will no longer write data, but should still process reads
+ * until EOF; if @how is QCRYPTO_SHUT_RDWR, then the entire session
+ * should shut down.  Use of this function is optional, since closing
+ * the session implies QCRYPTO_SHUT_RDWR.  However, using this
+ * function prior to terminating the underlying transport layer makes
+ * it possible for the remote endpoint to distinguish between a
+ * malicious party prematurely terminating the the connection and
+ * normal termination.
+ *
+ * This function should only be used after a successful
+ * qcrypto_tls_session_handshake().
+ *
+ * Returns: 0 for success, or -EAGAIN if more underlying I/O is
+ * required to finish proper session shutdown.
+ */
+int qcrypto_tls_session_shutdown(QCryptoTLSSession *sess,
+                                 QCryptoShutdownMode how);
+
 #endif /* QCRYPTO_TLSSESSION_H */
diff --git a/crypto/tlssession.c b/crypto/tlssession.c
index 33203e8ca711..903301189069 100644
--- a/crypto/tlssession.c
+++ b/crypto/tlssession.c
@@ -521,6 +521,33 @@ qcrypto_tls_session_get_handshake_status(QCryptoTLSSession *session)
 }


+int qcrypto_tls_session_shutdown(QCryptoTLSSession *session,
+                                 QCryptoShutdownMode how)
+{
+    gnutls_close_request_t mode;
+    int ret;
+
+    assert(session->handshakeComplete);
+    switch (how) {
+    case QCRYPTO_SHUT_WR:
+        mode = GNUTLS_SHUT_WR;
+        break;
+    case QCRYPTO_SHUT_RDWR:
+        mode = GNUTLS_SHUT_RDWR;
+        break;
+    default:
+        abort();
+    }
+
+    ret = gnutls_bye(session->handle, mode);
+    if (ret == GNUTLS_E_INTERRUPTED ||
+        ret == GNUTLS_E_AGAIN) {
+        return -EAGAIN;
+    }
+    return 0;
+}
+
+
 int
 qcrypto_tls_session_get_key_size(QCryptoTLSSession *session,
                                  Error **errp)
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] io: Support shutdown of TLS channel
  2020-03-27 16:19 [PATCH 0/3] nbd: Try for cleaner TLS shutdown Eric Blake
  2020-03-27 16:19 ` [PATCH 1/3] crypto: Add qcrypto_tls_shutdown() Eric Blake
@ 2020-03-27 16:19 ` Eric Blake
  2020-03-27 16:40   ` Daniel P. Berrangé
  2020-03-27 16:19 ` [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent Eric Blake
  2020-03-27 18:44 ` [PATCH 0/3] nbd: Try for cleaner TLS shutdown no-reply
  3 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2020-03-27 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: berrange, qemu-block

Gnutls documents that while many apps simply yank out the underlying
transport at the end of communication in the name of efficiency, this
is indistinguishable from a malicious actor terminating the connection
prematurely.  Since our channel I/O code already supports the notion of
a graceful shutdown request, it is time to plumb that through to the
TLS layer, and wait for TLS to give the all clear before then
terminating traffic on the underlying channel.

Note that channel-tls now always advertises shutdown support,
regardless of whether the underlying channel also has that support.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 io/channel-tls.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/io/channel-tls.c b/io/channel-tls.c
index 7ec8ceff2f01..f90905823e1d 100644
--- a/io/channel-tls.c
+++ b/io/channel-tls.c
@@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
                                     Error **errp)
 {
     QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
+    int ret = 0;

     tioc->shutdown |= how;

-    return qio_channel_shutdown(tioc->master, how, errp);
+    do {
+        switch (how) {
+        case QIO_CHANNEL_SHUTDOWN_READ:
+            /* No TLS counterpart */
+            break;
+        case QIO_CHANNEL_SHUTDOWN_WRITE:
+            ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
+            break;
+        case QIO_CHANNEL_SHUTDOWN_BOTH:
+            ret = qcrypto_tls_session_shutdown(tioc->session,
+                                               QCRYPTO_SHUT_RDWR);
+            break;
+        default:
+            abort();
+        }
+    } while (ret == -EAGAIN);
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "Cannot shut down TLS channel");
+        return -1;
+    }
+
+    if (qio_channel_has_feature(tioc->master, QIO_CHANNEL_FEATURE_SHUTDOWN)) {
+        return qio_channel_shutdown(tioc->master, how, errp);
+    }
+    return 0;
 }

 static int qio_channel_tls_close(QIOChannel *ioc,
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent
  2020-03-27 16:19 [PATCH 0/3] nbd: Try for cleaner TLS shutdown Eric Blake
  2020-03-27 16:19 ` [PATCH 1/3] crypto: Add qcrypto_tls_shutdown() Eric Blake
  2020-03-27 16:19 ` [PATCH 2/3] io: Support shutdown of TLS channel Eric Blake
@ 2020-03-27 16:19 ` Eric Blake
  2020-03-27 16:35   ` Daniel P. Berrangé
  2020-03-27 18:44 ` [PATCH 0/3] nbd: Try for cleaner TLS shutdown no-reply
  3 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2020-03-27 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, berrange, qemu-block, Max Reitz

Although the remote end should always be tolerant of a socket being
arbitrarily closed, there are situations where it is a lot easier if
the remote end can be guaranteed to read EOF even before the socket
has closed.  In particular, when using gnutls, if we fail to inform
the remote end about an impending teardown, the remote end cannot
distinguish between our closing the socket as intended vs. a malicious
intermediary interrupting things, and may result in spurious error
messages.  Or, we can end up with a deadlock where both ends are stuck
on a read() from the other end but neither gets an EOF.  Thus, after
any time a client sends NBD_OPT_ABORT or NBD_CMD_DISC, or a server has
finished replying (where appropriate) to such a request, it is worth
informing the channel that we will not be transmitting anything else.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd.c  | 1 +
 nbd/client.c | 3 ++-
 nbd/server.c | 4 ++++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2160859f6499..2906484390f9 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1402,6 +1402,7 @@ static void nbd_client_close(BlockDriverState *bs)

     if (s->ioc) {
         nbd_send_request(s->ioc, &request);
+        qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_WRITE, NULL);
     }

     nbd_teardown_connection(bs);
diff --git a/nbd/client.c b/nbd/client.c
index ba173108baab..1b8b3a9ae3bd 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright (C) 2016-2019 Red Hat, Inc.
+ *  Copyright (C) 2016-2020 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Client Side
@@ -103,6 +103,7 @@ static void nbd_send_opt_abort(QIOChannel *ioc)
      * even care if the request makes it to the server, let alone
      * waiting around for whether the server replies. */
     nbd_send_option_request(ioc, NBD_OPT_ABORT, 0, NULL, NULL);
+    qio_channel_shutdown(ioc, QIO_CHANNEL_SHUTDOWN_WRITE, NULL);
 }


diff --git a/nbd/server.c b/nbd/server.c
index 02b1ed080145..e21a1f662cc2 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1168,6 +1168,8 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
                                    "Option 0x%" PRIx32
                                    " not permitted before TLS", option);
                 if (option == NBD_OPT_ABORT) {
+                    qio_channel_shutdown(client->ioc,
+                                         QIO_CHANNEL_SHUTDOWN_WRITE, NULL);
                     return 1;
                 }
                 break;
@@ -1187,6 +1189,8 @@ static int nbd_negotiate_options(NBDClient *client, Error **errp)
                  * disconnecting, but that we must also tolerate
                  * guests that don't wait for our reply. */
                 nbd_negotiate_send_rep(client, NBD_REP_ACK, NULL);
+                qio_channel_shutdown(client->ioc,
+                                     QIO_CHANNEL_SHUTDOWN_WRITE, NULL);
                 return 1;

             case NBD_OPT_EXPORT_NAME:
-- 
2.26.0.rc2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent
  2020-03-27 16:19 ` [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent Eric Blake
@ 2020-03-27 16:35   ` Daniel P. Berrangé
  2020-03-27 17:42     ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2020-03-27 16:35 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, qemu-devel, qemu-block, Max Reitz

On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote:
> Although the remote end should always be tolerant of a socket being
> arbitrarily closed, there are situations where it is a lot easier if
> the remote end can be guaranteed to read EOF even before the socket
> has closed.  In particular, when using gnutls, if we fail to inform
> the remote end about an impending teardown, the remote end cannot
> distinguish between our closing the socket as intended vs. a malicious
> intermediary interrupting things, and may result in spurious error
> messages.

Does this actually matter in the NBD case ?

It has an explicit NBD command for requesting shutdown, and once
that's processed, it is fine to just close the socket abruptly - I
don't see a benefit to a TLS shutdown sequence on top.
AFAIK, the TLS level clean shutdown is only required if the
application protocol does not have any way to determine an
unexpected shutdown itself.

This is relevant for HTTP where the connection data stream may not
have a well defined end condition.

In the NBD case though, we have an explicit NBD_CMD_DISC to trigger
the disconnect. After processing that message, an EOF is acceptable
regardless of whether ,
before processing that message, any EOF is a unexpected.

>           Or, we can end up with a deadlock where both ends are stuck
> on a read() from the other end but neither gets an EOF.

If the socket has been closed abruptly why would it get stuck in
read() - it should see EOF surely ?

>                                                          Thus, after
> any time a client sends NBD_OPT_ABORT or NBD_CMD_DISC, or a server has
> finished replying (where appropriate) to such a request, it is worth
> informing the channel that we will not be transmitting anything else.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] io: Support shutdown of TLS channel
  2020-03-27 16:19 ` [PATCH 2/3] io: Support shutdown of TLS channel Eric Blake
@ 2020-03-27 16:40   ` Daniel P. Berrangé
  2020-03-27 17:29     ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2020-03-27 16:40 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block

On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:
> Gnutls documents that while many apps simply yank out the underlying
> transport at the end of communication in the name of efficiency, this
> is indistinguishable from a malicious actor terminating the connection
> prematurely.  Since our channel I/O code already supports the notion of
> a graceful shutdown request, it is time to plumb that through to the
> TLS layer, and wait for TLS to give the all clear before then
> terminating traffic on the underlying channel.
> 
> Note that channel-tls now always advertises shutdown support,
> regardless of whether the underlying channel also has that support.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  io/channel-tls.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/io/channel-tls.c b/io/channel-tls.c
> index 7ec8ceff2f01..f90905823e1d 100644
> --- a/io/channel-tls.c
> +++ b/io/channel-tls.c
> @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
>                                      Error **errp)
>  {
>      QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> +    int ret = 0;
> 
>      tioc->shutdown |= how;
> 
> -    return qio_channel_shutdown(tioc->master, how, errp);
> +    do {
> +        switch (how) {
> +        case QIO_CHANNEL_SHUTDOWN_READ:
> +            /* No TLS counterpart */
> +            break;
> +        case QIO_CHANNEL_SHUTDOWN_WRITE:
> +            ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
> +            break;
> +        case QIO_CHANNEL_SHUTDOWN_BOTH:
> +            ret = qcrypto_tls_session_shutdown(tioc->session,
> +                                               QCRYPTO_SHUT_RDWR);
> +            break;
> +        default:
> +            abort();
> +        }
> +    } while (ret == -EAGAIN);

I don't think it is acceptable to do this loop here. The gnutls_bye()
function triggers several I/O operations which could block. Looping
like this means we busy-wait, blocking this thread for as long as I/O
is blocking on the socket.

If we must call gnutls_bye(), then it needs to be done in a way that
can integrate with the main loop so it poll()'s / unblocks the current
coroutine/thread.  This makes the whole thing significantly more
complex to deal with, especially if the shutdown is being done in
cleanup paths which ordinarily are expected to execute without
blocking on I/O.  This is the big reason why i never made any attempt
to use gnutls_bye().


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] io: Support shutdown of TLS channel
  2020-03-27 16:40   ` Daniel P. Berrangé
@ 2020-03-27 17:29     ` Eric Blake
  2020-03-27 17:43       ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2020-03-27 17:29 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, qemu-block

On 3/27/20 11:40 AM, Daniel P. Berrangé wrote:
> On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:
>> Gnutls documents that while many apps simply yank out the underlying
>> transport at the end of communication in the name of efficiency, this
>> is indistinguishable from a malicious actor terminating the connection
>> prematurely.  Since our channel I/O code already supports the notion of
>> a graceful shutdown request, it is time to plumb that through to the
>> TLS layer, and wait for TLS to give the all clear before then
>> terminating traffic on the underlying channel.
>>
>> Note that channel-tls now always advertises shutdown support,
>> regardless of whether the underlying channel also has that support.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>   io/channel-tls.c | 27 ++++++++++++++++++++++++++-
>>   1 file changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/io/channel-tls.c b/io/channel-tls.c
>> index 7ec8ceff2f01..f90905823e1d 100644
>> --- a/io/channel-tls.c
>> +++ b/io/channel-tls.c
>> @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
>>                                       Error **errp)
>>   {
>>       QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
>> +    int ret = 0;
>>
>>       tioc->shutdown |= how;
>>
>> -    return qio_channel_shutdown(tioc->master, how, errp);
>> +    do {
>> +        switch (how) {
>> +        case QIO_CHANNEL_SHUTDOWN_READ:
>> +            /* No TLS counterpart */
>> +            break;
>> +        case QIO_CHANNEL_SHUTDOWN_WRITE:
>> +            ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
>> +            break;
>> +        case QIO_CHANNEL_SHUTDOWN_BOTH:
>> +            ret = qcrypto_tls_session_shutdown(tioc->session,
>> +                                               QCRYPTO_SHUT_RDWR);
>> +            break;
>> +        default:
>> +            abort();
>> +        }
>> +    } while (ret == -EAGAIN);
> 
> I don't think it is acceptable to do this loop here. The gnutls_bye()
> function triggers several I/O operations which could block. Looping
> like this means we busy-wait, blocking this thread for as long as I/O
> is blocking on the socket.

Hmm, good point.  Should we give qio_channel_tls_shutdown a bool 
parameter that says whether it should wait (good for use when we are 
being run in a coroutine and can deal with the additional I/O) or just 
fail with -EAGAIN (which the caller can ignore if it is not worried)?

> 
> If we must call gnutls_bye(), then it needs to be done in a way that
> can integrate with the main loop so it poll()'s / unblocks the current
> coroutine/thread.  This makes the whole thing significantly more
> complex to deal with, especially if the shutdown is being done in
> cleanup paths which ordinarily are expected to execute without
> blocking on I/O.  This is the big reason why i never made any attempt
> to use gnutls_bye().

We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which 
is indeed a cleanup path where not blocking is worthwhile) even without 
this patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) 
in the normal data path, where we are already using coroutines to manage 
callbacks, can benefit the remote endpoint, giving them a chance to see 
clean shutdown instead of abrupt termination.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent
  2020-03-27 16:35   ` Daniel P. Berrangé
@ 2020-03-27 17:42     ` Eric Blake
  2020-03-27 17:47       ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2020-03-27 17:42 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: Kevin Wolf, qemu-devel, qemu-block, Max Reitz

On 3/27/20 11:35 AM, Daniel P. Berrangé wrote:
> On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote:
>> Although the remote end should always be tolerant of a socket being
>> arbitrarily closed, there are situations where it is a lot easier if
>> the remote end can be guaranteed to read EOF even before the socket
>> has closed.  In particular, when using gnutls, if we fail to inform
>> the remote end about an impending teardown, the remote end cannot
>> distinguish between our closing the socket as intended vs. a malicious
>> intermediary interrupting things, and may result in spurious error
>> messages.
> 
> Does this actually matter in the NBD case ?
> 
> It has an explicit NBD command for requesting shutdown, and once
> that's processed, it is fine to just close the socket abruptly - I
> don't see a benefit to a TLS shutdown sequence on top.

You're right that the NBD protocol has ways for the client to advertise 
it will be shutting down, AND documents that the server must be robust 
to clients that just abruptly disconnect after that point.  But we don't 
have control over all such servers, and there may very well be a server 
that logs an error on abrupt closure, where it would be silent if we did 
a proper gnutls_bye.  Which is more important: maximum speed in 
disconnecting after we expressed intent, or maximum attempt at catering 
to all sorts of remote implementations that might not be as tolerant as 
qemu is of an abrupt termination?

> AFAIK, the TLS level clean shutdown is only required if the
> application protocol does not have any way to determine an
> unexpected shutdown itself.

'man gnutls_bye' states:

        Note that not all implementations will properly terminate a TLS 
connec‐
        tion.   Some  of  them, usually for performance reasons, will 
terminate
        only the  underlying  transport  layer,  and  thus  not 
distinguishing
        between  a  malicious  party prematurely terminating the 
connection and
        normal termination.

You're right that because the protocol has an explicit message, we can 
reliably distinguish any early termination prior to 
NBD_OPT_ABORT/NBD_CMD_DISC as being malicious, so the only case where it 
matters is if we have a premature termination after we asked for clean 
shutdown, at which point a malicious termination didn't lose any data. 
So on that front, I guess you are right that not using gnutls_bye isn't 
going to have much impact.

> 
> This is relevant for HTTP where the connection data stream may not
> have a well defined end condition.
> 
> In the NBD case though, we have an explicit NBD_CMD_DISC to trigger
> the disconnect. After processing that message, an EOF is acceptable
> regardless of whether ,
> before processing that message, any EOF is a unexpected.
> 
>>            Or, we can end up with a deadlock where both ends are stuck
>> on a read() from the other end but neither gets an EOF.
> 
> If the socket has been closed abruptly why would it get stuck in
> read() - it should see EOF surely ?

That's what I'm trying to figure out: the nbdkit testsuite definitely 
hung even though 'qemu-nbd --list' exited, but I haven't yet figured out 
whether the bug lies in nbdkit proper or in libnbd, nor whether a 
cleaner tls shutdown would have prevented the hang in a more reliable 
manner. https://www.redhat.com/archives/libguestfs/2020-March/msg00191.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] io: Support shutdown of TLS channel
  2020-03-27 17:29     ` Eric Blake
@ 2020-03-27 17:43       ` Daniel P. Berrangé
  2020-03-27 18:46         ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Daniel P. Berrangé @ 2020-03-27 17:43 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block

On Fri, Mar 27, 2020 at 12:29:39PM -0500, Eric Blake wrote:
> On 3/27/20 11:40 AM, Daniel P. Berrangé wrote:
> > On Fri, Mar 27, 2020 at 11:19:35AM -0500, Eric Blake wrote:
> > > Gnutls documents that while many apps simply yank out the underlying
> > > transport at the end of communication in the name of efficiency, this
> > > is indistinguishable from a malicious actor terminating the connection
> > > prematurely.  Since our channel I/O code already supports the notion of
> > > a graceful shutdown request, it is time to plumb that through to the
> > > TLS layer, and wait for TLS to give the all clear before then
> > > terminating traffic on the underlying channel.
> > > 
> > > Note that channel-tls now always advertises shutdown support,
> > > regardless of whether the underlying channel also has that support.
> > > 
> > > Signed-off-by: Eric Blake <eblake@redhat.com>
> > > ---
> > >   io/channel-tls.c | 27 ++++++++++++++++++++++++++-
> > >   1 file changed, 26 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/io/channel-tls.c b/io/channel-tls.c
> > > index 7ec8ceff2f01..f90905823e1d 100644
> > > --- a/io/channel-tls.c
> > > +++ b/io/channel-tls.c
> > > @@ -360,10 +360,35 @@ static int qio_channel_tls_shutdown(QIOChannel *ioc,
> > >                                       Error **errp)
> > >   {
> > >       QIOChannelTLS *tioc = QIO_CHANNEL_TLS(ioc);
> > > +    int ret = 0;
> > > 
> > >       tioc->shutdown |= how;
> > > 
> > > -    return qio_channel_shutdown(tioc->master, how, errp);
> > > +    do {
> > > +        switch (how) {
> > > +        case QIO_CHANNEL_SHUTDOWN_READ:
> > > +            /* No TLS counterpart */
> > > +            break;
> > > +        case QIO_CHANNEL_SHUTDOWN_WRITE:
> > > +            ret = qcrypto_tls_session_shutdown(tioc->session, QCRYPTO_SHUT_WR);
> > > +            break;
> > > +        case QIO_CHANNEL_SHUTDOWN_BOTH:
> > > +            ret = qcrypto_tls_session_shutdown(tioc->session,
> > > +                                               QCRYPTO_SHUT_RDWR);
> > > +            break;
> > > +        default:
> > > +            abort();
> > > +        }
> > > +    } while (ret == -EAGAIN);
> > 
> > I don't think it is acceptable to do this loop here. The gnutls_bye()
> > function triggers several I/O operations which could block. Looping
> > like this means we busy-wait, blocking this thread for as long as I/O
> > is blocking on the socket.
> 
> Hmm, good point.  Should we give qio_channel_tls_shutdown a bool parameter
> that says whether it should wait (good for use when we are being run in a
> coroutine and can deal with the additional I/O) or just fail with -EAGAIN
> (which the caller can ignore if it is not worried)?

A bool would not be sufficient, because the caller would need to know
which direction to wait for I/O on.

Looking at the code it does a flush of outstanding data, then sends
some bytes, and then receives some bytes. The write will probably
work most of the time, but the receive is almost always going to
return -EAGAIN. So I don't think that failing with EGAIN is going
to be much of a benefit.

> > If we must call gnutls_bye(), then it needs to be done in a way that
> > can integrate with the main loop so it poll()'s / unblocks the current
> > coroutine/thread.  This makes the whole thing significantly more
> > complex to deal with, especially if the shutdown is being done in
> > cleanup paths which ordinarily are expected to execute without
> > blocking on I/O.  This is the big reason why i never made any attempt
> > to use gnutls_bye().
> 
> We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is

Are you sure ?  AFAIK there is no existing usage of gnutls_bye() at all
in QEMU.

> indeed a cleanup path where not blocking is worthwhile) even without this
> patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the
> normal data path, where we are already using coroutines to manage callbacks,
> can benefit the remote endpoint, giving them a chance to see clean shutdown
> instead of abrupt termination.

I'm not convinced the clean shutdown at the TLS level does anything important
given that we already have done a clean shutdown at the NBD level.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent
  2020-03-27 17:42     ` Eric Blake
@ 2020-03-27 17:47       ` Daniel P. Berrangé
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel P. Berrangé @ 2020-03-27 17:47 UTC (permalink / raw)
  To: Eric Blake; +Cc: Kevin Wolf, qemu-devel, qemu-block, Max Reitz

On Fri, Mar 27, 2020 at 12:42:21PM -0500, Eric Blake wrote:
> On 3/27/20 11:35 AM, Daniel P. Berrangé wrote:
> > On Fri, Mar 27, 2020 at 11:19:36AM -0500, Eric Blake wrote:
> > > Although the remote end should always be tolerant of a socket being
> > > arbitrarily closed, there are situations where it is a lot easier if
> > > the remote end can be guaranteed to read EOF even before the socket
> > > has closed.  In particular, when using gnutls, if we fail to inform
> > > the remote end about an impending teardown, the remote end cannot
> > > distinguish between our closing the socket as intended vs. a malicious
> > > intermediary interrupting things, and may result in spurious error
> > > messages.
> > 
> > Does this actually matter in the NBD case ?
> > 
> > It has an explicit NBD command for requesting shutdown, and once
> > that's processed, it is fine to just close the socket abruptly - I
> > don't see a benefit to a TLS shutdown sequence on top.
> 
> You're right that the NBD protocol has ways for the client to advertise it
> will be shutting down, AND documents that the server must be robust to
> clients that just abruptly disconnect after that point.  But we don't have
> control over all such servers, and there may very well be a server that logs
> an error on abrupt closure, where it would be silent if we did a proper
> gnutls_bye.  Which is more important: maximum speed in disconnecting after
> we expressed intent, or maximum attempt at catering to all sorts of remote
> implementations that might not be as tolerant as qemu is of an abrupt
> termination?

It is the cost / benefit tradeoff here that matters. Correctly using
gnutls_bye(), in contexts which aren't expected to block is non-trivial
bringing notable extra code complexity. It isn't an obvious win to me
for something that just changes an error message for a scenario that
can already be cleanly handled at the application protocol level.

> 
> > AFAIK, the TLS level clean shutdown is only required if the
> > application protocol does not have any way to determine an
> > unexpected shutdown itself.
> 
> 'man gnutls_bye' states:
> 
>        Note that not all implementations will properly terminate a TLS
> connec‐
>        tion.   Some  of  them, usually for performance reasons, will
> terminate
>        only the  underlying  transport  layer,  and  thus  not
> distinguishing
>        between  a  malicious  party prematurely terminating the connection
> and
>        normal termination.
> 
> You're right that because the protocol has an explicit message, we can
> reliably distinguish any early termination prior to
> NBD_OPT_ABORT/NBD_CMD_DISC as being malicious, so the only case where it
> matters is if we have a premature termination after we asked for clean
> shutdown, at which point a malicious termination didn't lose any data. So on
> that front, I guess you are right that not using gnutls_bye isn't going to
> have much impact.
> 
> > 
> > This is relevant for HTTP where the connection data stream may not
> > have a well defined end condition.
> > 
> > In the NBD case though, we have an explicit NBD_CMD_DISC to trigger
> > the disconnect. After processing that message, an EOF is acceptable
> > regardless of whether ,
> > before processing that message, any EOF is a unexpected.
> > 
> > >            Or, we can end up with a deadlock where both ends are stuck
> > > on a read() from the other end but neither gets an EOF.
> > 
> > If the socket has been closed abruptly why would it get stuck in
> > read() - it should see EOF surely ?
> 
> That's what I'm trying to figure out: the nbdkit testsuite definitely hung
> even though 'qemu-nbd --list' exited, but I haven't yet figured out whether
> the bug lies in nbdkit proper or in libnbd, nor whether a cleaner tls
> shutdown would have prevented the hang in a more reliable manner.
> https://www.redhat.com/archives/libguestfs/2020-March/msg00191.html


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] nbd: Try for cleaner TLS shutdown
  2020-03-27 16:19 [PATCH 0/3] nbd: Try for cleaner TLS shutdown Eric Blake
                   ` (2 preceding siblings ...)
  2020-03-27 16:19 ` [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent Eric Blake
@ 2020-03-27 18:44 ` no-reply
  3 siblings, 0 replies; 15+ messages in thread
From: no-reply @ 2020-03-27 18:44 UTC (permalink / raw)
  To: eblake; +Cc: berrange, qemu-devel, qemu-block

Patchew URL: https://patchew.org/QEMU/20200327161936.2225989-1-eblake@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

io/channel-tls.o: In function `qio_channel_tls_shutdown':
/tmp/qemu-test/src/io/channel-tls.c:373: undefined reference to `qcrypto_tls_session_shutdown'
/tmp/qemu-test/src/io/channel-tls.c:376: undefined reference to `qcrypto_tls_session_shutdown'
collect2: error: ld returned 1 exit status
make: *** [scsi/qemu-pr-helper] Error 1
make: *** Waiting for unfinished jobs....
io/channel-tls.o: In function `qio_channel_tls_shutdown':
/tmp/qemu-test/src/io/channel-tls.c:373: undefined reference to `qcrypto_tls_session_shutdown'
/tmp/qemu-test/src/io/channel-tls.c:376: undefined reference to `qcrypto_tls_session_shutdown'
collect2: error: ld returned 1 exit status
make: *** [qemu-nbd] Error 1
io/channel-tls.o: In function `qio_channel_tls_shutdown':
/tmp/qemu-test/src/io/channel-tls.c:373: undefined reference to `qcrypto_tls_session_shutdown'
/tmp/qemu-test/src/io/channel-tls.c:376: undefined reference to `qcrypto_tls_session_shutdown'
collect2: error: ld returned 1 exit status
make: *** [qemu-io] Error 1
io/channel-tls.o: In function `qio_channel_tls_shutdown':
/tmp/qemu-test/src/io/channel-tls.c:373: undefined reference to `qcrypto_tls_session_shutdown'
/tmp/qemu-test/src/io/channel-tls.c:376: undefined reference to `qcrypto_tls_session_shutdown'
collect2: error: ld returned 1 exit status
make: *** [qemu-storage-daemon] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=89d889bd9dd84a6592e526b967f6a8cc', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-xtkcxxmy/src/docker-src.2020-03-27-14.42.04.29238:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=89d889bd9dd84a6592e526b967f6a8cc
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-xtkcxxmy/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    2m13.487s
user    0m8.419s


The full log is available at
http://patchew.org/logs/20200327161936.2225989-1-eblake@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] io: Support shutdown of TLS channel
  2020-03-27 17:43       ` Daniel P. Berrangé
@ 2020-03-27 18:46         ` Eric Blake
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Blake @ 2020-03-27 18:46 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, qemu-block

On 3/27/20 12:43 PM, Daniel P. Berrangé wrote:

>>> I don't think it is acceptable to do this loop here. The gnutls_bye()
>>> function triggers several I/O operations which could block. Looping
>>> like this means we busy-wait, blocking this thread for as long as I/O
>>> is blocking on the socket.
>>
>> Hmm, good point.  Should we give qio_channel_tls_shutdown a bool parameter
>> that says whether it should wait (good for use when we are being run in a
>> coroutine and can deal with the additional I/O) or just fail with -EAGAIN
>> (which the caller can ignore if it is not worried)?
> 
> A bool would not be sufficient, because the caller would need to know
> which direction to wait for I/O on.
> 
> Looking at the code it does a flush of outstanding data, then sends
> some bytes, and then receives some bytes. The write will probably
> work most of the time, but the receive is almost always going to
> return -EAGAIN. So I don't think that failing with EGAIN is going
> to be much of a benefit.

Here, I'm guessing you're talking about gnutls lib/record.c.  But note: 
for GNUTLS_SHUT_WR, it only does _gnutls_io_write_flush and 
gnutls_alert_send; the use of _gnutls_recv_int is reserved for 
GNUTLS_SHUT_RDWR.  When informing the other end that you are not going 
to write any more, you don't have to wait for a reply.

> 
>>> If we must call gnutls_bye(), then it needs to be done in a way that
>>> can integrate with the main loop so it poll()'s / unblocks the current
>>> coroutine/thread.  This makes the whole thing significantly more
>>> complex to deal with, especially if the shutdown is being done in
>>> cleanup paths which ordinarily are expected to execute without
>>> blocking on I/O.  This is the big reason why i never made any attempt
>>> to use gnutls_bye().
>>
>> We _are_ using gnutls_bye(GNUTLS_SHUT_RDWR) on the close() path (which is
> 
> Are you sure ?  AFAIK there is no existing usage of gnutls_bye() at all
> in QEMU.

Oh, I'm confusing that with nbdkit, which does use 
gnutls_bye(GNUTLS_SHUT_RDWR) but does not wait for a response (at least, 
not yet).

> 
>> indeed a cleanup path where not blocking is worthwhile) even without this
>> patch, but the question is whether using gnutls_bye(GNUTLS_SHUT_WR) in the
>> normal data path, where we are already using coroutines to manage callbacks,
>> can benefit the remote endpoint, giving them a chance to see clean shutdown
>> instead of abrupt termination.
> 
> I'm not convinced the clean shutdown at the TLS level does anything important
> given that we already have done a clean shutdown at the NBD level.

Okay, then for now, I'll still try and see if I can fix the 
libnbd/nbdkit hang without relying on qemu sending a clean 
gnutls_bye(GNUTLS_SHUT_WR).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] crypto: Add qcrypto_tls_shutdown()
  2020-03-27 16:19 ` [PATCH 1/3] crypto: Add qcrypto_tls_shutdown() Eric Blake
@ 2020-03-31  8:30   ` Markus Armbruster
  2020-03-31 15:17     ` Eric Blake
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Armbruster @ 2020-03-31  8:30 UTC (permalink / raw)
  To: Eric Blake; +Cc: berrange, qemu-devel, qemu-block

Eric Blake <eblake@redhat.com> writes:

> Gnutls documents that applications that want to distinguish between a
> clean end-of-communication and a malicious client abruptly tearing the
> underlying transport out of under our feet need to use gnutls_bye().
> Our channel code is already set up to allow shutdown requests, but we
> weren't forwarding those to gnutls.  To make that work, we first need
> a new entry point that can isolate the rest of our code from the
> gnutls interface.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  qapi/crypto.json            | 15 +++++++++++++++
>  include/crypto/tlssession.h | 24 ++++++++++++++++++++++++
>  crypto/tlssession.c         | 27 +++++++++++++++++++++++++++
>  3 files changed, 66 insertions(+)
>
> diff --git a/qapi/crypto.json b/qapi/crypto.json
> index b2a4cff683ff..1df0f4502885 100644
> --- a/qapi/crypto.json
> +++ b/qapi/crypto.json
> @@ -119,6 +119,21 @@
>    'prefix': 'QCRYPTO_IVGEN_ALG',
>    'data': ['plain', 'plain64', 'essiv']}
>
> +##
> +# @QCryptoShutdownMode:
> +#
> +# The supported modes for requesting shutdown of a crypto
> +# communication channel.
> +#
> +# @shut-wr: No more writes will be sent, but the remote end can still send
> +#           data to be read.
> +# @shut-rdwr: No more reads or writes should occur.
> +# Since: 5.1
> +##
> +{ 'enum': 'QCryptoShutdownMode',
> +  'prefix': 'QCRYPTO',
> +  'data': ['shut-wr', 'shut-rdwr']}
> +
>  ##
>  # @QCryptoBlockFormat:
>  #
> diff --git a/include/crypto/tlssession.h b/include/crypto/tlssession.h
> index 15b9cef086cc..10c670e3b6a2 100644
> --- a/include/crypto/tlssession.h
> +++ b/include/crypto/tlssession.h
> @@ -321,4 +321,28 @@ int qcrypto_tls_session_get_key_size(QCryptoTLSSession *sess,
>   */
>  char *qcrypto_tls_session_get_peer_name(QCryptoTLSSession *sess);
>
> +/**
> + * qcrypto_tls_shutdown:
> + * @sess: the TLS session object
> + * @how: the desired shutdown mode
> + *
> + * Prepare to terminate the session.  If @how is QCRYPTO_SHUT_WR, this
> + * side will no longer write data, but should still process reads
> + * until EOF; if @how is QCRYPTO_SHUT_RDWR, then the entire session
> + * should shut down.  Use of this function is optional, since closing
> + * the session implies QCRYPTO_SHUT_RDWR.  However, using this
> + * function prior to terminating the underlying transport layer makes
> + * it possible for the remote endpoint to distinguish between a
> + * malicious party prematurely terminating the the connection and
> + * normal termination.
> + *
> + * This function should only be used after a successful
> + * qcrypto_tls_session_handshake().
> + *
> + * Returns: 0 for success, or -EAGAIN if more underlying I/O is
> + * required to finish proper session shutdown.
> + */
> +int qcrypto_tls_session_shutdown(QCryptoTLSSession *sess,
> +                                 QCryptoShutdownMode how);
> +
>  #endif /* QCRYPTO_TLSSESSION_H */
> diff --git a/crypto/tlssession.c b/crypto/tlssession.c
> index 33203e8ca711..903301189069 100644
> --- a/crypto/tlssession.c
> +++ b/crypto/tlssession.c
> @@ -521,6 +521,33 @@ qcrypto_tls_session_get_handshake_status(QCryptoTLSSession *session)
>  }
>
>
> +int qcrypto_tls_session_shutdown(QCryptoTLSSession *session,
> +                                 QCryptoShutdownMode how)
> +{
> +    gnutls_close_request_t mode;
> +    int ret;
> +
> +    assert(session->handshakeComplete);

Suggest a blank line here.

> +    switch (how) {
> +    case QCRYPTO_SHUT_WR:
> +        mode = GNUTLS_SHUT_WR;
> +        break;
> +    case QCRYPTO_SHUT_RDWR:
> +        mode = GNUTLS_SHUT_RDWR;
> +        break;
> +    default:
> +        abort();
> +    }
> +
> +    ret = gnutls_bye(session->handle, mode);
> +    if (ret == GNUTLS_E_INTERRUPTED ||
> +        ret == GNUTLS_E_AGAIN) {
> +        return -EAGAIN;
> +    }
> +    return 0;
> +}
> +
> +
>  int
>  qcrypto_tls_session_get_key_size(QCryptoTLSSession *session,
>                                   Error **errp)

This is a thin wrapper around gnutls_bye().  I understand this is an
abstraction layer backed by GnuTLS.  Not sure abstracting from just one
concrete thing is a good idea, but that's way out of scope here.

In scope: why do you need QCryptoShutdownMode be a QAPI type?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] crypto: Add qcrypto_tls_shutdown()
  2020-03-31  8:30   ` Markus Armbruster
@ 2020-03-31 15:17     ` Eric Blake
  2020-03-31 15:33       ` Daniel P. Berrangé
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Blake @ 2020-03-31 15:17 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: berrange, qemu-devel, qemu-block

On 3/31/20 3:30 AM, Markus Armbruster wrote:
> Eric Blake <eblake@redhat.com> writes:
> 
>> Gnutls documents that applications that want to distinguish between a
>> clean end-of-communication and a malicious client abruptly tearing the
>> underlying transport out of under our feet need to use gnutls_bye().
>> Our channel code is already set up to allow shutdown requests, but we
>> weren't forwarding those to gnutls.  To make that work, we first need
>> a new entry point that can isolate the rest of our code from the
>> gnutls interface.
>>

>> +int qcrypto_tls_session_shutdown(QCryptoTLSSession *session,
>> +                                 QCryptoShutdownMode how)

> 
> This is a thin wrapper around gnutls_bye().  I understand this is an
> abstraction layer backed by GnuTLS.  Not sure abstracting from just one
> concrete thing is a good idea, but that's way out of scope here.

If we ever add an alternative TLS implementation to gnutls, then the 
abstraction is useful.  But I'm not sure how likely that is, so maybe 
Dan has more insight why he chose this design originally.

> 
> In scope: why do you need QCryptoShutdownMode be a QAPI type?

I don't, other than the fact that other TLS parameters were also QAPI 
types (such as QCryptoTLSCredsEndpoint).

But that may be moot, as Dan argued that this series adds more 
complexity than it is worth (I originally wrote it while trying to debug 
an nbdkit bug; but in the meantime, I have fixed the nbdkit bug without 
any change to qemu behavior).  So at this point, I will probably not be 
posting a v2 of this series.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] crypto: Add qcrypto_tls_shutdown()
  2020-03-31 15:17     ` Eric Blake
@ 2020-03-31 15:33       ` Daniel P. Berrangé
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel P. Berrangé @ 2020-03-31 15:33 UTC (permalink / raw)
  To: Eric Blake; +Cc: Markus Armbruster, qemu-block, qemu-devel

On Tue, Mar 31, 2020 at 10:17:49AM -0500, Eric Blake wrote:
> On 3/31/20 3:30 AM, Markus Armbruster wrote:
> > Eric Blake <eblake@redhat.com> writes:
> > 
> > > Gnutls documents that applications that want to distinguish between a
> > > clean end-of-communication and a malicious client abruptly tearing the
> > > underlying transport out of under our feet need to use gnutls_bye().
> > > Our channel code is already set up to allow shutdown requests, but we
> > > weren't forwarding those to gnutls.  To make that work, we first need
> > > a new entry point that can isolate the rest of our code from the
> > > gnutls interface.
> > > 
> 
> > > +int qcrypto_tls_session_shutdown(QCryptoTLSSession *session,
> > > +                                 QCryptoShutdownMode how)
> 
> > 
> > This is a thin wrapper around gnutls_bye().  I understand this is an
> > abstraction layer backed by GnuTLS.  Not sure abstracting from just one
> > concrete thing is a good idea, but that's way out of scope here.
> 
> If we ever add an alternative TLS implementation to gnutls, then the
> abstraction is useful.  But I'm not sure how likely that is, so maybe Dan
> has more insight why he chose this design originally.

The abstraction serves several purposes.

First, it means that we don't need #ifdefs wrt GNUTLS in every piece of
QEMU code that involves TLS. They are isolated in the crypto/ code only.

Related to that, it means that anything that touches GNUTLS APIs directly
gets funnelled via the crypto maintainer for review.

It is easy to mis-use many of the GNUTLS APIs, and so the abstraction
serves to apply/enforce a more desirable usage policy on the rest of
the QEMU code, making it harder to screw up TLS.

Much of this is based on learning from libvirt code where the usage of
GNUTLS was not nearly so well encapsulated and increased burden.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-03-31 15:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 16:19 [PATCH 0/3] nbd: Try for cleaner TLS shutdown Eric Blake
2020-03-27 16:19 ` [PATCH 1/3] crypto: Add qcrypto_tls_shutdown() Eric Blake
2020-03-31  8:30   ` Markus Armbruster
2020-03-31 15:17     ` Eric Blake
2020-03-31 15:33       ` Daniel P. Berrangé
2020-03-27 16:19 ` [PATCH 2/3] io: Support shutdown of TLS channel Eric Blake
2020-03-27 16:40   ` Daniel P. Berrangé
2020-03-27 17:29     ` Eric Blake
2020-03-27 17:43       ` Daniel P. Berrangé
2020-03-27 18:46         ` Eric Blake
2020-03-27 16:19 ` [PATCH 3/3] nbd: Use shutdown(SHUT_WR) after last item sent Eric Blake
2020-03-27 16:35   ` Daniel P. Berrangé
2020-03-27 17:42     ` Eric Blake
2020-03-27 17:47       ` Daniel P. Berrangé
2020-03-27 18:44 ` [PATCH 0/3] nbd: Try for cleaner TLS shutdown no-reply

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.