All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition
@ 2019-02-22 11:59 Alberto Garcia
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, Paolo Bonzini, Daniel P . Berrangé,
	Marc-André Lureau

This fixes a race condition in which the tcp_chr_read() ioc handler
can close a connection that is being written to from another thread.

This is essentially v1 rebased on top of the current master, after
Daniel and Marc-André's chardev series have been merged.

Note: vhost-user-test still fails if QTEST_VHOST_USER_FIXME is set.

Berto

RFC: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg01510.html

v1: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg01834.html
- Fixes memory leaks and adds a qemu_idle_add() function

v2:
- Rebased on top of the current master (fc3dbb90f2eb069801bfb4cfe9cbc)
- Patches 1 and 2: Remove the changes in char-pty.c, they're not
                   needed after the rebase.
- Patch 3: Fix conflicts after the rebase.

git backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/3:[down] 'main-loop: Fix GSource leak in qio_task_thread_worker()'
002/3:[0027] [FC] 'main-loop: Add qemu_idle_add()'
003/3:[0004] [FC] 'char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()'

Alberto Garcia (3):
  main-loop: Fix GSource leak in qio_task_thread_worker()
  main-loop: Add qemu_idle_add()
  char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()

 chardev/char-socket.c    | 19 +++++++++++++++++--
 include/qemu/main-loop.h | 12 ++++++++++++
 io/task.c                |  9 +++------
 util/main-loop.c         |  9 +++++++++
 4 files changed, 41 insertions(+), 8 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker()
  2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
@ 2019-02-22 11:59 ` Alberto Garcia
  2019-02-22 12:08   ` Daniel P. Berrangé
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add() Alberto Garcia
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, Paolo Bonzini, Daniel P . Berrangé,
	Marc-André Lureau

After g_source_attach() the GMainContext holds a reference to the
GSource, so the caller does not need to keep it.

qio_task_thread_worker() is not releasing its reference so the GSource
is being leaked since a17536c594bfed94d05667b419f747b692f5fc7f.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 io/task.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/io/task.c b/io/task.c
index 64c4c7126a..1ae7b86488 100644
--- a/io/task.c
+++ b/io/task.c
@@ -136,6 +136,7 @@ static gpointer qio_task_thread_worker(gpointer opaque)
                           qio_task_thread_result, task, NULL);
     g_source_attach(task->thread->completion,
                     task->thread->context);
+    g_source_unref(task->thread->completion);
     trace_qio_task_thread_source_attach(task, task->thread->completion);
 
     qemu_cond_signal(&task->thread_cond);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add()
  2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
@ 2019-02-22 11:59 ` Alberto Garcia
  2019-02-22 12:10   ` Daniel P. Berrangé
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout() Alberto Garcia
  2019-02-22 12:36 ` [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Paolo Bonzini
  3 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, Paolo Bonzini, Daniel P . Berrangé,
	Marc-André Lureau

This works like g_idle_add() but allows specifying a different
GMainContext. It also returns the GSource directly instead of its ID
number for convenience.

qio_task_thread_worker() is modified to make use of this new function.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 include/qemu/main-loop.h | 12 ++++++++++++
 io/task.c                | 10 +++-------
 util/main-loop.c         |  9 +++++++++
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index f6ba78ea73..f966c015d0 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -295,6 +295,18 @@ void qemu_mutex_lock_iothread_impl(const char *file, int line);
  */
 void qemu_mutex_unlock_iothread(void);
 
+/**
+ * qemu_idle_add: Add an idle function to a GMainContext
+ *
+ * This function is similar to GLib's g_idle_add() but it allows
+ * specifying a GMainContext instead of using the global one.
+ *
+ * The returned GSource is owned by the GMainContext and is deleted
+ * automatically after @func returns false. It can also be deleted by
+ * removing it from the GMainContext using g_source_destroy().
+ */
+GSource *qemu_idle_add(GSourceFunc func, gpointer data, GMainContext *ctx);
+
 /* internal interfaces */
 
 void qemu_fd_register(int fd);
diff --git a/io/task.c b/io/task.c
index 1ae7b86488..16754e22ba 100644
--- a/io/task.c
+++ b/io/task.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "io/task.h"
 #include "qapi/error.h"
 #include "qemu/thread.h"
@@ -130,13 +131,8 @@ static gpointer qio_task_thread_worker(gpointer opaque)
     trace_qio_task_thread_exit(task);
 
     qemu_mutex_lock(&task->thread_lock);
-
-    task->thread->completion = g_idle_source_new();
-    g_source_set_callback(task->thread->completion,
-                          qio_task_thread_result, task, NULL);
-    g_source_attach(task->thread->completion,
-                    task->thread->context);
-    g_source_unref(task->thread->completion);
+    task->thread->completion = qemu_idle_add(qio_task_thread_result, task,
+                                             task->thread->context);
     trace_qio_task_thread_source_attach(task, task->thread->completion);
 
     qemu_cond_signal(&task->thread_cond);
diff --git a/util/main-loop.c b/util/main-loop.c
index d4a521caeb..e23eda68cc 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -171,6 +171,15 @@ int qemu_init_main_loop(Error **errp)
     return 0;
 }
 
+GSource *qemu_idle_add(GSourceFunc func, gpointer data, GMainContext *ctx)
+{
+    GSource *idle = g_idle_source_new();
+    g_source_set_callback(idle, func, data, NULL);
+    g_source_attach(idle, ctx);
+    g_source_unref(idle);
+    return idle;
+}
+
 static int max_priority;
 
 #ifndef _WIN32
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()
  2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add() Alberto Garcia
@ 2019-02-22 11:59 ` Alberto Garcia
  2019-02-22 12:16   ` Daniel P. Berrangé
  2019-02-22 12:36 ` [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Paolo Bonzini
  3 siblings, 1 reply; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 11:59 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, Paolo Bonzini, Daniel P . Berrangé,
	Marc-André Lureau

There's a race condition in which the tcp_chr_read() ioc handler can
close a connection that is being written to from another thread.

Running iotest 136 in a loop triggers this problem and crashes QEMU.

 (gdb) bt
 #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
 #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
 #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
 #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
 #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
 #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
 #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
 #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
 #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
 #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
 #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
 #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
 #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
 #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
 #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
 #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
 #16 0x00005558b819140d in main_loop () at vl.c:1924
 #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665

This patch adds a lock to protect tcp_chr_disconnect() and
socket_reconnect_timeout()

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 chardev/char-socket.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 4fcdd8aedd..67c2eeac6d 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -172,7 +172,9 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
 
         if (ret < 0 && errno != EAGAIN) {
             if (tcp_chr_read_poll(chr) <= 0) {
+                qemu_mutex_unlock(&chr->chr_write_lock);
                 tcp_chr_disconnect(chr);
+                qemu_mutex_lock(&chr->chr_write_lock);
                 return len;
             } /* else let the read handler finish it properly */
         }
@@ -464,6 +466,13 @@ static void update_disconnected_filename(SocketChardev *s)
     }
 }
 
+static gboolean tcp_chr_be_event_closed(gpointer opaque)
+{
+    Chardev *chr = opaque;
+    qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
+    return FALSE;
+}
+
 /* NB may be called even if tcp_chr_connect has not been
  * reached, due to TLS or telnet initialization failure,
  * so can *not* assume s->state == TCP_CHARDEV_STATE_CONNECTED
@@ -471,7 +480,10 @@ static void update_disconnected_filename(SocketChardev *s)
 static void tcp_chr_disconnect(Chardev *chr)
 {
     SocketChardev *s = SOCKET_CHARDEV(chr);
-    bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
+    bool emit_close;
+
+    qemu_mutex_lock(&chr->chr_write_lock);
+    emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED;
 
     tcp_chr_free_connection(chr);
 
@@ -481,11 +493,12 @@ static void tcp_chr_disconnect(Chardev *chr)
     }
     update_disconnected_filename(s);
     if (emit_close) {
-        qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
+        qemu_idle_add(tcp_chr_be_event_closed, chr, chr->gcontext);
     }
     if (s->reconnect_time) {
         qemu_chr_socket_restart_timer(chr);
     }
+    qemu_mutex_unlock(&chr->chr_write_lock);
 }
 
 static gboolean tcp_chr_read(QIOChannel *chan, GIOCondition cond, void *opaque)
@@ -1128,8 +1141,10 @@ static gboolean socket_reconnect_timeout(gpointer opaque)
     Chardev *chr = CHARDEV(opaque);
     SocketChardev *s = SOCKET_CHARDEV(opaque);
 
+    qemu_mutex_lock(&chr->chr_write_lock);
     g_source_unref(s->reconnect_timer);
     s->reconnect_timer = NULL;
+    qemu_mutex_unlock(&chr->chr_write_lock);
 
     if (chr->be_open) {
         return false;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker()
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
@ 2019-02-22 12:08   ` Daniel P. Berrangé
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel P. Berrangé @ 2019-02-22 12:08 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-devel, Paolo Bonzini, Marc-André Lureau

In $SUBJECT  s/main-loop/io/

On Fri, Feb 22, 2019 at 01:59:10PM +0200, Alberto Garcia wrote:
> After g_source_attach() the GMainContext holds a reference to the
> GSource, so the caller does not need to keep it.
> 
> qio_task_thread_worker() is not releasing its reference so the GSource
> is being leaked since a17536c594bfed94d05667b419f747b692f5fc7f.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  io/task.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/io/task.c b/io/task.c
> index 64c4c7126a..1ae7b86488 100644
> --- a/io/task.c
> +++ b/io/task.c
> @@ -136,6 +136,7 @@ static gpointer qio_task_thread_worker(gpointer opaque)
>                            qio_task_thread_result, task, NULL);
>      g_source_attach(task->thread->completion,
>                      task->thread->context);
> +    g_source_unref(task->thread->completion);
>      trace_qio_task_thread_source_attach(task, task->thread->completion);
>  
>      qemu_cond_signal(&task->thread_cond);

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
 
Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add()
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add() Alberto Garcia
@ 2019-02-22 12:10   ` Daniel P. Berrangé
  0 siblings, 0 replies; 9+ messages in thread
From: Daniel P. Berrangé @ 2019-02-22 12:10 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-devel, Paolo Bonzini, Marc-André Lureau

On Fri, Feb 22, 2019 at 01:59:11PM +0200, Alberto Garcia wrote:
> This works like g_idle_add() but allows specifying a different
> GMainContext. It also returns the GSource directly instead of its ID
> number for convenience.
> 
> qio_task_thread_worker() is modified to make use of this new function.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  include/qemu/main-loop.h | 12 ++++++++++++
>  io/task.c                | 10 +++-------
>  util/main-loop.c         |  9 +++++++++
>  3 files changed, 24 insertions(+), 7 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout() Alberto Garcia
@ 2019-02-22 12:16   ` Daniel P. Berrangé
  2019-02-22 12:32     ` Alberto Garcia
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel P. Berrangé @ 2019-02-22 12:16 UTC (permalink / raw)
  To: Alberto Garcia; +Cc: qemu-devel, Paolo Bonzini, Marc-André Lureau

On Fri, Feb 22, 2019 at 01:59:12PM +0200, Alberto Garcia wrote:
> There's a race condition in which the tcp_chr_read() ioc handler can
> close a connection that is being written to from another thread.
> 
> Running iotest 136 in a loop triggers this problem and crashes QEMU.
> 
>  (gdb) bt
>  #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
>  #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
>  #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
>  #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
>  #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
>  #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
>  #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
>  #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
>  #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
>  #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
>  #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
>  #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
>  #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
>  #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
>  #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
>  #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
>  #16 0x00005558b819140d in main_loop () at vl.c:1924
>  #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665
> 
> This patch adds a lock to protect tcp_chr_disconnect() and
> socket_reconnect_timeout()

Can you think of any way to test this in the unit tests ?  I can understand
if its too difficult but just curious if there's any viable option ?

> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 4fcdd8aedd..67c2eeac6d 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -172,7 +172,9 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>  
>          if (ret < 0 && errno != EAGAIN) {
>              if (tcp_chr_read_poll(chr) <= 0) {
> +                qemu_mutex_unlock(&chr->chr_write_lock);
>                  tcp_chr_disconnect(chr);
> +                qemu_mutex_lock(&chr->chr_write_lock);

The idea of unlock & relocking here, just so tcp_chr_disconnect can
also acquire locks, doesn't make me too happy. This is really an
anti-pattern to me.

I think we should rename the existing method to be
tcp_chr_disconnect_locked() and document that it must only
be called with the write lock held, and call that from
here.

Then introduce a new wrapper for all the other callers to
carry on using which does the locking

 static void tcp_chr_disconnect(Chardev *chr)
 {
    qemu_mutex_lock(&chr->chr_write_lock);
    tcp_chr_disconnect_locked(chr);
    qemu_mutex_unlock(&chr->chr_write_lock);
 }
  
Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout()
  2019-02-22 12:16   ` Daniel P. Berrangé
@ 2019-02-22 12:32     ` Alberto Garcia
  0 siblings, 0 replies; 9+ messages in thread
From: Alberto Garcia @ 2019-02-22 12:32 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel, Paolo Bonzini, Marc-André Lureau

On Fri 22 Feb 2019 01:16:57 PM CET, Daniel P. Berrangé wrote:
> On Fri, Feb 22, 2019 at 01:59:12PM +0200, Alberto Garcia wrote:
>> There's a race condition in which the tcp_chr_read() ioc handler can
>> close a connection that is being written to from another thread.
>> 
>> Running iotest 136 in a loop triggers this problem and crashes QEMU.
>> 
>>  (gdb) bt
>>  #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
>>  #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
>>  #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
>>  #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
>>  #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
>>  #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
>>  #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
>>  #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
>>  #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
>>  #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
>>  #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
>>  #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
>>  #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
>>  #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
>>  #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
>>  #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
>>  #16 0x00005558b819140d in main_loop () at vl.c:1924
>>  #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665
>> 
>> This patch adds a lock to protect tcp_chr_disconnect() and
>> socket_reconnect_timeout()
>
> Can you think of any way to test this in the unit tests ?  I can
> understand if its too difficult but just curious if there's any viable
> option ?

I haven't thought about it. If anyone has a suggestion of how to
reproduce it we can add a new test later.

>>          if (ret < 0 && errno != EAGAIN) {
>>              if (tcp_chr_read_poll(chr) <= 0) {
>> +                qemu_mutex_unlock(&chr->chr_write_lock);
>>                  tcp_chr_disconnect(chr);
>> +                qemu_mutex_lock(&chr->chr_write_lock);
>
> The idea of unlock & relocking here, just so tcp_chr_disconnect can
> also acquire locks, doesn't make me too happy. This is really an
> anti-pattern to me.

I'm not too happy about it either. I'll send a new version with
tcp_chr_disconnect_locked() as you suggest.

Berto

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition
  2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
                   ` (2 preceding siblings ...)
  2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout() Alberto Garcia
@ 2019-02-22 12:36 ` Paolo Bonzini
  3 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2019-02-22 12:36 UTC (permalink / raw)
  To: Alberto Garcia, qemu-devel
  Cc: Daniel P . Berrangé, Marc-André Lureau

On 22/02/19 12:59, Alberto Garcia wrote:
> This fixes a race condition in which the tcp_chr_read() ioc handler
> can close a connection that is being written to from another thread.
> 
> This is essentially v1 rebased on top of the current master, after
> Daniel and Marc-André's chardev series have been merged.
> 
> Note: vhost-user-test still fails if QTEST_VHOST_USER_FIXME is set.

Thanks.  I'll look into it next (this week most of the time I had set
aside was eaten debugging other chardev issues... it's a rathole :)).

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-02-22 12:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-22 11:59 [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Alberto Garcia
2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 1/3] main-loop: Fix GSource leak in qio_task_thread_worker() Alberto Garcia
2019-02-22 12:08   ` Daniel P. Berrangé
2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 2/3] main-loop: Add qemu_idle_add() Alberto Garcia
2019-02-22 12:10   ` Daniel P. Berrangé
2019-02-22 11:59 ` [Qemu-devel] [PATCH v2 3/3] char-socket: Lock tcp_chr_disconnect() and socket_reconnect_timeout() Alberto Garcia
2019-02-22 12:16   ` Daniel P. Berrangé
2019-02-22 12:32     ` Alberto Garcia
2019-02-22 12:36 ` [Qemu-devel] [PATCH v2 0/3] char-socket: Fix race condition Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.