All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Thomas Huth" <thuth@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Eduardo Habkost" <ehabkost@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Max Reitz" <mreitz@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	qemu-block@nongnu.org, "Kevin Wolf" <kwolf@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	"Lukáš Doktor" <ldoktor@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Eric Blake" <eblake@redhat.com>
Subject: [Qemu-devel] [PULL 1/9] iothread: fix iothread hang when stop too soon
Date: Mon, 11 Feb 2019 13:50:32 +0800	[thread overview]
Message-ID: <20190211055040.13528-2-stefanha@redhat.com> (raw)
In-Reply-To: <20190211055040.13528-1-stefanha@redhat.com>

From: Peter Xu <peterx@redhat.com>

Lukas reported an hard to reproduce QMP iothread hang on s390 that
QEMU might hang at pthread_join() of the QMP monitor iothread before
quitting:

  Thread 1
  #0  0x000003ffad10932c in pthread_join
  #1  0x0000000109e95750 in qemu_thread_join
      at /home/thuth/devel/qemu/util/qemu-thread-posix.c:570
  #2  0x0000000109c95a1c in iothread_stop
  #3  0x0000000109bb0874 in monitor_cleanup
  #4  0x0000000109b55042 in main

While the iothread is still in the main loop:

  Thread 4
  #0  0x000003ffad0010e4 in ??
  #1  0x000003ffad553958 in g_main_context_iterate.isra.19
  #2  0x000003ffad553d90 in g_main_loop_run
  #3  0x0000000109c9585a in iothread_run
      at /home/thuth/devel/qemu/iothread.c:74
  #4  0x0000000109e94752 in qemu_thread_start
      at /home/thuth/devel/qemu/util/qemu-thread-posix.c:502
  #5  0x000003ffad10825a in start_thread
  #6  0x000003ffad00dcf2 in thread_start

IMHO it's because there's a race between the main thread and iothread
when stopping the thread in following sequence:

    main thread                       iothread
    ===========                       ==============
                                      aio_poll()
    iothread_get_g_main_context
      set iothread->worker_context
    iothread_stop
      schedule iothread_stop_bh
                                        execute iothread_stop_bh [1]
                                          set iothread->running=false
                                          (since main_loop==NULL so
                                           skip to quit main loop.
                                           Note: although main_loop is
                                           NULL but worker_context is
                                           not!)
                                      atomic_read(&iothread->worker_context) [2]
                                        create main_loop object
                                        g_main_loop_run() [3]
    pthread_join() [4]

We can see that when execute iothread_stop_bh() at [1] it's possible
that main_loop is still NULL because it's only created until the first
check of the worker_context later at [2].  Then the iothread will hang
in the main loop [3] and it'll starve the main thread too [4].

Here the simple solution should be that we check again the "running"
variable before check against worker_context.

CC: Thomas Huth <thuth@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Lukáš Doktor <ldoktor@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-id: 20190129051432.22023-1-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 iothread.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/iothread.c b/iothread.c
index 2fb1cdf55d..e615b7ae52 100644
--- a/iothread.c
+++ b/iothread.c
@@ -63,7 +63,11 @@ static void *iothread_run(void *opaque)
     while (iothread->running) {
         aio_poll(iothread->ctx, true);
 
-        if (atomic_read(&iothread->worker_context)) {
+        /*
+         * We must check the running state again in case it was
+         * changed in previous aio_poll()
+         */
+        if (iothread->running && atomic_read(&iothread->worker_context)) {
             GMainLoop *loop;
 
             g_main_context_push_thread_default(iothread->worker_context);
-- 
2.20.1

  reply	other threads:[~2019-02-11  5:56 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11  5:50 [Qemu-devel] [PULL 0/9] Block patches Stefan Hajnoczi
2019-02-11  5:50 ` Stefan Hajnoczi [this message]
2019-02-11  5:50 ` [Qemu-devel] [PULL 2/9] qemugdb/coroutine: fix arch_prctl has unknown return type Stefan Hajnoczi
2019-02-11  5:50 ` [Qemu-devel] [PULL 3/9] virtio-blk: cleanup using VirtIOBlock *s and VirtIODevice *vdev Stefan Hajnoczi
2019-02-11  5:50 ` [Qemu-devel] [PULL 4/9] virtio-blk: add acct_failed param to virtio_blk_handle_rw_error() Stefan Hajnoczi
2019-02-11  5:50 ` [Qemu-devel] [PULL 5/9] virtio-blk: add host_features field in VirtIOBlock Stefan Hajnoczi
2019-02-11  5:50 ` [Qemu-devel] [PULL 6/9] virtio-blk: add "discard" and "write-zeroes" properties Stefan Hajnoczi
2019-02-11  5:50 ` [Qemu-devel] [PULL 7/9] virtio-blk: add DISCARD and WRITE_ZEROES features Stefan Hajnoczi
2019-02-11  5:50 ` [Qemu-devel] [PULL 8/9] tests/virtio-blk: change assert on data_size in virtio_blk_request() Stefan Hajnoczi
2019-02-11  5:50 ` [Qemu-devel] [PULL 9/9] tests/virtio-blk: add test for WRITE_ZEROES command Stefan Hajnoczi
2019-02-11 11:42 ` [Qemu-devel] [PULL 0/9] Block patches Peter Maydell
2019-02-12  3:51   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-02-12  8:11     ` Stefano Garzarella
2019-02-12  9:49     ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190211055040.13528-2-stefanha@redhat.com \
    --to=stefanha@redhat.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=ldoktor@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.