All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Richard W.M. Jones" <rjones@redhat.com>
Subject: [Qemu-devel] [PULL 6/6] main-loop: Acquire main_context lock around os_host_main_loop_wait.
Date: Mon,  3 Apr 2017 21:44:09 +0200	[thread overview]
Message-ID: <20170403194409.21276-7-pbonzini@redhat.com> (raw)
In-Reply-To: <20170403194409.21276-1-pbonzini@redhat.com>

From: "Richard W.M. Jones" <rjones@redhat.com>

When running virt-rescue the serial console hangs from time to time.
Virt-rescue runs an ordinary Linux kernel "appliance", but there is
only a single idle process running inside, so the qemu main loop is
largely idle.  With virt-rescue >= 1.37 you may be able to observe the
hang by doing:

  $ virt-rescue -e ^] --scratch
  ><rescue> while true; do ls -l /usr/bin; done

The hang in virt-rescue can be resolved by pressing a key on the
serial console.

Possibly with the same root cause, we also observed hangs during very
early boot of regular Linux VMs with a serial console.  Those hangs
are extremely rare, but you may be able to observe them by running
this command on baremetal for a sufficiently long time:

  $ while libguestfs-test-tool -t 60 >& /tmp/log ; do echo -n . ; done

(Check in /tmp/log that the failure was caused by a hang during early
boot, and not some other reason)

During investigation of this bug, Paolo Bonzini wrote:

> glib is expecting QEMU to use g_main_context_acquire around accesses to
> GMainContext.  However QEMU is not doing that, instead it is taking its
> own mutex.  So we should add g_main_context_acquire and
> g_main_context_release in the two implementations of
> os_host_main_loop_wait; these should undo the effect of Frediano's
> glib patch.

This patch exactly implements Paolo's suggestion in that paragraph.

This fixes the serial console hang in my testing, across 3 different
physical machines (AMD, Intel Core i7 and Intel Xeon), over many hours
of automated testing.  I wasn't able to reproduce the early boot hangs
(but as noted above, these are extremely rare in any case).

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20170331205133.23906-1-rjones@redhat.com>
[Paolo: this is actually a glib bug: recent glib versions are also
expecting g_main_context_acquire around g_poll---but that is not
documented and probably not even intended].
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 util/main-loop.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/util/main-loop.c b/util/main-loop.c
index 4534c89..19cad6b 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -218,9 +218,12 @@ static void glib_pollfds_poll(void)
 
 static int os_host_main_loop_wait(int64_t timeout)
 {
+    GMainContext *context = g_main_context_default();
     int ret;
     static int spin_counter;
 
+    g_main_context_acquire(context);
+
     glib_pollfds_fill(&timeout);
 
     /* If the I/O thread is very busy or we are incorrectly busy waiting in
@@ -256,6 +259,9 @@ static int os_host_main_loop_wait(int64_t timeout)
     }
 
     glib_pollfds_poll();
+
+    g_main_context_release(context);
+
     return ret;
 }
 #else
@@ -412,12 +418,15 @@ static int os_host_main_loop_wait(int64_t timeout)
     fd_set rfds, wfds, xfds;
     int nfds;
 
+    g_main_context_acquire(context);
+
     /* XXX: need to suppress polling by better using win32 events */
     ret = 0;
     for (pe = first_polling_entry; pe != NULL; pe = pe->next) {
         ret |= pe->func(pe->opaque);
     }
     if (ret != 0) {
+        g_main_context_release(context);
         return ret;
     }
 
@@ -472,6 +481,8 @@ static int os_host_main_loop_wait(int64_t timeout)
         g_main_context_dispatch(context);
     }
 
+    g_main_context_release(context);
+
     return select_ret || g_poll_ret;
 }
 #endif
-- 
1.8.3.1

  parent reply	other threads:[~2017-04-03 19:44 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-03 19:44 [Qemu-devel] [PULL 0/6] Misc patches for QEMU 2.9.0-rc3 Paolo Bonzini
2017-04-03 19:44 ` [Qemu-devel] [PULL 1/6] iscsi: drop unused IscsiAIOCB.qiov field Paolo Bonzini
2017-04-03 19:44 ` [Qemu-devel] [PULL 2/6] target-i386: fix "info lapic" segfault on isapc Paolo Bonzini
2017-04-03 19:44 ` [Qemu-devel] [PULL 3/6] ipmi: Fix macro issues Paolo Bonzini
2017-04-03 19:44 ` [Qemu-devel] [PULL 4/6] nbd: fix memory leak on socket_connect failed Paolo Bonzini
2017-04-03 19:44 ` [Qemu-devel] [PULL 5/6] exec: revert MemoryRegionCache Paolo Bonzini
2017-04-03 19:44 ` Paolo Bonzini [this message]
2017-04-04 11:41 ` [Qemu-devel] [PULL 0/6] Misc patches for QEMU 2.9.0-rc3 Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170403194409.21276-7-pbonzini@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.