On Tue, Nov 14, 2017 at 02:09:39PM +0800, Peter Xu wrote: > On Mon, Nov 13, 2017 at 04:52:11PM +0000, Stefan Hajnoczi wrote: > > On Mon, Nov 06, 2017 at 05:46:17PM +0800, Peter Xu wrote: > > > This is not a problem if we are only having one single loop thread like > > > before. However, after per-monitor thread is introduced, this is not > > > true any more, and the race can happen. > > > > > > The race can be triggered with "make check -j8" sometimes: > > > > Please mention a specific test case that fails. > > It was any of the check-qtest-$(TARGET)s that failed. I'll mention > that in next post. > > > > > > > > > qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91: > > > io_watch_poll_finalize: Assertion `iwp->src == NULL' failed. > > > > > > This patch keeps the reference for the watch object when creating in > > > io_add_watch_poll(), so that the object will never be released in the > > > context main loop, especially when the context loop is running in > > > another standalone thread. Meanwhile, when we want to remove the watch > > > object, we always first detach the watch object from its owner context, > > > then we continue with the cleanup. > > > > > > Without this patch, calling io_remove_watch_poll() in main loop thread > > > is not thread-safe, since the other per-monitor thread may be modifying > > > the watch object at the same time. > > > > > > Reviewed-by: Marc-André Lureau > > > Signed-off-by: Peter Xu > > > --- > > > chardev/char-io.c | 16 ++++++++++++++-- > > > 1 file changed, 14 insertions(+), 2 deletions(-) > > > > > > diff --git a/chardev/char-io.c b/chardev/char-io.c > > > index f81052481a..50b5bac704 100644 > > > --- a/chardev/char-io.c > > > +++ b/chardev/char-io.c > > > @@ -122,7 +122,6 @@ GSource *io_add_watch_poll(Chardev *chr, > > > g_free(name); > > > > > > g_source_attach(&iwp->parent, context); > > > - g_source_unref(&iwp->parent); > > > return (GSource *)iwp; > > > } > > > > > > @@ -131,12 +130,25 @@ static void io_remove_watch_poll(GSource *source) > > > IOWatchPoll *iwp; > > > > > > iwp = io_watch_poll_from_source(source); > > > + > > > + /* > > > + * Here the order of destruction really matters. We need to first > > > + * detach the IOWatchPoll object from the context (which may still > > > + * be running in another loop thread), only after that could we > > > + * continue to operate on iwp->src, or there may be race condition > > > + * between current thread and the context loop thread. > > > + * > > > + * Let's blame the glib bug mentioned in commit 2b316774f6 > > > + * ("qemu-char: do not operate on sources from finalize > > > + * callbacks") for this extra complexity. > > > > I don't understand how this bug is to blame. Isn't the problem here a > > race condition between two QEMU threads? > > Yes, it is. > > The problem is, we won't have the race condition if glib does not have > that bug mentioned. Then the thread running GMainContext will have > full control of iwp->src destruction, and destruction of it would be > fairly straightforward (unref iwp->src in IOWatchPoll destructor). > Now IIUC we are doing this in a hacky way, say, we destroy iwp->src > explicitly from main thread before quitting (see [1] below, the whole > if clause). > > > > > Why are two threads accessing the watch at the same time? > > Here is how I understand: > > Firstly we need to tackle with that bug, by an explicit destruction of > iwp->src below; meanwhile when we are destroying it, the GMainContext > can still be running somewhere (it's not happening in current series > since I stopped iothread earlier than this point, however it can still > happen if in the future we don't do that), then we possibly want this > patch. > > Again, without this patch, current series should work; however I do > hope this patch can be in, in case someday we want to provide complete > thread safety for Chardevs (now it is not really thread-safe). You said qtests fail with "Assertion `iwp->src == NULL' failed" but then you said "without this patch, current series should work". How do you reproduce the failure if it doesn't occur? It looks like remove_fd_in_watch() -> io_remove_watch_poll() callers fall into two categories: called from within the event loop and called when a chardev is destroyed. Do the thread-safety issues occur when the chardev is destroyed by the QEMU main loop thread? Or did I miss cases where remove_fd_in_watch() is called from other threads? > > > > > > + */ > > > + g_source_destroy(&iwp->parent); > > > if (iwp->src) { > > > g_source_destroy(iwp->src); > > > g_source_unref(iwp->src); > > > iwp->src = NULL; > > > } > > [1] > > > > - g_source_destroy(&iwp->parent); > > > + g_source_unref(&iwp->parent); > > > } > > > > > > void remove_fd_in_watch(Chardev *chr) > > > -- > > > 2.13.5 > > > > > Thanks, > > -- > Peter Xu