All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] A glib warning encountered with internal iothread
@ 2017-09-26  5:44 Peter Xu
  2017-09-26  8:43 ` Stefan Hajnoczi
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2017-09-26  5:44 UTC (permalink / raw)
  To: QEMU Devel Mailing List; +Cc: Paolo Bonzini, Stefan Hajnoczi, Fam Zheng

Hi,

Generally speaking this is a question about glib, but I would like to
see how the list sees this before throwing this question to glib
community in case I made severe mistake.

I encountered one glib warning when start to use internal iothread:

(qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll: assertion '!SOURCE_DESTROYED (source)' failed

This will be triggered as long as we create one private iothread and
enables gcontext of it (by calling iothread_get_g_main_context() at
least once on the private iothread).

The warning is generated when quitting QEMU in following path (please
ignore unknown function names, they only appear in a private tree, but
the logic is the same):

#0  0x00007ff0c2bb4180 in g_source_remove_poll () at /lib64/libglib-2.0.so.0
#1  0x00005634b0fe8bc9 in aio_set_fd_handler (ctx=0x5634b365ca90, fd=22, is_external=false, io_read=0x0, io_write=0x0, io_poll=0x0, opaque=0x5634b365cb3c) at /root/git/qemu/util/aio-posix.c:226
#2  0x00005634b0fe8ebc in aio_set_event_notifier (ctx=0x5634b365ca90, notifier=0x5634b365cb3c, is_external=false, io_read=0x0, io_poll=0x0) at /root/git/qemu/util/aio-posix.c:299
#3  0x00005634b0fe5174 in aio_ctx_finalize (source=0x5634b365ca90) at /root/git/qemu/util/async.c:296
#4  0x00007ff0c2bb39a6 in g_source_unref_internal () at /lib64/libglib-2.0.so.0
#5  0x00005634b0fe5757 in aio_context_unref (ctx=0x5634b365ca90) at /root/git/qemu/util/async.c:484
#6  0x00005634b0c39642 in iothread_instance_finalize (obj=0x5634b36cfa40) at /root/git/qemu/iothread.c:127
#7  0x00005634b0ede36e in object_deinit (obj=0x5634b36cfa40, type=0x5634b3543c60) at /root/git/qemu/qom/object.c:453
#8  0x00005634b0ede3e0 in object_finalize (data=0x5634b36cfa40) at /root/git/qemu/qom/object.c:467
#9  0x00005634b0edf361 in object_unref (obj=0x5634b36cfa40) at /root/git/qemu/qom/object.c:902
#10 0x00005634b0ee0607 in object_finalize_child_property (obj=0x5634b369d3b0, name=0x5634b36eb900 "monitor_io_thr", opaque=0x5634b36cfa40) at /root/git/qemu/qom/object.c:1407
#11 0x00005634b0ede25d in object_property_del_child (obj=0x5634b369d3b0, child=0x5634b36cfa40, errp=0x0) at /root/git/qemu/qom/object.c:427
#12 0x00005634b0ede33d in object_unparent (obj=0x5634b36cfa40) at /root/git/qemu/qom/object.c:446
#13 0x00005634b0c39f44 in iothread_destroy (iothread=0x5634b36cfa40) at /root/git/qemu/iothread.c:383
#14 0x00005634b0ae920f in monitor_io_thread_destroy () at /root/git/qemu/monitor.c:4416
#15 0x00005634b0ae9302 in monitor_cleanup () at /root/git/qemu/monitor.c:4439
#16 0x00005634b0c496cb in main (argc=17, argv=0x7ffcc5b88918, envp=0x7ffcc5b889a8) at /root/git/qemu/vl.c:4889

It's on the path during destruction of AioContext, where we called
g_source_remove_poll() in the destructor.  When reach here, AioContext
has been detached from the gcontext already, so the assertion is
triggered.

I'll copy some code for easier reference from glib:

void
g_source_remove_poll (GSource *source,
		      GPollFD *fd)
{
  GMainContext *context;
  
  g_return_if_fail (source != NULL);
  g_return_if_fail (fd != NULL);
  g_return_if_fail (!SOURCE_DESTROYED (source)); <---- this assertion fails
  
  context = source->context;

  if (context)
    LOCK_CONTEXT (context);
  
  source->poll_fds = g_slist_remove (source->poll_fds, fd);

  if (context)
    {
      if (!SOURCE_BLOCKED (source))
	g_main_context_remove_poll_unlocked (context, fd);
      UNLOCK_CONTEXT (context);
    }
}

I'm translating this assertion failure into: "when removing one
GSource from the polled GSource array, the GSource should still be
attached to its GMainContext".  But does this really matter?  Why
can't we remove one GSource from the poll array even if the source is
detached already?  After all if we see following code in
g_source_remove_poll() we have already taken special care when
source->context == NULL happens.

Any thoughts?  TIA.

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] A glib warning encountered with internal iothread
  2017-09-26  5:44 [Qemu-devel] A glib warning encountered with internal iothread Peter Xu
@ 2017-09-26  8:43 ` Stefan Hajnoczi
  2017-09-26  9:11   ` Peter Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2017-09-26  8:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: QEMU Devel Mailing List, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi

On Tue, Sep 26, 2017 at 01:44:39PM +0800, Peter Xu wrote:
> Hi,
> 
> Generally speaking this is a question about glib, but I would like to
> see how the list sees this before throwing this question to glib
> community in case I made severe mistake.
> 
> I encountered one glib warning when start to use internal iothread:
> 
> (qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll: assertion '!SOURCE_DESTROYED (source)' failed
> 
> This will be triggered as long as we create one private iothread and
> enables gcontext of it (by calling iothread_get_g_main_context() at
> least once on the private iothread).
> 
> The warning is generated when quitting QEMU in following path (please
> ignore unknown function names, they only appear in a private tree, but
> the logic is the same):
> 
> #0  0x00007ff0c2bb4180 in g_source_remove_poll () at /lib64/libglib-2.0.so.0
> #1  0x00005634b0fe8bc9 in aio_set_fd_handler (ctx=0x5634b365ca90, fd=22, is_external=false, io_read=0x0, io_write=0x0, io_poll=0x0, opaque=0x5634b365cb3c) at /root/git/qemu/util/aio-posix.c:226
> #2  0x00005634b0fe8ebc in aio_set_event_notifier (ctx=0x5634b365ca90, notifier=0x5634b365cb3c, is_external=false, io_read=0x0, io_poll=0x0) at /root/git/qemu/util/aio-posix.c:299
> #3  0x00005634b0fe5174 in aio_ctx_finalize (source=0x5634b365ca90) at /root/git/qemu/util/async.c:296
> #4  0x00007ff0c2bb39a6 in g_source_unref_internal () at /lib64/libglib-2.0.so.0
> #5  0x00005634b0fe5757 in aio_context_unref (ctx=0x5634b365ca90) at /root/git/qemu/util/async.c:484
> #6  0x00005634b0c39642 in iothread_instance_finalize (obj=0x5634b36cfa40) at /root/git/qemu/iothread.c:127
> #7  0x00005634b0ede36e in object_deinit (obj=0x5634b36cfa40, type=0x5634b3543c60) at /root/git/qemu/qom/object.c:453
> #8  0x00005634b0ede3e0 in object_finalize (data=0x5634b36cfa40) at /root/git/qemu/qom/object.c:467
> #9  0x00005634b0edf361 in object_unref (obj=0x5634b36cfa40) at /root/git/qemu/qom/object.c:902
> #10 0x00005634b0ee0607 in object_finalize_child_property (obj=0x5634b369d3b0, name=0x5634b36eb900 "monitor_io_thr", opaque=0x5634b36cfa40) at /root/git/qemu/qom/object.c:1407
> #11 0x00005634b0ede25d in object_property_del_child (obj=0x5634b369d3b0, child=0x5634b36cfa40, errp=0x0) at /root/git/qemu/qom/object.c:427
> #12 0x00005634b0ede33d in object_unparent (obj=0x5634b36cfa40) at /root/git/qemu/qom/object.c:446
> #13 0x00005634b0c39f44 in iothread_destroy (iothread=0x5634b36cfa40) at /root/git/qemu/iothread.c:383
> #14 0x00005634b0ae920f in monitor_io_thread_destroy () at /root/git/qemu/monitor.c:4416
> #15 0x00005634b0ae9302 in monitor_cleanup () at /root/git/qemu/monitor.c:4439
> #16 0x00005634b0c496cb in main (argc=17, argv=0x7ffcc5b88918, envp=0x7ffcc5b889a8) at /root/git/qemu/vl.c:4889
> 
> It's on the path during destruction of AioContext, where we called
> g_source_remove_poll() in the destructor.  When reach here, AioContext
> has been detached from the gcontext already, so the assertion is
> triggered.
> 
> I'll copy some code for easier reference from glib:
> 
> void
> g_source_remove_poll (GSource *source,
> 		      GPollFD *fd)
> {
>   GMainContext *context;
>   
>   g_return_if_fail (source != NULL);
>   g_return_if_fail (fd != NULL);
>   g_return_if_fail (!SOURCE_DESTROYED (source)); <---- this assertion fails
>   
>   context = source->context;
> 
>   if (context)
>     LOCK_CONTEXT (context);
>   
>   source->poll_fds = g_slist_remove (source->poll_fds, fd);
> 
>   if (context)
>     {
>       if (!SOURCE_BLOCKED (source))
> 	g_main_context_remove_poll_unlocked (context, fd);
>       UNLOCK_CONTEXT (context);
>     }
> }
> 
> I'm translating this assertion failure into: "when removing one
> GSource from the polled GSource array, the GSource should still be
> attached to its GMainContext".  But does this really matter?  Why
> can't we remove one GSource from the poll array even if the source is
> detached already?  After all if we see following code in
> g_source_remove_poll() we have already taken special care when
> source->context == NULL happens.
> 
> Any thoughts?  TIA.

It seems glib chose this cleanup order:
GPollFD < GSource < GMainLoop < GMainContext

Even if glib could be modified to allow what you want, QEMU needs to
work with old glib versions.

I suggest modifying iothread.c to implement glib's required cleanup
order in the finalize function:

aio_context_unref();  <-- the GSource
g_main_loop_unref();
g_main_context_unref();

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] A glib warning encountered with internal iothread
  2017-09-26  8:43 ` Stefan Hajnoczi
@ 2017-09-26  9:11   ` Peter Xu
  2017-09-26 11:13     ` Fam Zheng
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Xu @ 2017-09-26  9:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: QEMU Devel Mailing List, Paolo Bonzini, Fam Zheng, Stefan Hajnoczi

On Tue, Sep 26, 2017 at 09:43:37AM +0100, Stefan Hajnoczi wrote:
> On Tue, Sep 26, 2017 at 01:44:39PM +0800, Peter Xu wrote:
> > Hi,
> > 
> > Generally speaking this is a question about glib, but I would like to
> > see how the list sees this before throwing this question to glib
> > community in case I made severe mistake.
> > 
> > I encountered one glib warning when start to use internal iothread:
> > 
> > (qemu-system-x86_64:19925): GLib-CRITICAL **: g_source_remove_poll: assertion '!SOURCE_DESTROYED (source)' failed
> > 
> > This will be triggered as long as we create one private iothread and
> > enables gcontext of it (by calling iothread_get_g_main_context() at
> > least once on the private iothread).
> > 
> > The warning is generated when quitting QEMU in following path (please
> > ignore unknown function names, they only appear in a private tree, but
> > the logic is the same):
> > 
> > #0  0x00007ff0c2bb4180 in g_source_remove_poll () at /lib64/libglib-2.0.so.0
> > #1  0x00005634b0fe8bc9 in aio_set_fd_handler (ctx=0x5634b365ca90, fd=22, is_external=false, io_read=0x0, io_write=0x0, io_poll=0x0, opaque=0x5634b365cb3c) at /root/git/qemu/util/aio-posix.c:226
> > #2  0x00005634b0fe8ebc in aio_set_event_notifier (ctx=0x5634b365ca90, notifier=0x5634b365cb3c, is_external=false, io_read=0x0, io_poll=0x0) at /root/git/qemu/util/aio-posix.c:299
> > #3  0x00005634b0fe5174 in aio_ctx_finalize (source=0x5634b365ca90) at /root/git/qemu/util/async.c:296
> > #4  0x00007ff0c2bb39a6 in g_source_unref_internal () at /lib64/libglib-2.0.so.0
> > #5  0x00005634b0fe5757 in aio_context_unref (ctx=0x5634b365ca90) at /root/git/qemu/util/async.c:484
> > #6  0x00005634b0c39642 in iothread_instance_finalize (obj=0x5634b36cfa40) at /root/git/qemu/iothread.c:127
> > #7  0x00005634b0ede36e in object_deinit (obj=0x5634b36cfa40, type=0x5634b3543c60) at /root/git/qemu/qom/object.c:453
> > #8  0x00005634b0ede3e0 in object_finalize (data=0x5634b36cfa40) at /root/git/qemu/qom/object.c:467
> > #9  0x00005634b0edf361 in object_unref (obj=0x5634b36cfa40) at /root/git/qemu/qom/object.c:902
> > #10 0x00005634b0ee0607 in object_finalize_child_property (obj=0x5634b369d3b0, name=0x5634b36eb900 "monitor_io_thr", opaque=0x5634b36cfa40) at /root/git/qemu/qom/object.c:1407
> > #11 0x00005634b0ede25d in object_property_del_child (obj=0x5634b369d3b0, child=0x5634b36cfa40, errp=0x0) at /root/git/qemu/qom/object.c:427
> > #12 0x00005634b0ede33d in object_unparent (obj=0x5634b36cfa40) at /root/git/qemu/qom/object.c:446
> > #13 0x00005634b0c39f44 in iothread_destroy (iothread=0x5634b36cfa40) at /root/git/qemu/iothread.c:383
> > #14 0x00005634b0ae920f in monitor_io_thread_destroy () at /root/git/qemu/monitor.c:4416
> > #15 0x00005634b0ae9302 in monitor_cleanup () at /root/git/qemu/monitor.c:4439
> > #16 0x00005634b0c496cb in main (argc=17, argv=0x7ffcc5b88918, envp=0x7ffcc5b889a8) at /root/git/qemu/vl.c:4889
> > 
> > It's on the path during destruction of AioContext, where we called
> > g_source_remove_poll() in the destructor.  When reach here, AioContext
> > has been detached from the gcontext already, so the assertion is
> > triggered.
> > 
> > I'll copy some code for easier reference from glib:
> > 
> > void
> > g_source_remove_poll (GSource *source,
> > 		      GPollFD *fd)
> > {
> >   GMainContext *context;
> >   
> >   g_return_if_fail (source != NULL);
> >   g_return_if_fail (fd != NULL);
> >   g_return_if_fail (!SOURCE_DESTROYED (source)); <---- this assertion fails
> >   
> >   context = source->context;
> > 
> >   if (context)
> >     LOCK_CONTEXT (context);
> >   
> >   source->poll_fds = g_slist_remove (source->poll_fds, fd);
> > 
> >   if (context)
> >     {
> >       if (!SOURCE_BLOCKED (source))
> > 	g_main_context_remove_poll_unlocked (context, fd);
> >       UNLOCK_CONTEXT (context);
> >     }
> > }
> > 
> > I'm translating this assertion failure into: "when removing one
> > GSource from the polled GSource array, the GSource should still be
> > attached to its GMainContext".  But does this really matter?  Why
> > can't we remove one GSource from the poll array even if the source is
> > detached already?  After all if we see following code in
> > g_source_remove_poll() we have already taken special care when
> > source->context == NULL happens.
> > 
> > Any thoughts?  TIA.
> 
> It seems glib chose this cleanup order:
> GPollFD < GSource < GMainLoop < GMainContext
> 
> Even if glib could be modified to allow what you want, QEMU needs to
> work with old glib versions.
> 
> I suggest modifying iothread.c to implement glib's required cleanup
> order in the finalize function:
> 
> aio_context_unref();  <-- the GSource
> g_main_loop_unref();
> g_main_context_unref();

Thanks for the idea.  However this may still not work in this case.

IIUC your suggestion is this:

-------------------
diff --git a/iothread.c b/iothread.c                                     
index 4fcd60bd55..73df280c60 100644 
--- a/iothread.c                    
+++ b/iothread.c                    
@@ -115,16 +115,16 @@ static void iothread_instance_finalize(Object *obj)
     IOThread *iothread = IOTHREAD(obj);                                 
                                    
     iothread_stop(iothread);       
-    if (iothread->worker_context) {                                     
-        g_main_context_unref(iothread->worker_context);                 
-        iothread->worker_context = NULL;                                
-    }                              
     qemu_cond_destroy(&iothread->init_done_cond);                       
     qemu_mutex_destroy(&iothread->init_done_lock);                      
     if (!iothread->ctx) {          
         return;                    
     }                              
     aio_context_unref(iothread->ctx);                                   
+    if (iothread->worker_context) {
+        g_main_context_unref(iothread->worker_context);                 
+        iothread->worker_context = NULL;                                
+    }                              
 }                                  
                                    
 static void iothread_complete(UserCreatable *obj, Error **errp)         
-------------------

And after above change we'll still get assert at context unref:

#0  0x00007f0ad9a63c90 in g_source_remove_poll () at /lib64/libglib-2.0.so.0
#1  0x0000555b99cf981e in aio_set_fd_handler (ctx=0x555b9bbc22a0, fd=22, is_external=false, io_read=0x0, io_write=0x0, io_poll=0x0, opaque=0x555b9bbc234c) at /root/git/qemu/util/aio-posix.c:226
#2  0x0000555b99cf9b11 in aio_set_event_notifier (ctx=0x555b9bbc22a0, notifier=0x555b9bbc234c, is_external=false, io_read=0x0, io_poll=0x0) at /root/git/qemu/util/aio-posix.c:299
#3  0x0000555b99cf5e18 in aio_ctx_finalize (source=0x555b9bbc22a0) at /root/git/qemu/util/async.c:296
#4  0x00007f0ad9a632ff in g_source_unref_internal () at /lib64/libglib-2.0.so.0
#5  0x00007f0ad9a635ce in g_source_iter_next () at /lib64/libglib-2.0.so.0
#6  0x00007f0ad9a646bd in g_main_context_unref () at /lib64/libglib-2.0.so.0
#7  0x0000555b99945973 in iothread_instance_finalize (obj=0x555b9bbaaaa0) at /root/git/qemu/iothread.c:125
#8  0x0000555b99bee58a in object_deinit (obj=0x555b9bbaaaa0, type=0x555b9ba48230) at /root/git/qemu/qom/object.c:453
#9  0x0000555b99bee5fc in object_finalize (data=0x555b9bbaaaa0) at /root/git/qemu/qom/object.c:467
#10 0x0000555b99bef583 in object_unref (obj=0x555b9bbaaaa0) at /root/git/qemu/qom/object.c:902
#11 0x0000555b99bf0829 in object_finalize_child_property (obj=0x555b9bc1eca0, name=0x555b9bbdce60 "monitor_io_thr", opaque=0x555b9bbaaaa0) at /root/git/qemu/qom/object.c:1407
#12 0x0000555b99bee479 in object_property_del_child (obj=0x555b9bc1eca0, child=0x555b9bbaaaa0, errp=0x0) at /root/git/qemu/qom/object.c:427
#13 0x0000555b99bee559 in object_unparent (obj=0x555b9bbaaaa0) at /root/git/qemu/qom/object.c:446
#14 0x0000555b99946284 in iothread_destroy (iothread=0x555b9bbaaaa0) at /root/git/qemu/iothread.c:383
#15 0x0000555b997f51f7 in monitor_io_thread_destroy () at /root/git/qemu/monitor.c:4416
#16 0x0000555b997f52ea in monitor_cleanup () at /root/git/qemu/monitor.c:4439
#17 0x0000555b999559f8 in main (argc=17, argv=0x7ffc3b196ea8, envp=0x7ffc3b196f38) at /root/git/qemu/vl.c:4889

The thing is, if we change the order, aio_context_unref() won't really
destroy the object (the context is taking the last reference), but
it'll be postponed to context unref, then we'll face the same issue.
The funny point is we have this in g_source_destroy_internal():

static void
g_source_destroy_internal (GSource      *source,
			   GMainContext *context,
			   gboolean      have_lock)
{
  ...
  if (!SOURCE_DESTROYED (source))
    {
      ...
      source->flags &= ~G_HOOK_FLAG_ACTIVE;       <------- flag unset here
      ...
      if (old_cb_funcs)
	{
	  UNLOCK_CONTEXT (context);
	  old_cb_funcs->unref (old_cb_data);      <------- destructor called
	  LOCK_CONTEXT (context);
	}
      ...
    }
  ...
}

If glib unset the flag after calling the destructor, it'll be fine.
But it's not doing it that way.  And with above implementation of
glib, I don't see a good way to solve this problem via ordering of
glib calls... :(

Thanks,

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] A glib warning encountered with internal iothread
  2017-09-26  9:11   ` Peter Xu
@ 2017-09-26 11:13     ` Fam Zheng
  2017-09-27  4:09       ` Peter Xu
  2017-09-27 12:17       ` Stefan Hajnoczi
  0 siblings, 2 replies; 9+ messages in thread
From: Fam Zheng @ 2017-09-26 11:13 UTC (permalink / raw)
  To: Peter Xu
  Cc: Stefan Hajnoczi, QEMU Devel Mailing List, Paolo Bonzini, Stefan Hajnoczi

On Tue, 09/26 17:11, Peter Xu wrote:
> If glib unset the flag after calling the destructor, it'll be fine.
> But it's not doing it that way.  And with above implementation of
> glib, I don't see a good way to solve this problem via ordering of
> glib calls... :(

Does this work?

---

diff --git a/include/block/aio.h b/include/block/aio.h
index e9aeeaec94..3237d2be7c 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -147,6 +147,9 @@ struct AioContext {
     int epollfd;
     bool epoll_enabled;
     bool epoll_available;
+
+    /* Protected by BQL */
+    int refcnt;
 };
 
 /**
diff --git a/util/async.c b/util/async.c
index 355af73ee7..c5464611e3 100644
--- a/util/async.c
+++ b/util/async.c
@@ -293,7 +293,6 @@ aio_ctx_finalize(GSource     *source)
     }
     qemu_lockcnt_unlock(&ctx->list_lock);
 
-    aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
     event_notifier_cleanup(&ctx->notifier);
     qemu_rec_mutex_destroy(&ctx->lock);
     qemu_lockcnt_destroy(&ctx->list_lock);
@@ -429,6 +428,8 @@ AioContext *aio_context_new(Error **errp)
     ctx->poll_grow = 0;
     ctx->poll_shrink = 0;
 
+    ctx->refcnt = 1;
+
     return ctx;
 fail:
     g_source_destroy(&ctx->source);
@@ -476,11 +477,17 @@ void aio_co_enter(AioContext *ctx, struct Coroutine *co)
 
 void aio_context_ref(AioContext *ctx)
 {
+    assert(ctx->refcnt > 0);
+    ctx->refcnt++;
     g_source_ref(&ctx->source);
 }
 
 void aio_context_unref(AioContext *ctx)
 {
+    assert(ctx->refcnt > 0);
+    if (--ctx->refcnt == 0) {
+        aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
+    }
     g_source_unref(&ctx->source);
 }
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] A glib warning encountered with internal iothread
  2017-09-26 11:13     ` Fam Zheng
@ 2017-09-27  4:09       ` Peter Xu
  2017-09-27 12:17       ` Stefan Hajnoczi
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Xu @ 2017-09-27  4:09 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Stefan Hajnoczi, QEMU Devel Mailing List, Paolo Bonzini, Stefan Hajnoczi

On Tue, Sep 26, 2017 at 07:13:43PM +0800, Fam Zheng wrote:
> On Tue, 09/26 17:11, Peter Xu wrote:
> > If glib unset the flag after calling the destructor, it'll be fine.
> > But it's not doing it that way.  And with above implementation of
> > glib, I don't see a good way to solve this problem via ordering of
> > glib calls... :(
> 
> Does this work?
> 
> ---
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index e9aeeaec94..3237d2be7c 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -147,6 +147,9 @@ struct AioContext {
>      int epollfd;
>      bool epoll_enabled;
>      bool epoll_available;
> +
> +    /* Protected by BQL */

Would atomic ops work here?

It can be used possibly in any destructor of AioContext.  And, IIUC we
are trying to get rid of BQL when possible (at least, I think the
whole bunch of thing I did in past weeks is a fight with BQL somehow).

> +    int refcnt;
>  };
>  
>  /**
> diff --git a/util/async.c b/util/async.c
> index 355af73ee7..c5464611e3 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -293,7 +293,6 @@ aio_ctx_finalize(GSource     *source)
>      }
>      qemu_lockcnt_unlock(&ctx->list_lock);
>  
> -    aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
>      event_notifier_cleanup(&ctx->notifier);
>      qemu_rec_mutex_destroy(&ctx->lock);
>      qemu_lockcnt_destroy(&ctx->list_lock);
> @@ -429,6 +428,8 @@ AioContext *aio_context_new(Error **errp)
>      ctx->poll_grow = 0;
>      ctx->poll_shrink = 0;
>  
> +    ctx->refcnt = 1;
> +
>      return ctx;
>  fail:
>      g_source_destroy(&ctx->source);
> @@ -476,11 +477,17 @@ void aio_co_enter(AioContext *ctx, struct Coroutine *co)
>  
>  void aio_context_ref(AioContext *ctx)
>  {
> +    assert(ctx->refcnt > 0);
> +    ctx->refcnt++;
>      g_source_ref(&ctx->source);
>  }
>  
>  void aio_context_unref(AioContext *ctx)
>  {
> +    assert(ctx->refcnt > 0);
> +    if (--ctx->refcnt == 0) {
> +        aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
> +    }
>      g_source_unref(&ctx->source);
>  }
>  

Thanks for the patch, this one with Stefan's suggestion does solve the
problem.  I am just not sure whether this is good - then we will have
two refcounts for one AioContext object.

Paolo, Stefan, could you help ack on this?  If you think it's ok, I
can repost the iothread series, fixing patch 4 as Stefan suggested,
and add Fam's patch (though the atomic usage to be discussed).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] A glib warning encountered with internal iothread
  2017-09-26 11:13     ` Fam Zheng
  2017-09-27  4:09       ` Peter Xu
@ 2017-09-27 12:17       ` Stefan Hajnoczi
  2017-09-27 12:36         ` Fam Zheng
  1 sibling, 1 reply; 9+ messages in thread
From: Stefan Hajnoczi @ 2017-09-27 12:17 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Peter Xu, Stefan Hajnoczi, QEMU Devel Mailing List, Paolo Bonzini

On Tue, Sep 26, 2017 at 07:13:43PM +0800, Fam Zheng wrote:
> On Tue, 09/26 17:11, Peter Xu wrote:
>  void aio_context_unref(AioContext *ctx)
>  {
> +    assert(ctx->refcnt > 0);
> +    if (--ctx->refcnt == 0) {
> +        aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
> +    }

This isn't a general solution because Linux AIO also has a file
descriptor that is removed in aio_ctx_finalize().

Here is a different approach.  Does it work for you?

BTW I'm not a glib expert so maybe we're abusing the API and missing the
obvious way to structure our code :).

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 2d51239ec6..5946ac09f0 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -223,7 +223,14 @@ void aio_set_fd_handler(AioContext *ctx,
             return;
         }

-        g_source_remove_poll(&ctx->source, &node->pfd);
+        /* If the GSource is in the process of being destroyed then
+         * g_source_remove_poll() causes an assertion failure.  Skip
+         * removal in that case, because glib cleans up its state during
+         * destruction anyway.
+         */
+        if (!g_source_is_destroyed(&ctx->source)) {
+            g_source_remove_poll(&ctx->source, &node->pfd);
+        }

         /* If the lock is held, just mark the node as deleted */
         if (qemu_lockcnt_count(&ctx->list_lock)) {

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] A glib warning encountered with internal iothread
  2017-09-27 12:17       ` Stefan Hajnoczi
@ 2017-09-27 12:36         ` Fam Zheng
  2017-09-27 13:14           ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: Fam Zheng @ 2017-09-27 12:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Xu, Stefan Hajnoczi, QEMU Devel Mailing List, Paolo Bonzini

On Wed, 09/27 13:17, Stefan Hajnoczi wrote:
> On Tue, Sep 26, 2017 at 07:13:43PM +0800, Fam Zheng wrote:
> > On Tue, 09/26 17:11, Peter Xu wrote:
> >  void aio_context_unref(AioContext *ctx)
> >  {
> > +    assert(ctx->refcnt > 0);
> > +    if (--ctx->refcnt == 0) {
> > +        aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
> > +    }
> 
> This isn't a general solution because Linux AIO also has a file
> descriptor that is removed in aio_ctx_finalize().

Right. Another option is to move everything in aio_context_finalize() into the
"if (--ctx->refcnt == 0) { ... }" block, before calling g_source_unref().

Fam

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] A glib warning encountered with internal iothread
  2017-09-27 12:36         ` Fam Zheng
@ 2017-09-27 13:14           ` Paolo Bonzini
  2017-09-28  2:43             ` Peter Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2017-09-27 13:14 UTC (permalink / raw)
  To: Fam Zheng, Stefan Hajnoczi
  Cc: Peter Xu, Stefan Hajnoczi, QEMU Devel Mailing List

On 27/09/2017 14:36, Fam Zheng wrote:
> On Wed, 09/27 13:17, Stefan Hajnoczi wrote:
>> On Tue, Sep 26, 2017 at 07:13:43PM +0800, Fam Zheng wrote:
>>> On Tue, 09/26 17:11, Peter Xu wrote:
>>>  void aio_context_unref(AioContext *ctx)
>>>  {
>>> +    assert(ctx->refcnt > 0);
>>> +    if (--ctx->refcnt == 0) {
>>> +        aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
>>> +    }
>>
>> This isn't a general solution because Linux AIO also has a file
>> descriptor that is removed in aio_ctx_finalize().
> 
> Right. Another option is to move everything in aio_context_finalize() into the
> "if (--ctx->refcnt == 0) { ... }" block, before calling g_source_unref().

I think I prefer Stefan's solution.  If the GSource has been ref'ed
independent of the AioContext object, it may be a problem to free it early.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] A glib warning encountered with internal iothread
  2017-09-27 13:14           ` Paolo Bonzini
@ 2017-09-28  2:43             ` Peter Xu
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Xu @ 2017-09-28  2:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Fam Zheng, Stefan Hajnoczi, Stefan Hajnoczi, QEMU Devel Mailing List

On Wed, Sep 27, 2017 at 03:14:40PM +0200, Paolo Bonzini wrote:
> On 27/09/2017 14:36, Fam Zheng wrote:
> > On Wed, 09/27 13:17, Stefan Hajnoczi wrote:
> >> On Tue, Sep 26, 2017 at 07:13:43PM +0800, Fam Zheng wrote:
> >>> On Tue, 09/26 17:11, Peter Xu wrote:
> >>>  void aio_context_unref(AioContext *ctx)
> >>>  {
> >>> +    assert(ctx->refcnt > 0);
> >>> +    if (--ctx->refcnt == 0) {
> >>> +        aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL);
> >>> +    }
> >>
> >> This isn't a general solution because Linux AIO also has a file
> >> descriptor that is removed in aio_ctx_finalize().
> > 
> > Right. Another option is to move everything in aio_context_finalize() into the
> > "if (--ctx->refcnt == 0) { ... }" block, before calling g_source_unref().
> 
> I think I prefer Stefan's solution.  If the GSource has been ref'ed
> independent of the AioContext object, it may be a problem to free it early.

Stefan's patch works for me.  I'll put Stefan's patch into iothread
series and repost soon.

Thanks Stefan (and all)!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-28  2:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-26  5:44 [Qemu-devel] A glib warning encountered with internal iothread Peter Xu
2017-09-26  8:43 ` Stefan Hajnoczi
2017-09-26  9:11   ` Peter Xu
2017-09-26 11:13     ` Fam Zheng
2017-09-27  4:09       ` Peter Xu
2017-09-27 12:17       ` Stefan Hajnoczi
2017-09-27 12:36         ` Fam Zheng
2017-09-27 13:14           ` Paolo Bonzini
2017-09-28  2:43             ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.